Adding a counter to the proc interface
Published: Last Updated:
Introduction
The proc interface of Linux kernel is a pseudo-filesystem that provides a window into the kernel’s runtime state. Unlike regular files, the data in /proc is generated on-the-fly by the kernel and reflects real-time system information.
The proc interface exposes information about all major kernel subsystems:
- Memory management (
/proc/meminfo,/proc/vmstat) - Process information (
/proc/[pid]/) - System interrupts (
/proc/interrupts) - CPU information (
/proc/cpuinfo) - Kernel modules (
/proc/modules)
This tutorial demonstrates how to add custom performance counters to track kernel behavior, which is essential for:
- Performance analysis - Understanding system bottlenecks
- Debugging - Tracking code path execution
- Research - Validating hypotheses about kernel behavior
Why Add Custom Counters?
When the built-in counters don’t provide the metrics you need, custom counters let you track specific kernel events. Benefits include:
- Low overhead - Counter increments use atomic operations with minimal performance impact
- Real-time visibility - Instantly see counter values without rebooting
- Flexible placement - Add counters anywhere in kernel code
Performance Considerations:
- Counters in hot paths (e.g.,
pte_alloc) can impact performance if the function is called millions of times per second - Use judiciously in critical sections
- Consider using tracepoints for more detailed analysis if needed
Counter Scopes
Counters can track data at different granularities:
Global-level counters (system-wide):
- Located in
/proc/vmstat,/proc/meminfo, etc. - Aggregate metrics across all processes
- Example: Total page faults, swap operations
Process-level counters (per-process):
- Located in
/proc/[pid]/status,/proc/[pid]/stat - Track individual process behavior
- Example: Memory usage, CPU time for specific processes
How the Kernel Differentiates Counter Scopes
The kernel uses different data structures and APIs for global vs. process-level counters:
Global counters:
- Stored in per-CPU arrays (
vm_event_states) - Incremented using
count_vm_events()orcount_vm_event() - Aggregated across all CPUs when read from
/proc/vmstat - No process context needed
Process-level counters:
- Stored in
task_struct(the process descriptor) - Incremented via direct field updates on the current process
- Accessed through
/proc/[pid]/interfaces - Require process context
Key difference: Global counters use atomic per-CPU operations, while process-level counters update fields in the current task’s task_struct.
Example: Process-Level Counter Increment
Consider the min_flt counter (minor page faults per process) in /proc/[pid]/stat:
// In mm/memory.c - page fault handler
static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
{
// ... page fault handling logic ...
// Increment process-level counter
current->min_flt++; // current points to task_struct of running process
// Also increment global counter
count_vm_event(PGFAULT);
return ret;
}
Key points:
currentis a macro that returns pointer to the current process’stask_structtask_structcontains fields likemin_flt,maj_flt,utime,stime, etc.- Direct field access (no locks needed as each process updates its own counters)
- Both global and process counters can be updated for the same event
Accessing Process-Level Counters
Process counters are read from task_struct fields:
// In fs/proc/array.c - generates /proc/[pid]/stat
static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
struct pid *pid, struct task_struct *task, int whole)
{
// ... other fields ...
seq_put_decimal_ull(m, " ", task->min_flt); // Minor page faults
seq_put_decimal_ull(m, " ", task->maj_flt); // Major page faults
seq_put_decimal_ull(m, " ", task->utime); // User CPU time
seq_put_decimal_ull(m, " ", task->stime); // System CPU time
// ... more fields ...
}
Location of task_struct definition: include/linux/sched.h
Adding a Custom Process-Level Counter
To add a process-level counter:
- Add field to
task_structininclude/linux/sched.h:struct task_struct { // ... existing fields ... unsigned long my_custom_counter; // ... more fields ... }; - Initialize in process creation (
kernel/fork.c):static struct task_struct *dup_task_struct(struct task_struct *orig, int node) { // ... allocation code ... tsk->my_custom_counter = 0; // ... rest of initialization ... } - Increment in your code:
// Anywhere in kernel code with process context current->my_custom_counter++; - Expose via
/proc/[pid]/statusinfs/proc/array.c:static inline void task_custom_stats(struct seq_file *m, struct task_struct *task) { seq_printf(m, "MyCustomCounter:\t%lu\n", task->my_custom_counter); }
Important: Process-level counters require rebuilding the kernel and are more invasive than global counters since they modify core kernel structures.
This is a global-level data:
sandeep@sandeep-Precision-3630-Tower:~$ cat /proc/meminfo
MemTotal: 32585780 kB
MemFree: 28039452 kB
MemAvailable: 29530716 kB
Buffers: 99868 kB
Cached: 2151404 kB
SwapCached: 0 kB
Active: 649700 kB
Inactive: 3242992 kB
Active(anon): 6512 kB
Inactive(anon): 2164592 kB
Active(file): 643188 kB
Inactive(file): 1078400 kB
Unevictable: 345384 kB
Mlocked: 64 kB
SwapTotal: 2097148 kB
...
This is process-level data:
sandeep@sandeep-Precision-3630-Tower:~$ cat /proc/1645/status
Name: gsd-media-keys
Umask: 0002
State: S (sleeping)
Tgid: 1645
Ngid: 0
Pid: 1645
PPid: 1329
TracerPid: 0
Uid: 1000 1000 1000 1000
Gid: 1000 1000 1000 1000
FDSize: 64
Groups: 4 24 27 30 46 120 131 132 1000
NStgid: 1645
NSpid: 1645
NSpgid: 1645
NSsid: 1645
VmPeak: 844100 kB
Understanding the Counter Architecture
This guide focuses on adding global-level counters to /proc/vmstat. The same principles apply to other proc interfaces with minor variations.
Target: Linux kernel v5.9 (steps are similar for v5.x kernels)
Key files involved:
mm/vmstat.c- Counter name definitions (display layer)include/linux/vm_event_item.h- Counter enum declarationsmm/migrate.c- Example usage of counters
cat /proc/vmstat
...
pgmigrate_success 0
...
When we list the statistics from /proc/vsmstat, we have a pgmigrate_success. This counter is defined in “mm/vmstat.c”
...
#ifdef CONFIG_MIGRATION
"pgmigrate_success",
"pgmigrate_fail",
"thp_migration_success",
"thp_migration_fail",
"thp_migration_split",
#endif
..
Understanding pgmigrate_success
The pgmigrate_success counter tracks successful page migrations in NUMA systems. On non-NUMA systems, this remains at 0.
Important: The string name in vmstat.c is just the display name. The actual counter is an enum constant defined elsewhere.
Two-part definition:
- Display name - String in
mm/vmstat.c(what you see in/proc/vmstat) - Counter enum - Constant in
include/linux/vm_event_item.h(what code uses)
This separation allows the kernel to efficiently use integer enums internally while presenting readable names to users.
The actual counter enum is defined in include/linux/vm_event_item.h:
#ifdef CONFIG_MIGRATION
PGMIGRATE_SUCCESS, PGMIGRATE_FAIL,
THP_MIGRATION_SUCCESS,
THP_MIGRATION_FAIL,
THP_MIGRATION_SPLIT,
#endif
Searching for PGMIGRATE_SUCCESS (case-sensitive) we see that it is used in two other places, both in the same file /mm/migrate.c
int migrate_pages(... {
...
count_vm_events(PGMIGRATE_SUCCESS, nr_succeeded);
...
}
int migrate_misplaced_transhuge_page(st
{
...
count_vm_events(PGMIGRATE_SUCCESS, HPAGE_PMD_NR);
...
}
How Counters are Incremented
Base page migration (migrate_pages):
- Migrates standard 4KB pages
- Increments by
nr_succeeded(total number of successfully migrated pages)
Huge page migration (migrate_misplaced_transhuge_page):
- Migrates 2MB transparent huge pages
- Increments by
HPAGE_PMD_NR(512 base pages per huge page)
The API count_vm_events(COUNTER_NAME, count) is used to atomically increment counters.
Step-by-Step: Adding a Custom Counter
We’ll create a counter called custom_test to track system call invocations. This demonstrates the complete workflow from definition to usage.
Prerequisites
- Linux kernel source (v5.9 recommended)
- Development tools:
build-essential,libncurses-dev,bison,flex,libssl-dev,libelf-dev - Root access for kernel installation
Step 1: Add Display Name
In mm/vmstat.c, add the human-readable string that will appear in /proc/vmstat:
...
"numa_hint_faults_local",
"numa_pages_migrated",
#endif
"custom_test", // <-- here
#ifdef CONFIG_MIGRATION
"pgmigrate_success",
"pgmigrate_fail",
...
Note: The array indices must match between vmstat.c and vm_event_item.h, so add your counter in the same relative position.
Step 2: Add Counter Enum
In include/linux/vm_event_item.h, add the enum constant that code will reference:
NUMA_HINT_FAULTS_LOCAL,
NUMA_PAGE_MIGRATE,
#endif
CUSTOM_TEST, // <-- Here
#ifdef CONFIG_MIGRATION
PGMIGRATE_SUCCESS, PGMIGRATE_FAIL,
THP_MIGRATION_SUCCESS,
Important: Use uppercase for the enum (e.g., CUSTOM_TEST) and lowercase with underscores for the display string (e.g., "custom_test").
Step 3: Instrument the Code
Now use the counter to track the migrate_pages system call. This syscall moves process memory pages between NUMA nodes.
Location: mm/mempolicy.c
Usage: We’ll track every invocation of the syscall, regardless of success or failure:
SYSCALL_DEFINE4(migrate_pages, pid_t, pid, unsigned long, maxnode,
const unsigned long __user *, old_nodes,
const unsigned long __user *, new_nodes)
{
count_vm_events(CUSTOM_TEST, 1); // <- Our counter
return kernel_migrate_pages(pid, maxnode, old_nodes, new_nodes);
}
Key points:
count_vm_events(CUSTOM_TEST, 1)atomically increments the counter by 1- Placed before
kernel_migrate_pages()to count every attempt - Counter increments even if the syscall fails (useful for debugging)
Building and Installing the Kernel
Compilation
# Configure kernel (if needed)
make menuconfig
# Build kernel with parallel jobs
make -j$(nproc)
# Install modules
sudo make modules_install
# Install kernel
sudo make install
# Update bootloader
sudo update-grub # For Ubuntu/Debian
# OR
sudo grub2-mkconfig -o /boot/grub2/grub.cfg # For RHEL/CentOS
Build time: Expect 30-60 minutes depending on your system.
Reboot
sudo reboot
Verify kernel version after reboot:
uname -r
Testing the Counter
Initial State
After booting into the new kernel, verify the counter exists and starts at 0:
cat /proc/vmstat | grep custom_test
Expected output:
custom_test 0
Triggering the Counter
Use the migratepages command to invoke the migrate_pages syscall:
# Install numactl if not present
sudo apt-get install numactl # Ubuntu/Debian
# OR
sudo yum install numactl # RHEL/CentOS
# Attempt to migrate pages (PID doesn't need to be valid)
migratepages 1234 0 1
What’s happening:
migratepagescalls themigrate_pages()syscall- Our counter increments before the actual migration logic
- Even invalid PIDs trigger the counter (by design)
Verify Increment
cat /proc/vmstat | grep custom_test
Expected output:
custom_test 1
Success! The counter incremented.
Continuous Monitoring
Watch the counter in real-time:
watch -n 1 'cat /proc/vmstat | grep custom_test'
Or use a script to trigger and monitor:
for i in {1..5}; do
migratepages 1234 0 1
echo "Attempt $i:"
grep custom_test /proc/vmstat
sleep 1
done
Important Considerations
Counter Persistence
Limitation: Counters reset to 0 on reboot. They are not persistent across boots.
Workaround: Write a monitoring script that:
- Polls
/proc/vmstatperiodically - Logs values to a file or time-series database
- Calculates deltas for rate-based metrics
Example monitoring script:
#!/bin/bash
while true; do
timestamp=$(date +%s)
value=$(grep custom_test /proc/vmstat | awk '{print $2}')
echo "$timestamp $value" >> /var/log/custom_counter.log
sleep 60
done
Performance Impact
- Atomic operations:
count_vm_eventsuses atomic increments (low overhead) - Cache effects: Frequent updates to the same counter may cause cache line bouncing on multi-core systems
- Critical paths: Avoid instrumenting functions called millions of times per second
Debugging Tips
If counter doesn’t appear:
- Check array alignment in
vmstat.candvm_event_item.h - Verify kernel compiled and installed correctly:
uname -r - Check for compilation errors:
dmesg | grep -i error
If counter doesn’t increment:
- Verify code path is actually executed
- Add
printk()statements for debugging - Check syscall is being invoked:
strace migratepages 1234 0 1
Advanced Usage
Multiple Counters
Add multiple related counters to track success/failure separately:
// In vm_event_item.h
CUSTOM_TEST_SUCCESS,
CUSTOM_TEST_FAIL,
// In vmstat.c
"custom_test_success",
"custom_test_fail",
// In code
if (result >= 0)
count_vm_events(CUSTOM_TEST_SUCCESS, 1);
else
count_vm_events(CUSTOM_TEST_FAIL, 1);
Per-CPU Counters
For extremely hot paths, consider per-CPU counters to reduce contention:
DECLARE_PER_CPU(unsigned long, custom_counter);
// Increment
this_cpu_inc(custom_counter);
Analysis Tools
Parse /proc/vmstat programmatically:
#!/usr/bin/env python3
def read_vmstat():
stats = {}
with open('/proc/vmstat') as f:
for line in f:
key, value = line.split()
stats[key] = int(value)
return stats
# Monitor rate of change
import time
prev = read_vmstat()
time.sleep(1)
curr = read_vmstat()
rate = curr['custom_test'] - prev['custom_test']
print(f"Custom counter rate: {rate} events/sec")
Conclusion
Adding custom counters to /proc provides lightweight, real-time instrumentation for kernel analysis. This technique is invaluable for:
- Performance tuning and bottleneck identification
- Validating kernel modifications
- Understanding system behavior under specific workloads
The low overhead makes it suitable for production environments, unlike more invasive debugging techniques. Combined with tools like perf, ftrace, and eBPF, custom counters form a complete kernel observability toolkit.