Building Proc-Monitor: A Lightweight Linux Process Monitoring Tool

As system administrators and developers, we’ve all been there: your server suddenly slows down, CPU usage spikes, or memory consumption skyrockets. By the time you open top or htop, the culprit process has already disappeared. Those short-lived, resource-hungry processes are notoriously difficult to catch and diagnose.

This frustration led me to create Proc-Monitor – a lightweight, dependency-free Linux process monitoring tool designed specifically to catch these elusive resource hogs and trace them back to their source.

The Problem: Short-Lived Resource Consumers

Traditional monitoring tools like top, htop, or even ps are excellent for real-time snapshots, but they share a common weakness: they only show what’s happening right now. If a process spawns, consumes 90% CPU for 200 milliseconds, and then exits, you’ll likely miss it entirely.

These brief but intense resource spikes can cause:

  • Service latency and timeouts
  • Degraded user experience
  • Cascading failures in microservice architectures
  • Mysterious performance issues that seem to “just happen”

Even worse, when you finally do catch a high-resource process, you’re left wondering: Which service or parent process spawned this? Understanding the process hierarchy is crucial for effective troubleshooting.

Introducing Proc-Monitor

Proc-Monitor solves these problems by continuously monitoring the /proc filesystem with configurable check intervals as low as 100ms. When it detects high CPU or RAM usage, it captures comprehensive information including:

  • Complete process details (PID, name, command line)
  • Resource usage (CPU percentage, RAM usage)
  • Parent service detection (which systemd service owns the process)
  • Process hierarchy (full parent chain up to init)
  • User information (who owns the process)
  • Timestamps (when the spike occurred)

All of this happens with zero external dependencies – just Python 3.6+ and the standard library.

Key Features

1. Dual Monitoring Modes

Proc-Monitor supports two distinct monitoring strategies:

Threshold Mode: Captures any process exceeding configured CPU or RAM thresholds. Perfect for catching unexpected spikes and anomalies.

Top-N Mode: Continuously tracks the top N processes by resource usage. Ideal for identifying your system’s most resource-intensive processes over time.

2. Systemd Service Integration

One of Proc-Monitor’s most powerful features is its ability to identify which systemd service spawned a process by parsing cgroup information:

def get_systemd_service(pid):
    """Get the systemd service name for a process."""
    content = read_file_safe(f'/proc/{pid}/cgroup')
    if not content:
        return "Unknown"
    
    for line in content.split('\n'):
        if '.service' in line:
            parts = line.strip().split('/')
            for part in reversed(parts):
                if '.service' in part:
                    return partCode language: Python (python)

This means instead of just knowing that process 12345 was consuming resources, you’ll know it was spawned by apache2.service, docker.service, or your custom application service.

3. Fast Detection with Configurable Intervals

The tool supports check intervals as low as 0.1 seconds (100ms), making it possible to catch even very brief resource spikes:

{
    "cpu_threshold": 30.0,
    "check_interval": 0.1,
    "track_ram": false
}Code language: JSON / JSON with Comments (json)

4. Comprehensive Reporting

When you stop monitoring (CTRL+C), Proc-Monitor generates a detailed JSON report with:

  • Aggregated statistics by service
  • Individual event details
  • Configuration used during monitoring
  • Timestamps and resource usage trends

How It Works: Under the Hood

Proc-Monitor directly reads the Linux /proc filesystem without relying on external tools or libraries. Here’s how the core monitoring works:

CPU Usage Calculation

CPU percentage is calculated by comparing process CPU ticks between check intervals:

def calculate_cpu_percent(pid, stat, current_time, total_cpu_delta):
    """Calculate CPU percentage for a process."""
    proc_time = stat['utime'] + stat['stime']
    
    if pid in prev_proc_stats:
        prev_proc_time, prev_time = prev_proc_stats[pid]
        time_delta = current_time - prev_time
        
        if time_delta > 0 and total_cpu_delta > 0:
            proc_delta = proc_time - prev_proc_time
            cpu_percent = (proc_delta / total_cpu_delta) * 100 * NUM_CPUS
            return cpu_percent
    
    return 0.0Code language: Python (python)

This approach accounts for multi-core systems and provides accurate CPU usage percentages even for short-lived processes.

Memory Usage Tracking

Memory information is extracted from /proc/<pid>/statm:

def get_process_memory(pid):
    """Get memory usage from /proc/<pid>/statm."""
    content = read_file_safe(f'/proc/{pid}/statm')
    if not content:
        return 0, 0.0
    
    parts = content.split()
    rss_pages = int(parts[1])
    page_size = os.sysconf('SC_PAGE_SIZE')
    rss_bytes = rss_pages * page_size
    
    <em># Calculate percentage of total RAM</em>
    meminfo = read_file_safe('/proc/meminfo')
    <em># ... calculate percentage</em>Code language: Python (python)

Parent Chain Discovery

Understanding which process spawned the high-resource consumer is crucial:

def get_parent_chain(pid, max_depth=10):
    """Get the parent process chain up to init (PID 1)."""
    chain = []
    current_pid = pid
    
    for _ in range(max_depth):
        stat = get_process_stat(current_pid)
        if not stat:
            break
        
        chain.append((current_pid, stat['name']))
        
        if current_pid == 1:
            break
        
        current_pid = stat['ppid']
    
    return chainCode language: Python (python)

Getting Started

Installation and Basic Usage

The simplest way to try Proc-Monitor is with a one-line command:

curl -sL https://raw.githubusercontent.com/cagatayuresin/proc-monitor/main/proc_monitor.py | sudo python3 -Code language: Bash (bash)

Or download and run locally:

wget https://raw.githubusercontent.com/cagatayuresin/proc-monitor/main/proc_monitor.py
sudo python3 proc_monitor.pyCode language: Bash (bash)

Configuration

Create a config.json file for customized monitoring:

{
    "mode": "threshold",
    "top_n": 5,
    "cpu_threshold": 50.0,
    "ram_threshold": 10.0,
    "check_interval": 0.3,
    "output_file": "resource_report.json",
    "track_cpu": true,
    "track_ram": true
}Code language: JSON / JSON with Comments (json)

Real-World Use Cases

1. Finding Memory Leaks

Configure aggressive RAM monitoring with a low threshold:

{
    "mode": "threshold",
    "ram_threshold": 5.0,
    "check_interval": 1.0,
    "track_cpu": false,
    "track_ram": true
}Code language: JSON / JSON with Comments (json)

This helps identify processes with growing memory consumption over time.

2. Catching CPU Spikes

For debugging sudden CPU usage spikes:

{
    "mode": "threshold",
    "cpu_threshold": 30.0,
    "check_interval": 0.1,
    "track_cpu": true,
    "track_ram": false
}Code language: JSON / JSON with Comments (json)

The 100ms check interval ensures even brief spikes are captured.

3. Production System Auditing

Use Top-N mode for continuous monitoring of your most resource-intensive processes:

{
    "mode": "top_n",
    "top_n": 10,
    "check_interval": 0.5
}Code language: JSON / JSON with Comments (json)

Example Output

[2024-01-15 10:30:45] [CPU] stress (PID:12345)
    CPU: 98.5% | RAM: 0.3% (12.4 MB)
    Service: stress-test.service
    User: root
    Chain: stress(12345) -> bash(12300) -> systemd(1)
    Cmd: /usr/bin/stress --cpu 1Code language: Bash (bash)

The generated JSON report provides even more detail for post-analysis:

{
  "generated_at": "2024-01-15 10:35:22",
  "summary": {
    "total_events": 150,
    "by_service": {
      "apache2.service": {
        "count": 100,
        "processes": [...]
      }
    }
  },
  "events": [...]
}Code language: JSON / JSON with Comments (json)

Design Decisions and Trade-offs

Why No External Dependencies?

I deliberately designed Proc-Monitor to use only Python’s standard library for several reasons:

  1. Universal compatibility: Works on any Linux system with Python 3.6+
  2. Easy deployment: No pip installs or virtual environments needed
  3. Minimal attack surface: Fewer dependencies mean fewer security concerns
  4. Lightweight: No overhead from heavy monitoring frameworks

Why Direct /proc Access?

Reading /proc directly provides:

  • Maximum portability across Linux distributions
  • No dependency on system utilities that might not be installed
  • Fine-grained control over what data to collect
  • Minimal performance overhead

Limitations and Considerations

Proc-Monitor is designed for specific use cases and has some intentional limitations:

  • Linux only: Requires the /proc filesystem
  • Root recommended: Full process information requires elevated privileges
  • Disk I/O: Frequent /proc reads can generate disk activity (though minimal)
  • Not a replacement: Complements, not replaces, traditional monitoring tools

Performance Considerations

At its default 0.3-second check interval, Proc-Monitor has minimal system impact. However, you can tune performance based on your needs:

  • Lower intervals (0.1s): Catches more short-lived processes but uses more CPU
  • Higher intervals (1.0s): Lower overhead but might miss brief spikes
  • Selective tracking: Disable CPU or RAM tracking if you only need one metric

Future Enhancements

While Proc-Monitor is feature-complete for its intended purpose, potential future additions could include:

  • Network usage tracking
  • Disk I/O monitoring
  • Container/cgroup-aware monitoring
  • Alert notifications (email, webhook)
  • Web-based dashboard
  • Historical trend analysis

Contributing

Proc-Monitor is open source (MIT License) and welcomes contributions! Whether it’s bug reports, feature requests, or pull requests, community involvement helps make the tool better for everyone.

Check out the GitHub repository to get involved.

Conclusion

Proc-Monitor fills a specific gap in the Linux monitoring ecosystem: catching and identifying short-lived, high-resource processes. Its zero-dependency design, dual monitoring modes, and systemd service integration make it a valuable tool for system administrators, DevOps engineers, and developers troubleshooting performance issues.

Whether you’re debugging mysterious CPU spikes, hunting memory leaks, or simply want better visibility into your system’s resource usage, Proc-Monitor provides the insights you need without the complexity of heavyweight monitoring solutions.

Try it today and never wonder “what was that process?” again.


Download: GitHub – proc-monitor

Quick Start: curl -sL https://raw.githubusercontent.com/cagatayuresin/proc-monitor/main/proc_monitor.py | sudo python3 -

License: MIT