Preliminary Investigation: Finally, the Real Culprits

View this thread on: d.buzz | hive.blog | peakd.com | ecency.com
·@thecrazygm·
0.000 HBD
Preliminary Investigation: Finally, the Real Culprits
Hey everyone,

My investigation into the high CPU usage on the "moon" server has been a process of refining the data to get a clear answer. After first identifying `python3` processes as the general cause, I [updated my metrics script](https://gist.github.com/TheCrazyGM/9c5945224d89152827036b76e84bbbb1) to capture more detailed information, specifically the full command line of the top process.

This new data allows for a much more precise analysis. Instead of just guessing which Python scripts are the problem, I can now query the metrics database for exactly which commands are running when the server's total CPU usage spikes.

#### The Analysis

![New Metrics, Fresh Data](https://files.peakd.com/file/peakd-hive/thecrazygm/23tSym8QC5mXEaVHwaSsBjSWtM4peCJYdtvVYVuNm5suDcs8TsgdR7F4686thdcSZbVCP.png)

I ran a simple query to select the top CPU command line from any data point where the server's overall CPU usage was greater than 50%:

```bash
sqlite3 -header -csv server_metrics.db "SELECT top_cpu_cmdline FROM metrics WHERE cpu_percent == 100"
```

![The Top Spikes](https://files.peakd.com/file/peakd-hive/thecrazygm/23tGRgytvKxmzq4pWjXMWuTBVvzHDJd6EjS2bS27DbyjjvPX1tFgkDzYovLD1j6hgr4Rn.png)

```bash
sqlite3 -header -csv server_metrics.db "SELECT top_cpu_cmdline FROM metrics WHERE cpu_percent > 50"
```

![Showing what made all 8 spikes](https://files.peakd.com/file/peakd-hive/thecrazygm/23t72XFAktXC2XMb5FShG6dDHz7uFE5cSiSQqfdyqdsSDqB1FPhA4g65qJ26N3FgWG4qE.png)

The output was a list of the commands that were at the top of the process list during these high-load moments. After tallying the results from the last 24 hours, two scripts are clearly responsible for the vast majority of the CPU spikes:

- `python3 sendNotifications.py`
- `python3 monitorDeposits.py`

These two background bots appeared with roughly equal frequency during the periods of high load. Interestingly, the metrics collection script itself showed up once, but its impact is negligible compared to the notification and deposit monitoring scripts.

#### Conclusion

This finally gives us a clear, actionable target. The high server load isn't coming from the web applications, as one might suspect, but from background daemons running constantly.

The next step is to dive into the code for `sendNotifications.py` and `monitorDeposits.py`. I'll be looking for any inefficiencies, such as loops that don't have a `sleep` interval, or resource-intensive calculations that can be optimized. This has been a great lesson in the importance of collecting the _right_ data. It took a few tries, but we now have a precise answer.

As always,
Michael Garcia a.k.a. TheCrazyGM

👍 , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,