Saturday, October 24, 2020

OPNsense - High CPU utilization after upgrade with 20.7


When I uploaded the Sensei module in OPNsense, the system crashed. I powered off the hardware Protectli and then turned on.
After powered on, the dhcpd (DHCPv4 Server) service did not run on OPNsense. Restarting the service was useless, and rebooting did not bring the service up.

Eventually, I upgraded the OPNsense firmware from the previous 20.1 version to the 20.7.4 version. The upgrade was successful. However, over time, the internet speed slowed down, and packet loss occurred.
The CPU usage status on OPNsense's Lobby: Dashboard showed 90-100%.

First of all, I went to the OPNsense forum to find a solution, but couldn't find a solution. I sshed into the shell, and started troubleshooting with the top and ps commands.

As shown below, the netflow (flowd_aggregate.py) or maltrail process's CPU was high along with /usr/local/bin/php-cgi process, so I stopped both services as other users tried.

▶ Related article:
How to stop Maltrail service on OPNsense


root@firewall:~ # top -aSH

last pid: 33259;  load averages:  5.23,  5.69,  5.22            up 0+18:19:08  19:14:50
164 threads:   12 running, 137 sleeping, 15 waiting
CPU: 80.5% user,  0.0% nice, 13.1% system,  0.6% interrupt,  5.8% idle
Mem: 700M Active, 5678M Inact, 2272K Laundry, 708M Wired, 270M Buf, 990M Free
Swap: 8192M Total, 8192M Free

  PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
  298 root         89    0    33M    22M CPU3     3 213:19  56.56% /usr/local/bin/python3 /usr/local/opnsense/scripts/netflow/flowd_aggregate.p
54030 root         76    0    43M    24M RUN      2   0:02  39.78% /usr/local/bin/php-cgi
95230 root         76    0    43M    24M RUN      0   0:02  37.76% /usr/local/bin/php-cgi
19078 root         76    0    43M    24M CPU0     0   0:02  32.14% /usr/local/bin/php-cgi
84701 root         52    0    43M    24M RUN      1   0:02  31.10% /usr/local/bin/php-cgi
39280 root         30    0  1080M  1054M select   2 106:23  14.45% python3 /usr/local/share/maltrail/sensor.py (python3.7){python3.7}

87302 root         31    0  1080M  1052M select   2 106:08  14.39% python3 /usr/local/share/maltrail/sensor.py (python3.7){python3.7}
55418 root         30    0  1080M  1054M select   3 106:06  14.37% python3 /usr/local/share/maltrail/sensor.py (python3.7){python3.7}
71585 root         28    0    23M    10M kqread   3  50:29  14.16% /usr/local/sbin/lighttpd -f /var/etc/lighty-webConfigurator.conf
32191 root         24    0  1507M  1396M bpf      3  46:02   9.19% python3 /usr/local/share/maltrail/sensor.py (python3.7){python3.7}
32191 root         25    0  1507M  1396M bpf      0  74:48   9.06% python3 /usr/local/share/maltrail/sensor.py (python3.7){python3.7}
   11 root        155 ki31      0    64K RUN      2 653:39   7.31% [idle{idle: cpu2}]
   11 root        155 ki31      0    64K RUN      3 653:28   7.20% [idle{idle: cpu3}]
32191 root         23    0  1507M  1396M bpf      3  42:39   5.44% python3 /usr/local/share/maltrail/sensor.py (python3.7){python3.7}
   11 root        155 ki31      0    64K RUN      1 663:02   5.24% [idle{idle: cpu1}]


 root@firewall:~ # ps auxww
USER     PID %CPU %MEM     VSZ     RSS TT  STAT STARTED       TIME COMMAND
root      11 72.3  0.0       0      64  -  RNL  00:55   2633:29.12 [idle]
root   32191 24.7 17.2 1542872 1429180  -  S    00:57    172:19.93 python3 /usr/local/share/maltrail/sensor.py (python3.7)
root     298 20.1  0.2   32128   20656  -  Rs   00:57    212:30.30 /usr/local/bin/python3 /usr/local/opnsense/scripts/netflow/flowd_aggregate.py (python3.7)

root   39280 17.3 13.0 1106020 1078888  -  R    00:58    105:58.22 python3 /usr/local/share/maltrail/sensor.py (python3.7)
root   87302 16.2 13.0 1105976 1077144  -  R    00:58    105:43.55 python3 /usr/local/share/maltrail/sensor.py (python3.7)
root   55418 15.7 13.0 1106012 1078848  -  R    00:58    105:41.62 python3 /usr/local/share/maltrail/sensor.py (python3.7)
root   54193 14.9  0.3   43640   24700  -  S    19:11      0:02.09 /usr/local/bin/php-cgi
root   71585 12.7  0.1   22420    9032  -  S    01:03     50:07.72 /usr/local/sbin/lighttpd -f /var/etc/lighty-webConfigurator.conf
root    1130 12.1  0.3   43644   24652  -  S    19:11      0:01.63 /usr/local/bin/php-cgi
root   77060 11.7  0.3   43596   24668  -  S    19:11      0:01.44 /usr/local/bin/php-cgi
root   69006 10.8  0.3   44440   26376  -  S    19:11      0:01.57 /usr/local/bin/php-cgi


The CPU usage decreased a bit, but it was still over 80-90%.

After searching and trying a few things, I finally found a solution.

The solution was disabling IPv6.

OPNsense disable IPv6

After disabling it as suggested on the site above, the CPU utilization dropped to less than 10%.



Afterward, I reran the Netflow and Maltrail services, and the CPU usage is still under 15%. Thanks to Werner Fischer.


1 comment:

Diego said...

So in my setup IPv6 has been disabled all along ... and I'm still getting high CPU usage from flowd_aggregate.py ... any other ideas?