Palo Alto firewall - Troubleshooting High MP CPU

How to Interpret: show system resources

The command show system resources gives a snapshot of Management Plane (MP) resource utilization including memory and CPU. This is similar to the ‘top’ command in Linux.

show system resources provides information about the memory used and available and if the MP is using swap. If the swap usage remains consistently high, it implies that processes are either failing to release memory or they justifiably require more memory to operate.

To verify MP is swapping for a extended period of time, refer to process kswapd highlighted in the sample output below. If this process is running for hours - this could indicate that you have consistent swapping going on and may need to take steps to reduce your memory footprint.

The values of interest are the Swap, Mem, CPU wait time (%wa), Virtual memory usage (VIRT) and CPU usage (%CPU). These values are helpful when determining high MP CPU and/or slow MP response. If the CPU wait time is high, it indicates the MP is waiting for a process to release the CPU.

To view real-time memory and CPU usage, run the command: show system resources follow

Most of the Palo Alto Platforms have multiple core CPUs. Some platforms have dedicated processors for MP and DP, while some use Single Processor for both MP and DP.

For example,

Platform	MP Processors	MP Cores
PA-200	Single Shared Dual Core Processor	1
PA-220	Single Shared Quad Core Processor	2
PA-800	Single Shared 8 Cores Processor	3
PA-3000	Dedicated Dual Core Processor	2
PA-3200	Dedicated Quad Core Processor	4
PA-5000	Dedicated Quad Core Processor	4
PA-5200	Dedicated Multi Core Processor	16-24

This processor can be multiple core processor (some cores can also be used for DP depending on platform), the output lists usage for all cores combined.

You can press "1" after running "show system resources follow" to toggle view to show separate states:

Sample output of the command is provided below:

show system resources
admin@PA-850 (active)> show system resources

top - 15:15:55 up 15 days, 6:53, 1 user, load average: 7.50, 6.82, 6.57

Tasks: 177 total, 8 running, 168 sleeping, 0 stopped, 1 zombie

%Cpu(s): 71.1 us, 2.7 sy, 0.3 ni, 25.7 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st

KiB Mem : 4119516 total, 515792 free, 1325328 used, 2278396 buff/cache

KiB Swap: 6104084 total, 4836884 free, 1267200 used. 1112294 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

4532 20 0 99752 4036 3308 R 94.4 0.1 22001:32 pan_task

4533 20 0 74712 3808 3252 R 94.4 0.1 22001:56 pan_task

4537 20 0 74732 3872 3272 R 94.4 0.1 22001:01 pan_task

4538 20 0 74712 3788 3188 R 94.4 0.1 22002:18 pan_task

4534 20 0 74712 3724 3132 R 88.9 0.1 22002:01 pan_task

22131 20 0 4012 1620 1244 R 66.7 0.0 0:37.80 genindex.sh

30475 20 0 16912 7020 1904 R 5.6 0.2 0:00.04 top

1 20 0 2320 252 252 S 0.0 0.0 0:10.94 init

2 20 0 0 0 0 S 0.0 0.0 0:03.84 kthreadd

3 20 0 0 0 0 S 0.0 0.0 1:13.88 ksoftirqd/0

5 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:+

...snipped...

46 0 -20 0 0 0 S 0.0 0.0 0:00.00 khelper

47 0 -20 0 0 0 S 0.0 0.0 0:00.00 netns

235 0 -20 0 0 0 S 0.0 0.0 0:00.00 writeback

238 0 -20 0 0 0 S 0.0 0.0 0:00.00 bioset

239 0 -20 0 0 0 S 0.0 0.0 0:00.00 kblockd

245 0 -20 0 0 0 S 0.0 0.0 0:00.00 ata_sff

255 20 0 0 0 0 S 0.0 0.0 0:00.00 khubd

262 0 -20 0 0 0 S 0.0 0.0 0:00.00 edac-poller

280 0 -20 0 0 0 S 0.0 0.0 0:00.00 rpciod

327 20 0 0 0 0 S 0.0 0.0 40:24.01 kswapd0

328 20 0 0 0 0 S 0.0 0.0 0:00.00 fsnotify_m+

329 1 -19 0 0 0 S 0.0 0.0 0:00.00 nfsiod

335 0 -20 0 0 0 S 0.0 0.0 0:00.00 kthrotld

904 20 0 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_0

907 20 0 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_1

921 20 0 0 0 0 S 0.0 0.0 0:00.00 spi32766

968 20 0 0 0 0 S 0.0 0.0 0:43.90 kworker/5:1

...snipped...

5047 nobody 20 0 303308 812 812 S 0.0 0.0 2:55.11 nginx

...snipped...

30466 admin 20 0 2728 704 580 S 0.0 0.0 0:00.01 less

30471 20 0 17248 7100 2204 S 0.0 0.2 0:00.02 sh

30476 20 0 16724 6204 1560 S 0.0 0.2 0:00.02 sed

You can press "1" after running "show system resources follow" to toggle view to show separate states:

Example from a PA-850 firewall:

show system resources
admin@PA-850 (active)> show system resources
top - 14:21:05 up 11 days, 5:58, 1 user, load average: 6.45, 6.44, 6.45
Tasks: 173 total, 8 running, 165 sleeping, 0 stopped, 0 zombie
%Cpu0 : 35.0 us, 7.3 sy, 0.0 ni, 57.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu4 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu6 : 20.5 us, 3.0 sy, 0.0 ni, 76.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu7 : 28.1 us, 7.3 sy, 0.0 ni, 64.4 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 4119516 total, 456576 free, 1381004 used, 2281936 buff/cache
KiB Swap: 6104084 total, 5046640 free, 1057444 used. 1434722 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4532 root 20 0 99752 5796 4784 R 100.0 0.1 16189:00 pan_task
4533 root 20 0 74712 5540 4920 R 100.0 0.1 16189:29 pan_task
4534 root 20 0 74712 5460 4860 R 100.0 0.1 16189:29 pan_task
4537 root 20 0 74732 5360 4756 R 100.0 0.1 16188:40 pan_task
4538 root 20 0 74712 5564 4932 R 100.0 0.1 16189:40 pan_task
21848 root 20 0 4012 1664 1256 R 96.0 0.0 5:27.98 genindex.sh
3330 root 20 0 48976 4520 2560 S 1.0 0.1 69:47.02 ehmon
4488 root 20 0 639500 139056 117412 S 0.3 3.4 48:58.79 useridd
4490 root 20 0 364060 5096 3720 S 0.3 0.1 98:37.45 distributord
4523 root 20 0 58676 2892 2576 S 0.3 0.1 21:11.89 tund
4761 root 20 0 131032 14696 3064 S 0.3 0.4 75:23.93 identityclient
4783 nobody 20 0 288456 8620 3936 S 0.3 0.2 4:48.71 appweb3
4785 root 20 0 1656580 195980 7320 S 0.3 4.8 32:45.77 logrcvr
25172 nobody 20 0 2186392 159940 7900 S 0.3 3.9 2:41.97 httpd
1 root 20 0 2320 508 504 S 0.0 0.0 0:08.10 init

To Toggle IRIX Mode off press Shift+I:

- You can see above this box has 8 CPUs, 5 of which (1-5) are for DP. The DP process is pan_task which is supposed to be running at all times hence it has no idle CPU cycles.
- In above you will notice individually each pan_task (process for DP) are showing 100% CPU. This is design behavior of TOP Command in IRIX Mode where It is possible for the % CPU column to display values that total greater than 100%.
- Solaris mode divides the % CPU for each process by the number of CPUs in the system so that the total will be 100%.

top - 15:37:56 up 15 days, 7:15, 1 user, load average: 6.59, 6.65, 6.66
Tasks: 176 total, 7 running, 167 sleeping, 0 stopped, 2 zombie
%Cpu0 : 28.0 us, 6.4 sy, 0.5 ni, 64.2 id, 0.0 wa, 0.0 hi, 0.9 si, 0.0 st
%Cpu1 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu4 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu6 : 17.9 us, 6.0 sy, 0.0 ni, 75.7 id, 0.5 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu7 : 43.6 us, 7.3 sy, 0.0 ni, 48.6 id, 0.0 wa, 0.0 hi, 0.5 si, 0.0 st
KiB Mem : 4119516 total, 292220 free, 1224748 used, 2602548 buff/cache
KiB Swap: 6104084 total, 4752288 free, 1351796 used. 848512 avail Mem
Iris mode On
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4532 root 20 0 99752 3508 2832 R 100.0 0.1 22023:31 pan_task
4534 root 20 0 74712 3460 2868 R 100.0 0.1 22024:01 pan_task
4537 root 20 0 74732 3560 2956 R 100.0 0.1 22023:01 pan_task
4538 root 20 0 74712 3372 2776 R 100.0 0.1 22024:18 pan_task
4533 root 20 0 74712 3320 2776 R 99.5 0.1 22023:55 pan_task
23368 root 20 0 4012 1620 1212 R 95.4 0.0 7:05.25 genindex.sh
3104 root 0 -20 206236 19628 2372 S 5.5 0.5 98:36.40 masterd_apps
4488 root 20 0 639500 131072 114584 S 4.6 3.2 68:00.68 useridd
3124 root 15 -5 133252 8904 1240 S 1.8 0.2 202:33.41 sysd
3307 root 30 10 202592 7432 1928 S 1.4 0.2 85:45.42 python
41 root 20 0 0 0 0 S 0.5 0.0 17:19.33 rcuc/7
3686 admin 20 0 4208 1616 1128 R 0.5 0.0 0:00.23 top
4761 root 20 0 131032 13484 2124 S 0.5 0.3 103:03.36 identityclient
4804 root 20 0 1204484 3332 1440 S 0.5 0.1 18:27.52 authd
7675 root 20 0 0 0 0 S 0.5 0.0 0:04.91 kworker/0:2

Additionally note that average CPU usage for this box will also consider DP Cores so your monitoring tools will show higher MP Usage.

Example from a PA-5060 device:

show system resources follow

top - 21:23:26 up 4 days, 1:57, 1 user, load average: 0.00, 0.00, 0.00

Tasks: 131 total, 1 running, 130 sleeping, 0 stopped, 0 zombie

Cpu0 : 0.3%us, 0.0%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

Cpu1 : 0.3%us, 0.0%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

Cpu2 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

Cpu3 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

Mem: 4053832k total, 2953796k used, 1100036k free, 163596k buffers

Swap: 2007992k total, 1080k used, 2006912k free, 1298156k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2402 root 0 -20 52960 16m 4440 S 0.3 0.4 5:31.90 masterd_apps
2421 root 15 -5 21680 4408 2508 S 0.3 0.1 18:59.06 sysd
3106 nobody 20 0 156m 47m 10m S 0.3 1.2 21:01.78 appweb3
1 root 20 0 2084 696 576 S 0.0 0.0 0:01.88 init

In this platform all cores are dedicated for MP, so you do not see any pan_task. Rest of the logic remains same.

MP-Log

To check the management plane's resource usage for historic logs, use the following command (this information is logged every 15 minutes).

less mp-log mp-monitor.log

admin@firewall1(active)> less mp-log mp-monitor.log

2022-03-16 06:34:33.901 +0000 --- processes

Total num processes: 39

Name PID CPU% FDs Open Virt Mem Res+Swap State

pppoe 6655 0 7 446568 27148 S

ha_agent 6648 0 11 239500 29040 S

satd 6649 0 12 562168 39420 S

l2ctrl 6646 0 8 519204 32788 S

dp-console 4234 0 5 21992 4596 S

varrcvr 6645 0 34 3011300 145616 S

Genindex.sh causing high MP CPU

show system resources follow
top - 14:17:29 up 11 days, 5:55, 1 user, load average: 6.84, 6.46, 6.45
Tasks: 176 total, 8 running, 168 sleeping, 0 stopped, 0 zombie
%Cpu(s): 73.0 us, 2.6 sy, 0.1 ni, 24.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 4119516 total, 319332 free, 1415160 used, 2385024 buff/cache
KiB Swap: 6104084 total, 5050440 free, 1053644 used. 1295050 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4532 root 20 0 99752 5800 4788 R 100.0 0.1 16185:23 pan_task
4533 root 20 0 74712 5544 4924 R 100.0 0.1 16185:53 pan_task
4534 root 20 0 74712 5460 4860 R 100.0 0.1 16185:52 pan_task
4538 root 20 0 74712 5568 4936 R 100.0 0.1 16186:04 pan_task
4537 root 20 0 74732 5364 4760 R 99.7 0.1 16185:04 pan_task
21848 root 20 0 4012 1664 1256 R 97.0 0.0 2:03.25 genindex.sh
3124 root 15 -5 133232 12636 2568 S 1.0 0.3 140:10.34 sysd
3104 root 0 -20 206236 22044 4280 S 0.7 0.5 70:13.62 masterd_apps
3307 root 30 10 202592 10124 4512 S 0.7 0.2 62:48.48 python
4785 root 20 0 1656580 195988 7104 S 0.3 4.8 32:45.55 logrcvr

Symptom
Genindex.sh process utilizing a large amount of CPU.

Environment
All firewall platforms can be affected by this post PANOS - +8.1

Cause
Genindex.sh runs after a PAN-OS upgrade to re-index the logs with any new fields that have been added. It runs every 15 minutes and is known to cause spikes in the CPU on the management plane. After it completes indexing all the logs, it will then stop running.

This should not cause any issues or be detrimental to the firewall and will only last until it has indexed all of your logs. However, depending on the amount of logs you have based off your platform and what you have log retention set to, it can take up to weeks.

One way to reduce the amount of time genindex.sh has to run is to make sure you are making use of log retention as the last logs there are to index, the less time it will take for the process to finish.

Resolution
There is no resolution, this is working as expected. Once it has finished indexing all the logs it will stop running and you should see your CPU utilization drop. Also it's worth keeping in mind that this process only uses one core so your devices with multiple cores will not be affected, however it will show up in the logs and GUI that the management CPU is high.

On many occasions, this could be resolved by reducing the amount of logging.

Varrcvr process is causing high MP CPU

▶ Symptom

The MP CPU is consistently hitting 100% on this firewall. A major change was migration to Panorama. There were no reports, and we do not generate lots of logs.

When you executed the command "show system resources follow" multiple times, and found that the varrcvr process was always showing as 66 plus in the output.

show system resources follow

admin@pa-220> show system resources follow

top - 17:15:18 up 63 days, 18:28, 1 user, load average: 3.77, 3.93, 4.10

Tasks: 139 total, 6 running, 133 sleeping, 0 stopped, 0 zombie

%Cpu0 : 90.5 us, 8.5 sy, 0.0 ni, 1.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st

%Cpu1 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st

%Cpu2 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st

%Cpu3 : 38.4 us, 14.1 sy, 47.2 ni, 0.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st

KiB Mem : 4119652 total, 254280 free, 2471820 used, 1393552 buff/cache

KiB Swap: 1972 total, 1972 free, 0 used. 1165668 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

3219 root 20 0 71944 32292 4972 R 99.7 0.8 91773:26 pan_task

3220 root 20 0 46976 7160 4724 R 99.7 0.2 91776:16 pan_task

4478 root 20 0 1549004 25420 8980 S 66.6 0.6 28695:37 varrcvr

27267 root 20 0 3876 1536 1260 R 8.2 0.0 0:05.09 genindex.sh

27252 root 20 0 17312 7524 2280 S 4.9 0.2 0:00.76 logpurger.sh

2915 root 0 -20 199232 32176 4548 S 2.0 0.8 432:48.61 masterd_apps

27212 admin 20 0 4184 1576 1128 R 0.7 0.0 0:00.28 top

28316 root 20 0 15444 3092 1152 R 0.7 0.1 0:00.02 cat

10 root 20 0 0 0 0 R 0.3 0.0 57:09.10 rcuc/0

16 root 20 0 0 0 0 S 0.3 0.0 24:36.92 rcuc/2

1504 nobody 20 0 2316468 161600 11072 S 0.3 3.9 6:58.72 httpd

...snipped...

▶ Resolution
Restart the varrcvr (vardata-receiver) process with the following CLI command.

debug software restart process vardata-receiver

admin@pa-220> debug software restart process vardata-receiver

Process varrcvr was restarted by user admin

※ vardata-receiver is Vardata Receiver server process

Varrcvr: Recording URL filtering log and packet capture sent by dataplane. Involved with WildFire logs.

PA-500 High Management CPU and poor performance with high logging

▶ Symptom
PA-500 runs very slow and it was possible to notice that Management CPU is between 98-100% almost all the time.

MP CPU Spikes up very frequently and "logrcvr", "mgmtsrvr" and "genindex.sh" appears to be the offending processes:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3265 20 0 1095m 282m 4780 S 35.4 14.2 93995:05 logrcvr
14961 20 0 5260 1560 1232 R 16.9 0.1 0:51.34 genindex.sh

3265 20 0 1095m 205m 3264 S 76.7 10.4 94000:37 logrcvr
3217 20 0 1198m 315m 7280 S 14.7 15.9 38133:24 mgmtsrvr

High IOWait process was seen as well, however disk space utilization was fine:

PU USAGE DISTRIBUTION Avg Max Max Date
---------------------------------------------
User 54% 54.3% 01-08-06:15
System 13% 13.9% 01-05-12:11
Nice 2% 2.6% 01-09-02:36
Idle 25% 26.6% 01-04-11:29
IOWait 1% 1.1% 01-04-11:29

Due to poor management plane performance commits and other tasks may be affected and get stuck, and using commit force should help in order to go through.

▶ Environment
Palo Alto PA-500 Firewall.
PAN-OS 8.0 and 8.1,
High rate of logging configured.

▶ Cause
When average IOWait is greater than 1%, it is likely causing processes to run slower than normal and it is important to reduce the amount of logging.

▶ Resolution
1. Set the Max days for Logs, In this case the max days was configured to value '90' using this article, which automatically purged all the logs older than 90 days.
2. In this case device was also upgraded to latest stable release. These both actions improved the performance remarkably.

* Related posts:

Palo Alto firewall - Troubleshooting High DP CPU

* Reference URLs:

How to Interpret: show system resources

How to the interpret output of "show system resources" for multicore CPU

Genindex.sh causing high MP CPU

Live Community comments with Genindex.sh
PA-500 High Management CPU and poor performance with high logging

Steps to Reduce MP CPU

Tips & Tricks: Reducing Management Plane Load
Tips & Tricks: Reducing Management Plane Load—Part 2
How to Open a Case for High Management Plane CPU

Commonly Used Processes/Daemons