Friday, January 18, 2013

How to find and kill zombie processes on Linux

SkyHi @ Friday, January 18, 2013

A process is called a zombie process if the process has been completed, but its PID and process entry remains in the Linux process table. A process is removed from the process table when the process is completed, and its parent process reads the completed process’ exit status by using the wait() system call. If a parent process fails to call wait() for whatever reason, its child process will be left in the process table, becoming a zombie.
To find zombie processes on Linux:
$ ps axo stat,ppid,pid,comm | grep -w defunct
Z     250 10242 fake-prog 
The above command searches for processes with zombie (defunct) state, and displays them in (state, PPID, PID, command-name) format. The sample output shows that there is one zombie process associated with “fake-prog”, and it was spawned by a parent process with PID 250.
Killing zombie processes is not obvious since zombie processes are already dead. You can try two options to kill a zombie process on Linux as follows.
First, you can try sending SIGCHLD signal to the zombie’s parent process using the kill command. Note that the above command gives you PPID (PID of parent process) of each zombie. In our example, PPID of the zombie is 250.
$ sudo kill -s SIGCHLD 250
If a zombie process still does not go away, you can kill the parent process (e.g., 250) of the zombie.
$ sudo kill -9 250
Once its parent process gets killed, the zombie will be adopted by the init process, which is a parent of all processes in Linux. The init process periodically calls wait() to reap any zombie process.

Zombie process is an inactive computer process, according to wikipedia article, "...On Unix operating systems, a zombie process or defunct process is a process that has completed execution but still has an entry in the process table, allowing the process that started it to read its exit status. In the term's colorful metaphor, the child process has died but has not yet been reaped..."

So how do I find out zombie process?

Use top or ps command:
# top
# ps aux | awk '{ print $8 " " $2 }' | grep -w Z
Z 4104
Z 5320
Z 2945

How do I kill zombie process?

You cannot kill zombies, as they are already dead. But if you have too many zombies then kill parent process or restart service.
You can kill zombie process using PID obtained from any one of the above command. For example kill zombie proces having PID 4104:
# kill -9 4104
Please note that kill -9 does not guarantee to kill a zombie process (see below for more info).

How do I automate zombie process killing?

Write a script and schedule as a cron job.

Zombies don’t just appear in scary movies anymore, sometimes they also appear on your Linux systems; but don’t fret they are mostly harmless.

What is a Zombie Process?

Before we get started I wanted to first cover what exactly a Zombie process is.
Linux and Unix both have the ability for a process to create a sub process otherwise known as a “Child Process”. Once a process creates a new sub process the first process then becomes a “Parent Process” as it has spawned a child process during its execution.
A Zombie or defunct process is a process that has finished its execution and is waiting for its Parent Process to read its exit status. Because the child process has finished, it is technically a “dead” process however since it is waiting for its parent there is still an entry in the process table. The zombie’s parent process does not necessarily need to be running for a zombie to appear, however it is most common to see a zombie process whose parent has died unexpectedly.

How to spot a Zombie Process

Zombie processes can be found easily with the ps command. Within the ps output there is a STAT column which will show the processes current status, a zombie process will have Z as the status. In addition to the STAT column zombies commonly have the words in the CMD column as well.
$ ps -elf | grep Z
1 Z madflojo 28827 28821 0 80 0 - 0 exit 12:28 pts/4 00:00:00 [zombies.aahhh]
1 Z madflojo 28828 28821 0 80 0 - 0 exit 12:28 pts/4 00:00:00 [zombies.aahhh]
1 Z madflojo 28829 28821 0 80 0 - 0 exit 12:28 pts/4 00:00:00 [zombies.aahhh]
1 Z madflojo 28830 28821 0 80 0 - 0 exit 12:28 pts/4 00:00:00 [zombies.aahhh]
1 Z madflojo 28831 28821 0 80 0 - 0 exit 12:28 pts/4 00:00:00 [zombies.aahhh]

What is the difference between a Zombie and Orphaned Process?

Orphaned processes are very similar to Zombie processes; however there is one major difference. An Orphaned process is a child process that is still an active process whose parent has died. Unlike zombies the orphaned process will be reclaimed or adopted by the init process.

How to spot an Orphaned Process

Orphaned processes can be found easily with the ps command as well. Within the ps output there is a PPID column which will show the processes parent process id; a orphaned process will have the PPID of 1 which is the init process.
You may be thinking to yourself, how do I differentiate an Orphaned process from a Daemon process? Well in short, there is no difference. For all intents and purposes a daemon process is a orphaned process, however the exiting of the parent process is on purpose rather than by error.
$ ps -elf | grep sshd
4 S root 718 1 0 80 0 - 12487 poll_s Jun07 ? 00:00:00 /usr/sbin/sshd -D

What to do about Zombie Processes?

Before performing any activity to clean up zombie processes it is best to identify the root cause of the issue. Zombie processes do not indicate a normal state for your system, they may be benign for now however like real zombies they become more troublesome when they are in large numbers. They also indicate either a system issue or an application issue depending on the source of the processes.
The steps necessary to clean up zombie processes is complicated and very situational, below are a couple of high level answers that can guide you to a solution.

If the parent process is still active

If the parent process of the zombie or zombies is still active (not process id 1) than this is an indication that the parent process is stalled on a certain task and has not yet read the exit status of the child processes. At this point the resolution is extremely situational, you can use the strace command to attach to the parent process and troubleshoot from there.
You may also be able to make the parent process exit cleanly taking its zombie children with it by issuing the kill command. If you do run the kill command I suggest that you run a kill with the default signal -15 (SIGTERM) rather than using a -9 (SIGKILL); as SIGTERM will tell the parent process to exit cleanly which is more likely to read the exit status of the zombie children.

If the parent process is no longer active

If  the parent process is no longer active than the clean up activity becomes a choice; at this point you can leave the zombie processes on your system, or you can simply reboot. A Zombie process whose parent is no longer active is not going to be cleaned up without rebooting the system. If the zombie processes are only in small numbers and not reoccurring or multiplying than it may be best to leave these processes be until the next reboot. If however they are multiplying or in a large number than this is an indication that there is a significant issue with your system.


Wednesday, January 16, 2013

Linux Iptables ip_conntrack: table full, dropping packet error and solution

SkyHi @ Wednesday, January 16, 2013

Some readers may be interested to know what ip_conntrack is in the first place, and why it fills up. If you run an iptables firewall, and have rules that act upon the state of a packet, then the kernel uses ip_conntrack to keep track of what state what connections are in so that the firewall rule logic can be applied against them. If you have a system that's getting a lot of network activity (high rates of connections, lots of concurrent connections, etc) then the table will accumulate entries.
The entries remain until an RST packet is sent from the original IP address. If you have a flaky network somewhere between you, and the clients accessing your server, it can cause the RST packets to be dropped due to the packet loss, and leave orphaned entries in your ip_conntrack table. This can also happen if you have a malfunctioning switch or NIC card... not necessarily a routing problem out on the internet somewhere.
Typically when I've seen this trouble crop up is when a server is the target of a DDoS attack. Filling up the ip_conntrack table is a relatively easy way to knock a server off line, and attackers know this.
As Major suggested, you can get short term relief by increasing the size of the table. However, these entries are held in memory by the kernel. The bigger you make the table, the more memory it will consume. That memory could be used by your server to serve requests if you really don't need the stateful firewall capability. Don't waste resources on this feature if you really don't need it.
Another option to consider is turning OFF iptables rules that use ip_conntrack so the state able is not used at all. Anything with "-m state" or "-t nat" can be turned off. If you want to just flush all your iptables rules you can do an "iptables -P" to set a default allow policy and "iptables -F" to flush all the rules. On an RHEL or CentOS system you can just do "service iptables stop".
Once iptables is no longer using ip_conntrack, you can reclaim the memory the table was using by unloading the related kernel modules.
rmmod ipt_MASQUERADE
rmmod iptable_nat
rmmod ipt_state
rmmod ip_conntrack
Then you will have an empty ip_conntrack that will stay empty. I mention this because a lot of sysadmins have hordes of iptables rules installed as a matter of course, and don't recognize the downside of having them present. You can still use iptables, but to avoid the use of ip_conntrack simply don't use rules based on stateful logic.

One other aspect to consider when raising your max conntrack setting is the depth of the memory objects used to track these connections, henceforth referred to as "buckets".
On RedHat the default hashsize for the conntrack module is 8192. The rule of thumb is to allow for no more than 8 connections per bucket so you would set your conntrack size to be equal to 8 * hashsize. This is why RedHat defaults the ip_conntrack_max to 65536.
You can tweak these settings by adjusting not just the ip_conntrack_max setting but the hashsize option to the ip_conntrack module.
So, for example, if you were to set your ip_conntrack_max to 131072 without modifying the default hashsize of 8k, you are allowing for a bucket depth of 16 entries. Thus the kernel has to dig deeper, potentially, to find that one connection object in it's bucket.
There are a number of schools of thought on how best to address this but in practice I have found that, given the resources, a shallower bucket is better.
For a server that does extremely heavy network traffic, and of course has the memory to spare, you would want to keep the average bucket depth to 2 or 4.
Hashsize, to my knowledge, isn't a dynamic setting so you will need to load the ip_conntrack module with the option:
hashsize =
So in Major's example above, if you want to double your server's capacity for tracked connections while not doubling the lookups you would reload the module with:
options ip_conntrack hashsize=16384
This keeps the items per bucket to 8. I have seen machines with a depth of beyond 8 get completely cowed under heavy network load and since memory is relatively plentiful nowadays you can increase the efficiency of the lookups by making this 4 connections per bucket or even 2 by just doing simple math and reloading the module with the right options.
Hope that helps.
Here is some relatively dated yet still applicable information on the subject:

My Red hat Enterprise Linux 5 server reporting the following message in /var/log/messages (syslog):
ip_conntrack: table full, dropping packet.
How do I fix this error?
A. If you notice the above message in syslog, it looks like the conntrack database doesn't have enough entries for your environment. Connection tracking by default handles up to a certain number of simultaneous connections. This number is dependent on you system's maximum memory size.
You can easily increase the number of maximal tracked connections, but be aware that each tracked connection eats about 350 bytes of non-swappable kernel memory!
To print current limit type:
# sysctl net.ipv4.netfilter.ip_conntrack_max
To increase this limit to e.g. 12000, type:
# sysctl -w net.ipv4.netfilter.ip_conntrack_max=12000
Alternatively, add the following line to /etc/sysctl.conf file:
The following will tell you how many sessions are open right now:
# wc -l /proc/net/ip_conntrack
5000 /proc/net/ip_conntrack

Regarding open connections. the best way to keep track of them is by checking /proc/net/ip_conntrack.
cat /proc/net/ip_conntrack | egrep 'dport=(80|443)'| wc -l