Deadlock in SIGHUP signal handler
I noticed that sometimes during reloading of the configuration, icecast would stop responding. This seems to be caused by mutex code in the signal handlers. It is possible to end up in a state where the mutex has been acquired by
global_lock() (i.e. due to an incoming connection) but not yet released before the signal handler is called. Then,
global_lock() is called again in the signal handler, but since it is already locked by the same thread, that call will block forever and the server stops handling new incoming requests.
- icecast2=2.4.0-1.1+deb8u1 (the current version in Debian Jessie)
- icecast2=2.4.2-1~bpo8+1 icecast2 binary only I have encountered and reproduced this problem on icecast2=2.4.0-1.1+deb8u1. In trying to isolate the problem, I accidentally used the sources for icecast2=2.4.2-1~bpo8+1 and only copied the icecast2 binary over the already installed binary version, but that does not really change the outcome.
Steps to reproduce:
- Simulate a lot of concurrent connection attempts
- While doing this,
kill -HUP icecast2
- Server handles those requests, reloads the config a couple of times
- Server stops handling requests suddenly
# ab -t 20 -c 50 http://localhost:8000/ & for i in `seq 1 20`; do sleep 1; kill -HUP $(pidof icecast2); done 2409 This is ApacheBench, Version 2.3 <$Revision: 1604373 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking localhost (be patient) Completed 5000 requests Completed 10000 requests Completed 15000 requests apr_socket_recv: Connection reset by peer (104) Total of 15175 requests completed + Exit 104 ab -t 20 -c 50 http://localhost:8000/ #
I added some extra logging to
global_lock(), and an
usleep(100000) to the latter after acquiring the mutex. In this setup, a simple call
curl http://localhost:8000/ & kill -HUP $(pidof icecast2) is enough to trigger this problem every time, and it clearly shows the mutex being acquired due to the incoming connection, and an attempt to lock it again in the signal handler before it being released. The daemon then stops responding.
The underlying cause seems to be the attempt to acquire the global lock in the signal handling code. The same problem may occur in other scenarios as well. It looks to me that only
_sig_hup() can trigger this bug.
I'm not too much into the icecast code so I'm not really aware of any best/common practices yet, but I think an approach where the signal sets a global
global.reload_config = true which is then handled in the main event loop (outside the signal handler) would do the trick, much the same way as