Deadlock in SIGHUP signal handler
I noticed that sometimes during reloading of the configuration, icecast would stop responding. This seems to be caused by mutex code in the signal handlers. It is possible to end up in a state where the mutex has been acquired by global_lock()
(i.e. due to an incoming connection) but not yet released before the signal handler is called. Then, global_lock()
is called again in the signal handler, but since it is already locked by the same thread, that call will block forever and the server stops handling new incoming requests.
Affected versions:
- icecast2=2.4.0-1.1+deb8u1 (the current version in Debian Jessie)
- icecast2=2.4.2-1
bpo8+1 icecast2 binary only I have encountered and reproduced this problem on icecast2=2.4.0-1.1+deb8u1. In trying to isolate the problem, I accidentally used the sources for icecast2=2.4.2-1bpo8+1 and only copied the icecast2 binary over the already installed binary version, but that does not really change the outcome.
Steps to reproduce:
- Simulate a lot of concurrent connection attempts
- While doing this,
kill -HUP icecast2
Expected result:
- Server handles those requests, reloads the config a couple of times
Actual result:
- Server stops handling requests suddenly
# ab -t 20 -c 50 http://localhost:8000/ & for i in `seq 1 20`; do sleep 1; kill -HUP $(pidof icecast2); done[1] 2409
This is ApacheBench, Version 2.3 <$Revision: 1604373 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 5000 requests
Completed 10000 requests
Completed 15000 requests
apr_socket_recv: Connection reset by peer (104)
Total of 15175 requests completed
[1]+ Exit 104 ab -t 20 -c 50 http://localhost:8000/
#
Probable cause:
I added some extra logging to _sig_hup()
and global_lock()
, and an usleep(100000)
to the latter after acquiring the mutex. In this setup, a simple call curl http://localhost:8000/ & kill -HUP $(pidof icecast2)
is enough to trigger this problem every time, and it clearly shows the mutex being acquired due to the incoming connection, and an attempt to lock it again in the signal handler before it being released. The daemon then stops responding.
The underlying cause seems to be the attempt to acquire the global lock in the signal handling code. The same problem may occur in other scenarios as well. It looks to me that only _sig_hup()
can trigger this bug.
Proposed solution:
I'm not too much into the icecast code so I'm not really aware of any best/common practices yet, but I think an approach where the signal sets a global global.reload_config = true
which is then handled in the main event loop (outside the signal handler) would do the trick, much the same way as _sig_die()
works.