Few weeks ago i was looking into problem that one of our customers encountered where a Android application incorrectly used an Android Timer to run a heartbeat check, i.e. to check if the server side of the application is accessible and alive.
In essence, what happened is that application started generating the large number of the requests to the heartbeat resource after a very specific network outage.
The issue surfaced only under very specific set of the circumstances:
a network request is just hangs. I've managed to replicate the issue fiddling with the router and blocking the network traffic in the way that initial handshake succeeds.
This is usually a very specific scenario when lots of troubles with the application design are showing up on the surface. It could be replicated by blocking the traffic through the firewall. In this case client sends the request, does not time out on the request but never receives the response, hence the socket is kept open but no data is received fro, the server.
The problem that we eventually uncovered was that the Timer job was started with with Timer.scheduleAtFixedRate method.
I only assume that person who originally used the method misunderstood its meaning: i believe that idea was that method guarantees that method is fired with the specified interval. However, the problem is that this method is trying to fire up all the calls that were missed in case if one of the calls took too long, i.e. if it took N minutes to execute one call of the timer's method (potentially because the socket thread was hanging and no timeout occurred) and original interval was one minute as soon as timer is back to normal, i.e. hanging thread is back, it will attempt to fire up all the missing calls to the thread method at once, literally providing a great tool to create a DDoS attack against the server side.
So, remember to use in such circumstances the Timer.schedule method instead.
No comments:
Post a Comment