Wednesday, September 21, 2016

Apache Http client gets stuck


The apache libraries are extremely popular in the development community. However, in my experience they are frequently misused due to lack of the proper documentation and meaningful examples.

The examples, provided usually only cover the basic use cases and more importantly neither source provide enough information about some aspects of the usage of such library that could have dire consequences.

I came across one of such "hidden secret" couple of weeks ago. Our customer used HttpClient class in Android application to check if the server is alive. The idea behind the code was that it sends a GET request to the non-existent server resource, server responds with the 404 error hence its alive.
I don't want to discuss the design here: i don't agree with such approach but it will work.

In order to do so the following code was used:
private final HttpClient client;
private final HttpGet request;
....
try {
    request.setURI(URI.create(getServerURL() + validEndpoint));
    HttpResponse response = client.execute(request);
} catch (Throwable e) {
...
}
It looks kind of ok for the scenario described above. But when i was investigating unrelated issue with the application, i found that this code gets stuck while executing the request and blocking subsequent attempts to use the client.

Further investigation had shown that input stream that HttpClient opened was waiting for more data from the socket, hence was blocking attempts to destroy the client.

I remember it very well that rule of thumb when using Java streams in general is that you MUST close the streams no matter what. Failure to do so will cause leaks and potentially blocks. It is not often straightforward but has to be done.

Above code was written accidentally or deliberately assuming that web server returns only error code and headers. For example like this:

HTTP/1.1 404 Not Found
Date: Wed, 21 Sep 2016 08:24:07 GMT
Server: Apache/2.4.6 (CentOS) mod_auth_gssapi/1.3.1 mod_nss/2.4.6 NSS/3.19.1 Basic ECC mod_wsgi/3.4 Python/2.7.5
Connection: close
In this case indeed, the HttpClient recognises that there is no body in the response hence it won't create sockets.

However, in most cases these days the web server will respond with something much fancier. Apache by default will throw the following response:

HTTP/1.1 404 Not Found
Date: Wed, 21 Sep 2016 08:24:07 GMT
Server: Apache/2.4.6 (CentOS) mod_auth_gssapi/1.3.1 mod_nss/2.4.6 NSS/3.19.1 Basic ECC mod_wsgi/3.4 Python/2.7.5
Content-Length: 209
Connection: close
Content-Type: text/html; charset=iso-8859-1
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
    <title>404 Not Found</title>
</head><body>
    <h1>Not Found</h1>
    <p>The requested URL /index1.html was not found on this server.</p>
</body></html>
And in this case the InputStream will be created. In order to gracefully destroy the HttpClient code should close the underlying stream. It should look like the following:

try {
  request.setURI(URI.create(getServerURL() + validEndpoint));
  HttpResponse response = client.execute(request);
  HttpEntity entity = response.getEntity();
  if (entity != null) {
    if (entity.isStreaming()) {
      InputStream is = entity.getContent();
      if (is != null) {
        try {
          is.close();
        } catch (Throwable e) {
        }
      }
    }
  }
} catch (IOException e) {...}

No comments:

Post a Comment