AWS API Gateway behind Nginx


If you happen to have a Nginx upstream using AWS API Gateway, and gets this error ‘SSL_do_handshake() failed (SSL: error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure) while SSL handshaking to upstream

Here is the fix – you need to add ‘proxy_ssl_server_name on;‘ in your nginx.conf. The directive is only available since version 1.7.0.

Reference: proxy_ssl_server_name

Syntax: proxy_ssl_server_name on | off;
Default:
proxy_ssl_server_name off;
Context: http, server, location

This directive appeared in version 1.7.0.

Enables or disables passing of the server name through TLS Server Name Indication extension (SNI, RFC 6066) when establishing a connection with the proxied HTTPS server.

Run query via Sumologic API


SumoLogic query can also be run via API. Here is a bash example that I wrote to get the nginx access logs.

By default it searches the logs in the past 10 minutes, but you can overwrite it by adding a parameter.

Selection_360.png

Here is the script. It is also can be found in my github.

#!/bin/bash

# Sumo credential format username:password
SUMOACCESS="username:password"

# Default 10 minutes
TIME=${1:-10}

# Wait interval in seconds
WAITFOR="10"

# Setup time range
FROM_TIME=`date  "+%Y-%m-%dT%R:%S" -d "$TIME min ago"`
TO_TIME=`date  "+%Y-%m-%dT%R:%S"`

# Check proxy
if [[ `export | grep http_proxy`  ]]; then
  echo "Found proxy"
  PROXY="-x ${http_proxy}:80"
fi

# Current time
/bin/date +%D-%R

# Generate search file
cat > search.json << EOF
{
  "query": "_sourceCategory=my-nginx-access | parse "* - - [*] \"*\" * *" as client, timestamp, request, response, size",
  "from": "${FROM_TIME}",
  "to": "${TO_TIME}",
  "timeZone": "Australia/Sydney"
}
EOF

echo "Searching log in the past $TIME minutes... "
job_id=`curl $PROXY -s -b cookies.txt -c cookies.txt -H 'Content-type: application/json' -H 'Accept: application/json' -X POST -T search.json --user $SUMOACCESS "https://api.au.sumologic.com/api/v1/search/jobs" | jq -r .id`

job_status="STARED"
while [ "${job_status}" != "DONE GATHERING RESULTS"  ]
do
  sleep $WAITFOR
  echo search job status is ${job_status}
  job_status=`curl $PROXY -s -b cookies.txt -c cookies.txt -H 'Accept: application/json' --user $SUMOACCESS https://api.au.sumologic.com/api/v1/search/jobs/${job_id}| jq -r .state`
done

echo "Generating search result..."
curl $PROXY -s -b cookies.txt -c cookies.txt -H 'Accept: application/json' --user $SUMOACCESS "https://api.au.sumologic.com/api/v1/search/jobs/${job_id}/messages?offset=0&limit=1000" -o results

Troubleshoot high CPU usage java process


This is a real troubleshooting example that I just did yesterday for a high CPU usage java application. The application uses tomcat and runs on AWS EC2.

  1. Login into the box, and change to root user so you can see all users’ process.
    sudo su -
  2. Install htop if you have not installed it before, the run it.
    yum -y install htop
    htop
  3. Press F6 then choose sort by CPU (or use keyboard shortcut P)

    Selection_336

  4. Press F4 to filter by keyword tomcat

    Selection_337

  5. Press H to turn on/off threads.

    Selection_335

  6. Send thread dump info to log. For tomcat, the log is /var/log/tomcat7/catalina.out. Here I dump the info of thread ID 7328
    kill -3 7328
  7. Get the hex value of 7328.
    echo "obase=16; 7328" | bc
    1CA0
  8. Search 1ca0 in /var/log/tomcat7/catalina.out. Then I get the info of the java thread 7328. By a quick glance, it should have something to do with the AWS S3 bucket operations. Send the info to developers to look into that pieces of codes to find out why it takes so much CPU time.
"http-bio-8080-exec-425" daemon prio=10 tid=0x00007fa8e046e000 nid=0x1ca0 runnable [0x00007fa8c94cf000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:442)
at sun.security.ssl.InputRecord.read(InputRecord.java:480)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:934)
- locked <0x00000000d62bcaf0> (a java.lang.Object)
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:891)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:102)
- locked <0x00000000d62bc6d0> (a sun.security.ssl.AppInputStream)
at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:260)
at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)
at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)
at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)
at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:271)
at com.amazonaws.http.protocol.SdkHttpRequestExecutor.doReceiveResponse(SdkHttpRequestExecutor.java:66)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123)
at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:682)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:486)
at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:464)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:273)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3645)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3597)
at com.amazonaws.services.s3.AmazonS3Client.listObjects(AmazonS3Client.java:621)
at newsapi.provider.aws.S3BlobStorageProvider.deleteByIdPrefix(S3BlobStorageProvider.java:173)
at newsapi.service.image.ImageUpdateServiceImpl.removeFromBlobRepository(ImageUpdateServiceImpl.java:84)
at newsapi.service.image.ImageUpdateServiceImpl.updateImages(ImageUpdateServiceImpl.java:53)
at newsapi.service.ContentIngestionServiceImpl.storeContent(ContentIngestionServiceImpl.java:711)
at newsapi.service.ContentIngestionServiceImpl.storeRelatedContent(ContentIngestionServiceImpl.java:844)
at newsapi.service.ContentIngestionServiceImpl.processContent(ContentIngestionServiceImpl.java:390)
at newsapi.service.ContentIngestionServiceImpl.addContentForDocument(ContentIngestionServiceImpl.java:318)
at newsapi.service.ContentIngestionServiceImpl.processDocument(ContentIngestionServiceImpl.java:174)
at newsapi.controller.DocumentController.processDocument(DocumentController.java:406)
at newsapi.controller.DocumentController.post(DocumentController.java:162)
at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.threewks.thundr.action.method.MethodAction.invoke(MethodAction.java:60)
at com.threewks.thundr.action.method.MethodActionResolver.invokeAction(MethodActionResolver.java:121)
at com.threewks.thundr.action.method.MethodActionResolver.resolve(MethodActionResolver.java:101)
at com.threewks.thundr.action.method.MethodActionResolver.resolve(MethodActionResolver.java:47)
at com.threewks.thundr.route.Routes.resolveAction(Routes.java:117)
at com.threewks.thundr.route.Routes.invoke(Routes.java:97)
at com.threewks.thundr.ThundrServlet.applyRoute(ThundrServlet.java:122)
at com.threewks.thundr.ThundrServlet.service(ThundrServlet.java:161)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at newsapi.util.filter.CacheFilter.doFilter(CacheFilter.java:93)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
at org.apache.catalina.valves.RemoteIpValve.invoke(RemoteIpValve.java:683)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:421)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1074)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:611)
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:314)
- locked <0x00000000d144da00> (a org.apache.tomcat.util.net.SocketWrapper)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:745)

LuaXML_lib.so: wrong ELF class: ELFCLASS32


In a recent project, we use OpenResty (Scalable Web Platform by Extending NGINX with Lua) as API gateway. Nginx hosts the webserver, and Lua script stores the logic.

One of our Lua files needs LuaXml modue to parse xml files. But the module is not installed by default in OpenResty. So we get this error when testing the nginx config – nginx: [error] init_by_lua error: /opt/openresty/nginx//conf/errors.lua:1: module ‘LuaXml’ not found.

So I downloaded the module and copy LuaXML_lib.so and LuaXml.lua to /opt/openresty/lualib. They are the only two needed files. Try again, then I get a different error:  /opt/openresty/lualib/LuaXML_lib.so: wrong ELF class: ELFCLASS32

Google tells me this means the module is for 32 bits OS. Oh, OK. Then I either need to find a module for 64 bits  or compile the source code for 64 bits.

Let’s do the hard way first – compiling LuaXML source code for 64 bits OS. Here is the how to:

wget http://viremo.eludi.net/LuaXML/LuaXML_101012.zip
mkdir LuaXml
unzip LuaXML_101012.zip -d LuaXml/

yum install libtermcap-devel ncurses-devel libevent-devel readline-devel
wget https://www.lua.org/ftp/lua-5.1.4.tar.gz

cd LuaXml/
gcc -shared -fPIC -o LuaXML_lib.so LuaXML_lib.c -I ../lua-5.1.4/src

Then copy LuaXML_lib.so and LuaXml.lua to /opt/openresty/lualib.

The easy way is to use LuaRocks  (Lua package manager):

wget http://luarocks.org/releases/luarocks-2.3.0.tar.gz
tar zxpf luarocks-2.3.0.tar.gz
./configure --with-lua-include='/home/ec2-user/app/files/ngx_openresty-1.9.7.2/bundle/lua-5.1.5/src'
make bootstrap
luarocks install luaxml

Allow DELETE method pass payload in Akamai


We have a RESTful API endpoint for DELETE request sitting behind Akamai. And the DELETE request contains some data in the payload. By default, Akamai strips off the payload in DELETE method when you only turn on ‘Allow DELETE’.

Selection_332.png

To allow Akamai to pass the payload in DELETE method, you have to:

  1. Enable ‘Allow All Methods on Parent Servers’ on the top level rule.

    Selection_330.png

  2. Add a new rule to enable WebDAV for DELETE method.

    Selection_331.png

I know it sounds a bit strange, but it is Akamai’s implementation of WebDAV. According to their internal documents:

In the current implementation, allowing the DELETE method with the tag security:allow-delete tag enables use of the DELETE method, but does not support passing a body as part of the request. If you want to support passing a body with the DELETE method, you need to enable support for WebDAV. You can limit this support to the DELETE method by enclosing it inside a match on the DELETE method, like this:

<match:request.method value="HTTP_DELETE">
  <edgeservices:webdav.status>on</edgeservices:webdav.status>
</match:request.method>

I understand why Akamai disallows it by default, as it is what RFC 7231 recommends. Well, my opinion is that architects/ developers should take the RFC’s recommendations into consideration when doing design.

A payload within a DELETE request message has no defined semantics; sending a payload body on a DELETE request might cause some existing implementations to reject the request.

Selection_329.png

Nginx upstream key exchange issue


akamai-nginx-upstream

Continue with my previous post.

Now it is about phase #3. Theoretically, it should be very straightward. All Nginx needs to do is just fowarding the Akamai reqeusts to the upstream. Right? But in real life, whatever can go wrong will go wrong😦

In the testing, we always get ‘502 Bad Gateway‘ error. I enabled debug mode in Nginx and found the relevant message ‘upstream prematurely closed connection while reading response header from upstream‘.

At the first begining I thought it could be caused by the whitelisting. But confirmed with the upstream vendor, there is no whitelisting and all settings look good to them. On our end, all settings look good to me too.

I use tcpdump to capture the conversation between Nginx and the upstream server. All I can see is that the upstream server terminates the connection after 20 seconds. We use Nginx default proxy timeout settings which is 60 seconds. So it should not be the cause. For some reason, the upstream just refuses Nginx’s request. As it is https, the first step should be ssl handshake. With this thought in mind, I scanned the upstream’s cert with ssl labs. And it does not look nice at all.

Selection_314.png

Not sure if it is the cause, but it is worthwhile to ask them to fix it anyway. Then the vendor did some fixes, and second scan looks better.

Selection_317.png

Test again, and it works!! I don’t know exactly what the magic is. But I do notice the major difference is about the ‘Key Exchange‘ of the cert. I assume what happened is Nginx requires a securer SSL handshake than the previous cert can support, so it just terminates the connection.

Akamai protocol rewrite issue


I have been working on a API gateway project (Akamai + Nginx + Lua) for quite a few weeks. There are lots of things that I would like to write and share. This post is one of them.

The high level architecture looks like this:

akamai-nginx-upstream

Phases:

  1. Users send http(s) GET or POST requests to Akamai.
  2. Akamai forwards the requests to origin server which is Nginx server.
  3. Nginx sends the requests to upstream which is a Apache server.

For security reasons, we would like to encrypt all the 3 phases.

  • phase #1, #2: The plan is to let Akamai rewrite http to https if users send http requests, as for certain path (e.g /security) it contains sensitive information.
  • phase #3: the upstream only do https, so Nginx just need to forward the requests sent from Akamai as it is https already.

Sounds all right to you so far? or that’s too early to say that🙂

According to the http code wiki page, 301/302 redirect does not support POST, so I have to use 307. Otherwise, Akamai will rewrite POST to GET and send it to Nginx. And my test proves it is correct.

307 Temporary Redirect (since HTTP/1.1)
The request should be repeated with another URI; however, future requests should still use the original URI. In contrast to how 302 was historically implemented, the request method is not allowed to be changed when reissuing the original request. For example, a POST request should be repeated using another POST request

307.png

While testing it, we found GET method works fine everywhere. But POST method works in postman but does not work in Jmeter nor java codes. Finally, we found the answer in RFC2616

If the 307 status code is received in response to a request other than GET or HEAD, the user agent MUST NOT automatically redirect the request unless it can be confirmed by the user, since this might change the conditions under which the request was issued.

Now what are the solutions?

  • phase #1: We force users to use https for certain paths. If it is http, it won’t work as it will be redirected to a custom page, e.g 404.html.

Selection_318.png

  • phase #2: We ask Akamai to change the metadata to force the forward protocol to be https. Currently, this feature is for Akamai internal only, that’s why it is greyed out as I don’t have permission to change it.

Selection_319.png

To be continued …

I will discuss the issue that is found in phase #3 in the next post.