Cross-Region S3 Replication Monitor



OpenShift V3 Persistent Storage Nagios Plugin

By the time of writing, OpenShift V3 comes with poor monitoring capabilities. The build-in monitoring only checks the metrics of Memory/CPU/Network, and it does not even support alerting! And the lowest granular level only down to last hour. So you have to build your own monitoring if you want to keep close eyes on your … Continue reading OpenShift V3 Persistent Storage Nagios Plugin

Elasticache Redis Unreachable Issue

We have a Elasticache Redis replication group, it has two nodes: one primary and one replica. Last week, we noticed that the primary redis node suddenly stops working - any connections to the primary node timed out eventually. According to the log, there was a load burst and following that the redis reboot itself. Unfortunately, … Continue reading Elasticache Redis Unreachable Issue

Troubleshoot high CPU usage java process

This is a real troubleshooting example that I just did yesterday for a high CPU usage java application. The application uses tomcat and runs on AWS EC2. Login into the box, and change to root user so you can see all users' process. sudo su - Install htop if you have not installed it before, the run it. … Continue reading Troubleshoot high CPU usage java process