Log Collection With Graylog on AWS

Bozhidar BozhanovMay 11th, 2015Last Updated: May 10th, 2015

0 34 6 minutes read

Log collection is essential to properly analyzing issues in production. An interface to search and be notified about exceptions on all your servers is a must. Well, if you have one server, you can easily ssh to it and check the logs, of course, but for larger deployments, collecting logs centrally is way more preferable than logging to 10 machines in order to find “what happened”.

There are many options to do that, roughly separated in two groups – 3rd party services and software to be installed by you.

3rd party (or “cloud-based” if you want) log collection services include Splunk, Loggly, Papertrail, Sumologic. They are very easy to setup and you pay for what you use. Basically, you send each message (e.g. via a custom logback appender) to a provider’s endpoint, and then use the dashboard to analyze the data. In many cases that would be the preferred way to go.

In other cases, however, company policy may frown upon using 3rd party services to store company-specific data, or additional costs may be undesired. In these cases extra effort needs to be put into installing and managing an internal log collection software. They work in a similar way, but implementation details may differ (e.g. instead of sending messages with an appender to a target endpoint, the software, using some sort of an agent, collects local logs and aggregates them). Open-source options include Graylog, FluentD, Flume, Logstash.

After a very quick research, I considered graylog to fit our needs best, so below is a description of the installation procedure on AWS (though the first part applies regardless of the infrastructure).

The first thing to look at are the ready-to-use images provided by graylog, including docker, openstack, vagrant and AWS. Unfortunately, the AWS version has two drawbacks – it’s using Ubuntu, rather than the Amazon AMI. That’s not a huge issue, although some generic scripts you use in your stack may have to be rewritten. The other was the dealbreaker – when you start it, it doesn’t run a web interface, although it claims it should. Only mongodb, elasticsearch and graylog-server are started. Having 2 instances – one web, and one for the rest would complicate things, so I opted for manual installation.

Graylog has two components – the server, which handles the input, indexing and searching, and the web interface, which is a nice UI that communicates with the server. The web interface uses mongodb for metadata, and the server uses elasticsearch to store the incoming logs. Below is a bash script (CentOS) that handles the installation. Note that there is no “sudo”, because initialization scripts are executed as root on AWS.

#!/bin/bash

# install pwgen for password-generation
yum upgrade ca-certificates --enablerepo=epel
yum --enablerepo=epel -y install pwgen

# mongodb
cat >/etc/yum.repos.d/mongodb-org.repo <<'EOT'
[mongodb-org]
name=MongoDB Repository
baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/x86_64/
gpgcheck=0
enabled=1
EOT

yum -y install mongodb-org
chkconfig mongod on
service mongod start

# elasticsearch
rpm --import https://packages.elasticsearch.org/GPG-KEY-elasticsearch

cat >/etc/yum.repos.d/elasticsearch.repo <<'EOT'
[elasticsearch-1.4]
name=Elasticsearch repository for 1.4.x packages
baseurl=http://packages.elasticsearch.org/elasticsearch/1.4/centos
gpgcheck=1
gpgkey=http://packages.elasticsearch.org/GPG-KEY-elasticsearch
enabled=1
EOT

yum -y install elasticsearch
chkconfig --add elasticsearch

# configure elasticsearch 
sed -i -- 's/#cluster.name: elasticsearch/cluster.name: graylog2/g' /etc/elasticsearch/elasticsearch.yml 
sed -i -- 's/#network.bind_host: localhost/network.bind_host: localhost/g' /etc/elasticsearch/elasticsearch.yml

service elasticsearch stop
service elasticsearch start

# java
yum -y update
yum -y install java-1.7.0-openjdk
update-alternatives --set java /usr/lib/jvm/jre-1.7.0-openjdk.x86_64/bin/java

# graylog
wget https://packages.graylog2.org/releases/graylog2-server/graylog-1.0.1.tgz
tar xvzf graylog-1.0.1.tgz -C /opt/
mv /opt/graylog-1.0.1/ /opt/graylog/
cp /opt/graylog/bin/graylogctl /etc/init.d/graylog
sed -i -e 's/GRAYLOG2_SERVER_JAR=\${GRAYLOG2_SERVER_JAR:=graylog.jar}/GRAYLOG2_SERVER_JAR=\${GRAYLOG2_SERVER_JAR:=\/opt\/graylog\/graylog.jar}/' /etc/init.d/graylog
sed -i -e 's/LOG_FILE=\${LOG_FILE:=log\/graylog-server.log}/LOG_FILE=\${LOG_FILE:=\/var\/log\/graylog-server.log}/' /etc/init.d/graylog

cat >/etc/init.d/graylog <<'EOT'
#!/bin/bash
# chkconfig: 345 90 60
# description: graylog control
sh /opt/graylog/bin/graylogctl $1
EOT

chkconfig --add graylog
chkconfig graylog on
chmod +x /etc/init.d/graylog

# graylog web
wget https://packages.graylog2.org/releases/graylog2-web-interface/graylog-web-interface-1.0.1.tgz
tar xvzf graylog-web-interface-1.0.1.tgz -C /opt/
mv /opt/graylog-web-interface-1.0.1/ /opt/graylog-web/

cat >/etc/init.d/graylog-web <<'EOT'
#!/bin/bash
# chkconfig: 345 91 61
# description: graylog web interface
sh /opt/graylog-web/bin/graylog-web-interface > /dev/null 2>&1 &
EOT

chkconfig --add graylog-web
chkconfig graylog-web on
chmod +x /etc/init.d/graylog-web

#configure 
mkdir --parents /etc/graylog/server/
cp /opt/graylog/graylog.conf.example /etc/graylog/server/server.conf
sed -i -e 's/password_secret =.*/password_secret = '$(pwgen -s 96 1)'/' /etc/graylog/server/server.conf

sed -i -e 's/root_password_sha2 =.*/root_password_sha2 = '$(echo -n password | shasum -a 256 | awk '{print $1}')'/' /etc/graylog/server/server.conf

sed -i -e 's/application.secret=""/application.secret="'$(pwgen -s 96 1)'"/g' /opt/graylog-web/conf/graylog-web-interface.conf
sed -i -e 's/graylog2-server.uris=""/graylog2-server.uris="http:\/\/127.0.0.1:12900\/"/g' /opt/graylog-web/conf/graylog-web-interface.conf

service graylog start
sleep 30
service graylog-web start

You may also want to set a TTL (auto-expiration) for messages, so that you don’t store old logs forever. Here’s how

# wait for the index to be created
INDEXES=$(curl --silent "http://localhost:9200/_cat/indices")
until [[ "$INDEXES" =~ "graylog2_0" ]]; do
	sleep 5
	echo "Index not yet created. Indexes: $INDEXES"
	INDEXES=$(curl --silent "http://localhost:9200/_cat/indices")
done

# set each indexed message auto-expiration (ttl)
curl -XPUT "http://localhost:9200/graylog2_0/message/_mapping" -d'{"message": {"_ttl" : { "enabled" : true, "default" : "15d" }}}'

Now you have everything running on the instance. Then you have to do some AWS-specific things (if using CloudFormation, that would include a pile of JSON). Here’s the list:

you can either have an auto-scaling group with one instance, or a single instance. I prefer the ASG, though the other one is a bit simpler. The ASG gives you auto-respawn if the instance dies.
set the above script to be invoked in the UserData of the launch configuration of the instance/asg (e.g. by getting it from s3 first)
allow UDP port 12201 (the default logging port). That should happen for the instance/asg security group (inbound), for the application nodes security group (outbound), and also as a network ACL of your VPC. Test the UDP connection to make sure it really goes through. Keep the access restricted for all sources, except for your instances.
you need to pass the private IP address of your graylog server instance to all the application nodes. That’s tricky on AWS, as private IP addresses change. That’s why you need something stable. You can’t use an ELB (load balancer), because it doesn’t support UDP. There are two options:
- Associate an Elastic IP with the node on startup. Pass that IP to the application nodes. But there’s a catch – if they connect to the elastic IP, that would go via NAT (if you have such), and you may have to open your instance “to the world”. So, you must turn the elastic IP into its corresponding public DNS. The DNS then will be resolved to the private IP. You can do that by manually and hacky:
```
GRAYLOG_ADDRESS="ec2-$GRAYLOG_ADDRESS//./-}.us-west-1.compute.amazonaws.com"
```
  or you can use the AWS EC2 CLI to obtain the instance details of the instance that the elastic IP is associated with, and then with another call obtain its Public DNS.
- Instead of using an Elastic IP, which limits you to a single instance, you can use Route53 (the AWS DNS manager). That way, when a graylog server instance starts, it can append itself to a route53 record, that way allowing for a round-robin DNS of multiple graylog instances that are in a cluster. Manipulating the Route53 records is again done via the AWS CLI. Then you just pass the domain name to applications nodes, so that they can send messages.
alternatively, you can install graylog-server on all the nodes (as an agent), and point them to an elasticsearch cluster. But that’s more complicated and probably not the intended way to do it
configure your logging framework to send messages to graylog. There are standard GELF (the greylog format) appenders, e.g. this one, and the only thing you have to do is use the Public DNS environment variable in the logback.xml (which supports environment variable resolution).
You should make the web interface accessible outside the network, so you can use an ELB for that, or the round-robin DNS mentioned above. Just make sure the security rules are tight and not allowing external tampering with your log data.
If you are not running a graylog cluster (which I won’t cover), then the single instance can potentially fail. That isn’t a great loss, as log messages can be obtained from the instances, and they are short-lived anyway. But the metadata of the web interface is important – dashboards, alerts, etc. So it’s good to do regular backups (e.g. with mongodump). Using an EBS volume is also an option.
Even though you send your log messages to the centralized log collector, it’s a good idea to also keep local logs, with the proper log rotation and cleanup.