Logstash

chenryn 34,686 views 33 slides Oct 05, 2012
Slide 1
Slide 1 of 33
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33

About This Presentation

how to collect nginxs' accesslogs into elasticsearch by logstash or message-passing.


Slide Content

Logstash::Intro
@ARGV

Why use Logstash?
•We already have splunk, syslog-ng, chukwa,
graylog2, scribe, flume and so on.
•But we want a free, light-weight and high-
integrality frame for our log:
•non free --> splunk
•heavy java --> scribe,flume
•lose data --> syslog
•non flex --> nxlog

How logstash works?
•Ah, just like others, logstash has
input/filter/output plugins.
•Attention: logstash process events, not (only)
loglines!
•"Inputs generate events, filters modify them,
outputs ship them elsewhere." -- [the life of an
event in logstash]
•"events are passed from each phase using
internal queues......Logstash sets each queue
size to 20." -- [the life of an event in logstash]

Existing plugins

Most popular plugins(inputs)
•amqp
•eventlog
•file
•redis
•stdin
•syslog
•ganglia

Most popular plugins(filters)
•date
•grep
•grok
•multiline

Most popular plugins(outputs)
•amqp
•elasticsearch
•email
•file
•ganglia
•graphite
•mongodb
•nagios
•redis
•stdout
•zabbix
•websocket

Usage in cluster - agent install
•Only an 'all in one' jar download in
http://logstash.net/
•All source include ruby and JRuby in
http://github.com/logstash/
•But we want a lightweight agent in cluster.

Usage in cluster - agent install
•Edit Gemfile like:
–source "http://ruby.taobao.org/"
–gem "cabin", "0.4.1"
–gem "bunny"
–gem "uuidtools"
–gem "filewatch", "0.3.3"
•clone logstash/[bin|lib]:
–git clone https://github.com/chenryn/logstash.git
–git branch pure-ruby
•Gem install
–gem install bundler
–bundle
•Run
–ruby logstash/bin/logstash -f logstash/etc/logstash-agent.conf

Usage in cluster - agent configuration
–input {
– file {
– type => "nginx"
– path => ["/data/nginx/logs/access.log" ]
– }
–}
–output {
– redis {
– type => "nginx"
– host => "5.5.5.5"
– key => "nginx"
– data_type => "channel"
– }
–}

Usage in cluster - server install
•Server is another agent run some filter and
storages.
•Message queue(RabbitMQ is too heavy, Redis
just enough):
–yum install redis-server
–service redis-server start
•Storage: mongo/elasticsearch/Riak
•Visualization: kibana/statsd/riemann/opentsdb
•Run:
–java -jar logstash-1.1.0-monolithic.jar agent -f logstash/etc/server.conf

Usage in cluster - server configuration
–input {
– redis {
– type => "nginx"
– host => "5.5.5.5"
– data_type => "channel"
– key => "nginx"
– }
–}
–filter {
– grok {
– type => "nginx"
– pattern => "%{NGINXACCESS}"
– patterns_dir => ["/usr/local/logstash/etc/patterns"]
– }
–}
–output {
– elasticsearch {
– cluster => 'logstash'
– host => '10.5.16.109'
– port => 9300
– }
–}

Usage in cluster - grok
•jls-grok is a pattern tool wrote by JRuby
•Lots of examples can be found at:
https://github.com/logstash/logstash/tree/master/patterns
•Here is my "nginx" patterns:
–NGINXURI %{URIPATH}(?:%{URIPARAM})*
–NGINXACCESS \[%{HTTPDATE}\] %{NUMBER:code:int} %{IP:client} %
{HOSTNAME} %{WORD:method} %{NGINXURI:req} %{URIPROTO}/%
{NUMBER:version} %{IP:upstream}(:%{POSINT:port})? %
{NUMBER:upstime:float} %{NUMBER:reqtime:float} %{NUMBER:size:int}
"(%{URIPROTO}://%{HOST:referer}%{NGINXURI:referer}|-)" %
{QS:useragent} "(%{IP:x_forwarder_for}|-)"

Usage in cluster - elasticsearch
•ElasticSearch is a production build-on Luence
for the cloud compute.
•more information at:
–http://www.elasticsearch.cn/
•Logstash has an embedded ElasticSearch
already!
•Attention: If you want to build your own
distributed elasticsearch cluster, make sure the
server version is equal to the client used by
logstash!

Usage in cluster - elasticsearch
•elasticsearch/config/elasticsearch.yml:
–cluster.name: logstash
–node.name: "ES109"
–node.master: true
–node.data: false
–index.number_of_replicas: 0
–index.number_of_shards: 1
–path.data: /data1/ES/data
–path.logs: /data1/ES/logs
–network.host: 10.5.16.109
–transport.tcp.port: 9300
–transport.tcp.compress: true
–gateway.type: local
–discovery.zen.minimum_master_nodes: 1

Usage in cluster - elasticsearch
•The embedded web front for ES is too simple,
sometimes naïve~Try Kibana and EShead.
•https://github.com/rashidkpc/Kibana
•https://github.com/mobz/elasticsearch-head.git
•Attention:there is a bug about ES ---- ifdown
your external network before ES starting and
ifup later.Otherwase your ruby client cannot
connect ES server!

Try it please!
•Ah, do not want install,install,install and install?
•Here is a killer application:
–sudo zypper install virtualbox rubygems
–gem install vagrant
–git clone https://github.com/mediatemple/log_wrangler.git
–cd log_wrangler
–PROVISION=1 vagrant up

Other output example
•For monitor(example):
–filter {
– grep {
– type => "linux-syslog"
– match => [ "@message","(error|ERROR|CRITICAL)" ]
– add_tag => [ "nagios-update" ]
– add_field => [ "nagios_host", "%{@source_host}", "nagios_service", "the name of your
nagios service check" ]
– }
–}
–output{
– nagios {
– commandfile => “/usr/local/nagios/var/rw/nagios.cmd"
– tags => "nagios-update"
– type => "linux-syslog"
– }
– }

Other output example
•For metric
–output {
– statsd {
– increment => "apache.response.%{response}"
– count => [ "apache.bytes", "%{bytes}" ]
– }
–}

Advanced Questions
•Is ruby1.8.7 stability enough?
•Try Message::Passing module in CPAN, I love perl~
•Is ElasticSearch high-speedy enough?
•Try Sphinx, see report in ELSA project:
–In designing ELSA, I tried the following components but found them too slow. Here they are ordered from fastest to
slowest for indexing speeds (non-scientifically tested):
1.Tokyo Cabinet
2.MongoDB
3.TokuDB MySQL plugin
4.Elastic Search (Lucene)
5.Splunk
6.HBase
7.CouchDB
8.MySQL Fulltext
•http://code.google.com/p/enterprise-log-search-and-archive/wiki/Documentation#Why_ELSA?

Advanced Testing
•How much event/sec can ElasticSearch hold?
•- Logstash::Output::Elasticsearch(HTTP) can only indexes 200+ msg/sec for
one thread.
•- Try _bulk API by myself using perl ElasticSearch::Transport::HTTPLite
module.
•-- speed testing result is 2500+ msg/sec
•-- tesing record see:
http://chenlinux.com/2012/09/16/elasticsearch-bulk-index-speed-testing/
WHY?!

Maybe…
•Logstash use an experimental module, we can
see the Logstash::Output::ElasticsearchHTTP
use ftw as http client but it cannot hold bulk size
larger than 200!!
•So we all suggest to use multi-output block in
agent.conf.

Advanced ES Settings(1)--problems
•Kibana can search data by using facets APIs.
But when you indexes URLs, they would be
auto-splitted by ‘/’~~
•And search facets at ip from 1000w msgs use
0.1s,but at urls use…ah, timeout!
•When you check your indices size, you will find
that (indices size/indices count) : message
length ~~ 10:1 !!

Advanced ES Settings(2)--solution
•Setting ElasticSearch default _mapping
template!
•In fact, ES “store” index data, and then “store”
store data… Yes! If you don’t set “store” : “no”,
all the data reduplicate stored.
•And ES has many analyze plugins.They
automate split words by whitespaces, path
hierachy, keword etc.
•So, set “index”:”not_analyzed” and facets 100k+
URLs can be finished in 1s.

Advanced ES Settings(2)--solution
•Optimze:
•Call _optimze API everyday may decrease some
indexed size~
•You can found those solutions in:
•https://github.com/logstash/logstash/wiki/Elasticsearch-Storage-Optimization
•https://github.com/logstash/logstash/wiki/Elasticsearch----Using-index-templates-&-dynamic-mappings

Advanced Input -- question
•Now we know how to disable _all field, but there
are still duplicated fields: @fields and
@message!
•Logstash search ES default in @message field
but logstash::Filter::Grok default capture
variables into @fields just from @message!
•How to solve?

Advanced Input -- solution
•We know some other systems like
Message::Passing have encode/decode in
addition to input/filter/output.
•In fact logstash has them too~but rename them
as ‘format’.
•So we can define the message format ourself,
just using logformat in nginx.conf.
•(example as follow)

Advanced Input -- nginx.conf
–logformat json '{"@timestamp":"$time_iso8601",'
'"@source":"$server_addr",‘
'"@fields":{‘
'"client":"$remote_addr",'
'"size":$body_bytes_sent,'
'"responsetime":$request_time,' '"upstreamtime":
$upstream_response_time,'
'"oh":"$upstream_addr",'
'"domain":"$host",'
'"url":"$uri",'
'"status":"$status"}}';
– access_log /data/nginx/logs/access.json json;
•See
http://cookbook.logstash.net/recipes/apache-json-logs/

Advanced Input -- json_event
•Now define input block with format:
–input {
– stdin {
– type => "nginx“
– format => "json_event“
– }
–}
•And start in command line:
–tail -F /data/nginx/logs/access.json \
–| sed 's/upstreamtime":-/upstreamtime":0/' \
–| /usr/local/logstash/bin/logstash -f /usr/local/logstash/etc/agent.conf &
•Attention: Upstreamtime may be “-” if status is 400.

Advanced Web GUI
•Write your own website using ElasticSearch
RESTful API to search as follows:
–curl -XPOST http://es.domain.com:9200/logstash-2012.09.18/nginx/_search?pretty=1 –d ‘
{
“query”: {
“range”: {
“from”: “now-1h”,
“to”: “now”
}
},
“facets”: {
“curl_test”: {
“date_histogram”: {
“key_field”: “@timestamp”,
“value_field”: “url”,
“interval “: “5m”
}
}
},
“size”: 0
}

Additional Message::Passing demo
•I do write a demo using Message::Passing,
Regexp::Log, ElasticSearch and so on perl
modules working similar to logstash usage
showed here.
•See:
–http://chenlinux.com/2012/09/16/message-passing-agent/
–http://chenlinux.com/2012/09/16/regexp-log-demo-for-nginx/
–http://chenlinux.com/2012/09/16/message-passing-filter-demo/

Reference
•http://logstash.net/docs/1.1.1/tutorials/metrics-from-logs
•http://logwrangler.mtcode.com/
•https://www.virtualbox.org/wiki/Linux_Downloads
•http://vagrantup.com/v1/docs/getting-started/index.html
•http://www.elasticsearch.cn
•http://search.cpan.org/~bobtfish/Message-Passing-
0.010/lib/Message/Passing.pm