I was working on a feature to enrich some log events with extra data. In this case, the session setup event has most of the info (client, server, etc) and the subsequent events only contain info about the event and the session ID. I wanted to de-normalize all of the session info so I could facet along those dimensions in Kibana.
My first attempt used the elasticsearch filter plugin in logstash to lookup the session setup event:
1 2 3 4 5 6 7 8 9 10 |
|
Hmm, it works but is fairly slow at ~150 events per second. Looking at the logstash pipeline analyzer shows the elasticsearch filter is easily taking several orders of magnitude longer than the next slowest filter. My second try used memcached to set/get the session info.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
|
Boom! Events are processing 10x faster (~1500 events per second) with minimal changes.
A couple tips for working with this plugin:
- Don’t use a field name in the namespace config item as it will just use it verbatim and not dereference it.
- If the key name is getting resolved incorrectly, you will start getting weird results where the wrong session info is used. You can dump the keys from memcached if you are trying to figure out if the plugin is using the name you expect.
1 2 3 4 5 6 |
|
Note: Elastic recently released Logstash 6.6.0 and tout the memcached filter plugin as a new feature. Don’t call it a comeback, it’s been around for more than a year!