Sunday, May 14, 2017

Lessons Learned with Logstash (so far)

So far my experiences with the Elasticsearch stack (ES) have been very positive. I've had it installed on OpenBSD 6.0 (amd64 on virtualbox on wind10) and 6.1 (amd64 on 2012R2 Hyper-V) with vast version differences between the two. I've learned some things along the way, and thought I'd take a second to account for them, specifically as it relates to Logstash.


Use a grok tool to assist in filter building

A lot of forum/blog posts advocate ditching the tedious process of regex-building, opting instead for the use of the kv filter,and then moving on with your life. Perhaps that's a possibility for some, but in my case the kv filter (which stands for key/value for things like 'box=blah') simply wasn't going to work. In specific, Windows event logs (translated to syslog via syslogagent) often will break any kind of logic from other event logs, and some defy any logic whatsoever.

So if regex filters are as important to you in Logstash as they are to me, you're going to be spending a lot of time working with them. My philosophy regarding the ES stack is that Logstash is on the front-lines of the battle, and can significantly ease Elasticsearch's work as well as the end-user's, so spending a proportionally large amount of time in Logstash versus ES itself or Kibana in my estimation is time well-spent.

Don't beat your head against a wall testing your grok regex's with live (or even test) logs in Logstash. I don't have any affiliation with the following site, but grok constructor  really has been helpful to me. It's probably like a lot of similar sites, where multiple logs can be loaded for multiple parse tests on a single regex, and allow you to specify your list of grok patterns so you can actually test what you will use. Once I successfully testbuild a regex here, I literally copy-n-paste it into my Logstash configuration.

Break up your logging devices by port and type

You can easily and permanently limit any 'damage' done by regexs for some devices that end up catching logs from others by categorizing logging devices by which destination port they report to the Logstash server on. Syslog is 514 and Logstash seems to like 5000, so in my implementations I start with 5001 for Juniper, 5002 for Fortinet, and 5003 for Windows, and on.

This also helps with _grokparsefailure identification in Kibana because now you can easily filter by type and be sure you're only looking at what you really want.

The port/type backstory

I stumbled into this practice on accident, grappling with what is likely Logstash's dirty little secret- that being problems with non-standard reporting devices. If you've scratched your head over a perfectly-constructed grok filter that works but still produces a _grokparsefailure tag, this is for you. It turns out that Logstash uses grok filtering internally on logs of type 'syslog', separately from anything you configure. For devices that don't "play nice" with their syslog reporting (read: Fortinet, Juniper, and more) Logstash will make it's displeasure known by attaching the _grokparsefailure tag to what it ships to ES. 

Once I realized that all these maddening _grokparsefailures where not my fault (a whole bunch more were my fault), I did three things-
  1. Moved my error log file to a larger capacity portion of my filesystem (from /var/log/ to /home/)
  2. Confirmed that every filter had a tag attached to it
  3. Separated incoming devices by port and assigned a type.
This had a great outcome- as soon as I re-classified incoming logs as anything other than syslog, Logstash itself stopped fretting over whether the incoming log was of a 'proper' format or not. That instantly made my failure logs drop, which also slowed the strain on capacity for my filesystem. Lots of big wins here!

I did not find this on my own. Aside from numerous google hits for things related to _grokparsefailure and matching, there was this very helpful article by James Turnbull on kartar.net .

Break up your configuration file

I've found that there is a sense of order and sanity in breaking up your configuration file into multiple configuration files. If there's more than one cook messing with your stew, you can limit any nuclear damage they do in a single file, and you can read multiple files of a configuration easier than one giant file. I've broken my into 
  • intro file with inputs and opening filter invocation
  • files by device or application
  • a general/misc/smaller configuration file for smaller stuff.
  • outro file with closing filter bracket and output stanza.
You'd think it would be easy to keep track of all those nested brackets, but after a couple thousand lines it can be a nightmare.


I'll put more here as I come up with it.