Friday, June 30, 2017

Lessons Learned with Logstash - Part III

I self-marketed my last post on Reddit and got encouraging replies for which I'm truly grateful. In one of the posts, I replied with what will be the latest installment here. I was just too excited! Anyway, this is the nitty-gritty of how to do the proper yeoman's work of proper field mapping in ES-

I use OpenBSD6.1 (on HyperV!) so apologies for OS-specific calls-

  • since I have four distinct types of sources I have each type log to LS on a port specific to that type. So all of my Junipers are logging to LS on 5001, my Fortigates on 5002, my Windows Servers on 5000, and my Nutanix Cluster Nodes reporting on 5005. I comment all but one out at a time to isolate the mapping work.
  • (assuming LS and ES are on the same box) (and not assuming the current state of the setup) (and assuming you want to start over wherever it is), I wrote the following script to stop LS, clear all the storage and logs for LS and ES, kill any existing mappings in ES and then restart it so that the system is ready to start a new round of mapping work:

[root@noc-05: Fri, Jun-30 10:32PM]
/root/#cat /usr/sbin/stopes
echo "\t\t\t ##### stopping logstash ##### \t\t\t"
rcctl stop logstash
sleep 2
echo "\t\t\t ##### clearing ES mappings ##### \t\t\t"
curl -XPOST 'localhost:9200/.kibana/_update_by_query?pretty&wait_for_completion&refresh' -H 'Content-Type: application/json' -d'{  "script": {    "inline": "ctx._source.defaultIndex = null",    "lang": "painless"  },  "query": {    "term": {      "_type": "config"    }  }}'
rcctl stop elasticsearch
sleep 1
echo "\t\t\t ##### clearing ES and LS logs, storage ##### \t\t\t"
rm /var/log/logstash/logstash.log ;touch /var/log/logstash/logstash.log ;chown _logstash:_logstash /var/log/logstash/*;rm -rf /storage/elasticsearch/;rm /var/log/elasticsearch/elasticsearch.log ;touch /var/log/elasticsearch/elasticsearch.log ;chown _elasticsearch:_elasticsearch /var/log/elasticsearch/*
sleep 1
echo "\t\t\t ##### starting ES ##### \t\t\t"
rcctl start elasticsearch
[root@noc-05: Fri, Jun-30 10:32PM]
/root/#

  • For the current source category I'm working with, I pick through my logstash filters for them once again, being sure to not inadvertently introduce a field in two spots with slightly different spellings (equating to two separate fields in ES) like dst-ip and dst_ip.



  • I then start logstash with a single device category reporting in

rcctl -d logstash

watch the stuff come in, re-visiting _grokparsefailures, and repeatedly refreshing the index for new field types coming in (whether dynamically if you still have that on, or a manually defined field simply hasn't seen a log come in that triggers it's use). Some dynamically-mapped errors are ES's fault- others are because you are using the wrong UTF (8 vs 16) or not an appropriate codec that could be used. Either way, now is the time to see those, correct them in LS and restart it until you hone down what's going crazy. Now is when those online grok filter tools come in REAL handy. Keep using the stopes script, correct your logstash filtering, and restart logstash... repeatedly.

  • When you've felt you've a) rooted out all the _grokparsefailures (hint, put the pesky, corner-case logs in a catch-all filter so you can move on with life), b) rooted out the dynamic-mapping crap fields, you're ready to pull down the mapping from ES and convert it to the mapping you tell it to pay attention to (which just so happens to be ONLY the filtering your logstash config files are telling it to pay attention to)-

rcctl stop logstash
curl -XGET http://127.0.0.1:9200/logstash-*/_mapping?pretty > my_mapping.json
cp my_mapping.json my_template.json 

That above gets a file for you to edit, this is where you tighten up the fields themselves. You will notice duplicate field entries (remember dst-ip and dst_ip) and you'll have to go back in LS and mutate => rename one of the two to match the other . Then you'll make decision on every field based on what you observed it's data to be and decide whether it's gong to be treated like text, an integer, an ip address, or time/date, etc. (I say etc. but I don't know anymore lol). Doing this is a huge favor not only to you but to the performance of your system. Improperly typed fields are the bane of our existence. For one thing, I could not get geomapping working in Kibana until I set the geoip fields correctly.

  • And if you are only doing one category of log sources, then you skip to the end and upload the mapping into ES and restart LS and you're in production!

curl -XPUT http://localhost:9200/_template/logstash-*_template?pretty -d @my_template.json
curl -XDELETE http://localhost:9200/logstash-*?pretty
rcctl -d start logstash

The above pushes the template to ES, clears any existing indices, and then fires up logstash to feed it production docs.

Conclusion

If you are like me, you have to repeat this for each category of logging source you deal with, then concatenate each of the sources into a single my_template.json file. I'm not there yet, still working on Windows Server (last of my 4 source categories). Also, the editing tools on this blog platform are deplorable- my Reddit post had better formatting than this blog post, sigh.

Thursday, June 29, 2017

Lessons Learned with Logstash - Part II

In Lessons Learned with Logstash (so far) I went into the some of the problems and challenges of Logstash. Since then, I've discovered what I refer to as the 'chasm' between an out-of-the-box Elasticsearch solution, and a large-scale, full-blown enterprise-level implementation. TLDR; if you're going to use ES out-of-the-box, don't expect to use it like that 'at-scale' for very long.

The Chasm - A field mapping explosion!

The chasm between simple-ES and Hawking-complicated-ES is what is known in the documentation as a 'field mapping explosion'. In a sweet turn of irony, that which draws most newcomers to ES is exactly what will cut them down at certain scale of logging without a major refactoring of their understanding of the stack.

side rant: I suspect this is why most blog articles related to syslogging with ES really only focus on a shallow treatment of Logstash... and never get to Elasticsearch itself- ES is complicated. Sadly this group doesn't just include personal bloggers like yours truly, but commercial vendors (traditional log handling application vendors) vying for a piece of the ES pie -I find their lackluster attempts to market their ES chops especially deplorable. If ES is being used by Netflix and Ebay and Wikipedia, then it stands to reason that the average toe-dipping blog article is wholly insufficient for, well, just about anything seriously related to ES (this article included!).

Back to business: Dynamic field mapping is a feature of ES to bring neophytes in, because it attempts to automatically create fields based on the data being fed to it, without any configuration or interaction by the administrator. Literally feed it a stream or store of information, and ES will get down to work creating all the fields for you, and then populating them with data. Sounds fantastic! What could possibly go wrong?

The Two Strikes Against ES

Unfortunately the key word in all of this isn't 'dynamic' or 'automatically', but rather the word 'attempt', because depending on the data being fed to it, it can either shine or crash and burn. Both are endings bright, but one isn't happy. To help explain, it's useful to know the two strikes against Elastic putting this feature into ES-

  1. Those that rely the most on dynamic mapping are those most likely to suffer from it (newbies).
  2. One of the best use-cases of ES creates an embarrassingly hot mess of an ELK stack when using dynamic field mapping (syslog).

I know this because I fell into both categories- a newbie using ES for syslog analysis.

How does it (not) work?

So what happens? At it's core, ES suffers from separation anxiety- but in the opposite manner we're used to. Instead of being clingy it tries to separate lots of information, and when it's syslog data, much it shouldn't be separated. Specifically, the hyphen signals to ES that whatever is on either side of it should be separated. In the world of syslog, that usually includes hostnames, and dates and times.

If it seems like a glaring issue, that's because it is. Your ES index will be littered with faulty fields, and those fields will contain data, both of which should've been data somewhere else you specifically needed it to be. That's bad when you're using ES to find needles in your haystack, and some needles get permanently lost in those bogus fields.

Your system will start to churn on itself. At one point I thought I needed around 500 to 600 fields. I had 906. Then I had 3342, and that was effectively data corruption on a mass scale (the system takes in ~4 million events per day). It made searches against appropriately stored information undeniably slow.

Simply crafting detailed logstash rules with plenty of fields guarantees nothing. If ES sees some data as a field, it can wreck your grok-parse rules and leave you wondering why they don't work when they once did. Add to this the variability of Unicode amongst your data sources (Juniper and MS-Windows are  UTF-16, Nutanix and Fortinet are UTF-8, for instance) and you can spend a great deal of time hunting down your parsing issues, all while munging your data into oblivion.

The Solution

The solution is to prepare ES ahead of time for exactly what it should expect, using manual field mappings. Via the API, you can turn off dynamic mapping and manually map every field you will be using. The added benefit of this work is that at while defining the fields you set the type of data ES should expect per field. When tuned properly via appropriate data types, ES can index and search significantly faster than if it's left with the default field mapping definition. Speaking of which, Logstash will ship in fields that are of type string. This almost necessitates a static mapping exercise, as there performance and capability penalties for incorrectly typing your data.

Friends with Benefits

For instance, if IP addresses are properly typed as such, then you can do fancy things like search through CIDR subnets and perform IP space calculations. As another example, properly typing number data as integers (whole or floating, of various maximum sizes) allows ES to perform math functions on them, whereas left as the default (type string) will never give that ability. A static field mapping fed to ES ahead of document ingestion will ensure that the right field typing is applied to the incoming data to avoid disaster, wasted CPU cycles, and losses of extended functionality.

How-to, Next Time

In the next article I'll go over the gory details of how to properly define the fields you intend to use. At a basic level, you use strict LS filter rules along with the dynamic field mapping capability to see what fields come across from LS to ES. It allows you to tweak your filter rules until only the fields you want are traversing the stack. From there you feed them back in as a template (along with changes to the field types) and voila! ES and LS suddenly seem to get along with each other.

SPECIAL BONUS TIP: in part one I talked about breaking up your logging devices by port and type. Along with that do not forget to conditionally filter each set of device types via their assigned type before parsing that section of rules. I forgot to do this and ended up having my Nutanix logs parsed by my Nutanix grokparse filters... and then by my Fortinet grokparse filters, leading to all sorts of chaos. Start your constituent logstash configuration files (the ones I described previously as being dedicated to a particular device/vendor/application) with this (using Nutanix as the example):

if [type] == "nutanix" { 

   {nested nutanix-filter_01}
   .
   .
   .
   {nested nutanix-filter_99}
}



Art - Mechlab WIP

Another in my series of "unfinished art projects that shall be finished soon"... here's mechlab!











Art - Quadville WIP shots

IT's been awhile since I've worked on Quadville, but I'm aiming to get back into it. Until then, here's some last screenshots from this work in progress. Enjoy!


















Sunday, May 14, 2017

Lessons Learned with Logstash (so far)

So far my experiences with the Elasticsearch stack (ES) have been very positive. I've had it installed on OpenBSD 6.0 (amd64 on virtualbox on wind10) and 6.1 (amd64 on 2012R2 Hyper-V) with vast version differences between the two. I've learned some things along the way, and thought I'd take a second to account for them, specifically as it relates to Logstash.


Use a grok tool to assist in filter building

A lot of forum/blog posts advocate ditching the tedious process of regex-building, opting instead for the use of the kv filter,and then moving on with your life. Perhaps that's a possibility for some, but in my case the kv filter (which stands for key/value for things like 'box=blah') simply wasn't going to work. In specific, Windows event logs (translated to syslog via syslogagent) often will break any kind of logic from other event logs, and some defy any logic whatsoever.

So if regex filters are as important to you in Logstash as they are to me, you're going to be spending a lot of time working with them. My philosophy regarding the ES stack is that Logstash is on the front-lines of the battle, and can significantly ease Elasticsearch's work as well as the end-user's, so spending a proportionally large amount of time in Logstash versus ES itself or Kibana in my estimation is time well-spent.

Don't beat your head against a wall testing your grok regex's with live (or even test) logs in Logstash. I don't have any affiliation with the following site, but grok constructor  really has been helpful to me. It's probably like a lot of similar sites, where multiple logs can be loaded for multiple parse tests on a single regex, and allow you to specify your list of grok patterns so you can actually test what you will use. Once I successfully testbuild a regex here, I literally copy-n-paste it into my Logstash configuration.

Break up your logging devices by port and type

You can easily and permanently limit any 'damage' done by regexs for some devices that end up catching logs from others by categorizing logging devices by which destination port they report to the Logstash server on. Syslog is 514 and Logstash seems to like 5000, so in my implementations I start with 5001 for Juniper, 5002 for Fortinet, and 5003 for Windows, and on.

This also helps with _grokparsefailure identification in Kibana because now you can easily filter by type and be sure you're only looking at what you really want.

The port/type backstory

I stumbled into this practice on accident, grappling with what is likely Logstash's dirty little secret- that being problems with non-standard reporting devices. If you've scratched your head over a perfectly-constructed grok filter that works but still produces a _grokparsefailure tag, this is for you. It turns out that Logstash uses grok filtering internally on logs of type 'syslog', separately from anything you configure. For devices that don't "play nice" with their syslog reporting (read: Fortinet, Juniper, and more) Logstash will make it's displeasure known by attaching the _grokparsefailure tag to what it ships to ES. 

Once I realized that all these maddening _grokparsefailures where not my fault (a whole bunch more were my fault), I did three things-
  1. Moved my error log file to a larger capacity portion of my filesystem (from /var/log/ to /home/)
  2. Confirmed that every filter had a tag attached to it
  3. Separated incoming devices by port and assigned a type.
This had a great outcome- as soon as I re-classified incoming logs as anything other than syslog, Logstash itself stopped fretting over whether the incoming log was of a 'proper' format or not. That instantly made my failure logs drop, which also slowed the strain on capacity for my filesystem. Lots of big wins here!

I did not find this on my own. Aside from numerous google hits for things related to _grokparsefailure and matching, there was this very helpful article by James Turnbull on kartar.net .

Break up your configuration file

I've found that there is a sense of order and sanity in breaking up your configuration file into multiple configuration files. If there's more than one cook messing with your stew, you can limit any nuclear damage they do in a single file, and you can read multiple files of a configuration easier than one giant file. I've broken my into 
  • intro file with inputs and opening filter invocation
  • files by device or application
  • a general/misc/smaller configuration file for smaller stuff.
  • outro file with closing filter bracket and output stanza.
You'd think it would be easy to keep track of all those nested brackets, but after a couple thousand lines it can be a nightmare.


I'll put more here as I come up with it.

Monday, March 20, 2017

dannomech

I created this a year or two ago and never got around to posting it here- there's more than just this but I think this was decently representative of most of what I'd accomplished