danno's blog: June 2017

Friday, June 30, 2017

Lessons Learned with Logstash - Part III

I self-marketed my last post on Reddit and got encouraging replies for which I'm truly grateful. In one of the posts, I replied with what will be the latest installment here. I was just too excited! Anyway, this is the nitty-gritty of how to do the proper yeoman's work of proper field mapping in ES-

I use OpenBSD6.1 (on HyperV!) so apologies for OS-specific calls-

since I have four distinct types of sources I have each type log to LS on a port specific to that type. So all of my Junipers are logging to LS on 5001, my Fortigates on 5002, my Windows Servers on 5000, and my Nutanix Cluster Nodes reporting on 5005. I comment all but one out at a time to isolate the mapping work.
(assuming LS and ES are on the same box) (and not assuming the current state of the setup) (and assuming you want to start over wherever it is), I wrote the following script to stop LS, clear all the storage and logs for LS and ES, kill any existing mappings in ES and then restart it so that the system is ready to start a new round of mapping work:

[root@noc-05: Fri, Jun-30 10:32PM]
/root/#cat /usr/sbin/stopes
echo "\t\t\t ##### stopping logstash ##### \t\t\t"
rcctl stop logstash
sleep 2
echo "\t\t\t ##### clearing ES mappings ##### \t\t\t"
curl -XPOST 'localhost:9200/.kibana/_update_by_query?pretty&wait_for_completion&refresh' -H 'Content-Type: application/json' -d'{ "script": { "inline": "ctx._source.defaultIndex = null", "lang": "painless" }, "query": { "term": { "_type": "config" } }}'
rcctl stop elasticsearch
sleep 1
echo "\t\t\t ##### clearing ES and LS logs, storage ##### \t\t\t"
rm /var/log/logstash/logstash.log ;touch /var/log/logstash/logstash.log ;chown _logstash:_logstash /var/log/logstash/*;rm -rf /storage/elasticsearch/;rm /var/log/elasticsearch/elasticsearch.log ;touch /var/log/elasticsearch/elasticsearch.log ;chown _elasticsearch:_elasticsearch /var/log/elasticsearch/*
sleep 1
echo "\t\t\t ##### starting ES ##### \t\t\t"
rcctl start elasticsearch
[root@noc-05: Fri, Jun-30 10:32PM]
/root/#

For the current source category I'm working with, I pick through my logstash filters for them once again, being sure to not inadvertently introduce a field in two spots with slightly different spellings (equating to two separate fields in ES) like dst-ip and dst_ip.

I then start logstash with a single device category reporting in

rcctl -d logstash

watch the stuff come in, re-visiting _grokparsefailures, and repeatedly refreshing the index for new field types coming in (whether dynamically if you still have that on, or a manually defined field simply hasn't seen a log come in that triggers it's use). Some dynamically-mapped errors are ES's fault- others are because you are using the wrong UTF (8 vs 16) or not an appropriate codec that could be used. Either way, now is the time to see those, correct them in LS and restart it until you hone down what's going crazy. Now is when those online grok filter tools come in REAL handy. Keep using the stopes script, correct your logstash filtering, and restart logstash... repeatedly.

When you've felt you've a) rooted out all the _grokparsefailures (hint, put the pesky, corner-case logs in a catch-all filter so you can move on with life), b) rooted out the dynamic-mapping crap fields, you're ready to pull down the mapping from ES and convert it to the mapping you tell it to pay attention to (which just so happens to be ONLY the filtering your logstash config files are telling it to pay attention to)-

rcctl stop logstash
curl -XGET http://127.0.0.1:9200/logstash-*/_mapping?pretty > my_mapping.json
cp my_mapping.json my_template.json

That above gets a file for you to edit, this is where you tighten up the fields themselves. You will notice duplicate field entries (remember dst-ip and dst_ip) and you'll have to go back in LS and mutate => rename one of the two to match the other . Then you'll make decision on every field based on what you observed it's data to be and decide whether it's gong to be treated like text, an integer, an ip address, or time/date, etc. (I say etc. but I don't know anymore lol). Doing this is a huge favor not only to you but to the performance of your system. Improperly typed fields are the bane of our existence. For one thing, I could not get geomapping working in Kibana until I set the geoip fields correctly.

And if you are only doing one category of log sources, then you skip to the end and upload the mapping into ES and restart LS and you're in production!

curl -XPUT http://localhost:9200/_template/logstash-*_template?pretty -d @my_template.json
curl -XDELETE http://localhost:9200/logstash-*?pretty
rcctl -d start logstash

The above pushes the template to ES, clears any existing indices, and then fires up logstash to feed it production docs.

Conclusion

If you are like me, you have to repeat this for each category of logging source you deal with, then concatenate each of the sources into a single my_template.json file. I'm not there yet, still working on Windows Server (last of my 4 source categories). Also, the editing tools on this blog platform are deplorable- my Reddit post had better formatting than this blog post, sigh.

Thursday, June 29, 2017

Lessons Learned with Logstash - Part II

In Lessons Learned with Logstash (so far) I went into the some of the problems and challenges of Logstash. Since then, I've discovered what I refer to as the 'chasm' between an out-of-the-box Elasticsearch solution, and a large-scale, full-blown enterprise-level implementation. TLDR; if you're going to use ES out-of-the-box, don't expect to use it like that 'at-scale' for very long.

The Chasm - A field mapping explosion!

The chasm between simple-ES and Hawking-complicated-ES is what is known in the documentation as a 'field mapping explosion'. In a sweet turn of irony, that which draws most newcomers to ES is exactly what will cut them down at certain scale of logging without a major refactoring of their understanding of the stack.

side rant: I suspect this is why most blog articles related to syslogging with ES really only focus on a shallow treatment of Logstash... and never get to Elasticsearch itself- ES is complicated. Sadly this group doesn't just include personal bloggers like yours truly, but commercial vendors (traditional log handling application vendors) vying for a piece of the ES pie -I find their lackluster attempts to market their ES chops especially deplorable. If ES is being used by Netflix and Ebay and Wikipedia, then it stands to reason that the average toe-dipping blog article is wholly insufficient for, well, just about anything seriously related to ES (this article included!).

Back to business: Dynamic field mapping is a feature of ES to bring neophytes in, because it attempts to automatically create fields based on the data being fed to it, without any configuration or interaction by the administrator. Literally feed it a stream or store of information, and ES will get down to work creating all the fields for you, and then populating them with data. Sounds fantastic! What could possibly go wrong?

The Two Strikes Against ES

Unfortunately the key word in all of this isn't 'dynamic' or 'automatically', but rather the word 'attempt', because depending on the data being fed to it, it can either shine or crash and burn. Both are endings bright, but one isn't happy. To help explain, it's useful to know the two strikes against Elastic putting this feature into ES-

Those that rely the most on dynamic mapping are those most likely to suffer from it (newbies).
One of the best use-cases of ES creates an embarrassingly hot mess of an ELK stack when using dynamic field mapping (syslog).

I know this because I fell into both categories- a newbie using ES for syslog analysis.

How does it (not) work?

So what happens? At it's core, ES suffers from separation anxiety- but in the opposite manner we're used to. Instead of being clingy it tries to separate lots of information, and when it's syslog data, much it shouldn't be separated. Specifically, the hyphen signals to ES that whatever is on either side of it should be separated. In the world of syslog, that usually includes hostnames, and dates and times.

If it seems like a glaring issue, that's because it is. Your ES index will be littered with faulty fields, and those fields will contain data, both of which should've been data somewhere else you specifically needed it to be. That's bad when you're using ES to find needles in your haystack, and some needles get permanently lost in those bogus fields.

Your system will start to churn on itself. At one point I thought I needed around 500 to 600 fields. I had 906. Then I had 3342, and that was effectively data corruption on a mass scale (the system takes in ~4 million events per day). It made searches against appropriately stored information undeniably slow.

Simply crafting detailed logstash rules with plenty of fields guarantees nothing. If ES sees some data as a field, it can wreck your grok-parse rules and leave you wondering why they don't work when they once did. Add to this the variability of Unicode amongst your data sources (Juniper and MS-Windows are UTF-16, Nutanix and Fortinet are UTF-8, for instance) and you can spend a great deal of time hunting down your parsing issues, all while munging your data into oblivion.

The Solution

The solution is to prepare ES ahead of time for exactly what it should expect, using manual field mappings. Via the API, you can turn off dynamic mapping and manually map every field you will be using. The added benefit of this work is that at while defining the fields you set the type of data ES should expect per field. When tuned properly via appropriate data types, ES can index and search significantly faster than if it's left with the default field mapping definition. Speaking of which, Logstash will ship in fields that are of type string. This almost necessitates a static mapping exercise, as there performance and capability penalties for incorrectly typing your data.

Friends with Benefits

For instance, if IP addresses are properly typed as such, then you can do fancy things like search through CIDR subnets and perform IP space calculations. As another example, properly typing number data as integers (whole or floating, of various maximum sizes) allows ES to perform math functions on them, whereas left as the default (type string) will never give that ability. A static field mapping fed to ES ahead of document ingestion will ensure that the right field typing is applied to the incoming data to avoid disaster, wasted CPU cycles, and losses of extended functionality.

How-to, Next Time

In the next article I'll go over the gory details of how to properly define the fields you intend to use. At a basic level, you use strict LS filter rules along with the dynamic field mapping capability to see what fields come across from LS to ES. It allows you to tweak your filter rules until only the fields you want are traversing the stack. From there you feed them back in as a template (along with changes to the field types) and voila! ES and LS suddenly seem to get along with each other.

SPECIAL BONUS TIP: in part one I talked about breaking up your logging devices by port and type. Along with that do not forget to conditionally filter each set of device types via their assigned type before parsing that section of rules. I forgot to do this and ended up having my Nutanix logs parsed by my Nutanix grokparse filters... and then by my Fortinet grokparse filters, leading to all sorts of chaos. Start your constituent logstash configuration files (the ones I described previously as being dedicated to a particular device/vendor/application) with this (using Nutanix as the example):

if [type] == "nutanix" {

{nested nutanix-filter_01}
.
.
.
{nested nutanix-filter_99}
}

Art - Mechlab WIP

Another in my series of "unfinished art projects that shall be finished soon"... here's mechlab!

Art - Quadville WIP shots

IT's been awhile since I've worked on Quadville, but I'm aiming to get back into it. Until then, here's some last screenshots from this work in progress. Enjoy!