danno's blog

Sunday, May 06, 2018

PowerShell Effective Route Table Lookup

Looking up the routes on a windows host via PowerShell can be misleading; at work (www.appliedi.net) our principal use is to see not just normal destination next-hops associated with actual adapters, but moreso those of SSL-VPN connections. A Disclaimer: what follows is ugly, and while I applaud the ability to derive this information from a single PowerShell statement, I abhor the statement itself and Microsoft's continued inability to to easily provide what is universally needed. With that chiding disclaimer, here ya go:

Get-NetIpInterface -ConnectionState Connected
|Where-Object -FilterScript {$_.InterfaceAlias -notmatch
"^Lo.*"}|Select-Object -Unique -Property ifIndex|Get-NetRoute|Where-Object -FilterScript {$_.NextHop -notmatch "((0`.0`.0`.0)|::)"}|Format-Table -Property
@{L='Destination';E='DestinationPrefix'}, @{L='Next Hop';E='NextHop'},@{L='Interface';E='InterfaceAlias'}

We grab routes from the IP Interfaces they originate from, ferreting out the useless usual suspects with output to human-readable-non-CamelCase output. Here's an example:

Destination      
Next Hop    Interface

-----------      
--------    ---------

174.136.79.138/32 192.168.1.1 Wi-Fi 4

0.0.0.0/0        
192.168.1.1 Wi-Fi 4

216.167.192.0/20 
10.95.0.8   fortissl

192.168.75.0/24  
10.95.0.8   fortissl

174.136.88.0/21  
10.95.0.8   fortissl

174.136.86.0/23  
10.95.0.8   fortissl

174.136.85.0/24  
10.95.0.8   fortissl

174.136.84.224/27 10.95.0.8   fortissl

Enjoy!

Friday, January 26, 2018

Logstash Imap input plugin works, but not as described

I've been re-entering the world of Logstash as I attempt to leverage it for workflow automation solutions. If that sounds fun, it's because it is.

However, applications that don't work as described can cause an large amount of non-fun when on boarding them into your operations. Don't get me started on Autotask, for instance (whoa Nelly!).

I will call out the Logstash Imap input plugin here, however. It works, and is awesome. But the description is woefully incorrect-

Descriptionedit

Read mails from IMAP server

Periodically scan an IMAP folder (INBOX by default) and move any read messages to the trash.

So when setting this up for testing I thought it would look at emails in the "INBOX" and act on them via the filter rules and then move them to the trash. But in testing nothing happened at all. I was using two gmail accounts and the Ubuntu server had it's logstash and postfix properly set up. I was debugging and tailing and everything... and nothing was happening.

But send it an email, and then it works. Well, sorta- it acts on that new email via filter rules, but does not move it to the trash. And it leaves all the other emails alone.

I didn't post this to beat up on the Logstash team, I think they're doing wonderful work. It's just that I've perused at least three or four blogposts out in Internetland that show how to use this input... and none of them reference this documentation error. I thought that someone should acknowledge it so others could avoid it.

Saturday, November 04, 2017

picture of the day - luxrender logos

Another selection of blast-from-the-blast illustrations, this time a collection of LuxRender logos I was working on.

Enjoy!

Tracking Fortigate SSL-VPN User Activity with Kibana

The dream of the Elasticsearch stack is that you will be able to glean information that might not be easily derived (or, at all) from other sources. The real sizzle is that it might be able to tell you more than the things it has information on. Here's a real-world example using Fortinet Fortigate SSL-VPN activity (Forticlient User VPN connections).

The Use-Case

The Elasticsearch Stack can be used to find SSL-VPN concurrent use problems, which cannot be found with the Fortigate itself (without a true parsing of the raw logs, including the ones that are useless). To briefly review Fortigate SSL-VPN functionality, the administrator can disallow multiple concurrent logins with the same VPN account. This is good for both hosting revenue as well as security, as a client is forced to purchase a separate account for each user, and each client user's usage can be individually tracked for accounting/security purposes.

A common problem with this is that individual end-users may have more than one device that uses the SSL-VPN account, or a group of users may be attempting to use the same account. For instance, a remote-worker user may have the same SSL-VPN account in use on a desktop and a laptop, or a whole office of employees might be trying to use the same account. In either case, they'll experience a race condition between the competing devices attempting to use the same account (if concurrent account usage is disallowed). How do we see it in a particular user? We modify the query for them.

Pre-requisites

You must have users that use a Fortigate vdom SSL-VPN gateway that logs to Elasticsearch (ES). Using Kibana we'll construct a query based on the collected information, so if this is a single unit and not a vdom in a multi-chassis setup, pay attention to the query language changes below. You are encouraged to have Logstash brokering the information between the Fortigate and your Elasticsearch back-end. This is encouraged because it's "relatively easy" to have Logstash auto-bucket your categories of data so that ES can store it and Kibana can more easily reference it for the search we're going to perform. (Hint- I use the kv plugin, but be careful about merely accepting the labels as Fortigate reports them, src-port is different than source-port or source-prt, etc., especially if Fortinet devices are not the only ones reporting to your ES back-end.)

Construct the basic search Columns

In my setup I have multiple vendor sources reporting to ES (via Logstash), not just Fortigates. Regardless, I set up the columns in the Kibana discovery window including the following: user, group, bytes_received, bytes_sent, dst_ip, dst_port, duration, action, reason, remote_ip.

Construct the Query

Kibana can help a query along visually by being able to add filters literally with your mouse... but real men/women/trans know their true grit shows when the query itself includes all that filtering. It also makes it easy to communicate in a blog post like this one where there are no screenshots (maybe I'll add some later. LOL, who am I kidding? Just read it.) So, here's the complete query:

vd: "vdom_name" AND _exists_: "user" AND NOT user: "N/A" AND (( action: "tunnel-up" AND NOT reason: "N/A" ) OR ( action: "tunnel-down") ) AND NOT bytes_sent: "0"

You will want to substitute '"vdom_name"' with the name of your particular vdom. If you are running this against a standalone Fortigate unit and not a vdom in a Fortigate chassis, then you'll want to remove the 'vd: "vdom_name" AND ' entirely.

Viewing what we have so far, and what are we looking at?

Okay so we have the query and the result table set up. If your gateway is busy like ours you'll see lots of users. This query will only show their logins and logouts, with the time and data used within those sessions. Pretty handy, but let's narrow the query down to a single user to see if concurrent account usage is happening.

Modified query for one user

Using the account name "testaccount01" as the example, the revised query would be-

vd: "vdom_name" AND _exists_: "user" AND NOT user: "N/A" AND (( action: "tunnel-up" AND NOT reason: "N/A" ) OR ( action: "tunnel-down") ) AND NOT bytes_sent: "0" AND user: "testaccount01"

Once that's running (and a relevant amount of time to review access attempts, like 12 or 24 hours, has been defined) simply look at the "remote_ip" column. If there are different IP addresses showing up at the same time, then we have a winner! This is somewhat complicated if the competing devices are all behind the same NAT firewall, but their contentious activity will still be logged, and it should show busier than if a single user on a single device was successfully using the account by themselves.

Friday, October 27, 2017

OpenBSD 6.2 on Azure

I can confirm multiple successful installations of OpenBSD 6.2 (AMD64 GENERIC-MP) running in Azure. There's an official document by MS that I ran off of:
https://docs.microsoft.com/en-us/azure/virtual-machines/linux/create-upload-openbsd

Of course, there are some differences in my implementation, hence this post. To use an OpenBSD VM in Azure, you have to-

create it outside Azure (Windows10 Pro or any other Hyper-V capable system).
do some basic configuration work.
create the Azure resources of the VM.
push the VHD up to Azure.
Turn on and test connectivity to the VM.

Create the VM

Simply create a VM on Hyper-V. Caveats:

must be VHD, fixed-size. The drive can be small if you are savvy about attaching other larger drives to the system later.
must have some basic configuration work done on it or it won't be accessible
must have either DHCP on it, or the IP address the VM gets in the 'lab' must be the IP address it will have in Azure.

VM Creation Steps

Create the fixed-size vhd disk first (to allow the creation of a vhd and not a vhdx)
Create the VM - 2-core, 2048GB RAM, one nic. No console redirection.
Attach the created drive and the latest OpenBSD iso file as a DVD disc
Boot

OpenBSD Installer instructions

give it the name you want it to be in Azure
create an extra user aside from root.
set DNS to be globally accessible like 8.8.8.8
Don't use DHCP, instead designate the IP address you want to be used in Azure

About IP addressing: it might be easier to have the LAN address you use re-created in Azure. This allows you to update the VM as much as you want in your environment prior to uploading it to Azure. Just remember that you will want to change the address later in Azure if you are going to have a site-to-site tunnel between that Azure environment and the one you created the VM in.

Post-Creation Configuration Steps

As root:

grab a pre-created doas.conf file and load it into /etc/. be sure it has the user accounted for in it so that you can ssh in as the user and then elevate to root.
it's best to also have pre-created .profile files for root and the other user to sit in their home directory, which can do things like color-highlight root's prompt, or account for the ftp server to install packages from.
install the following packages and make the following links:

pkg_add nano py-setuptools openssl git

ln -sf /usr/local/bin/python2.7 /usr/local/bin/python

ln -sf /usr/local/bin/python2.7-2to3 /usr/local/bin/2to3

ln -sf /usr/local/bin/python2.7-config /usr/local/bin/python-config

ln -sf /usr/local/bin/pydoc2.7 /usr/local/bin/pydoc

git the latest Azure agent, install and run it

git clone https://github.com/Azure/WALinuxAgent

cd WALinuxAgent

python setup.py install

waagent -register-service

waagent -force

confirm the Azure agent is running

ps auxw| grep waagent

Create the Azure Resources of the VM

When the VM is created it will take the specs you configured in VMM, like CPU Count and RAM. For the file system itself, however, some work needs to be done, namely a logical storage space needs to be created in Azure where the VM VHD file can be uploaded and subsequently used. The logic to Azure storage is Account=>Container=>blob (file).

Log into the target Azure Environment

IN this post, this is done via Azure CLI (2.0) on Windows on your local workstation (perhaps later I'll update with a straight-PS method). Confirm you're in the right environment by listing the resource group of the target environment you are provisioning into:

az group list --output table

Create the Storage Account

A storage account needs to be created and then a storage container, then the VHD can be uploaded to it and used. Here's the storage account creation command:

az storage account create --resource-group <rgrpname> --name <storageaccountname> --location <azure dc region> --sku Premium_LRS

This could look like this :

az storage account create --resource-group obsdrgrp --name obsd62storacc1 --location eastus --sku Premium_LRS

Create the Container

In ARM go to the storage account that was just created (in our example, obsd62storacc1) and copy one of the two keys in the Settings => Access keys page of the storage account. Then create the storage container with the following command that utilizes said key:

az storage container create --name <container name> --account-name <storageaccountname> --account-key <storage account key>

Implement the VM in Azure

We're in the action phase, uploading the VHD and creating the VM.

Upload the VHD

Issue the following the command to push the VHD (you shut down the VM, right?) up to Azure in the storage container you just created:

az storage blob upload --container-name <container name> --file <full-path-filename> --name <name you want for VHD in Azure> --account-name <storageaccountname> --account-key <storage account key>

Be sure to enclose the local path of the VHD in double-quotes, especially if the path includes directories with spaces in their labeling.

Create the VM in Azure

This is the final command to issue!

az vm create --resource-group <rgrpname> --name <VM name> --image "https://<storageaccountname>.blob.core.windows.net/<container name>/<VHD name in Azure>" --use-unmanaged-disk --os-type linux --admin-username <created user> --generate-ssh-keys

I've had the command 'fail' due to a timeout, seen it as a failure in ARM, and then restarted the VM and it was fine. Yep. In fact, I think it's supposed to fail if you follow this method because of the IP addressing that needs to be wonked in this next step.

Wonk the IP addressing

You may have stipulated the address in the VM, but Azure doesn't care, it will assign it a random address from the first subnet in the VNET this is VM is being provisioned into. In the environment I based this post on, this was a pain. I had to-

Change the base subnet the VM nic was looking at, and wait for a save.
Change the IP address to 'static', then to the address configured in the VM itself.

While this may seem like a pain, it well yeah, it is... this imho is better than Azure DHCP. Perhaps there's some mac-address reservation capability, but as of this writing I hadn't run into it, and I prefer actual static addressing anyway. I suspect many OBSD users do as well. Here's one little bonus I'll include for your troubles- in the NSG created for the VM, if you create an inbound allow-all policy, you'll actually be able to ping your VM. That's a bonus in Azure territory, as you can't do that with Windows VM's in Azure.

Connect to the VM

Now it's time to test connectivity to the VM and get on with your life. We'll use Putty on Windows to accomplish this- assumedly from the same workstation you created and uploaded the VM from. If you did not, then you'll need to get the keys that were created in the last command and use them where you will be using Putty from. First let's see what public IP address the VM got so we can use it to SSH to :

az vm list-ip-addresses --resource-group <rgrpname> --name <VM name>

Convert the Key for Putty

Unless you already have it, download and then run puttygen. In the File tab, select "Load private key". There were two keys generated in the Windows User .ssh directory that were created, id_rsa and id_rsa.pub. You want to select id_rsa. It's emphatic about your success or failure to find the right file.

Upon success, on the bottom-right of the menu, choose "Save private key", ensuring it was set to "RSA" key type and "2048" bits. Give it whatever name you want, but naming convention adherence is something I've never taken lightly :)

Use Putty to connect to the VM

In the putty configuration for this VM connection, under Connection => SSH => Auth, hit the browse button on the right-side and select the key file you generated with puttygen. Also under Connection => Data, put in the username that was created in the VM prior to uploading it to Azure.

That 'should' be all it takes to get in! You won't even need the user's password, it should authenticate via the key and you should be on your way! Let me know if this works for you, or if not, why!

Thanks for reading!

Friday, June 30, 2017

Lessons Learned with Logstash - Part III

I self-marketed my last post on Reddit and got encouraging replies for which I'm truly grateful. In one of the posts, I replied with what will be the latest installment here. I was just too excited! Anyway, this is the nitty-gritty of how to do the proper yeoman's work of proper field mapping in ES-

I use OpenBSD6.1 (on HyperV!) so apologies for OS-specific calls-

since I have four distinct types of sources I have each type log to LS on a port specific to that type. So all of my Junipers are logging to LS on 5001, my Fortigates on 5002, my Windows Servers on 5000, and my Nutanix Cluster Nodes reporting on 5005. I comment all but one out at a time to isolate the mapping work.
(assuming LS and ES are on the same box) (and not assuming the current state of the setup) (and assuming you want to start over wherever it is), I wrote the following script to stop LS, clear all the storage and logs for LS and ES, kill any existing mappings in ES and then restart it so that the system is ready to start a new round of mapping work:

[root@noc-05: Fri, Jun-30 10:32PM]
/root/#cat /usr/sbin/stopes
echo "\t\t\t ##### stopping logstash ##### \t\t\t"
rcctl stop logstash
sleep 2
echo "\t\t\t ##### clearing ES mappings ##### \t\t\t"
curl -XPOST 'localhost:9200/.kibana/_update_by_query?pretty&wait_for_completion&refresh' -H 'Content-Type: application/json' -d'{ "script": { "inline": "ctx._source.defaultIndex = null", "lang": "painless" }, "query": { "term": { "_type": "config" } }}'
rcctl stop elasticsearch
sleep 1
echo "\t\t\t ##### clearing ES and LS logs, storage ##### \t\t\t"
rm /var/log/logstash/logstash.log ;touch /var/log/logstash/logstash.log ;chown _logstash:_logstash /var/log/logstash/*;rm -rf /storage/elasticsearch/;rm /var/log/elasticsearch/elasticsearch.log ;touch /var/log/elasticsearch/elasticsearch.log ;chown _elasticsearch:_elasticsearch /var/log/elasticsearch/*
sleep 1
echo "\t\t\t ##### starting ES ##### \t\t\t"
rcctl start elasticsearch
[root@noc-05: Fri, Jun-30 10:32PM]
/root/#

For the current source category I'm working with, I pick through my logstash filters for them once again, being sure to not inadvertently introduce a field in two spots with slightly different spellings (equating to two separate fields in ES) like dst-ip and dst_ip.

I then start logstash with a single device category reporting in

rcctl -d logstash

watch the stuff come in, re-visiting _grokparsefailures, and repeatedly refreshing the index for new field types coming in (whether dynamically if you still have that on, or a manually defined field simply hasn't seen a log come in that triggers it's use). Some dynamically-mapped errors are ES's fault- others are because you are using the wrong UTF (8 vs 16) or not an appropriate codec that could be used. Either way, now is the time to see those, correct them in LS and restart it until you hone down what's going crazy. Now is when those online grok filter tools come in REAL handy. Keep using the stopes script, correct your logstash filtering, and restart logstash... repeatedly.

When you've felt you've a) rooted out all the _grokparsefailures (hint, put the pesky, corner-case logs in a catch-all filter so you can move on with life), b) rooted out the dynamic-mapping crap fields, you're ready to pull down the mapping from ES and convert it to the mapping you tell it to pay attention to (which just so happens to be ONLY the filtering your logstash config files are telling it to pay attention to)-

rcctl stop logstash
curl -XGET http://127.0.0.1:9200/logstash-*/_mapping?pretty > my_mapping.json
cp my_mapping.json my_template.json

That above gets a file for you to edit, this is where you tighten up the fields themselves. You will notice duplicate field entries (remember dst-ip and dst_ip) and you'll have to go back in LS and mutate => rename one of the two to match the other . Then you'll make decision on every field based on what you observed it's data to be and decide whether it's gong to be treated like text, an integer, an ip address, or time/date, etc. (I say etc. but I don't know anymore lol). Doing this is a huge favor not only to you but to the performance of your system. Improperly typed fields are the bane of our existence. For one thing, I could not get geomapping working in Kibana until I set the geoip fields correctly.

And if you are only doing one category of log sources, then you skip to the end and upload the mapping into ES and restart LS and you're in production!

curl -XPUT http://localhost:9200/_template/logstash-*_template?pretty -d @my_template.json
curl -XDELETE http://localhost:9200/logstash-*?pretty
rcctl -d start logstash

The above pushes the template to ES, clears any existing indices, and then fires up logstash to feed it production docs.

Conclusion

If you are like me, you have to repeat this for each category of logging source you deal with, then concatenate each of the sources into a single my_template.json file. I'm not there yet, still working on Windows Server (last of my 4 source categories). Also, the editing tools on this blog platform are deplorable- my Reddit post had better formatting than this blog post, sigh.

Thursday, June 29, 2017

Lessons Learned with Logstash - Part II

In Lessons Learned with Logstash (so far) I went into the some of the problems and challenges of Logstash. Since then, I've discovered what I refer to as the 'chasm' between an out-of-the-box Elasticsearch solution, and a large-scale, full-blown enterprise-level implementation. TLDR; if you're going to use ES out-of-the-box, don't expect to use it like that 'at-scale' for very long.

The Chasm - A field mapping explosion!

The chasm between simple-ES and Hawking-complicated-ES is what is known in the documentation as a 'field mapping explosion'. In a sweet turn of irony, that which draws most newcomers to ES is exactly what will cut them down at certain scale of logging without a major refactoring of their understanding of the stack.

side rant: I suspect this is why most blog articles related to syslogging with ES really only focus on a shallow treatment of Logstash... and never get to Elasticsearch itself- ES is complicated. Sadly this group doesn't just include personal bloggers like yours truly, but commercial vendors (traditional log handling application vendors) vying for a piece of the ES pie -I find their lackluster attempts to market their ES chops especially deplorable. If ES is being used by Netflix and Ebay and Wikipedia, then it stands to reason that the average toe-dipping blog article is wholly insufficient for, well, just about anything seriously related to ES (this article included!).

Back to business: Dynamic field mapping is a feature of ES to bring neophytes in, because it attempts to automatically create fields based on the data being fed to it, without any configuration or interaction by the administrator. Literally feed it a stream or store of information, and ES will get down to work creating all the fields for you, and then populating them with data. Sounds fantastic! What could possibly go wrong?

The Two Strikes Against ES

Unfortunately the key word in all of this isn't 'dynamic' or 'automatically', but rather the word 'attempt', because depending on the data being fed to it, it can either shine or crash and burn. Both are endings bright, but one isn't happy. To help explain, it's useful to know the two strikes against Elastic putting this feature into ES-

Those that rely the most on dynamic mapping are those most likely to suffer from it (newbies).
One of the best use-cases of ES creates an embarrassingly hot mess of an ELK stack when using dynamic field mapping (syslog).

I know this because I fell into both categories- a newbie using ES for syslog analysis.

How does it (not) work?

So what happens? At it's core, ES suffers from separation anxiety- but in the opposite manner we're used to. Instead of being clingy it tries to separate lots of information, and when it's syslog data, much it shouldn't be separated. Specifically, the hyphen signals to ES that whatever is on either side of it should be separated. In the world of syslog, that usually includes hostnames, and dates and times.

If it seems like a glaring issue, that's because it is. Your ES index will be littered with faulty fields, and those fields will contain data, both of which should've been data somewhere else you specifically needed it to be. That's bad when you're using ES to find needles in your haystack, and some needles get permanently lost in those bogus fields.

Your system will start to churn on itself. At one point I thought I needed around 500 to 600 fields. I had 906. Then I had 3342, and that was effectively data corruption on a mass scale (the system takes in ~4 million events per day). It made searches against appropriately stored information undeniably slow.

Simply crafting detailed logstash rules with plenty of fields guarantees nothing. If ES sees some data as a field, it can wreck your grok-parse rules and leave you wondering why they don't work when they once did. Add to this the variability of Unicode amongst your data sources (Juniper and MS-Windows are UTF-16, Nutanix and Fortinet are UTF-8, for instance) and you can spend a great deal of time hunting down your parsing issues, all while munging your data into oblivion.

The Solution

The solution is to prepare ES ahead of time for exactly what it should expect, using manual field mappings. Via the API, you can turn off dynamic mapping and manually map every field you will be using. The added benefit of this work is that at while defining the fields you set the type of data ES should expect per field. When tuned properly via appropriate data types, ES can index and search significantly faster than if it's left with the default field mapping definition. Speaking of which, Logstash will ship in fields that are of type string. This almost necessitates a static mapping exercise, as there performance and capability penalties for incorrectly typing your data.

Friends with Benefits

For instance, if IP addresses are properly typed as such, then you can do fancy things like search through CIDR subnets and perform IP space calculations. As another example, properly typing number data as integers (whole or floating, of various maximum sizes) allows ES to perform math functions on them, whereas left as the default (type string) will never give that ability. A static field mapping fed to ES ahead of document ingestion will ensure that the right field typing is applied to the incoming data to avoid disaster, wasted CPU cycles, and losses of extended functionality.

How-to, Next Time

In the next article I'll go over the gory details of how to properly define the fields you intend to use. At a basic level, you use strict LS filter rules along with the dynamic field mapping capability to see what fields come across from LS to ES. It allows you to tweak your filter rules until only the fields you want are traversing the stack. From there you feed them back in as a template (along with changes to the field types) and voila! ES and LS suddenly seem to get along with each other.

SPECIAL BONUS TIP: in part one I talked about breaking up your logging devices by port and type. Along with that do not forget to conditionally filter each set of device types via their assigned type before parsing that section of rules. I forgot to do this and ended up having my Nutanix logs parsed by my Nutanix grokparse filters... and then by my Fortinet grokparse filters, leading to all sorts of chaos. Start your constituent logstash configuration files (the ones I described previously as being dedicated to a particular device/vendor/application) with this (using Nutanix as the example):

if [type] == "nutanix" {

{nested nutanix-filter_01}
.
.
.
{nested nutanix-filter_99}
}

Art - Mechlab WIP

Another in my series of "unfinished art projects that shall be finished soon"... here's mechlab!

Art - Quadville WIP shots

IT's been awhile since I've worked on Quadville, but I'm aiming to get back into it. Until then, here's some last screenshots from this work in progress. Enjoy!

Sunday, May 14, 2017

Lessons Learned with Logstash (so far)

So far my experiences with the Elasticsearch stack (ES) have been very positive. I've had it installed on OpenBSD 6.0 (amd64 on virtualbox on wind10) and 6.1 (amd64 on 2012R2 Hyper-V) with vast version differences between the two. I've learned some things along the way, and thought I'd take a second to account for them, specifically as it relates to Logstash.

Use a grok tool to assist in filter building

A lot of forum/blog posts advocate ditching the tedious process of regex-building, opting instead for the use of the kv filter,and then moving on with your life. Perhaps that's a possibility for some, but in my case the kv filter (which stands for key/value for things like 'box=blah') simply wasn't going to work. In specific, Windows event logs (translated to syslog via syslogagent) often will break any kind of logic from other event logs, and some defy any logic whatsoever.

So if regex filters are as important to you in Logstash as they are to me, you're going to be spending a lot of time working with them. My philosophy regarding the ES stack is that Logstash is on the front-lines of the battle, and can significantly ease Elasticsearch's work as well as the end-user's, so spending a proportionally large amount of time in Logstash versus ES itself or Kibana in my estimation is time well-spent.

Don't beat your head against a wall testing your grok regex's with live (or even test) logs in Logstash. I don't have any affiliation with the following site, but grok constructor really has been helpful to me. It's probably like a lot of similar sites, where multiple logs can be loaded for multiple parse tests on a single regex, and allow you to specify your list of grok patterns so you can actually test what you will use. Once I successfully testbuild a regex here, I literally copy-n-paste it into my Logstash configuration.

Break up your logging devices by port and type

You can easily and permanently limit any 'damage' done by regexs for some devices that end up catching logs from others by categorizing logging devices by which destination port they report to the Logstash server on. Syslog is 514 and Logstash seems to like 5000, so in my implementations I start with 5001 for Juniper, 5002 for Fortinet, and 5003 for Windows, and on.

This also helps with _grokparsefailure identification in Kibana because now you can easily filter by type and be sure you're only looking at what you really want.

The port/type backstory

I stumbled into this practice on accident, grappling with what is likely Logstash's dirty little secret- that being problems with non-standard reporting devices. If you've scratched your head over a perfectly-constructed grok filter that works but still produces a _grokparsefailure tag, this is for you. It turns out that Logstash uses grok filtering internally on logs of type 'syslog', separately from anything you configure. For devices that don't "play nice" with their syslog reporting (read: Fortinet, Juniper, and more) Logstash will make it's displeasure known by attaching the _grokparsefailure tag to what it ships to ES.

Once I realized that all these maddening _grokparsefailures where not my fault (a whole bunch more were my fault), I did three things-

Moved my error log file to a larger capacity portion of my filesystem (from /var/log/ to /home/)
Confirmed that every filter had a tag attached to it
Separated incoming devices by port and assigned a type.

This had a great outcome- as soon as I re-classified incoming logs as anything other than syslog, Logstash itself stopped fretting over whether the incoming log was of a 'proper' format or not. That instantly made my failure logs drop, which also slowed the strain on capacity for my filesystem. Lots of big wins here!

I did not find this on my own. Aside from numerous google hits for things related to _grokparsefailure and matching, there was this very helpful article by James Turnbull on kartar.net .

Break up your configuration file

I've found that there is a sense of order and sanity in breaking up your configuration file into multiple configuration files. If there's more than one cook messing with your stew, you can limit any nuclear damage they do in a single file, and you can read multiple files of a configuration easier than one giant file. I've broken my into

intro file with inputs and opening filter invocation
files by device or application
a general/misc/smaller configuration file for smaller stuff.
outro file with closing filter bracket and output stanza.

You'd think it would be easy to keep track of all those nested brackets, but after a couple thousand lines it can be a nightmare.

I'll put more here as I come up with it.