What is the Log Pipeline?

Kolide's Inventory and Checks features regularly poll the current state of your devices to provide up-to-date information. For users who wish to forward and store their device data outside of Kolide, we offer our Log Pipeline. 

Please note: This is an advanced feature that gives you direct access to Osquery, an open-source component of our agent that collects data from a device. If used incorrectly, Log Pipeline can have negative performance ramifications on the devices enrolled in K2. Before using this feature, we highly recommend you review the Osquery documentation and practice writing performant and useful queries in our Live Query feature.

The Log Pipeline allows you to log all changes that are happening on a device, and track these changes happening across your fleet. Beyond the data Kolide already collects and stores in Inventory, you can also track things like:

  • Process Events 
  • File Changes via File Integrity Monitoring
  • Hardware Events
  • System Logs
  • ...and any other data Osquery is capable of collecting (see schema)

You can then easily configure where you want the logs to be sent. Currently we support two types of log destinations:

  • Amazon S3 Bucket
  • GCP Google Cloud Storage Bucket

These logs can be populated from one of several sources:

  • Inventory: Device Properties
  • Query Packs
  • File Integrity Monitoring (FIM)
  • Custom Osquery SQL Queries

Getting Started:

To get started, log into Kolide, click your Avatar, and then click on "Settings". 

On the left-hand sidebar menu, you will be able to see a new section called APP-WIDE SETTINGS, with Log Pipeline as an item within this new section. Underneath Log Pipeline are several pages:

You can configure additional osquery specific settings by going to Osquery Settings (located under ADMIN SETTINGS). Here you can adjust customizable options for the underlying osquery agent.

There are also two osquery-specific features which can be configured:

Logging Inventory: Device Property Loggers

The easiest way to get started with the Log Pipeline is exporting Kolide's Inventory data. Without writing any SQL you can enable logging for any item in the Device Property sidebar.

To enable logging for a Device Property: 

  1. Go to the Log Pipeline 
  2. Under Device Property Loggers, Click: Manage Loggers
  3. Click the Enable Toggle next to the item of interest

Kolide will now begin forwarding results of these inventory items to your configured log destinations.

Osquery Packs

Osquery Packs are collections of one or more queries that run on a recurring interval (per-query)

For example you could create a Query Pack called Browser_Extensions which logged all known browser extensions. This pack would be populated by three separate queries, one each for: 

  • SELECT * FROM chrome_extensions 
  • SELECT * FROM firefox_addons  
  • SELECT * FROM ie_extensions 

For each of these queries you could specify:

  • Interval - How often the query is run
  • Shard - The percentage of my fleet I wish to run the query against
  • Log Type - What type of result I want to receive (snapshot vs differential)
  • Platform - Which device platforms I want to run the query against (macOS, Linux, Windows)

There are two methods which you can use to add Query Packs:

  • New Empty Pack - You can write and save SQL queries in the UI
  • Import JSON Pack - You can import using the standard osquery pack format

New Empty Pack

The first way we can create a new Query Pack is the manual method. Here we will write the SQL queries and specify their options within the Kolide UI.

  1. Navigate to Settings
  2. Click Log Pipeline
  3. Click Osquery Packs
  4. Click Add Pack button
  5. Select New Empty Pack from the dropdown 
  6. Add a name and optional description 
  7. Designate which platforms to run the queries on (which you can edit later) 
  8. Set a minimum osquery version
  9. Set Shard (Shard (In Percent) is a setting that restricts this query pack to a percentage (between 1-100) of target hosts.). 

You can read more about query packs in the osquery documentation.

Import JSON Pack

If we have existing Query Packs written we can add them by importing the Pack JSON. To Import a JSON Pack:

  1. Navigate to Settings
  2. Click Log Pipeline
  3. Click Osquery Packs
  4. Click Add Pack button
  5. Select Import JSON Pack from the dropdown 
  6. Add a name and optional description
  7. Paste valid Osquery Pack JSON into the text area. 

You can read more about query packs in the osquery documentation.

Queries inside the packs

Once a pack is created, you will want to add queries you’d like to run within the pack. You can add new queries to the packs, and edit the queries once they are added. 

Log Type:

  • Differential with additions only - Emits a batch format log when something changes, with only additions, not what used to be there. 

Log Destinations

To utilize the Log Pipeline, you will need to configure at least one destination where Kolide can forward logs to. There are two supported Log Destinations:

  • Amazon S3 Bucket
  • GCP Google Cloud Storage Bucket

Setting up an Amazon S3 Bucket

There are two ways to add an Amazon S3 Bucket. To start either:

  1. Click the Add New Destination button on the Log Pipeline / Log Destinations page.
  2. Select AWS S3 Bucket from the dropdown.
  3. Select your preferred authentication method:

Authentication method 1:
Grant Kolide Access with STS (this is the more secure option):

  1. Provide a Display Name for your bucket. This will help you differentiate it from your other configured log destinations.
  2. Provide the AWS S3 Bucket Name for your desired bucket.
  3. Provide the AWS S3 Role ARN, which have permissions to write to the bucket. 

Providing Access to AWS Accounts Owned by Third Parties

or
Authentication 2:
Specify IAM Access Key ID & Secret Access Key:

  1. Provide the IAM credentials and Bucket Name for your desired bucket.
  2. Choose whether to send either or both Status Logs and Result Logs. 

Setting up a GCP Google Storage Bucket

To add a GCP Google Cloud Storage bucket:

  1. Click the Add New Destination button on the Log Pipeline / Log Destinations page. 
  2. Select GCP Bucket from the dropdown.
  3. Provide a Display Name for your bucket, this will help you differentiate it from your other configured log destinations.
  4. Provide your GCP Bucket Name and paste the contents of the corresponding GCP IAM JSON key file for your desired bucket.
  5. Choose whether to send either or both Status Logs and Result Logs. 

Log Path and Formatting

As noted in the Add New Destination modal, logs are written to a custom path of your choosing. When constructing a path, you can choose from the following variables: 

  • {{device_id}} - The unique identifier for the Device sending the logs.
  • {{device_name}} - The display name of the device (usually its host name) or, if no device name is found, the string "NO DEVICE NAME".
  • {{device_serial}} - The device's hardware serial number or, if no serial is found, the string "NO DEVICE SERIAL".
  • {{request_id}} - A ULID associated with the HTTPS request made by the Osquery agent. (Note: it is possible for files to share the same ULID across queries)
  • {{random_ulid}} - A random ULID that is generated for each log written into the bucket. 
  • {{pack_name}} - The name of the query pack (RESULT LOGS ONLY).
  • {{query_name}} - The name of the query inside the query pack (RESULT LOGS ONLY).

See the examples below on how to use these variables on how to construct these log paths.

Result Logs
kolide/results/{{pack_name}}/{{query_name}}/device-{{device_id}}/{{request_id}}.json

Status Logs
kolide/status/device-{{device_id}}/{{request_id}}.json

Log Formatting

Inside each object in the bucket you will find a single JSON object (or an array of JSON objects if viewing status logs). These logs will be in Osquery snapshot format (for queries explicitly marked as “snapshot” or it batched event mode format. Status logs utilize GLOG’s JSON formatting. 

Beyond the default information sent by Osquery, Kolide further “decorates” this data with the following additions:

  • kolide_decorations - Contains information about the device useful for correlating it to the Device and its owner in our API.
  • request_id - The ULID representing the HTTPS request made by the Osquery agent (matches the object name in the bucket)

Live Log Viewer

To aid in debugging, the Live Log Viewer allows you to see logs that are being received in real time. Please note, these logs are not saved in Kolide, and this page is reset each time you visit this page. While you can download the logs, but they will not necessarily be current. To continuously collect the logs, you will need to set a log destination. 

Osquery Settings

The Osquery Options page allows you to adjust the configuration of the osquery agent. We strongly recommend using the default values, but provide access to these options, for advanced users, who need to fine-tune the behavior of the osquery agent on their devices.

 

If you accidentally make changes that you wish to revert, you can navigate away from the page without saving. If you accidentally save changes and wish to revert to the original configuration, you may click the button marked: ‘Reset to Defaults’.

Decorators

Query pack results can be supplemented with additional data called decorators (eg. the device's hostname). These decorators are added to each result in the log-file. Decorators can be used to more easily identify a host. In this section, you can add your own decorators by creating a name, which will be added to results in the log-file, select which OS platform you want to run the decorators on, designate a Run Type, as well as a SQL Query. 

The dropdown marked Run Type provides three options for this setting.

  • Always (default) - Run decorator everytime the log is emitted
  • Load - At the start when the computer starts up
  • Interval - Specified interval (in seconds)

File Integrity Monitoring (FIM)

File integrity monitoring (FIM) is available for Linux (using inotify), macOS (using FDEvents), and Windows (using NTFS journal events). The daemon configures a watchlist of files/directories from the osquery configuration. The actions (and hashes when appropriate) on those selected paths populate the file_events table (macOS and Linux) or the ntfs_journal_events.

Proper configuration of the FIM requires two pieces:

  1. Specification of paths you wish to monitor.
  2. Configuration of a query pack with a valid file_events or ntfs_journal_events query.

Setting up a FIM Category

To get started let's first Create a New FIM Category, this will allow us to specify the paths on disk that we wish to monitor. There are two ways you can specify paths:

  1. Using predetermined paths
  2. Using SQL to programmatically return paths

Using predetermined paths:

To use predetermined paths you can select Provided List from the dropdown labeled paths defined by  and add each path of interest on their own line as shown in the screenshot below:

Paths can use wildcards to allow relative locations. This helps us deal with directories that would be uniquely named on each device. For example:

 /Users/john/Library/%% 

 /Users/beth/Library/%%  

We can use fnmatch-style, or filesystem globbing, patterns to represent the target paths. You may use standard wildcards *  or SQL-style wildcards % .

A single wildcard (%) will return only files and folders at that level of the file-system hierarchy.

A double wildcard (%%) will recursively search the current level and any underlying subdirectories.

Matching wildcard rules

  • % : Match all files and folders for one level.
  • %% : Match all files and folders recursively.
  • %abc : Match all within-level ending in "abc".
  • abc% : Match all within-level starting with "abc".

Matching examples

  • /Users/%/Library : Monitor for changes to every user's Library folder.
  • /Users/%/Library/ : Monitor for changes to files within each Library folder.
  • /Users/%/Library/% : Same, changes to files within each Library folder.
  • /Users/%/Library/%% : Monitor changes recursively within each Library.
  • /bin/%sh : Monitor the bin directory for changes ending in sh.

⚠️ Warning! ⚠️ Double wildcards (recursive searches) cannot be used mid-string:

For example the following path is not valid: /Users/%%/%.conf 

Using SQL to Programmatically return paths:

You may not know the location of your desired path ahead of time. In those cases, you can select SQL Query Output from the dropdown labeled paths defined by  and add a valid SQL query such as shown in the screenshot below:

Excluding Paths:

In addition to choosing paths you wish to monitor, you may specify paths that you would like to exclude.

Configuring a Query Pack to retrieve results from the FIM

In order to retrieve these events, a Query Pack that queries the file_events table must first be configured. Query Packs are configured in the Logging Pipeline/Query Packs. An example Query Pack for retrieving FIM results could be:

{

  "queries": {

    "file_events": {

      "query": "SELECT * FROM file_events;",

      "interval": 600

    }

  }

}

You could paste the above JSON in as a New Query Pack

This pack would then return results for all of your configured FIM Categories.

As file changes happen in paths specified by FIM Categories, events will appear in the file_events table. During a file change event, the md5, sha1, and sha256 for the file will be calculated if possible. A sample event looks like this:

{

  "action":"ATTRIBUTES_MODIFIED",

  "category":"homes",

  "md5":"bf3c734e1e161d739d5bf436572c32bf",

  "sha1":"9773cf934440b7f121344c253a25ae6eac3e3182",

  "sha256":"d0d3bf53d6ae228122136f11414baabcdc3d52a7db9736dd256ad3ba",

  "target_path":"\/root\/.ssh\/authorized_keys",

  "time":"1429208712",

  "transaction_id":"0"

}

For more information on Windows File Integrity Monitoring, check out our recent blog post on the subject.

Compared to Kolide Fleet

For many users, Log Pipeline is a complete replacement for any on-prem instances of our open-source Osquery manager called Kolide Fleet. That being said, there are a few differences that you should be aware of before committing to moving over:

  • Log Pipeline allows you to target devices via their platform or using Osquery “Discovery Queries”. Log Pipeline does not yet support the creation of  Device labels (devices grouped by an Osquery SQL query)
  • Log Pipeline allows you to upload Osquery packs in their JSON format but does not yet support the Fleet File Format.
  • Log Pipeline gives you a Fleet-like user interface for adding packs, queries, and other items but does not yet offer programmatic access to these endpoints via an API.

Planned Features

The Log Pipeline is in private-beta and we are excited to see how customers take advantage of it before we plan its future development. That being said, there are a number of features we plan to add in 2020 (please note any speculative future work is always subject to change).

  • Programmatic access via our API.
  • Better visibility into the Log Pipeline in the Privacy Center
  • Additional Log Destinations (integrations with Log aggregator services, webhooks, etc.)

If this is something you want to see, let us know by hitting us up via Intercom, or emailing us support@kolide.com. 

A Note about Private Devices

Devices that have been marked “Private” do not emit Logs in the Log Pipeline and any custom Packs, Decorators, FIM Configurations, and Osquery Options will not apply to their devices.

In addition, any queries containing tables blacklisted in the Device Privacy Settings are not allowed to be added to Packs, Discovery Queries, and any other location that accepts Osquery SQL. 

Note: After a query is added to the Log Pipeline, it will not be removed if it has a table subsequently banned.

Did this answer your question?