Document toolboxDocument toolbox

Files fetcher

Overview

Devo File Fetcher is an extension developed on top of osquery’s extensibility framework that allows the Devo EA Manager to carve file contents and upload them to Devo. As a result, endpoints are configured to scan an arbitrary number of files and folders, process them, and upload their contents automatically.

Three tables are added to the standard OSQuery schema:

  • devo_files_config: returns general configuration information from the fetchfiles extension.
  • devo_files_info: shows statistics about the files processed.
  • devo_files: provides access to the content of the files processed by the extension.

By default, the DevoFetchFilesPack is included in the Endpoint Agent solution, containing the minimum set of queries that implement the file-processing functionality. The pack makes use of the tables listed above.

Basic configuration

In general terms, the filefetcher extension works by processing one or more paths defined in its configuration and uploading the contents of the files in those paths line-by-line. Such configuration is defined in the Ansible role duam-packs along with the rest of settings for the EA Manager. The following instructions specify the process to add a new configuration to the Files Fetcher extension:

1. Edit the Ansible playbook file located in $HOME/devo-ea-deployer/playbooks/roles/deam-packs/files/devo-packs/options.yaml

2. Locate the devo_extensions and fetchfiles sections in the playbook file. It should read as shown in the following code snippet:

devo_extensions: 
    fetchfiles:
        watchdog:
            tag: box.devo_ea.files
            file_buffer_size: 131072 # 128K
            max_number_of_parts_per_file: 2000
            paths:
              - pattern: /var/log/syslog
              - pattern: /var/log/system.log
              - pattern: C:\Program Files (x86)\Apache Software Foundation\Tomcat*\logs\*
              - pattern: C:\Program Files\Apache Software Foundation\Tomcat*\logs\*


3. Add additional pattern sections for each individual file or set of them to be found in the path. As an example, the last pattern in the snippet shows the configuration to ingest into Devo the contents of the default website IIS logs:

devo_extensions: 
    fetchfiles:
        watchdog:
            tag: box.devo_ea.files
            file_buffer_size: 131072 # 128K
            max_number_of_parts_per_file: 2000
            paths:
              - pattern: /var/log/syslog
              - pattern: /var/log/system.log
              - pattern: C:\Program Files (x86)\Apache Software Foundation\Tomcat*\logs\*
              - pattern: C:\Program Files\Apache Software Foundation\Tomcat*\logs\*
              - pattern: C:\inetpub\logs\LogFiles\W3SVC1\* 


4. Save all changes to the options.yaml file

5. Run the deam-packs playbook to apply the changes by executing the following commands (make sure the path to the inventory file is correct):

$ cd $HOME/devo-ea-deployer
$ ansible-playbook -i inventories/<YOUR INVENTORY FILE NAME>.yaml playbooks/deam-packs.yaml


Once the playbook is run, File Fetcher will automatically start reading the data and upstreaming it to Devo.

Data access in Devo

By default, all uploaded content files will be ingested into Devo under box.devo_ea.files

This destination data structure can be configured to point at any my.app.*.* tag.

If the data sent to Devo already has an existing technology and parser associated in Devo, the File Fetcher can be configured to use them. This feature is only available from v.1.1.0.

Options

Filesfetcher supports two different levels of configuration:

  • Global: Sets the overall behavior of the extension.
  • Per pattern: Allows setting-specific configurations per specified source files path.

Global options

The following options are available as global settings of the extension:

  • config_refresh: specifies the interval in which the agent will look for updates of the configuration of the filesfetcher extension in the EAM. Can be expressed in seconds (s), minutes (m), and hours (h).
  • watchdog: configuration block specific to the capturing function.
  • watchdog—scan each (number): specifies the interval in which all specified paths will be re-scanned for changes (e.g., new files detection).
  • watchdog—file_buffer_size (number): total size in kilobytes per processed chunk.
  • watchdog—max_number_of_parts_per_file (number): max number of processed events per chunk.
  • watchdog—tag (Devo tag): default destination in Devo for all ingested files. Can be overridden in the patterns options.
  • watchdog—allow_empty_paths (false | true): allows the usage of an empty path section (i.e., paths:[]).

The following example illustrates how these options are configured in the yaml file:

devo_extensions:
      fetchfiles:
        config_refresh: 30s
        watchdog:
          scan_each: 30s
          file_buffer_size: 102400 # 100k
          max_number_of_parts_per_file: 10000
          tag: my.app.ea.files
          allow_empty_paths: true
          paths: []


Patterns options

The patterns section allows for the definition of files scanning paths along with their respective scanning options. These options are described in the following list:

  • pattern (string): Specifies a the set of files to scan using a subset of glob patterns. Additionally, ‘**’ patterns are also supported to denote full folders and subfolders processing. Examples of valid pattern definitions are:
- pattern: /tmp/test/*
- pattern: /tmp/test/**/*.log
- pattern: /tmp/test/**/*
- pattern: /tmp/test/a/**
- pattern: /tmp/test/a/**/*
- pattern: /tmp/test/a/*/*
- pattern: /tmp/test/a1/b/{1,2}*.txt
  • tag: data structure in Devo where the content of the files matching the pattern will be uploaded.
  • content_separator (string): defines an event delimiter string. By default, events are processed as full line events.
  • file_processor (fixed | multiline): allows setting a multiline events processing in conjunction with the content_separator string. Default value is fixed (single-line events).

The following example illustrates the usage of these options:

devo_extensions:
      fetchfiles:
        config_refresh: 10m
        watchdog:
          scan_each: 1m
          tag: my.app.ea.files
          paths:
            - pattern: C:\flog\logs\apache\**\error*log
              tag: my.app.ea.apache-error
            - pattern: C:\flog\logs\apache\common*log
              tag: my.app.ea.apache-common
              content_separator: "a"
            - pattern: C:\flog\logs\apache\combined*log
              tag: my.app.ea.apache-combined
            - pattern: C:\flog\logs\xml\notes_xml?.log
              content_separator: <note>
              file_processor: multiline
  • threshold_file_modification_time (duration): Negative number in duration format that represents the time the File Fetcher needs to consider that an event is fully written. For example, ff the scanned file has been modified within now + threshold_file_modification_time, the last event is not sent but marked as the offset to be sent in the next scan iteration. When using bigger multiline events that could take longer to write, we advise that you increase the threshold so the chance to truncate a log is lower. By default the value is -500ms. The value should be in duration format, some valid examples are: -500ms, -10s, -5s. If the value is 0 or a positive value, every scan will send up to the end of the file. Note that this feature is only available from v1.1.0.
  • payload_format (c:event): Allows the user to remove the JSON wrapper around each event sent to Devo so the events are sent “as is”. Used to be able to use existing technologies in Devo that do not use JSON. As of v1.1.0, only supported technologies are the ones that don't modify the tag or the payload. The only value valid for this parameter is c:event. In order to make use of it, follow the configuration snippet below. Note that this feature is only available from v1.1.0.

    

paths:
          - pattern: /var/log/httpd/access_log
            tag: web.apache.access-combined.pro.ltdemo.www1
            payload_format: c:event
          - pattern: C:\Users\win.user\Documents\file2
            tag: my.app.ea.file2
            payload_format: c:event

Example of an event sent to Devo when not using payload_format (c:event):

<14>2021-06-04T12:24:41+02:00 2020-EMEA-0091 web.apache.access-combined.pro.ltdemo.www1: {"action":"snapshot","calendarTime":"Fri Jun  4 10:24:40 2021 UTC","columns":{"__devoPayloadFormat":"","__devoTag":"web.apache.access-combined.pro.ltdemo.www1","event":"[Fri May 21 15:40:14 2021] [nemo:trace1-8] [pid 1776:tid 4272] [client 52.79.74.15:55395] We need to navigate the open-source SMTP feed!","extracted_ts":"1622802280","file_mod_unix_ts":"1622646093","file_name":"/var/log/apache/error.log","file_offset":"287719","file_size":"287719","partial_offset":"287719","remaining_bytes":"0"},"counter":0,"decorations":{"hostIp":"192.168.1.134","host_uuid":"59a38fcc-2820-11b2-a85c-cb6f4f3f7739","hostname":"2020-EMEA-0091","platform":"rhel","tls_hostname":"localhost:8080"},"epoch":0,"hostIdentifier":"2020-EMEA-0091","name":"pack/DevoFetchFilesPack/files_content","numerics":false,"unixTime":1622802280}


 Example of an event sent to Devo using payload_format (c:cevent):

<14>2021-06-04T12:30:25+02:00 2020-EMEA-0091 web.apache.access-combined.pro.ltdemo.www1: [Fri May 21 15:40:14 2021] [nemo:trace1-8] [pid 1776:tid 4272] [client 52.79.74.15:55395] We need to navigate the open-source SMTP feed!

Multiline events using regular expressions


Files Fetcher enables the user to use regular expressions as delimiters for the events. This is a powerful tool to parse and interpret log files where the delimitation between events is not clear.

The regular expression defined as delimiter should follow the syntax defined in syntax package   - regexp/syntax - pkg.go.dev and it should always be placed at the beginning of the line in the log file. If the delimiter does not start at the beginning of the line it will be ignored.

Example:

We want to ingest into Devo a log file from an application with the following structure:

2012-01-19 10:13:25,393 [http-8080-1] ERROR com.myservlet.servlet.Servlet2  - DEBUG: null
java.lang.NullPointerException
    at com.myservlet.servlet.Servlet2.doPost(Servlet2.java:140)
    at com.myservlet.servlet.Servlet2.doGet(Servlet2.java:292)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290) ...
2012-01-19 10:13:35,393 [http-8080-1] ERROR com.myservlet.servlet.Servlet2  - DEBUG: null
java.lang.NullPointerException
    at com.myservlet.servlet.Servlet2.doPost(Servlet2.java:140)
    at com.myservlet.servlet.Servlet2.doGet(Servlet2.java:292)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290) ...
2012-01-19 10:14:45,393 [http-8080-1] ERROR com.myservlet.servlet.Servlet2  - DEBUG: null
java.lang.NullPointerException
    at com.myservlet.servlet.Servlet2.doPost(Servlet2.java:140)
    at com.myservlet.servlet.Servlet2.doGet(Servlet2.java:292)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290) ...


In the above example, there are three events that need to be parsed and sent to Devo. Looking at the log file, we can consider the event date as the delimiter between events. When there is a new date at the beginning of the line, it is considered that there is a new event.

Configuration in File Fetcher should be something like the following: 

 ...
  fetchfiles:
    watchdog:
      tag: box.devo_ea.files
      file_buffer_size: 131072 # 128K
      max_number_of_parts_per_file: 2000
      config_refresh: 1m
      paths:
        - pattern: /tmp/testlog/*.log
          content_separator: \d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}
          file_processor: multiline 


The regular expression defined in content_separator defines that a structure following : “4 digits-2 digits-2 digits 2 digits:2 digits:2 digits,3 digits” will mark the start of a new event and the end of the current one. (i.e: 2012-01-19 10:13:25,393).

Update configuration of patterns↔tags

Updating tags on the fly to avoid incoherent data in Devo isn't permitted. The file fetcher will check if the pattern and the tags match those that were set during the initial configuration. 

To reconfigure a pattern to a different tag:

  1. Remove the pattern that you want to reconfigure in the options.yaml file and deploy the deam-packs playbook (ansible-playbook -i inventories/your_inventory.yaml playbooks/deam-packs.yaml). See the highlighted pattern line in the screenshot below:
  2. Wait until the configuration is propagated to the agents. The default value for configuration refresh is 15 minutes and is set in the config_refresh tag in options.yaml.
  3. Once the configuration is propagated, recreate the pattern, this time pointing to the new tag in the options.yaml file.
  4. Deploy the deam-packs playbook (ansible-playbook -i inventories/your_inventory.yaml playbooks/deam-packs.yaml). See the pattern pointing to the new tag in the screenshot below:
  5. The data will start ingesting from the beginning of the file in the new tag.

Performance considerations

Depending on the configuration of the file-fetching mechanism, there might be a potential impact on the sizing of the UAM elements as well as in the data volumes ingested into Devo. A general recommendation is to introduce configurations one-by-one and with a clear, optimum specification of the files and their contents to be uploaded.

Furthermore, consider combining the file fetcher functionality with automatic labeling of endpoints and their corresponding configuration profiles (e.g., scan Apache logs in the designated paths only if the endpoint is running an Apache webserver process).