Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Logs generated by most AWS services (CloudtrailCloudTrail, VPC Flows, Elastic Load Balancer, etc.) are exportable to a blob object in S3. Many other 3rd party services have also adopted this paradigm so it has become a common pattern used by many different technologies. Devo Professional Services and Technical Acceleration teams have a base-collector code that will leverage this S3 paradigm to collect logs and can be customized for different customer's different technology logs that may be stored in S3.

...

  • Sending data to S3 (this guide uses Cloudtrail CloudTrail as a data source service)

  • Setting up S3 event notifications to SQS

  • Enabling SQS and S3 access using a cross-account IAM role

  • Gathering information to be provided to Devo for collector setup

...

  • Access to S3, SQS, IAM, and CloudTrail services

  • Permissions to send data to S3

  • Knowledge of log format/technology type being stored in S3

Create S3 bucket and set up data feed (CloudTrail example) 

Check this article for a setup configuration example.

Devo collector features

Feature

Details

Allow parallel downloading (multipod)

Allowedallowed

Running environmentsCollector Server, On Premise

  • collector server

  • on-premise

Populated Devo events

Tabletable

Flattening Preprocessing

Nono

Data sources

Data source

Description

Collector service name

Devo table

Available from release

Any

Theoretically any source you send to an SQS can be collected

 

 

v1.0.0

CONFIG LOGS

 

aws_sqs_config

cloud.aws.configlogs.events

v1.0.0

AWS ELB

 

aws_sqs_elb

web.aws.elb.access

v1.0.0

AWS ALB

 

aws_sqs_alb

web.aws.alb.access

v1.0.0

CISCO UMBRELLA

 

aws_sqs_cisco_umbrella

sig.cisco.umbrella.dns

v1.0.0

CLOUDFLARE LOGPUSH

 

aws_sqs_cloudflare_logpush

cloud.cloudflare.logpush.http

v1.0.0

CLOUDFLARE AUDIT

 

aws_sqs_cloudflare_audit

cloud.aws.cloudflare.audit

v1.0.0

CLOUDTRAIL

 

aws_sqs_cloudtrail

cloud.aws.cloudtrail.*

v1.0.0

CLOUDTRAIL VIA KINESIS FIREHOSE

 

aws_sqs_cloudtrail_kinesis

cloud.aws.cloudtrail.*

v1.0.0

CLOUDWATCH

 

aws_sqs_cloudwatch

cloud.aws.cloudwatch.logs

v1.0.0

CLOUDWATCH VPC

 

aws_sqs_cloudwatch_vpc

cloud.aws.vpc.flow

v1.0.0

CONTROL TOWER

VPC Flow Logs, Cloudtrail, Cloudfront, and/or AWS config logs

aws_sqs_control_tower

 

v1.0.0

FDR

 

aws_sqs_fdr

edr.crowdstrike.cannon

v1.0.0

GUARD DUTY

 

aws_sqs_guard_duty

cloud.aws.guardduty.findings

v1.0.0

GUARD DUTY VIA KINESIS FIREHOUSE

 

aws_sqs_guard_duty_kinesis

cloud.aws.guardduty.findings

v1.0.0

IMPERVA INCAPSULA

 

aws_sqs_incapsula

cef0.imperva.incapsula

v1.0.0

LACEWORK

 

aws_sqs_lacework

monitor.lacework.

v1.0.0

PALO ALTO

 

aws_sqs_palo_alto

firewall.paloalto.[file-log_type]

v1.0.0

ROUTE 53

 

aws_sqs_route53

dns.aws.route53

v1.0.0

OS LOGS

 

aws_sqs_os

box.[file-log_type].[file-log_subtype].us

v1.0.0

SENTINEL ONE FUNNEL

 

aws_sqs_s1_funnel

edr.sentinelone.dv

v1.0.0

S3 ACCESS

 

aws_sqs_s3_access

web.aws.s3.access

v1.0.0

VPC LOGS

 

aws_sqs_vpc

cloud.aws.vpc.flow

v1.0.0

WAF LOGS

 

aws_sqs_waf

cloud.aws.waf.logs

v1.0.0

Options

See examples of common configurations here: General S3 Collector Configuration Examples and Recipes
There are many configurable options outlined in the README on the GitLab link, reproduced here. See GitLab repository for specific examples in each subdirectory.

  • direct_mode --- true or false (default is false), set to true if the logs are being sent directly to the queue without using s3.

  • file_field_definitions --- defined as a dictionary mapping variable names (you decide) to lists of parsing rules.
    each parsing rule has an operator, with its own keys which go along with it. Parsing rules are applied in the order they are listed in the configuration.

    • The "split" operator takes an "on" and an "element" -- the file name will split into pieces on the character or character sequence specified by "on" and extract whatever is at the specified "element" index as in the example.

    • the "replace" operator take a "to_replace" and a "replace_with"

    • For example, if your filename were "server_logs/12409834/ff.gz", this configuration would store the log_type as "serverlogs"

Code Block
"file_field_definitions": 
{
	"log_type": [{"operator": "split", "on": "/", "element": 0}, {"operator": "replace", "to_replace": "_", "replace_with": ""}]
}
  • filename_filter_rules: a list of rules for filtering out entire files.

  • encoding -- takes a string from one of the following: “gzip” “none” “parquet”

  • ack_messages -- whether or not to delete messages from the queue after processing, takes boolean values. If not specified, default is true. We recommend leaving this out of the config. If you see it in there, pay close attention to if it’s on or off.

  • file_format -- takes a dictionary with the following keys

    • type -- a string specifying which processor to use

      • single_json_object -- logs are stored as/in a json object

        • single_json_object_processor config options: “key” (string: the key of where the list of logs is stored) See cloudtrail_collector for example.

          • Code Block
            config: {"key": "log"}
            fileobj:  {..."log": {...}}
      • unseparated_json_processor -- logs are stored as/in json objects which are written in a text file with no separator

        • unseparated_json config options: “key” (string: where the log is stored), “include” (dict: maps names of keys outside of inner part to be included, which can be renamed). If there is no key, that is, the whole JSON object is the desired log, set “flat”: true See aws_config_collector for example

          • Code Block
            fileobj:  {...}{...}{...}
      • text_file_processor -- logs are stored as text files, potentially with lines and fields separated with e.g. commas and newlines

        • text_file config options: includes options for how lines and records are separated (e.g. newline, tab, comma), good for csv style data.

      • line_split_processor –- logs stored in a newline separated file, works more quickly than separated_json_processor

        • config options: “json”: true or false. If setting json to true, assumes that logs are newline-separated json, and allows them to be parsed by the collector therefore enabling record-field mapping

      • separated_json_processor – logs stored as many json objects that have some kind of separator

        • config options: specify the separator e.g. “separator”: “||”. the default is newline if left unused.

          • Code Block
            fileobj:  {...}||{...}||{...}
      • jamf_processor – special processor for JAMF logs

      • aws_access_logs_processor – special processor for AWS access logs

      • windows_security_processor – special processor for Windows Security logs

      • vpc_flow_processor – special processor for VPC Flow logs

      • json_line_arrays_processor – processor for unseparated json objects that are on multiple lines of a single file

        • Code Block
          fileobj:  {...}{...}
          {...}{...}{...}
          {...}
      • dict_processor – processor for logs that comes as python dictionary objects, i.e. in direct mode

    • config -- a dictionary of information the specified file_format processor needs

  • record_field_mapping -- a dictionary -- each key defines a variable that can be parsed out from each record (which may be referenced later in filtering)
    e.g., we may want to parse something and call it "type", by getting "type" from a certain key in the record (which may be multiple layers deep).

    Code Block
    {"type": {"keys": ["file", "type"]},	"operations": []	}

    keys is a list of how key values in the record to look into to find the value, its to handle nesting (essentially defining a path through the data). Suppose we have logs that look like this:

    Code Block
    {“file”: {“type”: { “log_type” : 100}}}

    so if we want to get the log_type, we should list all the keys needed to parse through the json in order:

    Code Block
    keys: [“file”, “type”, “log_type”]

    In many cases you will probably only need one key.

    e.g. in flat json that isn’t nested

    Code Block
    {“log_type”: 100, “other_info”: “blah” ….}

    here you would just specify keys: [“log_type”]. A few operations are supported that can be used to further alter the parsed information (like split and replace). This snippet would grab whatever is located at log[“file”][“type”] and name it as “type”. record_field_mapping defines variables by taking them from logs, and these variables can then be used for filtering. Let’s say you have a log in json format like this which will be set to devo:

    Code Block
    {“file”: {“value”: 0, “type”: “security_log”}}

    Specifying “type” in the record_field_mapping will allow the collector to extract that value, “security_log” and save it as type. Now let’s say you want to change the tag dynamically based on that value. You could change the routing_template to something like my.app.datasource.[record-type]. In the case of the log above, it would be sent to my.app.datasource.security_log. Now let’s say you want to filter out (not send) any records which have the type security_log. You could write a line_filter_rule as follows:

    {"source": "record", "key": "type", "type": "match", "value": "security_log" } We specified the source as record because we want to use a variable from the record_field_mapping. We specified the key as “type” because that is the name of the variable we defined. We specify type as “match” because any record matching this rule we want to filter out. And we specify the value as security_log because we specifically do not want to send any records with the type equalling “security_log” The split operation is the same as if you ran the python split function on a string.

    Let’s say you have a filename “logs/account_id/folder_name/filename” and you want to save the account_id as a variable to use for tag routing or filtering.

    You could write a file_field_definition like this:

    "account_id": [{"operator": "split", "on": "/", "element": 1}]

    This would store a variable called account_id by taking the entire filename and splitting it into pieces based on where it finds backslashes, then take the element as position one. In Python it would look like:

    Code Block
    filename.split(“/”)[1]
  • routing_template -- a string defining how to build the tag to send each message. e.g.
    "my.app.wow.[record-type].[file-log_type]" -- if the "type" extracted during record_field_mapping were "null", the record would be sent to the tag "my.app.wow.null"

  • line_filter_rules -- a list of lists of rules for filtering out individual records so they do not get sent to devo
    for example:

Code Block
"line_filter_rules": [
	[{
        "source": "record",
        "key": "type",
        "type": "doesnotmatch",
        "value": "ldap"
      }],
    [
      {"source": "file", "key": "main-log_ornot", "type": "match", "value": "main-log"},
      {"source": "record", "key": "type", "type": "match", "value": "kube-apiserver-audit"},
    ]
  ]

This set of rules could be expressed in pseudocode as follows:
if record.type != "ldap" OR (file.main-log_ornot == main-log AND record.type == "kube-api-server-audit"):
do_not_send_record()

(Internal) Notes + Debugging
Config can include "debug_mode": true to print out some useful information as logs come in.
For local testing it is useful to set "ack_messages" to false, to try processing without eating from the queue. Be careful to remove this or set it to true when launching the collector. The default is to ack messages if it is not set.

If something seems wrong at launch, you can set the following in the collector parameters/ job config.

"debug_mode": true,
"do_not_send": true,
"ack_messages": false

This will print out data as it is being processed, stop messages from getting hacked, and at the last step, not actually send the data (so you can see if something is breaking without the customer getting wrongly formatted repeat data without consuming from the queue and losing data)

...

Rw tab
titleOn-premise collector

This data collector can be run in any machine that has the Docker service available because it should be executed as a docker container. The following sections explain how to prepare all the required setup for having the data collector running.

Structure

The following directory structure should be created for being used when running the collector:

Code Block
<any_directory>
└── devo-collectors/
    └── <product_name>/
        ├── certs/
        │   ├── chain.crt
        │   ├── <your_domain>.key
        │   └── <your_domain>.crt
        ├── state/
        └── config/ 
            └── config.yaml 
Note

Replace <product_name> with the proper value.

Devo credentials

In Devo, go to Administration → Credentials → X.509 Certificates, download the Certificate, Private key and Chain CA and save them in <product_name>/certs/. Learn more about security credentials in Devo here.

...

Note

Replace <product_name> with the proper value.

Editing the config.yaml file

...

Run the collector

If the value is true, the debug logging traces will be
Rw ui tabs macro
Rw tab
titleCloud collector

We use a piece of software called Collector Server to host and manage all our available collectors.

To enable the collector for a customer:

  1. In the Collector Server GUI, access the domain in which you want this instance to be created

  2. Click Add Collector and find the one you wish to add.

  3. In the Version field, select the latest value.

  4. In the Collector Name field, set the value you prefer (this name must be unique inside the same Collector Server domain).

  5. In the sending method select Direct Send. Direct Send configuration is optional for collectors that create Table events, but mandatory for those that create Lookups.

  6. In the Parameters section, establish the Collector Parameters as follows below:

Editing the JSON configuration

Code Block
{
  "global_overrides": {
    "debug": false
  },
  "inputs": {
    "sqs_collector": {
      "id": "12351",
      "enabled": true,
      "credentials": {
        "aws_access_key_id": "",
        "aws_secret_access_key": "",
        "aws_base_account_role": "arn:aws:iam::837131528613:role/devo-xaccount-cs-role",
        "aws_cross_account_role": "",
        "aws_external_id": ""
      },
      "ack_messages": false,
      "direct_mode": false,
      "do_not_send": false,
      "compressed_events": false,
      "base_url": "https://us-west-1.queue.amazonaws.com/id/name-of-queue",
      "region": "us-west-1",
      "sqs_visibility_timeout": 240,
      "sqs_wait_timeout": 20,
      "sqs_max_messages": 1,
      "services": {
        "custom_service": {
          "file_field_definitions": {},
          "filename_filter_rules": [],
          "encoding": "gzip",
          "send_filtered_out_to_unknown": false,
          "file_format": {
            "type": "line_split_processor",
            "config": {
              "json": true
            }
          },
          "record_field_mapping": {
            "event_simpleName": {
              "keys": [
                "event_simpleName"
              ]
            }
          },
          "routing_template": "edr.crowdstrike.cannon",
          "line_filter_rules": [
            [
              {
                "source": "record",
                "key": "event_simpleName",
                "type": "match",
                "value": "EndOfProcess"
              }
            ],
            [
              {
                "source": "record",
                "key": "event_simpleName",
                "type": "match",
                "value": "DeliverLocalFXToCloud"
              }
            ]
          ]
        }
      }
    }
  }
}
Info

All defined service entities will be executed by the collector. If you do not want to run any of them, just remove the entity from the services object.

Note

Please replace the placeholders with real world values following the description table below

Parameter

Data type

Type

Value range / Format

Details

debug_status

bool

Mandatory

false / true

If the value is true, the debug logging traces will be enabled when running the collector. If the value is false, only the info, warning and error logging levels will be printed.

short_unique_id

int

Mandatory

Minimum length: 1
Maximum length: 5

Use this param to give an unique id to this input service.

Note

This parameter is used to build the persistence address, do not use the same value for multiple collectors. It could cause a collision.

enabled

bool

Mandatory

false / true

Use this param to enable or disable the given input logic when running the collector. If the value is true, the input will be run. If the value is false, it will be ignored.

base_url

str

Mandatory

 

By default, the base url is https://sqs.region.amazonaws.com/account-number/queue-name. This needs to be set to the url of sqs.

aws_access_key_id

str

Mandatory/Optional

Any

Only needed if not using cross account

aws_secret_access_key

str

Mandatory/Optional

Any

Only needed if not using cross account

aws_base_account_role

str

Mandatory/Optional

Any

Only needed if using cross account This is devos cross account role

aws_cross_account_role

str

Mandatory/Optional

Any

Only needed if using cross account This is your cross account role

aws_external_id

str

Optional

Any

Extra security you can set up

ack_messages

bool

Manatory

false / true

Needs to be set to true to delete messages from the queue. Leave false until testing complete

direct_mode

bool

Optional

false / true

Set to False for most all scenarios.

This parameter should be removed if it is not used.

do_not_send

bool

Optional

false / true

Set to True to not send the log to Devo.

This parameter should be removed if it is not used.

sqs_visibility_timeout

int

Mandatory

Min: 120

Max: 43200 (haven’t needed to test higher)

This parameter specifies how long the object will be held by the collector. If it is not processed and deleted within the allotted time in seconds. The message will be put back and can be processed again.

Set this parameter for timeouts between the queue and the collector, the collector has to download large files and process them. Otherwise defaults to 120. For Crowdstrike FDR some messages can take 10-15 minutes to process please set the timeout to help duplicate reduction.

sqs_wait_timeout

int

Mandatory

Min: 20

Max: 20

This is how long polling works. It will wait per poll the value of seconds listed. If no message is found, it will return Long poll did not find any messages in queue. All data in the SQS queue has been successfully collected.

sqs_max_messages

int

Mandatory

Min: 1

Max: 6

This is now 1 always and forever.

region

str

Mandatory

Example:

us-east-1

This is the region that is in the base url

compressed_events

bool

Mandatory

This needs to be true or False

Only works with GZIP compression should be false unless you see this below.

If you see any errors ‘utf-8' codec can't decode byte 0xa9 in position 36561456: invalid start byte it might be the events need to be decompressed

encoding

str

Optional

 

This parameter means the way the log files are encoded inside the s3 bucket.

Options from most used to least used.

  • gzip

  • none

  • parquet

  • latin-1

  • Note

    • It can accept any other string like ascii or utf-16. It is just trying to read the file format.

Rw tab
titleOn-premise collector

This data collector can be run in any machine that has the Docker service available because it should be executed as a docker container. The following sections explain how to prepare all the required setup for having the data collector running.

Structure

The following directory structure should be created for being used when running the collector:

Code Block
<any_directory>
└── devo-collectors/
    └── <product_name>/
        ├── certs/
        │   ├── chain.crt
        │   ├── <your_domain>.key
        │   └── <your_domain>.crt
        ├── state/
        └── config/ 
            └── config.yaml 
Note

Replace <product_name> with the proper value.

Devo credentials

In Devo, go to Administration → Credentials → X.509 Certificates, download the Certificate, Private key and Chain CA and save them in <product_name>/certs/. Learn more about security credentials in Devo here.

Image Added
Note

Replace <product_name> with the proper value.

Editing the config.yaml file

Code Block
globals:
  debug: <debug_status>
  id: <collector_id>
  name: <collector_name>
  persistence:
    type: filesystem
    config:
      directory_name: state
  multiprocessing: false
  queue_max_size_in_mb: 1024
  queue_max_size_in_messages: 1000
  queue_max_elapsed_time_in_sec: 60
  queue_wrap_max_size_in_messages: 100

outputs:
  devo_1:
    type: devo_platform
    config:
      address: <devo_address>
      port: 443
      type: SSL
      chain: <chain_filename>
      cert: <cert_filename>
      key: <key_filename>

inputs:
  sqs:
    id: 12345
    enabled: true
    credentials:
      aws_access_key_id: password
      aws_secret_access_key: secret-access-key
      aws_base_account_role: arn:aws:iam::837131528613:role/devo-xaccount-cs-role
      aws_cross_account_role: arn:aws:iam::{account-id}:role/{role-name}
      aws_external_id: extra_security_optional
    region: region
    base_url: https://sqs.{region}.amazonaws.com/{account-number}/{queue-name}
    sqs_visibility_timeout: 120
    sqs_wait_timeout: 20
    sqs_max_messages: 4
    ack_messages: false
    direct_mode: false
    do_not_send: false
    compressed_events: false
    services:
      custom_service:
        file_field_definitions: {}
        routing_template: my.app.source1.type1filename_filter_rules: []
        encoding: gzip
        lineack_filter_rules: []
Info

All defined service entities will be executed by the collector. If you do not want to run any of them, just remove the entity from the services object.

Replace the placeholders with your required values following the description table below:

Parameter

Data type

Type

Value range

Details

debug_status

bool

Mandatory

false / true

messages: false
        file_format:
          type: single_json_object_processor
          config:
            key: Records
        record_field_mapping: {}
        routing_template: my.app.source1.type1
        line_filter_rules: []
Info

All defined service entities will be executed by the collector. If you do not want to run any of them, just remove the entity from the services object.

Replace the placeholders with your required values following the description table below:

Parameter

Data type

Type

Value range

Details

debug_status

bool

Mandatory

false / true

If the value is true, the debug logging traces will be enabled when running the collector. If the value is false, only the info, warning and error logging levels will be printed.

collector_id

int

Mandatory

Minimum length: 1
Maximum length: 5

Use this param to give an unique id to this collector.

collector_name

str

Mandatory

Minimum length: 1
Maximum length: 10

Use this param to give a valid name to this collector.

devo_address

str

Mandatory

collector-us.devo.io
collector-eu.devo.io

Use this param to identify the Devo Cloud where the events will be sent.

chain_filename

str

Mandatory

Minimum length: 4
Maximum length: 20

Use this param to identify the chain.cert  file downloaded from your Devo domain. Usually this file's name is: chain.crt

cert_filename

str

Mandatory

Minimum length: 4
Maximum length: 20

Use this param to identify the file.cert downloaded from your Devo domain.

key_filename

str

Mandatory

Minimum length: 4
Maximum length: 20

Use this param to identify the file.key downloaded from your Devo domain.

short_unique_id

int

Mandatory

Minimum length: 1
Maximum length: 5

Use this param to give an unique id to this input service.

Note

This parameter is used to build the persistence address, do not use the same value for multiple collectors. It could cause a collision.

input_status

bool

Mandatory

false / true

Use this param to enable or disable the given input logic when running the collector. If the value is true, the input will be run. If the value is false, it will be ignored.

base_url

str

Mandatory

 

By default, the base url is https://sqs.region.amazonaws.com/account-number/queue-name. This needs to be set to the url of sqs.

aws_access_key_id

str

Mandatory/Optional

Any

Only needed if not using cross account

aws_secret_access_key

str

Mandatory/Optional

Any

Only needed if not using cross account

aws_base_account_role

str

Mandatory/Optional

Any

Only needed if using cross account This is devos cross account role

aws_cross_account_role

str

Mandatory/Optional

Any

Only needed if using cross account This is your cross account role

aws_external_id

str

Optional

Any

Extra security you can set up

ack_messages

bool

Manatory

false / true

Needs to be set to true to delete messages from the queue. Leave false until testing complete

direct_mode

bool

Optional

false / true

Set to False for most all scenarios.

This parameter should be removed if it is not used.

do_not_send

bool

Optional

false / true

Set to True to not send the log to Devo.

This parameter should be removed if it is not used.

debugsqs_visibility_md5timeout

boolint

Optional

false / true

Set to True to will send the message md5 to my.app.sqs.message_body only needed for more debugging on duplicates.

This parameter should be removed if it is not used.

sqs_visibility_timeout

int

Mandatory

Min: 120

Max: 43200 (haven’t needed to test higher)

Mandatory

Min: 120

Max: 43200 (haven’t needed to test higher)

This parameter specifies how long the object will be held by the collector. If it is not processed and deleted within the allotted time in seconds. The message will be put back and can be processed again.

Set this parameter for timeouts between the queue and the collector, the collector has to download large files and process them. If this process is broken up the time. Otherwise defaults to 120. sqs_For Crowdstrike FDR some messages can take 10-15 minutes to process please set the timeout to help duplicate reduction.

sqs_wait_timeout

int

Mandatory

Min: 20

Max: 20The

min has handled most customer scenarios at this pointThis is how long polling works. It will wait per poll the value of seconds listed. If no message is found, it will return Long poll did not find any messages in queue. All data in the SQS queue has been successfully collected.

sqs_max_messages

int

Mandatory

Min: 1

Max: 6

This is now 1 always and forever.

region

str

Mandatory

Example:

us-east-1

This is the region that is in the base url

compressed_events

bool

Mandatory

This needs to be true or False

Only works with GZIP compression should be false unless you see this below.

If you see any errors ‘utf-8' codec can't decode byte 0xa9 in position 36561456: invalid start byte it might be the events need to be decompressed

Download the Docker image

encoding

str

Optional

 

This parameter means the way the log files are encoded inside the s3 bucket.

Options from most used to least used.

  • gzip

  • none

  • parquet

  • latin-1

  • Note

    • It can accept any other string like ascii or utf-16. It is just trying to read the file format.

Download the Docker image

The collector should be deployed as a Docker container. Download the Docker image of the collector as a .tgz file by clicking the link in the following table:

Collector Docker image

SHA-256 hash

collector-aws_sqs_if-docker-image-1.27.0

d4a462b75032731042a2ce3d82ca92e6a0ee1b8b099b5371ecaa7028bc843e4f4b75fb4481203b5a416eb9523ef97b5fa09a939f530265b0158f530777398d28

Use the following command to add the Docker image to the system:

Code Block
gunzip -c <image_file>-<version>.tgz | docker load
Note

Once the Docker image is imported, it will show the real name of the Docker image (including version info). Replace <image_file> and <version> with a proper value.

The Docker image can be deployed on the following services:

Docker

Execute the following command on the root directory <any_directory>/devo-collectors/<product_name>/

Code Block
docker run 
--name collector-<product_name> 
--volume $PWD/certs:/devo-collector/certs 
--volume $PWD/config:/devo-collector/config 
--volume $PWD/state:/devo-collector/state 
--env CONFIG_FILE=config.yaml 
--rm 
--interactive 
--tty 
<image_name>:<version>
Note

Replace <product_name>, <image_name> and <version> with the proper values.

Docker Compose

The following Docker Compose file can be used to execute the Docker container. It must be created in the <any_directory>/devo-collectors/<product_name>/ directory.

Code Block
version: '3'
services:
  collector-<product_name>:
    image: <image_name>:${IMAGE_VERSION:-latest}
    container_name: collector-<product_name>
    volumes:
      - ./certs:/devo-collector/certs
      - ./config:/devo-collector/config
      - ./credentials:/devo-collector/credentials
      - ./state:/devo-collector/state
    environment:
      - CONFIG_FILE=${CONFIG_FILE:-config.yaml}

To run the container using docker-compose, execute the following command from the <any_directory>/devo-collectors/<product_name>/ directory:

Code Block
IMAGE_VERSION=<version> docker-compose up -d
Note

Replace <product_name>, <image_name> and <version> with the proper values.

Rw tab
titleCloud collector

We use a piece of software called Collector Server to host and manage all our available collectors.

To enable the collector for a customer:

  1. In the Collector Server GUI, access the domain in which you want this instance to be created

  2. Click Add Collector and find the one you wish to add.

  3. In the Version field, select the latest value.

  4. In the Collector Name field, set the value you prefer (this name must be unique inside the same Collector Server domain).

  5. In the sending method select Direct Send. Direct Send configuration is optional for collectors that create Table events, but mandatory for those that create Lookups.

  6. In the Parameters section, establish the Collector Parameters as follows below:

Editing the JSON configuration

Code Block
{
  "global_overrides": {
    "debug": false
  },
  "inputs": {
    "sqs_collector": {
      "id": "12351",
      "enabled": true,
      "credentials": {
        "aws_access_key_id": "",
        "aws_secret_access_key": "",
        "aws_base_account_role": "arn:aws:iam::837131528613:role/devo-xaccount-cs-role",
        "aws_cross_account_role": "",
        "aws_external_id": ""
      },
      "ack_messages": false,
      "direct_mode": false,
      "do_not_send": false,
      "compressed_events": false,
      "debug_md5": true,
      "base_url": "https://us-west-1.queue.amazonaws.com/id/name-of-queue",
      "region": "us-west-1",
      "sqs_visibility_timeout": 240,
      "sqs_wait_timeout": 20,
      "sqs_max_messages": 1,
      "services": {
        "custom_service": {
          "file_field_definitions": {},
          "filename_filter_rules": [],
          "encoding": "gzip",
          "send_filtered_out_to_unknown": false,
          "file_format": {
            "type": "line_split_processor",
            "config": {
              "json": true
            }
          },
          "record_field_mapping": {
            "event_simpleName": {
              "keys": [
                "event_simpleName"
              ]
            }
          },
          "routing_template": "edr.crowdstrike.cannon",
          "line_filter_rules": [
            [
              {
                "source": "record",
                "key": "event_simpleName",
                "type": "match",
                "value": "EndOfProcess"
              }
            ],
            [
              {
                "source": "record",
                "key": "event_simpleName",
                "type": "match",
                "value": "DeliverLocalFXToCloud"
              }
            ]
          ]
        }
      }
    }
  }
}
Info

All defined service entities will be executed by the collector. If you do not want to run any of them, just remove the entity from the services object.

Note

Please replace the placeholders with real world values following the description table below

Parameter

Data type

Type

Value range / Format

Details

debug_status

bool

Mandatory

false / true

If the value is true, the debug logging traces will be enabled when running the collector. If the value is false, only the info, warning and error logging levels will be printed.

short_unique_id

int

Mandatory

Minimum length: 1
Maximum length: 5

Use this param to give an unique id to this input service.

Note

This parameter is used to build the persistence address, do not use the same value for multiple collectors. It could cause a collision.

enabled

bool

Mandatory

false / true

Use this param to enable or disable the given input logic when running the collector. If the value is true, the input will be run. If the value is false, it will be ignored.

base_url

str

Mandatory

 

By default, the base url is https://sqs.region.amazonaws.com/account-number/queue-name. This needs to be set to the url of sqs.

aws_access_key_id

str

Mandatory/Optional

Any

Only needed if not using cross account

aws_secret_access_key

str

Mandatory/Optional

Any

Only needed if not using cross account

aws_base_account_role

str

Mandatory/Optional

Any

Only needed if using cross account This is devos cross account role

aws_cross_account_role

str

Mandatory/Optional

Any

Only needed if using cross account This is your cross account role

aws_external_id

str

Optional

Any

Extra security you can set up

ack_messages

bool

Manatory

false / true

Needs to be set to true to delete messages from the queue. Leave false until testing complete

direct_mode

bool

Optional

false / true

Set to False for most all scenarios.

This parameter should be removed if it is not used.

do_not_send

bool

Optional

false / true

Set to True to not send the log to Devo.

This parameter should be removed if it is not used.

debug_md5

bool

Optional

false / true

Set to True to will send the message md5 to my.app.sqs.message_body only needed for more debugging on duplicates.

This parameter should be removed if it is not used.

sqs_visibility_timeout

int

Mandatory

Min: 120

Max: 43200 (haven’t needed to test higher)

Set this parameter for timeouts between the queue and the collector, the collector has to download large files and process them. If this process is broken up the time. Otherwise defaults to 120.

sqs_wait_timeout

int

Mandatory

Min: 20

Max: 20

The min has handled most customer scenarios at this point.

sqs_max_messages

int

Mandatory

Min: 1

Max: 6

This is now 1 always and forever.

region

str

Mandatory

Example:

us-east-1

This is the region that is in the base url

compressed_events

bool

Mandatory

This needs to be true or False

Only works with GZIP compression should be false unless you see this below.

If you see any errors ‘utf-8' codec can't decode byte 0xa9 in position 36561456: invalid start byte it might be the events need to be decompressed

Verify data collection

Once the collector has been launched, it is important to check if the ingestion is performed in a proper way. To do so, go to the collector’s logs console.

This service has the following components:

...

Component

...

Description

...

Setup

...

The setup module is in charge of authenticating the service and managing the token expiration when needed.

...

Puller

...

The setup module is in charge of pulling the data in a organized way and delivering the events via SDK.

Setup output

A successful run has the following output messages for the setup module:

Code Block
2024-01-16T12:47:04.044    INFO OutputProcess::MainThread -> Process started
2024-01-16T12:47:04.044    INFO InputProcess::MainThread -> Process Started
2024-01-16T12:47:04.177    INFO InputProcess::MainThread -> InputThread(sqs_collector,12345) - Starting thread (execution_period=60s)
2024-01-16T12:47:04.177    INFO InputProcess::MainThread -> ServiceThread(sqs_collector,12345,aws_sqs_vpc,predefined) - Starting thread (execution_period=60s)
2024-01-16T12:47:04.177    INFO InputProcess::MainThread -> AWSsqsPullerSetup(unknown,sqs_collector#12345,aws_sqs_vpc#predefined) -> Starting thread
2024-01-16T12:47:04.177    INFO InputProcess::MainThread -> AWSsqsPuller(sqs_collector,12345,aws_sqs_vpc,predefined) - Starting thread
2024-01-16T12:47:04.178 WARNING InputProcess::AWSsqsPuller(sqs_collector,12345,aws_sqs_vpc,predefined) -> Waiting until setup will be executed
2024-01-16T12:47:04.191    INFO OutputProcess::MainThread -> ConsoleSender(standard_senders,console_sender_0) -> Starting thread
2024-01-16T12:47:04.191    INFO OutputProcess::MainThread -> ConsoleSenderManagerMonitor(standard_senders,console_1) -> Starting thread (every 300 seconds)
2024-01-16T12:47:04.191    INFO OutputProcess::MainThread -> ConsoleSenderManager(standard_senders,manager,console_1) -> Starting thread
2024-01-16T12:47:04.192    INFO OutputProcess::MainThread -> ConsoleSender(lookup_senders,console_sender_0) -> Starting thread
2024-01-16T12:47:04.192    INFO OutputProcess::ConsoleSenderManager(standard_senders,manager,console_1) -> [EMERGENCY PERSISTENCE SYSTEM] ConsoleSenderManager(standard_senders,manager,console_1) -> Nothing retrieved from the persistence.
2024-01-16T12:47:04.192    INFO OutputProcess::OutputStandardConsumer(standard_senders_consumer_0) -> [EMERGENCY PERSISTENCE SYSTEM] OutputStandardConsumer(standard_senders_consumer_0) -> Nothing retrieved from the persistence.
2024-01-16T12:47:04.192    INFO OutputProcess::MainThread -> ConsoleSenderManagerMonitor(lookup_senders,console_1) -> Starting thread (every 300 seconds)
2024-01-16T12:47:04.192    INFO OutputProcess::MainThread -> ConsoleSenderManager(lookup_senders,manager,console_1) -> Starting thread
2024-01-16T12:47:04.193    INFO OutputProcess::MainThread -> ConsoleSender(internal_senders,console_sender_0) -> Starting thread
2024-01-16T12:47:04.193    INFO OutputProcess::ConsoleSenderManager(lookup_senders,manager,console_1) -> [EMERGENCY PERSISTENCE SYSTEM] ConsoleSenderManager(lookup_senders,manager,console_1) -> Nothing retrieved from the persistence.
2024-01-16T12:47:04.193    INFO OutputProcess::MainThread -> ConsoleSenderManagerMonitor(internal_senders,console_1) -> Starting thread (every 300 seconds)
2024-01-16T12:47:04.193    INFO OutputProcess::MainThread -> ConsoleSenderManager(internal_senders,manager,console_1) -> Starting thread
2024-01-16T12:47:04.193    INFO OutputProcess::OutputLookupConsumer(lookup_senders_consumer_0) -> [EMERGENCY PERSISTENCE SYSTEM] OutputLookupConsumer(lookup_senders_consumer_0) -> Nothing retrieved from the persistence.
2024-01-16T12:47:05.795    INFO InputProcess::AWSsqsPuller(sqs_collector,12345,aws_sqs_vpc,predefined) -> Starting data collection every 5 seconds

Puller output

A successful initial run has the following output messages for the puller module:

Note that the PrePull action is executed only one time before the first run of the Pull action.

Code Block
I2024-01-16T17:02:56.221036303Z 2024-01-16T17:02:56.220    INFO InputProcess::AWSsqsPuller(sqs_collector,12345,aws_sqs_cloudwatch_vpc,predefined) -> Acked message receiptHandle: /+qA+ymL2Vs8yb//++7YM2Ef8BCetrJ+/+////F1uwLOVfONfagI99vA=
2024-01-16T17:02:56.221386926Z 2024-01-16T17:02:56.221    INFO InputProcess::AWSsqsPuller(sqs_collector,12345,aws_sqs_cloudwatch_vpc,predefined) -> Data collection completed. Elapsed time: 2.413 seconds. Waiting for 2.587 second(s) until the next one

Restart the persistence

This collector uses persistent storage to download events in an orderly fashion and avoid duplicates. In case you want to re-ingest historical data or recreate the persistence, you can restart the persistence of this collector by following these steps:

  1. Delete and Re-DO the collector with new ID number

The collector will detect this change and will restart the persistence using the parameters of the configuration file or the default configuration in case it has not been provided.

Note

Note that this action clears the persistence and cannot be recovered in any way. Resetting persistence could result in duplicate or lost events.

Collector operations

This section is intended to explain how to proceed with specific operations of this collector.

Verify collector operations

The initialization module is in charge of setup and running the input (pulling logic) and output (delivering logic) services and validating the given configuration.

Events delivery and Devo ingestion

The event delivery module is in charge of receiving the events from the internal queues where all events are injected by the pullers and delivering them using the selected compatible delivery method.

A successful run has the following output messages for the initializer module:

Code Block
INFO OutputProcess::SyslogSenderManagerMonitor(standard_senders,sidecar_0) -> Number of available senders: 1, sender manager internal queue size: 0
INFO OutputProcess::SyslogSenderManagerMonitor(standard_senders,sidecar_0) -> enqueued_elapsed_times_in_seconds_stats: {}
INFO OutputProcess::SyslogSenderManagerMonitor(standard_senders,sidecar_0) -> Sender: SyslogSender(standard_senders,syslog_sender_0), status: {"internal_queue_size": 0, "is_connection_open": True}
INFO OutputProcess::SyslogSenderManagerMonitor(standard_senders,sidecar_0) -> Standard - Total number of messages sent: 44, messages sent since "2022-06-28 10:39:22.511671+00:00": 44 (elapsed 0.007 seconds)
INFO OutputProcess::SyslogSenderManagerMonitor(internal_senders,sidecar_0) -> Number of available senders: 1, sender manager internal queue size: 0
INFO OutputProcess::SyslogSenderManagerMonitor(internal_senders,sidecar_0) -> enqueued_elapsed_times_in_seconds_stats: {}
INFO OutputProcess::SyslogSenderManagerMonitor(internal_senders,sidecar_0) -> Sender: SyslogSender(internal_senders,syslog_sender_0), status: {"internal_queue_size": 0, "is_connection_open": True}
INFO OutputProcess::SyslogSenderManagerMonitor(internal_senders,sidecar_0) -> Internal - Total number of messages sent: 1, messages sent since "2022-06-28 10:39:22.516313+00:00": 1 (elapsed 0.019 seconds)

Sender services

The Integrations Factory Collector SDK has 3 different senders services depending on the event type to delivery (internal, standard, and lookup). This collector uses the following Sender Services:

...

Sender Services

...

Description

...

internal_senders

...

In charge of delivering internal metrics to Devo such as logging traces or metrics.

...

standard_senders

...

In charge of delivering pulled events to Devo.

Sender statistics

Each service displays its own performance statistics that allow checking how many events have been delivered to Devo by type:

...

Logging trace

...

Description

...

Number of available senders: 1

...

Displays the number of concurrent senders available for the given Sender Service.

...

sender manager internal queue size: 0

...

Displays the items available in the internal sender queue.

This value helps detect bottlenecks and needs to increase the performance of data delivery to Devo. This last can be made by increasing the concurrent senders.

...

Total number of messages sent: 44, messages sent since "2022-06-28 10:39:22.511671+00:00": 21 (elapsed 0.007 seconds)

...

Displayes the number of events from the last time and following the given example, the following conclusions can be obtained:

  • 44 events were sent to Devo since the collector started.

  • The last checkpoint timestamp was 2022-06-28 10:39:22.511671+00:00.

  • 21 events where sent to Devo between the last UTC checkpoint and now.

  • Those 21 events required 0.007 seconds to be delivered.

By default these traces will be shown every 10 minutes.

Check memory usage

To check the memory usage of this collector, look for the following log records in the collector which are displayed every 5 minutes by default, always after running the memory free process.

  • The used memory is displayed by running processes and the sum of both values will give the total used memory for the collector.

  • The global pressure of the available memory is displayed in the global value.

  • All metrics (Global, RSS, VMS) include the value before freeing and after: previous -> after freeing memory

Code Block
INFO InputProcess::MainThread -> [GC] global: 20.4% -> 20.4%, process: RSS(34.50MiB -> 34.08MiB), VMS(410.52MiB -> 410.02MiB)
INFO OutputProcess::MainThread -> [GC] global: 20.4% -> 20.4%, process: RSS(28.41MiB -> 28.41MiB), VMS(705.28MiB -> 705.28MiB)

Differences between RSS and VMS memory usage:

  • RSS is the Resident Set Size, which is the actual physical memory the process is using

  • VMS is the Virtual Memory Size which is the virtual memory that process is using

Enable/disable the logging debug mode

Sometimes it is necessary to activate the debug mode of the collector's logging. This debug mode increases the verbosity of the log and allows you to print execution traces that are very helpful in resolving incidents or detecting bottlenecks in heavy download processes.

  • To enable this option you just need to edit the configuration file and change the debug_status parameter from false to true and restart the collector.

  • To disable this option, you just need to update the configuration file and change the debug_status parameter from true to false and restart the collector.

For more information, visit the configuration and parameterization section corresponding to the chosen deployment mode.

Change log

...

Release

...

Released on

...

Release type

...

Details

...

Verify data collection

Once the collector has been launched, it is important to check if the ingestion is performed in a proper way. To do so, go to the collector’s logs console.

This service has the following components:

Component

Description

Setup

The setup module is in charge of authenticating the service and managing the token expiration when needed.

Puller

The setup module is in charge of pulling the data in a organized way and delivering the events via SDK.

Setup output

A successful run has the following output messages for the setup module:

Code Block
2024-01-16T12:47:04.044    INFO OutputProcess::MainThread -> Process started
2024-01-16T12:47:04.044    INFO InputProcess::MainThread -> Process Started
2024-01-16T12:47:04.177    INFO InputProcess::MainThread -> InputThread(sqs_collector,12345) - Starting thread (execution_period=60s)
2024-01-16T12:47:04.177    INFO InputProcess::MainThread -> ServiceThread(sqs_collector,12345,aws_sqs_vpc,predefined) - Starting thread (execution_period=60s)
2024-01-16T12:47:04.177    INFO InputProcess::MainThread -> AWSsqsPullerSetup(unknown,sqs_collector#12345,aws_sqs_vpc#predefined) -> Starting thread
2024-01-16T12:47:04.177    INFO InputProcess::MainThread -> AWSsqsPuller(sqs_collector,12345,aws_sqs_vpc,predefined) - Starting thread
2024-01-16T12:47:04.178 WARNING InputProcess::AWSsqsPuller(sqs_collector,12345,aws_sqs_vpc,predefined) -> Waiting until setup will be executed
2024-01-16T12:47:04.191    INFO OutputProcess::MainThread -> ConsoleSender(standard_senders,console_sender_0) -> Starting thread
2024-01-16T12:47:04.191    INFO OutputProcess::MainThread -> ConsoleSenderManagerMonitor(standard_senders,console_1) -> Starting thread (every 300 seconds)
2024-01-16T12:47:04.191    INFO OutputProcess::MainThread -> ConsoleSenderManager(standard_senders,manager,console_1) -> Starting thread
2024-01-16T12:47:04.192    INFO OutputProcess::MainThread -> ConsoleSender(lookup_senders,console_sender_0) -> Starting thread
2024-01-16T12:47:04.192    INFO OutputProcess::ConsoleSenderManager(standard_senders,manager,console_1) -> [EMERGENCY PERSISTENCE SYSTEM] ConsoleSenderManager(standard_senders,manager,console_1) -> Nothing retrieved from the persistence.
2024-01-16T12:47:04.192    INFO OutputProcess::OutputStandardConsumer(standard_senders_consumer_0) -> [EMERGENCY PERSISTENCE SYSTEM] OutputStandardConsumer(standard_senders_consumer_0) -> Nothing retrieved from the persistence.
2024-01-16T12:47:04.192    INFO OutputProcess::MainThread -> ConsoleSenderManagerMonitor(lookup_senders,console_1) -> Starting thread (every 300 seconds)
2024-01-16T12:47:04.192    INFO OutputProcess::MainThread -> ConsoleSenderManager(lookup_senders,manager,console_1) -> Starting thread
2024-01-16T12:47:04.193    INFO OutputProcess::MainThread -> ConsoleSender(internal_senders,console_sender_0) -> Starting thread
2024-01-16T12:47:04.193    INFO OutputProcess::ConsoleSenderManager(lookup_senders,manager,console_1) -> [EMERGENCY PERSISTENCE SYSTEM] ConsoleSenderManager(lookup_senders,manager,console_1) -> Nothing retrieved from the persistence.
2024-01-16T12:47:04.193    INFO OutputProcess::MainThread -> ConsoleSenderManagerMonitor(internal_senders,console_1) -> Starting thread (every 300 seconds)
2024-01-16T12:47:04.193    INFO OutputProcess::MainThread -> ConsoleSenderManager(internal_senders,manager,console_1) -> Starting thread
2024-01-16T12:47:04.193    INFO OutputProcess::OutputLookupConsumer(lookup_senders_consumer_0) -> [EMERGENCY PERSISTENCE SYSTEM] OutputLookupConsumer(lookup_senders_consumer_0) -> Nothing retrieved from the persistence.
2024-01-16T12:47:05.795    INFO InputProcess::AWSsqsPuller(sqs_collector,12345,aws_sqs_vpc,predefined) -> Starting data collection every 5 seconds

Puller output

A successful initial run has the following output messages for the puller module:

Note that the PrePull action is executed only one time before the first run of the Pull action.

Code Block
I2024-01-16T17:02:56.221036303Z 2024-01-16T17:02:56.220    INFO InputProcess::AWSsqsPuller(sqs_collector,12345,aws_sqs_cloudwatch_vpc,predefined) -> Acked message receiptHandle: /+qA+ymL2Vs8yb//++7YM2Ef8BCetrJ+/+////F1uwLOVfONfagI99vA=
2024-01-16T17:02:56.221386926Z 2024-01-16T17:02:56.221    INFO InputProcess::AWSsqsPuller(sqs_collector,12345,aws_sqs_cloudwatch_vpc,predefined) -> Data collection completed. Elapsed time: 2.413 seconds. Waiting for 2.587 second(s) until the next one

Restart the persistence

This collector uses persistent storage to download events in an orderly fashion and avoid duplicates. In case you want to re-ingest historical data or recreate the persistence, you can restart the persistence of this collector by following these steps:

  1. Delete and Re-DO the collector with new ID number

The collector will detect this change and will restart the persistence using the parameters of the configuration file or the default configuration in case it has not been provided.

Note

Note that this action clears the persistence and cannot be recovered in any way. Resetting persistence could result in duplicate or lost events.

Collector operations

This section is intended to explain how to proceed with specific operations of this collector.

Verify collector operations

The initialization module is in charge of setup and running the input (pulling logic) and output (delivering logic) services and validating the given configuration.

Events delivery and Devo ingestion

The event delivery module is in charge of receiving the events from the internal queues where all events are injected by the pullers and delivering them using the selected compatible delivery method.

A successful run has the following output messages for the initializer module:

Code Block
INFO OutputProcess::SyslogSenderManagerMonitor(standard_senders,sidecar_0) -> Number of available senders: 1, sender manager internal queue size: 0
INFO OutputProcess::SyslogSenderManagerMonitor(standard_senders,sidecar_0) -> enqueued_elapsed_times_in_seconds_stats: {}
INFO OutputProcess::SyslogSenderManagerMonitor(standard_senders,sidecar_0) -> Sender: SyslogSender(standard_senders,syslog_sender_0), status: {"internal_queue_size": 0, "is_connection_open": True}
INFO OutputProcess::SyslogSenderManagerMonitor(standard_senders,sidecar_0) -> Standard - Total number of messages sent: 44, messages sent since "2022-06-28 10:39:22.511671+00:00": 44 (elapsed 0.007 seconds)
INFO OutputProcess::SyslogSenderManagerMonitor(internal_senders,sidecar_0) -> Number of available senders: 1, sender manager internal queue size: 0
INFO OutputProcess::SyslogSenderManagerMonitor(internal_senders,sidecar_0) -> enqueued_elapsed_times_in_seconds_stats: {}
INFO OutputProcess::SyslogSenderManagerMonitor(internal_senders,sidecar_0) -> Sender: SyslogSender(internal_senders,syslog_sender_0), status: {"internal_queue_size": 0, "is_connection_open": True}
INFO OutputProcess::SyslogSenderManagerMonitor(internal_senders,sidecar_0) -> Internal - Total number of messages sent: 1, messages sent since "2022-06-28 10:39:22.516313+00:00": 1 (elapsed 0.019 seconds)

Sender services

The Integrations Factory Collector SDK has 3 different senders services depending on the event type to delivery (internal, standard, and lookup). This collector uses the following Sender Services:

Sender Services

Description

internal_senders

In charge of delivering internal metrics to Devo such as logging traces or metrics.

standard_senders

In charge of delivering pulled events to Devo.

Sender statistics

Each service displays its own performance statistics that allow checking how many events have been delivered to Devo by type:

Logging trace

Description

Number of available senders: 1

Displays the number of concurrent senders available for the given Sender Service.

sender manager internal queue size: 0

Displays the items available in the internal sender queue.

This value helps detect bottlenecks and needs to increase the performance of data delivery to Devo. This last can be made by increasing the concurrent senders.

Total number of messages sent: 44, messages sent since "2022-06-28 10:39:22.511671+00:00": 21 (elapsed 0.007 seconds)

Displayes the number of events from the last time and following the given example, the following conclusions can be obtained:

  • 44 events were sent to Devo since the collector started.

  • The last checkpoint timestamp was 2022-06-28 10:39:22.511671+00:00.

  • 21 events where sent to Devo between the last UTC checkpoint and now.

  • Those 21 events required 0.007 seconds to be delivered.

By default these traces will be shown every 10 minutes.

Check memory usage

To check the memory usage of this collector, look for the following log records in the collector which are displayed every 5 minutes by default, always after running the memory free process.

  • The used memory is displayed by running processes and the sum of both values will give the total used memory for the collector.

  • The global pressure of the available memory is displayed in the global value.

  • All metrics (Global, RSS, VMS) include the value before freeing and after: previous -> after freeing memory

Code Block
INFO InputProcess::MainThread -> [GC] global: 20.4% -> 20.4%, process: RSS(34.50MiB -> 34.08MiB), VMS(410.52MiB -> 410.02MiB)
INFO OutputProcess::MainThread -> [GC] global: 20.4% -> 20.4%, process: RSS(28.41MiB -> 28.41MiB), VMS(705.28MiB -> 705.28MiB)

Differences between RSS and VMS memory usage:

  • RSS is the Resident Set Size, which is the actual physical memory the process is using

  • VMS is the Virtual Memory Size which is the virtual memory that process is using

Enable/disable the logging debug mode

Sometimes it is necessary to activate the debug mode of the collector's logging. This debug mode increases the verbosity of the log and allows you to print execution traces that are very helpful in resolving incidents or detecting bottlenecks in heavy download processes.

  • To enable this option you just need to edit the configuration file and change the debug_status parameter from false to true and restart the collector.

  • To disable this option, you just need to update the configuration file and change the debug_status parameter from true to false and restart the collector.

For more information, visit the configuration and parameterization section corresponding to the chosen deployment mode.

Change log

Release

Released on

Release type

Details

Recommendations

v1.7.0

Status
colourRed
titleBug Fixes

Status
colourBlue
titleFEATURES

Bug Fixes

  • Fixed control tower issue

  • Fixed bug with Falcon Data Replicator Large where logs were taking over an hour to finish

Features

  • Created custom tagging off of record field mapping

  • Created NLB logging service

  • Added INFO/DEBUG logging around each method so users can see size and timing.

Recommended Version

v1.6.4

Status
colourRed
titleBug Fixes

Status
colourGreen
titleImprovements

Features

  • Created custom tagging off of record field mapping

  • Added INF0/DEBUG logging around most methods so users can see size and timing.

Bug Fixes

  • Fixed Dependency Issue.

  • Fixed control tower issue

  • Fixed Falcon Data Replicator Large where logs were taking over an hour to finish.

Upgrade

v1.6.3

Status
colourRed
titleBug Fixes

Bug Fixes

  • Fixed Log Operations Bug

  • Added Backwards compatibility to control tower

  • Fixed Palo Alto Service for snappy decompression.

Upgrade

v1.6.2

Status
colourRed
titleBug Fixes

Bug Fixes

  • None type causing message processing to fail fdr_large, fixed.

  • Added default region to initialization of sts client to prevent needing environment variables in the green cluster.

  • Fixed bug in control tower processor

Upgrade

v1.6.1

Status
colourGreen
titleIMPROVEMENTS

Improvements

  • Created new processor for extracting a message from singular log

Upgrade

v1.6.0

Status
colourRed
titleBUG FIXES

Status
colourGreen
titleIMPROVEMENTS

Improvements

  • Increased DCSDK to 1.12.2 to 1.12.4

  • Removed Multithreading

  • Added a setup method

  • Removed Deduplication

  • Added debugging logging for using dynamic filenames to help with creating dynamic tags

Bug fixes

  • Fixed a bug where the message body was a string and caused a type error.

  • Fixed a bug where client was not refreshed in time before acknowledging a message.

Upgrade

v1.5.1

Status
colourRed
titleBUG FIXES

Bug fixes

Fixed dependency issue

Upgrade

v1.5.0

Status
colourRed
titleBUG FIXES

Status
colourGreen
titleIMPROVEMENTS

Feature

  • Removed debug_md5 and made it default for all dictionary logs

  • Created a new vpc flow processor

  • Added new sender for relay in house + TLS

  • Added persistence functionality for gzip sending buffer

  • Added Automatic activation of gzip sending

Improvements

  • Updated docker image to 1.3.0

  • Updated DCDSK from 1.11.1 to 1.12.2

  • Fixed high vulnerability in Docker Image

  • Upgrade DevoSDK dependency to version v5.4.0

  • Fixed error in persistence system

  • Applied changes to make DCSDK compatible with MacOS

  • Added new sender for relay in house + TLS

  • Added persistence functionality for gzip sending buffer

  • Added Automatic activation of gzip sending

  • Improved behaviour when persistence fails

  • Upgraded DevoSDK dependency

  • Fixed console log encoding

  • Restructured python classes

  • Improved behaviour with non-utf8 characters

  • Decreased defaut size value for internal queues (Redis limitation, from 1GiB to 256MiB)

  • New persistence format/structure (compression in some cases)

  • Removed dmesg execution (It was invalid for docker execution)

Upgrade

v1.4.0

Status
colourRed
titleBUG FIXES

Status
colourGreen
titleIMPROVEMENTS

Status
colourBlue
titleFEATURES

Features

  • Implemented use of pulling events sent by event bridge

  • Added more debugging information to be added to events such as: Time the message was sent to queue, times it has been sent to the queue, the bucket, and file name.

Bug fixes

  • Fixed an import dependency error

Improvements

  • Upped the visibility timeout to 1 hour by default

Upgrade

v1.3.2

Status
colourRed
titleBUG FIXES

Bug fixing

  • Fixed the initialization of the client credentials that was missing the token.

Upgrade

v1.3.1

Status
colourRed
titleBUG FIXES

Bug fixing

  • Fixed index out of range error in aws_sqs_fdr_large service

Upgrade

v1.3.0

Status
colourBlue
titleFEATURES

Features

  • Fixed logging message saying the message wasn’t acked event though it was

  • Added use of 1-6 messages back in config

  • Added multithreading for downloading messages in parallel

  • Updated the aws_sqs_fdr_large service with a faster downloading method using ijson.

Upgrade

v1.2.3

Status
colourBlue
titleFEATURES

Features

  • Updated to orjson for performance qualities.

Upgrade

v1.2.2

Status
colourBlue
titleFEATURES

Features

  • Changed processors in handling of the log from str to json dumps

Upgrade

v1.2.1

Status
colourBlue
titleFEATURES

Features

  • Added file filtering to the incapsula service

Upgrade

v1.2.0

Status
colourGreen
titleIMPROVEMENTS

Status
colourBlue
titleFEATURES

  • Updated to DCSDK 1.11.1

    • Added extra check for not valid message timestamps

    • Added extra check for improve the controlled stop

    •  Changed default number for connection retries (now 7)

    •  Fix for Devo connection retries

recommended

Upgrade

v1.1.3

Status
colourGreen
titleIMPROVEMENTS

Status
colourRed
titleBUG FIXES

Status
colourBlue
titleFEATURES

 

Bug fixes

  • Fixed bug in parquet log processing

  • Fixed the max number of messages and updated the message timeout in flight

  • Fixed the way access key and secret are used

Improvements

  • Updated to DCSDK 1.11.0

Features

  • Added feature to send md5 message to my.app table

  • Added RDS service to collector defs

upgrade

Upgrade

v1.0.1

Status
colourGreen
titleIMPROVEMENTS

Status
colourRed
titleBUG FIXES

Bug fixes

  • state file fixed

Improvements

  • using run method, instead of pull to enable long polling.

  • adding different types of encoding (latin-1)

  • update collector defs to be objects instead of arrays which was throwing off tagging, and record field mapping.

upgrade

Upgrade

v1.0.0

Status
colourGreen
titleINTIAL RELEASE

Released with DCSDK 1.10.2

upgrade

Initial version