Content Comparison

Table of Contents

minLevel	2
maxLevel	2
outline	false
type	flat
separator	brackets
printable	false

Overview

Logs generated by most AWS services (CloudtrailCloudTrail, VPC Flows, Elastic Load Balancer, etc.) are exportable to a blob object in S3. Many other 3rd party services have also adopted this paradigm so it has become a common pattern used by many different technologies. Devo Professional Services and Technical Acceleration teams have a base-collector code that will leverage this S3 paradigm to collect logs and can be customized for different customer's different technology logs that may be stored into in S3.

This documentation will go through setting up your AWS infrastructure for our collector integration to work out of the box:

Sending data to S3 (this guide uses Cloudtrail CloudTrail as a data source service)
Setting up S3 event notifications to SQS
Enabling SQS and S3 access using a cross-account IAM role
Gathering information to be provided to Devo for collector setup

...

Access to S3, SQS, IAM, and CloudTrail services
Permissions to send data to S3
Knowledge of log format/technology type being stored into in S3

...

Create S3 bucket and

...

set up

...

data feed (CloudTrail example)

...

The following will be set up during this section:

S3 bucket for data storage
CloudTrail trail for data logging into an S3 bucket

Create an S3 bucket

Rw ui steps macro

Rw step

Navigate to AWS Management Console and select S3.

Image Removed

Rw step

Create a new bucket that you wish for these logs or skip to the next step if using an existing bucket. Default S3 bucket permissions should be fine.

Image Removed

Set up a CloudTrail trail to log events into an S3 bucket

...

Rw step

After the bucket has been created, we will need to set up a data feed into this S3 bucket via CloudTrail. Click CloudTrail.

...

Rw step

Create a new trail following these steps:

...

Click Create trail.

...

When setting up the trail on the screen, make sure to choose the S3 bucket you want CloudTrail to send data into accordingly. If you have an existing S3 bucket, choose that box and enter your S3 bucket name. Otherwise, create a new S3 bucket here.

...

A prefix is optional but highly recommended for easier set up of S3 event notifications to different SQS queues.

...

All other options on this page are optional, but default settings do work. Check with your infra team to figure out what they want to do.

...

On the next page, you choose the log events you wish for CloudTrail to capture. At the very least, we recommend Management events be enabled. Data events and Insight events are additional charges so check with your team about this. Data events can generate A LOT of data if your account has power users of S3. Please check with your AWS team to see if these are worthwhile to track.

...

Finish up and create the trail.

...

Creating an SQS queue and enabling S3 event notifications

SQS provides the following benefits from our perspective:

Built in retrying on the failure of processing a message
Dead letter queueing (if enabled when setting up SQS queue)
Allows for downstream outage without loss of the state of processing
Allows for parallelization of workers in event of very high volume data
Guaranteed at least once delivery (S3 and SQS guarantees)
Ability to have multiple S3 buckets send events to the same SQS queue and even those in other accounts via S3 event notifications to SNS to SQS in the target account

Info

Optional - Using event otifications with SNS

Sending S3 event notifications to SNS may be beneficial/required to some teams if they are using the bucket event notifications in multiple applications. This is fully supported as long as the original S3 event notification message gets passed through SNS transparently to SQS. You will not need to follow the steps to set up event notifications to a single SQS, but could follow the Amazon documentation here to setup the following:

A brief write-up of this architecture can be found in this AWS blog. Note this will also help if you have buckets in different regions/accounts and would like one centralized technology queue for all of your logging.

Create an SQS queue for a specific service events type (i.e. CloudTrail)

In this example, we will continue by setting up an SQS queue for our CloudTrail technology logs.

Rw ui steps macro

Rw step

Navigate to the SQS console.

Rw step

Click Create queue.

Rw step

Create a Standard queue, the default configuration is fine.

Rw step

In the Access policy section, select Advanced and copy and paste the following policy replacing where {{ }} occurs.

Code Block

{
 "Version": "2012-10-17",
 "Id": "example-ID",
 "Statement": [
  {
   "Sid": "example-statement-ID",
   "Effect": "Allow",
   "Principal": {
     "Service": "s3.amazonaws.com"
   },
   "Action": [
    "SQS:SendMessage"
   ],
   "Resource": "arn:aws:sqs:{{SQS queue region}}:{{Account ID #}}:{{Queue name you are currently creating}}",
   "Condition": {
      "ArnLike": { "aws:SourceArn": "arn:aws:s3:*:*:{{Bucket name with data here}}" },
      "StringEquals": { "aws:SourceAccount": "{{Account ID # of the bucket}}" }
   }
  }
 ]
}

Info
An example resource ARN should look like this: `arn:aws:sqs:us-east-1:0123456789:devo-example-sqs-queue`

Rw step

The rest of the default configuration is fine, but you can set up a dead letter queue and server-side encryption, which is transparent to our side.

Rw step

Create the queue.

Rw step

Copy the URL of your newly created queue and save it, as you will need to provide Devo with this.

Image Removed

Setup S3 event notifications

Rw ui steps macro

Rw step

Navigate back to your S3 bucket with data in it.

Rw step

Click the Properties tab of the bucket.

Image Removed

Rw step

Click the Events box under Advanced settings.

Rw step

Click Create event notification

Image Removed

Rw step

Set up the event notifications similar to the following:

The event notification name can follow whatever naming convention you need.
Type of event: All object create events
If you put in a Prefix for your technology types, set the same here
The suffix should be .json.gz

Note
In this example, we’re using .json.gz but note that the suffix must match the suffix of the objects in the S3 bucket you have configured. The suffix may vary from one AWS service to another.

Set SQS Queue as notifications destination
Select the SQS queue name of the queue you created earlier.

Rw step

Click the Save button after configuring this.

Rw step

CloudTrail trail logs should now be generating corresponding messages in the queue if all was properly configured.

Enabling SQS and S3 access using a cross-account IAM role

For allowing the Devo collector to pull in data from your AWS environment, we will need an IAM cross-account role in your account. You will have to provide this role’s ARN to Devo.

Create an IAM policy

This IAM policy will:

Allow the role to read messages off the SQS queue and acknowledge (delete) them off the queue after successfully processing the messages
Retrieve the S3 object referenced in the SQS message so that Devo can read and process the message into the system
Provide limited access only to specified resources (minimal permissions)

Follow the next steps to create the IAM policy:

...

Rw step

Navigate to the IAM console.

...

Rw step

Go to the Policies section.

...

Rw step

Create a policy.

Rw step

Choose the JSON method and enter in the following policy while replacing the items within {{}} (ARN’s for the S3 bucket -optionally including configured prefix- and the SQS queue setup are in the previous steps of this guide).

...

Check this article for a setup configuration example.

Devo collector features

Feature	Details
Allow parallel downloading (`multipod`)	`allowed`
Running environments	`collector server` `on-premise`
Populated Devo events	`table`
Flattening Preprocessing	`no`

Data sources

Data source	Description	Collector service name	Devo table	Available from release
Any	Theoretically any source you send to an SQS can be collected			`v1.0.0`
CONFIG LOGS		`aws_sqs_config`	`cloud.aws.configlogs.events`	`v1.0.0`
AWS ELB		`aws_sqs_elb`	`web.aws.elb.access`	`v1.0.0`
AWS ALB		`aws_sqs_alb`	`web.aws.alb.access`	`v1.0.0`
CISCO UMBRELLA		`aws_sqs_cisco_umbrella`	`sig.cisco.umbrella.dns`	`v1.0.0`
CLOUDFLARE LOGPUSH		`aws_sqs_cloudflare_logpush`	`cloud.cloudflare.logpush.http`	`v1.0.0`
CLOUDFLARE AUDIT		`aws_sqs_cloudflare_audit`	`cloud.aws.cloudflare.audit`	`v1.0.0`
CLOUDTRAIL		`aws_sqs_cloudtrail`	`cloud.aws.cloudtrail.*`	`v1.0.0`
CLOUDTRAIL VIA KINESIS FIREHOSE		`aws_sqs_cloudtrail_kinesis`	`cloud.aws.cloudtrail.*`	`v1.0.0`
CLOUDWATCH		`aws_sqs_cloudwatch`	`cloud.aws.cloudwatch.logs`	`v1.0.0`
CLOUDWATCH VPC		`aws_sqs_cloudwatch_vpc`	`cloud.aws.vpc.flow`	`v1.0.0`
CONTROL TOWER	VPC Flow Logs, Cloudtrail, Cloudfront, and/or AWS config logs	`aws_sqs_control_tower`		`v1.0.0`
FDR		`aws_sqs_fdr`	`edr.crowdstrike.cannon`	`v1.0.0`
GUARD DUTY		`aws_sqs_guard_duty`	`cloud.aws.guardduty.findings`	`v1.0.0`
GUARD DUTY VIA KINESIS FIREHOUSE		`aws_sqs_guard_duty_kinesis`	`cloud.aws.guardduty.findings`	`v1.0.0`
IMPERVA INCAPSULA		`aws_sqs_incapsula`	`cef0.imperva.incapsula`	`v1.0.0`
LACEWORK		`aws_sqs_lacework`	`monitor.lacework.`	`v1.0.0`
PALO ALTO		`aws_sqs_palo_alto`	`firewall.paloalto.[file-log_type]`	`v1.0.0`
ROUTE 53		`aws_sqs_route53`	`dns.aws.route53`	`v1.0.0`
OS LOGS		`aws_sqs_os`	`box.[file-log_type].[file-log_subtype].us`	`v1.0.0`
SENTINEL ONE FUNNEL		`aws_sqs_s1_funnel`	`edr.sentinelone.dv`	`v1.0.0`
S3 ACCESS		`aws_sqs_s3_access`	`web.aws.s3.access`	`v1.0.0`
VPC LOGS		`aws_sqs_vpc`	`cloud.aws.vpc.flow`	`v1.0.0`
WAF LOGS		`aws_sqs_waf`	`cloud.aws.waf.logs`	`v1.0.0`

Run the collector

Rw ui tabs macro

Rw tab

title	Cloud collector

We use a piece of software called Collector Server to host and manage all our available collectors.

To enable the collector for a customer:

In the Collector Server GUI, access the domain in which you want this instance to be created
Click Add Collector and find the one you wish to add.
In the Version field, select the latest value.
In the Collector Name field, set the value you prefer (this name must be unique inside the same Collector Server domain).
In the sending method select Direct Send. Direct Send configuration is optional for collectors that create Table events, but mandatory for those that create Lookups.
In the Parameters section, establish the Collector Parameters as follows below:

Editing the JSON configuration

Code Block

{
  "global_overrides": {
    "debug": false
  },
  "inputs": {
    "sqs_collector": {
      "Sidid": "VisualEditor012351",
      "Effectenabled": "Allow"true,
      "Actioncredentials": [{
        "s3:GetObject"aws_access_key_id": "",
        "sqs:DeleteMessageaws_secret_access_key": "",
        "sqs:GetQueueAttributes "aws_base_account_role": "arn:aws:iam::837131528613:role/devo-xaccount-cs-role",
        "sqs:ChangeMessageVisibility"aws_cross_account_role": "",
        "aws_external_id"sqs:ReceiveMessage "",
      },
 "sqs:GetQueueUrl"      ]"ack_messages": false,
      "Resourcedirect_mode": [false,
       "arn:aws:sqs:<<YOUR_SQS_REGION>>:<<ACCOUNT_NUMBER>>:<<QUEUE_NAME>>""do_not_send": false,
       "arn:aws:s3:::<<BUCKET_NAME>/<<OPTIONAL_PREFIX_SCOPE_LIMIT>>/*"
"compressed_events": false,
    ]    }
 ]
}

You can keep adding more resources if you have multiple SQS queues and S3 buckets that you would like Devo to pull and read from.

Info

If KMS encryption is active for the S3 bucket, the respective KMS key must be included as a resource within the IAM policy. Otherwise, the Devo collector will fail to pull events due to a permission error: "An error occurred (AccessDenied) when calling the GetObject operation: Access Denied".
The /* trailing in the S3 ARN denotes access to objects in the S3 Bucket. If missing, calls to the S3 API will result in a permission error and objects cannot be accessed by the collector.

Rw step

Give the policy a name with the naming convention that your account uses as necessary and an optional description.

Rw step

Click Create and note down the policy name you've created for the access method needed for the Devo collector's proper functioning.

Create a cross-account role

Cross-account roles let roles/users from other AWS accounts (in this case, the Devo collector server AWS Account) access to assume a role in your account. This sidesteps the need to exchange permanent credentials, as credentials are still stored separately in their respective accounts, and AWS themselves authenticates the identities. For more information, check this document.

Follow these steps to create the cross-account role:

Rw ui steps macro

Rw step

Click Roles in the IAM console, then select Create role.

Image Removed

Rw step

Create a role with the Another AWS account scope and use Account ID:837131528613

Rw step

Attach the policy you created in the previous steps (i.e.: devo-xaccount-cs-policy)

Image Removed

Rw step

Give this role a name (you will provide this to Devo)

Image Removed

Rw step

Go into the newly created role and click Trust relationships → Edit trust relationship.

Image Removed

Rw step

Change the existing policy document to the following, which will only allow for our collector server role to access the policy.

Code Block

{
  "Version": "2012-10-17",
  "Statement": [
    {"base_url": "https://us-west-1.queue.amazonaws.com/id/name-of-queue",
      "region": "us-west-1",
      "sqs_visibility_timeout": 240,
      "sqs_wait_timeout": 20,
      "sqs_max_messages": 1,
      "services": {
        "custom_service": {
          "file_field_definitions": {},
          "filename_filter_rules": [],
          "encoding": "gzip",
          "send_filtered_out_to_unknown": false,
          "file_format": {
            "type": "line_split_processor",
            "config": {
              "Effectjson": "Allow",true
      "Principal": {      }
  "AWS": "arn:aws:iam::837131528613:role/devo-xaccount-cs-role"       },
        "Action": "sts:AssumeRole",  "record_field_mapping": {
            "Conditionevent_simpleName": {"StringEquals": {"sts:ExternalId": {{YOUR_CONFIGURED_EXTERNALID}}
    }
  ]
}

Rw step

Click Update Trust Policy to finish.

Information to be provided to Devo

At the end of this configuration process, the following tidbits of information will have to be provided to Devo for the collector setup in order to complete the integration:

Technology type that we will be consuming, or log format (in case the collector is pulling data from an AWS service - i.e: this guide is using CloudTrail as an example-, just the service name must be provided)
SQS Queue URL
Cross-account role ARN (i.e.: arn:aws:iam::<YOUR-ACCOUNT-ID>:role/devo-xs-collector-role) and optionally, ExternalID (if used in cross account role trust policy)
externalID must be created without special characters such as # or ! (characters such as @ , . _ - + = are allowed).

...


              "keys": [
                "event_simpleName"
              ]
            }
          },
          "routing_template": "edr.crowdstrike.cannon",
          "line_filter_rules": [
            [
              {
                "source": "record",
                "key": "event_simpleName",
                "type": "match",
                "value": "EndOfProcess"
              }
            ],
            [
              {
                "source": "record",
                "key": "event_simpleName",
                "type": "match",
                "value": "DeliverLocalFXToCloud"
              }
            ]
          ]
        }
      }
    }
  }
}

Info
All defined service entities will be executed by the collector. If you do not want to run any of them, just remove the entity from the `services` object.

Note
Please replace the placeholders with real world values following the description table below

Parameter

Data type

Type

Value range / Format

Details

debug_status

bool

Mandatory

false / true

If the value is true, the debug logging traces will be enabled when running the collector. If the value is false, only the info, warning and error logging levels will be printed.

short_unique_id

int

Mandatory

Minimum length: 1
Maximum length: 5

Use this param to give an unique id to this input service.

Note
This parameter is used to build the persistence address, do not use the same value for multiple collectors. It could cause a collision.

enabled

bool

Mandatory

false / true

Use this param to enable or disable the given input logic when running the collector. If the value is true, the input will be run. If the value is false, it will be ignored.

base_url

str

Mandatory

By default, the base url is https://sqs.region.amazonaws.com/account-number/queue-name. This needs to be set to the url of sqs.

aws_access_key_id

str

Mandatory/Optional

Any

Only needed if not using cross account

aws_secret_access_key

str

Mandatory/Optional

Any

Only needed if not using cross account

aws_base_account_role

str

Mandatory/Optional

Any

Only needed if using cross account This is devos cross account role

aws_cross_account_role

str

Mandatory/Optional

Any

Only needed if using cross account This is your cross account role

aws_external_id

str

Optional

Any

Extra security you can set up

ack_messages

bool

Manatory

false / true

Needs to be set to true to delete messages from the queue. Leave false until testing complete

direct_mode

bool

Optional

false / true

Set to False for most all scenarios.

This parameter should be removed if it is not used.

do_not_send

bool

Optional

false / true

Set to True to not send the log to Devo.

This parameter should be removed if it is not used.

sqs_visibility_timeout

int

Mandatory

Min: 120

Max: 43200 (haven’t needed to test higher)

This parameter specifies how long the object will be held by the collector. If it is not processed and deleted within the allotted time in seconds. The message will be put back and can be processed again.

Set this parameter for timeouts between the queue and the collector, the collector has to download large files and process them. Otherwise defaults to 120. For Crowdstrike FDR some messages can take 10-15 minutes to process please set the timeout to help duplicate reduction.

sqs_wait_timeout

int

Mandatory

Min: 20

Max: 20

This is how long polling works. It will wait per poll the value of seconds listed. If no message is found, it will return Long poll did not find any messages in queue. All data in the SQS queue has been successfully collected.

sqs_max_messages

int

Mandatory

Min: 1

Max: 6

This is now 1 always and forever.

region

str

Mandatory

Example:

us-east-1

This is the region that is in the base url

compressed_events

bool

Mandatory

This needs to be true or False

Only works with GZIP compression should be false unless you see this below.

If you see any errors ‘utf-8' codec can't decode byte 0xa9 in position 36561456: invalid start byte it might be the events need to be decompressed

encoding

str

Optional

This parameter means the way the log files are encoded inside the s3 bucket.

Options from most used to least used.

gzip
none
parquet
latin-1
Note
- It can accept any other string like ascii or utf-16. It is just trying to read the file format.

Rw tab

title	On-premise collector

This data collector can be run in any machine that has the Docker service available because it should be executed as a docker container. The following sections explain how to prepare all the required setup for having the data collector running.

Structure

The following directory structure should be created for being used when running the collector:

Code Block

<any_directory>
└── devo-collectors/
    └── <product_name>/
        ├── certs/
        │   ├── chain.crt
        │   ├── <your_domain>.key
        │   └── <your_domain>.crt
        ├── state/
        └── config/ 
            └── config.yaml

Note
Replace `<product_name>` with the proper value.

Devo credentials

In Devo, go to Administration → Credentials → X.509 Certificates, download the Certificate, Private key and Chain CA and save them in <product_name>/certs/. Learn more about security credentials in Devo here.

Image Added

Note
Replace `<product_name>` with the proper value.

Editing the config.yaml file

Code Block

globals:
  debug: <debug_status>
  id: <collector_id>
  name: <collector_name>
  persistence:
    type: filesystem
    config:
      directory_name: state
  multiprocessing: false
  queue_max_size_in_mb: 1024
  queue_max_size_in_messages: 1000
  queue_max_elapsed_time_in_sec: 60
  queue_wrap_max_size_in_messages: 100

outputs:
  devo_1:
    type: devo_platform
    config:
      address: <devo_address>
      port: 443
      type: SSL
      chain: <chain_filename>
      cert: <cert_filename>
      key: <key_filename>

inputs:
  sqs:
    id: 12345
    enabled: true
    credentials:
      aws_access_key_id: password
      aws_secret_access_key: secret-access-key
      aws_base_account_role: arn:aws:iam::837131528613:role/devo-xaccount-cs-role
      aws_cross_account_role: arn:aws:iam::{account-id}:role/{role-name}
      aws_external_id: extra_security_optional
    region: region
    base_url: https://sqs.{region}.amazonaws.com/{account-number}/{queue-name}
    sqs_visibility_timeout: 120
    sqs_wait_timeout: 20
    sqs_max_messages: 4
    ack_messages: false
    direct_mode: false
    do_not_send: false
    compressed_events: false
    services:
      custom_service:
        file_field_definitions: {}
        filename_filter_rules: []
        encoding: gzip
        ack_messages: false
        file_format:
          type: single_json_object_processor
          config:
            key: Records
        record_field_mapping: {}
        routing_template: my.app.source1.type1
        line_filter_rules: []

Info
All defined service entities will be executed by the collector. If you do not want to run any of them, just remove the entity from the `services` object.

Replace the placeholders with your required values following the description table below:

Parameter

Data type

Type

Value range

Details

debug_status

bool

Mandatory

false / true

If the value is true, the debug logging traces will be enabled when running the collector. If the value is false, only the info, warning and error logging levels will be printed.

collector_id

int

Mandatory

Minimum length: 1
Maximum length: 5

Use this param to give an unique id to this collector.

collector_name

str

Mandatory

Minimum length: 1
Maximum length: 10

Use this param to give a valid name to this collector.

devo_address

str

Mandatory

collector-us.devo.io
collector-eu.devo.io

Use this param to identify the Devo Cloud where the events will be sent.

chain_filename

str

Mandatory

Minimum length: 4
Maximum length: 20

Use this param to identify the chain.cert file downloaded from your Devo domain. Usually this file's name is: chain.crt

cert_filename

str

Mandatory

Minimum length: 4
Maximum length: 20

Use this param to identify the file.cert downloaded from your Devo domain.

key_filename

str

Mandatory

Minimum length: 4
Maximum length: 20

Use this param to identify the file.key downloaded from your Devo domain.

short_unique_id

int

Mandatory

Minimum length: 1
Maximum length: 5

Use this param to give an unique id to this input service.

Note
This parameter is used to build the persistence address, do not use the same value for multiple collectors. It could cause a collision.

input_status

bool

Mandatory

false / true

Use this param to enable or disable the given input logic when running the collector. If the value is true, the input will be run. If the value is false, it will be ignored.

base_url

str

Mandatory

By default, the base url is https://sqs.region.amazonaws.com/account-number/queue-name. This needs to be set to the url of sqs.

aws_access_key_id

str

Mandatory/Optional

Any

Only needed if not using cross account

aws_secret_access_key

str

Mandatory/Optional

Any

Only needed if not using cross account

aws_base_account_role

str

Mandatory/Optional

Any

Only needed if using cross account This is devos cross account role

aws_cross_account_role

str

Mandatory/Optional

Any

Only needed if using cross account This is your cross account role

aws_external_id

str

Optional

Any

Extra security you can set up

ack_messages

bool

Manatory

false / true

Needs to be set to true to delete messages from the queue. Leave false until testing complete

direct_mode

bool

Optional

false / true

Set to False for most all scenarios.

This parameter should be removed if it is not used.

do_not_send

bool

Optional

false / true

Set to True to not send the log to Devo.

This parameter should be removed if it is not used.

sqs_visibility_timeout

int

Mandatory

Min: 120

Max: 43200 (haven’t needed to test higher)

This parameter specifies how long the object will be held by the collector. If it is not processed and deleted within the allotted time in seconds. The message will be put back and can be processed again.

Set this parameter for timeouts between the queue and the collector, the collector has to download large files and process them. Otherwise defaults to 120. For Crowdstrike FDR some messages can take 10-15 minutes to process please set the timeout to help duplicate reduction.

sqs_wait_timeout

int

Mandatory

Min: 20

Max: 20

This is how long polling works. It will wait per poll the value of seconds listed. If no message is found, it will return Long poll did not find any messages in queue. All data in the SQS queue has been successfully collected.

sqs_max_messages

int

Mandatory

Min: 1

Max: 6

This is now 1 always and forever.

region

str

Mandatory

Example:

us-east-1

This is the region that is in the base url

compressed_events

bool

Mandatory

This needs to be true or False

Only works with GZIP compression should be false unless you see this below.

If you see any errors ‘utf-8' codec can't decode byte 0xa9 in position 36561456: invalid start byte it might be the events need to be decompressed

encoding

str

Optional

This parameter means the way the log files are encoded inside the s3 bucket.

Options from most used to least used.

gzip
none
parquet
latin-1
Note
- It can accept any other string like ascii or utf-16. It is just trying to read the file format.

Download the Docker image

The collector should be deployed as a Docker container. Download the Docker image of the collector as a .tgz file by clicking the link in the following table:

Collector Docker image	SHA-256 hash
collector-aws_sqs_if-docker-image-1.7.0	`4b75fb4481203b5a416eb9523ef97b5fa09a939f530265b0158f530777398d28`

Use the following command to add the Docker image to the system:

Code Block
gunzip -c <image_file>-<version>.tgz \| docker load

Note
Once the Docker image is imported, it will show the real name of the Docker image (including version info). Replace `<image_file>` and `<version>` with a proper value.

The Docker image can be deployed on the following services:

Docker

Execute the following command on the root directory <any_directory>/devo-collectors/<product_name>/

Code Block

docker run 
--name collector-<product_name> 
--volume $PWD/certs:/devo-collector/certs 
--volume $PWD/config:/devo-collector/config 
--volume $PWD/state:/devo-collector/state 
--env CONFIG_FILE=config.yaml 
--rm 
--interactive 
--tty 
<image_name>:<version>

Note
Replace `<product_name>`, `<image_name>` and `<version>` with the proper values.

Docker Compose

The following Docker Compose file can be used to execute the Docker container. It must be created in the <any_directory>/devo-collectors/<product_name>/ directory.

Code Block

version: '3'
services:
  collector-<product_name>:
    image: <image_name>:${IMAGE_VERSION:-latest}
    container_name: collector-<product_name>
    volumes:
      - ./certs:/devo-collector/certs
      - ./config:/devo-collector/config
      - ./credentials:/devo-collector/credentials
      - ./state:/devo-collector/state
    environment:
      - CONFIG_FILE=${CONFIG_FILE:-config.yaml}

To run the container using docker-compose, execute the following command from the <any_directory>/devo-collectors/<product_name>/ directory:

Code Block
IMAGE_VERSION=<version> docker-compose up -d

Note
Replace `<product_name>`, `<image_name>` and `<version>` with the proper values.

Verify data collection

Once the collector has been launched, it is important to check if the ingestion is performed in a proper way. To do so, go to the collector’s logs console.

This service has the following components:

Component	Description
Setup	The setup module is in charge of authenticating the service and managing the token expiration when needed.
Puller	The setup module is in charge of pulling the data in a organized way and delivering the events via SDK.

Setup output

A successful run has the following output messages for the setup module:

Code Block

2024-01-16T12:47:04.044    INFO OutputProcess::MainThread -> Process started
2024-01-16T12:47:04.044    INFO InputProcess::MainThread -> Process Started
2024-01-16T12:47:04.177    INFO InputProcess::MainThread -> InputThread(sqs_collector,12345) - Starting thread (execution_period=60s)
2024-01-16T12:47:04.177    INFO InputProcess::MainThread -> ServiceThread(sqs_collector,12345,aws_sqs_vpc,predefined) - Starting thread (execution_period=60s)
2024-01-16T12:47:04.177    INFO InputProcess::MainThread -> AWSsqsPullerSetup(unknown,sqs_collector#12345,aws_sqs_vpc#predefined) -> Starting thread
2024-01-16T12:47:04.177    INFO InputProcess::MainThread -> AWSsqsPuller(sqs_collector,12345,aws_sqs_vpc,predefined) - Starting thread
2024-01-16T12:47:04.178 WARNING InputProcess::AWSsqsPuller(sqs_collector,12345,aws_sqs_vpc,predefined) -> Waiting until setup will be executed
2024-01-16T12:47:04.191    INFO OutputProcess::MainThread -> ConsoleSender(standard_senders,console_sender_0) -> Starting thread
2024-01-16T12:47:04.191    INFO OutputProcess::MainThread -> ConsoleSenderManagerMonitor(standard_senders,console_1) -> Starting thread (every 300 seconds)
2024-01-16T12:47:04.191    INFO OutputProcess::MainThread -> ConsoleSenderManager(standard_senders,manager,console_1) -> Starting thread
2024-01-16T12:47:04.192    INFO OutputProcess::MainThread -> ConsoleSender(lookup_senders,console_sender_0) -> Starting thread
2024-01-16T12:47:04.192    INFO OutputProcess::ConsoleSenderManager(standard_senders,manager,console_1) -> [EMERGENCY PERSISTENCE SYSTEM] ConsoleSenderManager(standard_senders,manager,console_1) -> Nothing retrieved from the persistence.
2024-01-16T12:47:04.192    INFO OutputProcess::OutputStandardConsumer(standard_senders_consumer_0) -> [EMERGENCY PERSISTENCE SYSTEM] OutputStandardConsumer(standard_senders_consumer_0) -> Nothing retrieved from the persistence.
2024-01-16T12:47:04.192    INFO OutputProcess::MainThread -> ConsoleSenderManagerMonitor(lookup_senders,console_1) -> Starting thread (every 300 seconds)
2024-01-16T12:47:04.192    INFO OutputProcess::MainThread -> ConsoleSenderManager(lookup_senders,manager,console_1) -> Starting thread
2024-01-16T12:47:04.193    INFO OutputProcess::MainThread -> ConsoleSender(internal_senders,console_sender_0) -> Starting thread
2024-01-16T12:47:04.193    INFO OutputProcess::ConsoleSenderManager(lookup_senders,manager,console_1) -> [EMERGENCY PERSISTENCE SYSTEM] ConsoleSenderManager(lookup_senders,manager,console_1) -> Nothing retrieved from the persistence.
2024-01-16T12:47:04.193    INFO OutputProcess::MainThread -> ConsoleSenderManagerMonitor(internal_senders,console_1) -> Starting thread (every 300 seconds)
2024-01-16T12:47:04.193    INFO OutputProcess::MainThread -> ConsoleSenderManager(internal_senders,manager,console_1) -> Starting thread
2024-01-16T12:47:04.193    INFO OutputProcess::OutputLookupConsumer(lookup_senders_consumer_0) -> [EMERGENCY PERSISTENCE SYSTEM] OutputLookupConsumer(lookup_senders_consumer_0) -> Nothing retrieved from the persistence.
2024-01-16T12:47:05.795    INFO InputProcess::AWSsqsPuller(sqs_collector,12345,aws_sqs_vpc,predefined) -> Starting data collection every 5 seconds

Puller output

A successful initial run has the following output messages for the puller module:

Note that the PrePull action is executed only one time before the first run of the Pull action.

Code Block

I2024-01-16T17:02:56.221036303Z 2024-01-16T17:02:56.220    INFO InputProcess::AWSsqsPuller(sqs_collector,12345,aws_sqs_cloudwatch_vpc,predefined) -> Acked message receiptHandle: /+qA+ymL2Vs8yb//++7YM2Ef8BCetrJ+/+////F1uwLOVfONfagI99vA=
2024-01-16T17:02:56.221386926Z 2024-01-16T17:02:56.221    INFO InputProcess::AWSsqsPuller(sqs_collector,12345,aws_sqs_cloudwatch_vpc,predefined) -> Data collection completed. Elapsed time: 2.413 seconds. Waiting for 2.587 second(s) until the next one

Restart the persistence

This collector uses persistent storage to download events in an orderly fashion and avoid duplicates. In case you want to re-ingest historical data or recreate the persistence, you can restart the persistence of this collector by following these steps:

Delete and Re-DO the collector with new ID number

The collector will detect this change and will restart the persistence using the parameters of the configuration file or the default configuration in case it has not been provided.

Note
Note that this action clears the persistence and cannot be recovered in any way. Resetting persistence could result in duplicate or lost events.

Collector operations

This section is intended to explain how to proceed with specific operations of this collector.

Verify collector operations

The initialization module is in charge of setup and running the input (pulling logic) and output (delivering logic) services and validating the given configuration.

Events delivery and Devo ingestion

The event delivery module is in charge of receiving the events from the internal queues where all events are injected by the pullers and delivering them using the selected compatible delivery method.

A successful run has the following output messages for the initializer module:

Code Block

INFO OutputProcess::SyslogSenderManagerMonitor(standard_senders,sidecar_0) -> Number of available senders: 1, sender manager internal queue size: 0
INFO OutputProcess::SyslogSenderManagerMonitor(standard_senders,sidecar_0) -> enqueued_elapsed_times_in_seconds_stats: {}
INFO OutputProcess::SyslogSenderManagerMonitor(standard_senders,sidecar_0) -> Sender: SyslogSender(standard_senders,syslog_sender_0), status: {"internal_queue_size": 0, "is_connection_open": True}
INFO OutputProcess::SyslogSenderManagerMonitor(standard_senders,sidecar_0) -> Standard - Total number of messages sent: 44, messages sent since "2022-06-28 10:39:22.511671+00:00": 44 (elapsed 0.007 seconds)
INFO OutputProcess::SyslogSenderManagerMonitor(internal_senders,sidecar_0) -> Number of available senders: 1, sender manager internal queue size: 0
INFO OutputProcess::SyslogSenderManagerMonitor(internal_senders,sidecar_0) -> enqueued_elapsed_times_in_seconds_stats: {}
INFO OutputProcess::SyslogSenderManagerMonitor(internal_senders,sidecar_0) -> Sender: SyslogSender(internal_senders,syslog_sender_0), status: {"internal_queue_size": 0, "is_connection_open": True}
INFO OutputProcess::SyslogSenderManagerMonitor(internal_senders,sidecar_0) -> Internal - Total number of messages sent: 1, messages sent since "2022-06-28 10:39:22.516313+00:00": 1 (elapsed 0.019 seconds)

Sender services

The Integrations Factory Collector SDK has 3 different senders services depending on the event type to delivery (internal, standard, and lookup). This collector uses the following Sender Services:

Sender Services	Description
`internal_senders`	In charge of delivering internal metrics to Devo such as logging traces or metrics.
`standard_senders`	In charge of delivering pulled events to Devo.

Sender statistics

Each service displays its own performance statistics that allow checking how many events have been delivered to Devo by type:

Logging trace

Description

Number of available senders: 1

Displays the number of concurrent senders available for the given Sender Service.

sender manager internal queue size: 0

Displays the items available in the internal sender queue.

This value helps detect bottlenecks and needs to increase the performance of data delivery to Devo. This last can be made by increasing the concurrent senders.

Total number of messages sent: 44, messages sent since "2022-06-28 10:39:22.511671+00:00": 21 (elapsed 0.007 seconds)

Displayes the number of events from the last time and following the given example, the following conclusions can be obtained:

44 events were sent to Devo since the collector started.
The last checkpoint timestamp was 2022-06-28 10:39:22.511671+00:00.
21 events where sent to Devo between the last UTC checkpoint and now.
Those 21 events required 0.007 seconds to be delivered.

By default these traces will be shown every 10 minutes.

Check memory usage

To check the memory usage of this collector, look for the following log records in the collector which are displayed every 5 minutes by default, always after running the memory free process.

The used memory is displayed by running processes and the sum of both values will give the total used memory for the collector.
The global pressure of the available memory is displayed in the global value.
All metrics (Global, RSS, VMS) include the value before freeing and after: previous -> after freeing memory

Code Block

INFO InputProcess::MainThread -> [GC] global: 20.4% -> 20.4%, process: RSS(34.50MiB -> 34.08MiB), VMS(410.52MiB -> 410.02MiB)
INFO OutputProcess::MainThread -> [GC] global: 20.4% -> 20.4%, process: RSS(28.41MiB -> 28.41MiB), VMS(705.28MiB -> 705.28MiB)

Differences between RSS and VMS memory usage:

RSS is the Resident Set Size, which is the actual physical memory the process is using
VMS is the Virtual Memory Size which is the virtual memory that process is using

Enable/disable the logging debug mode

Sometimes it is necessary to activate the debug mode of the collector's logging. This debug mode increases the verbosity of the log and allows you to print execution traces that are very helpful in resolving incidents or detecting bottlenecks in heavy download processes.

To enable this option you just need to edit the configuration file and change the debug_status parameter from false to true and restart the collector.
To disable this option, you just need to update the configuration file and change the debug_status parameter from true to false and restart the collector.

For more information, visit the configuration and parameterization section corresponding to the chosen deployment mode.

Change log

Release

Released on

Release type

Details

Recommendations

v1.7.0

16 Oct 2024

Status

colour	Red
title	Bug Fixes

Status

colour	Blue
title	FEATURES

Bug Fixes

Fixed control tower issue
Fixed bug with Falcon Data Replicator Large where logs were taking over an hour to finish

Features

Created custom tagging off of record field mapping
Created NLB logging service
Added INFO/DEBUG logging around each method so users can see size and timing.

Recommended Version

v1.6.4

07 Oct 2024

Status

colour	Red
title	Bug Fixes

Status

colour	Green
title	Improvements

Features

Created custom tagging off of record field mapping
Added INF0/DEBUG logging around most methods so users can see size and timing.

Bug Fixes

Fixed Dependency Issue.
Fixed control tower issue
Fixed Falcon Data Replicator Large where logs were taking over an hour to finish.

Upgrade

v1.6.3

07 Oct 2024

Status

colour	Red
title	Bug Fixes

Bug Fixes

Fixed Log Operations Bug
Added Backwards compatibility to control tower
Fixed Palo Alto Service for snappy decompression.

Upgrade

v1.6.2

24 Sep 2024

Status

colour	Red
title	Bug Fixes

Bug Fixes

None type causing message processing to fail fdr_large, fixed.
Added default region to initialization of sts client to prevent needing environment variables in the green cluster.
Fixed bug in control tower processor

Upgrade

v1.6.1

03 Sep 2024

Status

colour	Green
title	IMPROVEMENTS

Improvements

Created new processor for extracting a message from singular log

Upgrade

v1.6.0

15 Jul 2024

Status

colour	Red
title	BUG FIXES

Status

colour	Green
title	IMPROVEMENTS

Improvements

Increased DCSDK to 1.12.2 to 1.12.4
Removed Multithreading
Added a setup method
Removed Deduplication
Added debugging logging for using dynamic filenames to help with creating dynamic tags

Bug fixes

Fixed a bug where the message body was a string and caused a type error.
Fixed a bug where client was not refreshed in time before acknowledging a message.

Upgrade

v1.5.1

11 Jul 2024

Status

colour	Red
title	BUG FIXES

Bug fixes

Fixed dependency issue

Upgrade

v1.5.0

10 Jul 2024

Status

colour	Red
title	BUG FIXES

Status

colour	Green
title	IMPROVEMENTS

Feature

Removed debug_md5 and made it default for all dictionary logs
Created a new vpc flow processor
Added new sender for relay in house + TLS
Added persistence functionality for gzip sending buffer
Added Automatic activation of gzip sending

Improvements

Updated docker image to 1.3.0
Updated DCDSK from 1.11.1 to 1.12.2
Fixed high vulnerability in Docker Image
Upgrade DevoSDK dependency to version v5.4.0
Fixed error in persistence system
Applied changes to make DCSDK compatible with MacOS
Added new sender for relay in house + TLS
Added persistence functionality for gzip sending buffer
Added Automatic activation of gzip sending
Improved behaviour when persistence fails
Upgraded DevoSDK dependency
Fixed console log encoding
Restructured python classes
Improved behaviour with non-utf8 characters
Decreased defaut size value for internal queues (Redis limitation, from 1GiB to 256MiB)
New persistence format/structure (compression in some cases)
Removed dmesg execution (It was invalid for docker execution)

Upgrade

v1.4.0

26 Jun 2024

Status

colour	Red
title	BUG FIXES

Status

colour	Green
title	IMPROVEMENTS

Status

colour	Blue
title	FEATURES

Features

Implemented use of pulling events sent by event bridge
Added more debugging information to be added to events such as: Time the message was sent to queue, times it has been sent to the queue, the bucket, and file name.

Bug fixes

Fixed an import dependency error

Improvements

Upped the visibility timeout to 1 hour by default

Upgrade

v1.3.2

17 Jun 2024

Status

colour	Red
title	BUG FIXES

Bug fixing

Fixed the initialization of the client credentials that was missing the token.

Upgrade

v1.3.1

14 Jun 2024

Status

colour	Red
title	BUG FIXES

Bug fixing

Fixed index out of range error in aws_sqs_fdr_large service

Upgrade

v1.3.0

10 Jun 2024

Status

colour	Blue
title	FEATURES

Features

Fixed logging message saying the message wasn’t acked event though it was
Added use of 1-6 messages back in config
Added multithreading for downloading messages in parallel
Updated the aws_sqs_fdr_large service with a faster downloading method using ijson.

Upgrade

v1.2.3

29 May 2024

Status

colour	Blue
title	FEATURES

Features

Updated to orjson for performance qualities.

Upgrade

v1.2.2

30 Apr 2024

Status

colour	Blue
title	FEATURES

Features

Changed processors in handling of the log from str to json dumps

Upgrade

v1.2.1

09 Apr 2024

Status

colour	Blue
title	FEATURES

Features

Added file filtering to the incapsula service

Upgrade

v1.2.0

22 Mar 2024

Status

colour	Green
title	IMPROVEMENTS

Status

colour	Blue
title	FEATURES

Updated to DCSDK 1.11.1
- Added extra check for not valid message timestamps
- Added extra check for improve the controlled stop
- Changed default number for connection retries (now 7)
- Fix for Devo connection retries

Upgrade

v1.1.3

23 Feb 2024

Status

colour	Green
title	IMPROVEMENTS

Status

colour	Red
title	BUG FIXES

Status

colour	Blue
title	FEATURES

Bug fixes

Fixed bug in parquet log processing
Fixed the max number of messages and updated the message timeout in flight
Fixed the way access key and secret are used

Improvements

Updated to DCSDK 1.11.0

Features

Added feature to send md5 message to my.app table
Added RDS service to collector defs

Upgrade

v1.0.1

29 Jan 2024

Status

colour	Green
title	IMPROVEMENTS

Status

colour	Red
title	BUG FIXES

Bug fixes

state file fixed

Improvements

using run method, instead of pull to enable long polling.
adding different types of encoding (latin-1)
update collector defs to be objects instead of arrays which was throwing off tagging, and record field mapping.

Upgrade

v1.0.0

19 Jan 2024

Status

colour	Green
title	INTIAL RELEASE

Released with DCSDK 1.10.2

Initial version

Version	Old Version 6	New Version 31
Changes made by	Borja Moro Moreno	Scott Bronkema
Saved on	Feb 27, 2024	Dec 09, 2024

Versions Compared

Key

Overview

Create S3 bucket and

set up

data feed (CloudTrail example)

Create an S3 bucket

Set up a CloudTrail trail to log events into an S3 bucket

Creating an SQS queue and enabling S3 event notifications

Create an SQS queue for a specific service events type (i.e. CloudTrail)

Setup S3 event notifications

Enabling SQS and S3 access using a cross-account IAM role

Create an IAM policy

Devo collector features

Data sources

Run the collector

Editing the JSON configuration

Create a cross-account role

Information to be provided to Devo

Structure

Devo credentials

Editing the config.yaml file

Download the Docker image

Docker

Docker Compose

Verify data collection

Setup output

Puller output

Restart the persistence

Collector operations

Verify collector operations

Events delivery and Devo ingestion

Sender services

Sender statistics

Check memory usage

Enable/disable the logging debug mode

Change log