/
AWS collector

AWS collector

Overview

Amazon Web Services (AWS) provides on-demand cloud computing platforms and APIs to individual companies. Each available AWS service generates information related to different aspects of its functionality. The available data types include service events, audit events, metrics, and logs.

You can use the AWS collector to retrieve data from the AWS APIs and send it to your Devo domain. Once the gathered information arrives at Devo, it will be processed and included in different tables in the associated Devo domain so users can analyze it.

To run this collector, there are some configurations detailed below that you need to consider:

Configuration

Details

Configuration

Details

Credentials

There are several available options to define credentials.

Service events

All the service events that are generated on AWS are managed by Cloudwatch. However, Devo’s AWS collector offers two different services that collect Cloudwatch events:

  • sqs-cloudwatch-consumer - This service is used to collect Security Hub events.

  • service-events-all -This service is used to collect events from the rest of the services on AWS.

Audit events

For the S3+SQS approach (setting types as audits_s3) some previous configuration is required.

Logs

Logs can be collected from different services. Depending on the type, some previous setups must be applied on AWS.

More information

Refer to the Vendor setup section to know more about these configurations.

If you need to pull global service events created by CloudFront, IAM and AWS STS, US regions will need to be enabled within the collector. For more information, see Viewing CloudTrail events with the AWS CLI and Using update-trail.

Devo collector features

Feature

Details

Feature

Details

Allow parallel downloading (multipod)

not allowed

Running environments

  • collector server

  • on-premise

Populated Devo events

table

Flattening preprocessing

no

Data sources

Data source

Description

API endpoint

Collector service name

Devo table

Available from release

Data source

Description

API endpoint

Collector service name

Devo table

Available from release

Service events

The different available services in AWS usually generate some information related to their internal behaviors, such as "a virtual machine has been started", "a new file has been created in an S3 bucket" or "an AWS lambda function has been invoked" and this kind of event can be triggered by no human interaction.

The service events are managed by the CloudWatch Events service (CWE), recently AWS has created a new service called Amazon EventBridge that tends to replace the CWE service.

The findings detected by AWS Security Hub are also managed by CloudWatch Events (CWE).

ReceiveMessage

ReceiveMessage - Amazon Simple Queue Service

Generic events:

service-events-all

Security Hub events:

sqs-cloudwatch-consumer

Generic events:

  • If auto_event_type parameter in config file is not set or set to false: cloud.aws.cloudwatch.events

  • If auto_event_type parameter in config file is set to true: cloud.aws.cloudwatch.{event_type}

Security Hub events:

  • cloud.aws.securityhub.findings

-

Audit events

This kind of event is more specific because they are triggered by a human interaction no matter the different ways used: API, web interaction, or even the CLI console.

The audit events are managed by the CloudTrail service.

There are two ways to read Audit events:

  • API: using CloudTrail API. This way is slower, but it can retrieve data back in time.

  • S3+SQS: forwarding CloudTrail data to an S3 bucket and reading from there through a SQS queue. This way is much faster, but it only can retrieve elements since the creation of the S3+SQS pipeline.

Via API:

LookupEvents

LookupEvents - AWS CloudTrail

Via S3+SQS:

ReceiveMessage

ReceiveMessage - Amazon Simple Queue Service

audit-events-all

  • If auto_event_type parameter in config file is not set or set to false: cloud.aws.cloudtrail.events

  • If auto_event_type parameter in config file is set to true: cloud.aws.cloudtrail.{event_type}

-

Metrics

According to the standard definition, this kind of information is usually generated at the same moment is requested because it is usually a query about the status of a service (all things inside AWS are considered services).

AWS makes something slightly different because what is doing is to generate metrics information every N time slots, such as 1 min, 5 min, 30 min, 1h, etc., even if no one makes a request (also is possible to have information every X seconds but this would require extra costs).

The metrics are managed by the CloudWatch Metrics service (CWM).

ListMetrics

ListMetrics - Amazon CloudWatch

After listing the metrics, GetMetricData and GetMetricStatistics are also called.

GetMetricData - Amazon CloudWatch

GetMetricStatistics - Amazon CloudWatch

 

metrics-all

cloud.aws.cloudwatch.metrics

-

Logs

Logs could be defined as information with a non-fixed structure that is sent to one of the available “logging” services, these services are CloudWatch Logs and S3.

There are some very customizable services, such as AWS Lambda, or even any developed application which is deployed inside an AWS virtual machine (EC2), that can generate custom log information, this kind of information is managed by the CloudWatch Logs service (CWL) and also by the S3 service.

There are also some other services that can generate logs with a fixed structure, such as VPC Flow Logs or CloudFront Logs. These kinds of services require one special way of collecting their data.

DescribeLogStreams

DescribeLogStreams - Amazon CloudWatch Logs

Logs can be:

  • Managed by Cloudwatch: This is a custom service that is activated using service custom_service and including the type logs into the types parameter in the config file.

  • Not managed by Cloudwatch: Use non-cloudwatch-logs service and include the required type (flowlogs for VPC Flow Logs and/or cloudfrontlogs for CloudFront Logs) into the types parameter in the config file.

 

  • Managed by Cloudwatch: cloud.aws.cloudwatch.logs

  • Not managed by Cloudwatch:

    • VPC Flow Logs:

      • If auto_event_type parameter in config file is set to true: cloud.aws.vpc.unknown

      • If auto_event_type parameter in config file is set to true: cloud.aws.vpc.{event_type}

    • CloudFront Logs:

      • If auto_event_type parameter in config file is set to true: cloud.aws.cloudfront.unknown

      • If auto_event_type parameter in config file is set to true: cloud.aws.cloudfront.{event_type}

-

AWS GuardDuty

AWS GuardDuty is a managed threat detection service that continuously monitors for malicious activity and unauthorized behavior to protect your AWS accounts, workloads, and data stored in Amazon S3.

Data Sources: GuardDuty ingests and processes data from AWS CloudTrail logs, VPC Flow Logs, and DNS logs

Findings: When a potential threat is detected, GuardDuty generates a finding. These findings provide details about the activity, including the affected resources, type of threat, and suggested remediation actions.

We are using API to get findings of GuardDuty service.

What is Amazon GuardDuty? - Amazon GuardDuty

aws-guardduty

  • cloud.aws.guardduty.findings

v1.10.0

Cisco Umbrella [Non-AWS service]

Cisco Umbrella is a cloud-driven Secure Internet Gateway (SIG) that leverages insights gained through the analysis of various logs, including DNS logs, IP logs, and Proxy logs, to provide a first line of defense.

DNS logs record all DNS queries that are made through the Cisco Umbrella DNS resolvers. These logs contain data about the DNS queries originating from your network, requested domain names and the IP address of the requester.

IP logs capture all IP-based communications that occur through the network. These logs store details such as the source and destination IP addresses, ports and protocols used.

Proxy logs are generated when users access web resources through the Cisco Umbrella intelligent proxy. They contain detailed information on the web traffic including the URL accessed, the method of access (GET, POST, etc.), the response status, etc

Via S3+SQS:

ReceiveMessage

ReceiveMessage - Amazon Simple Queue Service

cisco-umbrella

  • sig.cisco.umbrella.dns

  • sig.cisco.umbrella.ip

  • sig.cisco.umbrella.proxy

v1.6.0

Vendor setup

There are some minimal requirements to set up this collector:

  • AWS console access: Credentials are required to access the AWS console.

  • Owner or Administrator permissions within the AWS console, or the fill access to configure AWS services.

Some manual actions are necessary in order to get all the required information or services and allow Devo to gather information from AWS. The following sections describe how to get the required AWS credentials and how to proceed with the different required setups depending on the gathered information type.

Credentials

It’s recommended to have available or create the following IAM policies before the creation of the IAM user that will be used for the AWS collector.

Some collector services require the creation of some IAM policies before creating the IAM user that will be used for the AWS collector. The following table contains the details about the policies that could be used by the AWS collector:

Source type

AWS Data Bus

Recommended policy name

Variant

 Additional info

Source type

AWS Data Bus

Recommended policy name

Variant

 Additional info

Service events

CloudWatch Events

devo-cloudwatch-events

All resources

It’s not required the creation of any new policy due to there are not needed any permissions

Audit events

CloudTrail API

devo-cloudtrail-api

All resources

{ "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": "cloudtrail:LookupEvents", "Resource": "*" } ] }

-

-

Specific resource

CloudTrail S3+SQS

devo-cloudtrail-s3

All resources

{ "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": "s3:GetObject", "Resource": "*" } ] }

 

- 

-

Specific S3 bucket

 

{ "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": "s3:GetObject", "Resource": [ "arn:aws:s3:::devo-cloudtrail-storage-bucket1/*", "arn:aws:s3:::devo-cloudtrail-storage-bucket2/*" ] } ] }

Metrics

CloudWatch Metrics

devo-cloudwatch-metrics

All resources

Specific resource

Logs

CloudWatch Logs

devo-cloudwatch-logs

All log groups

Specific log groups

Logs to S3 + SQS

devo-vpcflow-logs

All resources

Specific resource

Cisco Umbrella

Logs to S3 + SQS

devo-cisco-umbrella

Specific resource

AWS GuardDuty

Guardduty API

devo-guardduty-api

Specific resource

Depending on which source types are collected, one or more of the policies described above will be used. Once the required policies are created, each one must be associated with an IAM user. To create it, visit the AWS Console and log in with a user account with enough permissions to create and access AWS structures:

  1. Go to IAM → Users.

  2. Click Add users button.

  3. Enter the required value in the filed User name.

  4. Enable the checkbox Access key - Programmatic access.

  5. Click on Next: Tags button.

  6. Click on Next: Review button.

  7. Click on Create user button.

  8. The Access Key ID and Secret Key will show. Click Download.csv button and save it.

It is best practice to assume roles that are granted just the required privileges to perform an action. If the customer does not want to use their own AWS use to perform these actions required by the collector - because it has far more privileges than required - they can use this option. Note that this option requires the use of AWS account credentials. To avoid sharing those credentials, check the Cross Account section below.

Then the customer must attach the required policies to AWS to the role that is going to be assumed.

  1. Go to IAM → Roles.

  2. Click on Create role button.

  3. In the Trusted entity type, select AWS account and then select This account (123456789012).

  4. Add the required policies.

  5. Give a name to the role.

  6. Click on Create role.

You should also add authentication credentials to the configuration. Add the next fields into the configuration:

  • access_key: This is the Access Key ID provided by AWS during the user creation process.

  • access_secret: This is the Secret Access Key provided by AWS during the user creation process.

  • base_assume_role: This is the ARN of the role that is going to be assumed by the user authenticated with the parameters above, access_key and access_secret. This role has to be properly granted to allow the actions that the collector is going to perform.

These fields need to be in the credentials and are required to use this authentication method:

In case you don't want to share your credentials with Devo, you should add some parameters to the configuration file. In the credentials section, instead. of sharing access_key and access_secret. Follow these steps to allow this authentication:

  1. Prepare the environment to allow Devo’s cloud collector server to assume roles cross-account.

  2. Add ARNs for each role into the configuration:

    • base_assume_role: This is the ARN of the role that is going to be assumed by the profile bound to the machine/instance where the collector is running. This role already exists in Devo's AWS account and its value must be: arn:aws:iam::837131528613:role/devo-xaccount-cs-role *

    • target_assume_role: This is the ARN of the role in the AWS account. This role allows the collector to have access to the resources specified in this role. To keep your data secure, please, use policies that grant just the necessary permissions.

    • assume_role_external_id : This is an optional parameter to add more security to this Cross Account operation. This value should be a string added to the request to assume the customer’s role.

Service Events

Cloudwatch manages all the service events that have been generated on AWS. However, Devo’s AWS Collector offers two different services that collect Cloudwatch Events:

  1. sqs-cloudwatch-consumer: This service is used to collect Security Hub events.

  2. service-events-all: This service is used to collect events from the rest of the services on AWS

If you want to create them manually, click on each one to follow the steps.

  1. Go to Simple Queue Service and click on Create queue.

  2. In the Details section. Choose FIFO queue type and set the name field value you prefer. It must end with .fifo suffix.

  3. In the Configuration section. Set the Message retention period field value to 5 days. Be sure that Content-based deduplication checkbox is marked.

  4. In the Access policy section. Choose method Basic and choose Only the queue owner for receiving and sending permissions.

  5. Optional step. Create one tag with Key usedby and value devo-collector.

  6. Click on Create queue.

Steps to enable Audit Events

No actions are required in Cloudtrail Service for retrieving this kind of information when the API approach is used (setting types as audit_apis).

For the S3+SQS approach (setting types as audits_s3) some previous configuration is required. Find a complete description of how to create an S2 +SQS pipeline here.

Steps to enable Metrics

No actions are required in CloudWatch Metrics service for retrieving this kind of information.

Steps to enable Logs

Logs can be collected from different services. Depending on the type, some previous setups must be applied on AWS:

Steps to enable Cisco Umbrella logs

Action

Steps

Action

Steps

SQS Standard queue creation

  1. Go to Simple Queue Service and click Create queue.

  2. In the Details section:

    1. Choose Standard queue type.

    2. Set the Name field value you prefer.

  3. In the Configuration section:

    1. Set the Message retention period field value to 5 Days.

    2. Leave the rest values from Configuration section with the default ones.

  4. In the Access policy section:

    1. Choose method Advanced.

    2. Replace "Principal": {"AWS":"<account_id>"} with "Principal": "*" (leave rest of JSON as come)

  5. (Not mandatory) Tags section:

    1. Create one tag with Key “usedBy“ and Value “devo-collector“

  6. Click on Create queue button.

S3 bucket creation/configuration

  1. Go to S3 and click on Create bucket button.

  2. Set the preferred value in Bucket name field.

  3. Choose any Region value.

  4. Click the Next button.

  5. (Not mandatory) Create one tag with Key usedBy and Value devo-collector.

  6. Leave rest of fields with default values, click the Next button.

  7. Leave all values with default ones, click the Next button.

  8. Click the Create bucket button.

  9. Mark the checkbox next to the previously created S3 bucket.

  10. In the popup box, click the Copy Bucket ARN button and save the content for being used in the next steps.

  11. In S3 bucket list, click the previously created bucket name link.

  12. Click the Properties tab.

  13. Click the Events box.

  14. Click the + Add notification link.

  15. Set the preferred value in the Name field.

  16. Mark the All object create events checkbox.

  17. In the Send to field, select the SQS Queue as value.

  18. Select the previously created SQS queue in the SQS field.

Enable Logging to Your Own S3 Bucket

  1. Refer to vendor’s configuration steps: Enable Logging to Your Own S3 Bucket.

Minimum configuration required for basic pulling

Although this collector supports advanced configuration, the fields required to retrieve data with basic configuration are defined below.

Setting

Details

Setting

Details

access_key

This is the account identifier for AWS. More info can be found in the section Using a user account and local policies.

access_secret

This is the secret (kind of a password) for AWS. More info can be found in the section Using a user account and local policies.

base_assume_role

This allows assuming a role with limited privileges to access AWS services. More info can be found in the sections Assuming a role (self-account) and/or Assuming a role (cross-account).

target_assume_role

This allows assuming a role on another AWS account with limited privileges to access AWS services. More info can be found in the section Assuming a role (cross-account).

assume_role_external_id

This is an optional field that provides additional security to the assuming role operation on cross-accounts. More info can be found in the section Assuming a role (cross-account).

Accepted authentication methods

Depending on how did you obtain your credentials, you will have to either fill in or delete the following properties on the JSON credentials configuration block.

Authentication method

access_key

access_secret

base_assume_role

target_assume_role

assume_role_external_id

Authentication method

access_key

access_secret

base_assume_role

target_assume_role

assume_role_external_id

Access Key / Access Secret

REQUIRED

REQUIRED

 

 

 

Assume role (self-account)

REQUIRED

REQUIRED

REQUIRED

 

 

Assume role (cross-account)

 

 

REQUIRED

REQUIRED

OPTIONAL

Run the collector

Once the data source is configured, you can either send us the required information if you want us to host and manage the collector for you (Cloud collector), or deploy and host the collector in your own machine using a Docker image (On-premise collector).

Collector services detail

This section is intended to explain how to proceed with specific actions for services.

Service events (all services)

This service could be considered a general AWS event puller. It reads events from all the AWS services, which are managed by CloudWatch.

Service events (Security Hub)

This service is used to read specifically Security Hub events, which need to be processed in a different way.

Audit events (via API)

This service reads Cloudtrail audit events via API.

There are two ways to read Cloudtrail events: via API or via S3+SQS.

  • API: It is slower, but can read past events.

  • S3+SQS: It is much faster, but can only read events since the creation of the queue.

This service makes use of the AWS API to get the data.

Audit events (via S3 + SQS)

This service reads Cloudtrail audit events via the S3+SQS pipeline.

There are two ways to read Cloudtrail events: via API or via S3+SQS.

  • API: It is slower, but can read past events.

  • S3+SQS: It is much faster, but can only read events since the creation of the queue.

Metrics (All metrics)

This service could be considered a general AWS metric puller. It reads metrics from all the AWS services that generate them. Those metrics are also managed by Cloudwatch.

This service makes use of the AWS API to get the data.

AWS-GuardDuty (Via API)

This service reads GuardDuty events via API. This service is not scalable because of it use of GuardDuty APIs. We use this service for “low” data due to the API limitation, otherwise we should use AWS_SQS_IF

This service makes use of the AWS API to get the data only if data is low

The events are going to be ingested into the table cloud.aws.guardduty.findings

Non Cloudwatch Logs

This service reads logs from some AWS services, but those logs are not managed by Cloudwatch. These logs are stored in an S3 bucket and read through an SQS queue, so it is using an S3+SQS pipeline.

The implemented services currently are:

  • VPC Flow Logs

  • Cloudfront Logs

Custom Logs

This service reads logs from some AWS services and these logs are managed by Cloudwatch. Cloudwatch creates log groups to store the different log sources, so it is required to use a custom puller in order to read from different log groups at the same time. This service makes use of the AWS API to get the data.

Cisco Umbrella (via S3+SQS)

This service reads logs from a Cisco Umbrella managed bucket via the S3+SQS pipeline. Cisco provides a way to deposit logging data into a S3 bucket.

Collector operations

This section is intended to explain how to proceed with the specific operations of this collector.

Change log

Release

Released on

Release type

Details

Recommendations

Release

Released on

Release type

Details

Recommendations

v1.11.0

Jan 6, 2025


IMPROVEMENT
BUG FIX

Improvements

  • Updated DCSDK base docker image 1.3.1.

  • Added Unit tests and added user_guide

  • Upgraded Boto3 libraries from 1.34.97 to 1.35.92

  • Updated DCSDK from 1.11.1 to 1.13.1:

    • Added new sender for relay in house + TLS

    • Added persistence functionality for gzip sending buffer

    • Added Automatic activation of gzip sending

    • Improved behaviour when persistence fails

    • Upgraded DevoSDK dependency

    • Fixed console log encoding

    • Restructured python classes

    • Improved behaviour with non-utf8 characters

    • Decreased default size value for internal queues (Redis limitation, from 1GiB to 256MiB)

    • New persistence format/structure (compression in some cases)

    • Removed dmesg execution (It was invalid for docker execution)

    • Applied changes to make DCSDK compatible with MacOS

    • Upgrade DevoSDK dependency to version v5.4.0

    • Change internal queue management for protecting against OOMK

    • Extracted ModuleThread structure from PullerAbstract

    • Improve Controlled stop when both processes fails to instantiate

    • Improve Controlled stop when InputProcess is killed

    • Bug related to lost of collector_name , collector_id and job_id

    • Bug related queues and ValueError (edited)

    • Change internal queue management for protecting against OOMK

    • Extracted ModuleThread structure from PullerAbstract

    • Improve Controlled stop when both processes fails to instantiate

    • Improve Controlled stop when InputProcess is killed

    • Fixed error related a ValueError exception not well controlled

    • Fixed error related with loss of some values in internal mes

  • Bug Fix:

    • Changes in code to handle the guard-duty missing logs issue

    sages

 

v1.10.0

May 31, 2024

NEW FEATURE

Improvements:

  • Implemented GuardDuty service, added puller set-up and puller for it

Upgrade

v1.8.2

Mar 1, 2024

IMPROVEMENT

Improvements:

  • Upgraded DCSDK Docker base image updated to 1.2.0

Upgrade

v1.8.1

Feb 28, 2024

BUG FIX

Bug Fixes:

  • Fix a bug when dealing with events that have no lastEventTimestamp present in the log_stream

Upgrade

v1.8.0

Dec 7, 2023

IMPROVEMENT
NEW FEATURE

New Feature

  • Updated method to call all the log group name if log_group parameter is this '/' in the config

Improvements

  • Upgraded DCSDK from 1.9.2 to 1.10.2

    • Ensure special characters are properly sent to the platform

    • Changed log level to some messages from info to debug

    • Changed some wrong log messages

    • Upgraded some internal dependencies

    • Changed queue passed to setup instance constructor

    • Ability to validate collector setup and exit without pulling any data

    • Ability to store in the persistence the messages that couldn't be sent after the collector stopped

    • Ability to send messages from the persistence when the collector starts and before the puller begins working

    • Ensure special characters are properly sent to the platform

Upgrade

v1.7.1

Oct 25, 2023

bug fixes

  • Fixed the way the collector handles milliseconds as the strptime function has been updated since 2021

  • Fixed the missing parameter in a method call

Recommended version

v1.6.0

Sep 20, 2023

NEW FEATURE

New features:

  1. Added Cisco Umbrella new data source using SQS+S3

  2. Added is_aws_service optional parameter in collector_definitions.yaml.

  3. Added event_type_file_regex_patterns optional parameter to set a dict as: event_type -> regex_for_s3_file_key

Upgrade

v1.5.0

Aug 11, 2023

IMPROVEMENT

Improvements

  1. Upgraded [boto] libraries from 1.21.36 to 1.28.24

  2. Upgraded DCSDK from 1.3.0 to 1.9.1

Upgrade

v1.4.1

Aug 11, 2022

BUG FIX

Bug Fixes:

  • Fixed a bug that prevented the use of the Assumed Role authentication method.

  • Fixed a bug that prevented session renewal when using any of the Assume Authentication methods:

    • Assume Role

    • Cross Account

Upgrade

v1.4.0

Jan 22, 2022

NEW FEATURE
IMPROVEMENT
BUG FIX

New features:

  • CrossAccount authentication method is now available improving the way in which the credentials are shared when the collector is running in the Collector Service.

Improvements:

  • The audit-events-all service (type audits_api) has been enhanced to allow requesting events older than 500 days.

Bug Fixes:

  • Fixed a bug that raised a KeyError when the optional param event_type_processor_mapping was not defined running service-events-all service.

Upgrade