Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Current »

Overview

IBM Cloud Flow Logs for VPC enables the collection, storage, and presentation of information about the Internet Protocol (IP) traffic going to and from network interfaces within your Virtual Private Cloud (VPC).

The IBM Cloud Flow Logs for VPC collector collects flow logs from an IBM log collector instance and sends them to Devo.

Devo collector features

Feature

Details

Allow parallel downloading (multipod)

not allowed

Running environments

  • collector server

  • on-premise

Populated Devo events

table

Flattening preprocessing

yes

Allowed source events obfuscation

yes

Data sources

Data source

Description

API endpoint

Collector service name

Devo table

Available from release

IBM Cloud

IBM Cloud flow logs for VPC

/v2/export

flow_log

cloud.ibm.vpc.flow_log

v1.0.0

For more information on how the events are parsed, visit our page.

Flattening preprocessing

Click here to see an example of a flow log object. However, the collector gets the flow log objects one by one, instead of grouped. This is due to a pre-processing performed by IBM.

Pre-processing is performed on raw flow logs fetched from the Log Analysis instance. IBM Cloud sends flow log files to COS; these files represent all logs captured within five minute intervals. When the COS trigger sends the flow logs to Log Analysis, each individual log line is sent along with metadata. 

When Devo fetches the log lines from the Log Analysis instance, some metadata fields are fetched (e.g. the flow log file key and the account) and the prefix flow_log_ is added to properties that belong to the individual log line. Other properties, such as capture_start_time and capture_end_time are common properties for all log lines contained within the original flow log file stored in COS.

See below for an example of a processed flow log:

{
    "account": "099547bf015144628q4ef599863c5123",
    "key": "ibm_vpc_flowlogs_v1/account=099547bf015144628q4ef599863c5123/region=us-south/vpc-id=crn%3Av1%3Abluemix%3Apublic%3Ais%3Aus-south%3Aa%2F099547bf015144628q4ef599863c5123%3A%3Avpc%3Ar006-a1f9875c-874b-472b-ab4a-8a669fe57be6/subnet-id=crn%3Av1%3Abluemix%3Apublic%3Ais%3Aus-south-1%3Aa%2F099547bf015144628q4ef599863c5123%3A%3Asubnet%3A0717-d622cc30-816e-4b94-935d-70ca4b8f9b8b/endpoint-type=vnics/instance-id=crn%3Av1%3Abluemix%3Apublic%3Ais%3Aus-south-1%3Aa%2F099547bf015144628q4ef599863c5123%3A%3Ainstance%3A0717_2abb8f82-07f8-4d09-aedb-4c02e61fb20d/vnic-id=0717-b36b4d70-3c92-446d-9d5e-f28778ab033f/record-type=egress/year=2023/month=10/day=11/hour=03/stream-id=20231011T035936Z/00000000.gz",
    "version": "0.0.1",
    "collector_crn": "crn:v1:bluemix:public:is:us-south:a/099547bf015144628q4ef599863c5123::flow-log-collector:r006-1b7a1d95-8a01-40ae-b011-9c6b0575a59f",
    "attached_endpoint_type": "vnic",
    "network_interface_id": "0717-b36b4d70-3c92-446d-9d5e-f28778ab033f",
    "instance_crn": "crn:v1:bluemix:public:is:us-south-1:a/099547bf015144628q4ef599863c5123::instance:0717_2abb8f82-07f8-4d09-aedb-4c02e61fb20d",
    "vpc_crn": "crn:v1:bluemix:public:is:us-south:a/099547bf015144628q4ef599863c5123::vpc:r006-a1f9875c-874b-472b-ab4a-8a669fe57be6",
    "capture_end_time": "2023-10-11T03:59:36Z",
    "capture_start_time": "2023-10-11T03:56:06Z",
    "state": "ok",
    "flow_log_start_time": "2023-10-11T03:56:26Z",
    "flow_log_end_time": "2023-10-11T03:59:26Z",
    "flow_log_direction": "O",
    "flow_log_action": "accepted",
    "flow_log_initiator_ip": "10.240.0.4",
    "flow_log_initiator_port": 68,
    "flow_log_target_ip": "10.240.0.1",
    "flow_log_target_port": 67,
    "flow_log_transport_protocol": 17,
    "flow_log_ether_type": "IPv4",
    "flow_log_was_initiated": true,
    "flow_log_was_terminated": false,
    "flow_log_bytes_from_initiator": 2050,
    "flow_log_packets_from_initiator": 6,
    "flow_log_bytes_from_target": 1956,
    "flow_log_packets_from_target": 6,
    "flow_log_cumulative_packets_from_initiator": 6,
    "flow_log_cumulative_packets_from_target": 6,
    "flow_log_cumulative_bytes_from_target": 1956,
    "flow_log_cumulative_bytes_from_initiator": 2050,
    "@devo_environment": "develop",
    "@devo_pulling_id": "1696997002225"
}

Minimum configuration required for basic pulling

The collector retrieves IBM Cloud flow logs for VPC from a Log Analysis instance. To achieve this, users must have previously set up logging for VPC to direct log objects to a COS Bucket. Additionally, a cloud function should be in place to read and insert these logs into the Log Analysis instance.

IBM Cloud offers a comprehensive guide on setting up logging and the cloud function object. This guide also introduces the  IBM Cloud VPC flow logs project terraform solution, which encompasses the deployment of all necessary components and a demonstration of its functionality. For detailed documentation on deploying each component, please refer to the README file associated with the terraform solution.

In general, users should undertake the following steps (please keep in mind that the referenced terraform solution automates all component creation for the demo VPC configuration):

  1. Create a VPC flow log collector to channel flow log files from a specific VPC configuration to a COS Bucket.

  2. Setup a Log Analysis instance , enabling the collector to access flow logs.

Establish a cloud function to fetch flow log files from the COS Bucket and transfer them to the Log Analysis instance. The source code for this cloud function is available in the aforementioned IBM Cloud VPC flow logs project terraform solution.

It's important to note that using the example terraform solution in its default configuration will establish and configure a sample VPC flow network, a COS Bucket for sample log storage, a Log Analysis instance, and a cloud function to read from the COS Bucket and forward logs to the Log Analysis instance. For deployment in production environments, users should adhere to their organization's guidelines regarding IBM Cloud component deployment, such as using infrastructure-as-code tools, consulting IBM professional services, and so on. Production environments can also have multiple COS buckets pushing logs to the same Log Analysis instance. 

For the purposes of the Devo collector, only one Log Analysis instance can be configured – if your organization has multiple Log Analysis instances for VPC flow logs, then you must configure multiple Devo collectors.

Once the Log Analysis instance is created, users will be able to fetch the necessary credentials:

Setting

Details

service_key

The IBM Cloud Log Analysis instance service key.

The service key can be found in the IBM Cloud console via: Observability > Logging > Select the Flow Log log collector instance > Open Dashboard > Settings > Organization > API Keys > Service Keys

base_url

The IBM Cloud Flow Logs for VPC log collector API base URL. Select from the following API endpoints

This minimum configuration refers exclusively to those specific parameters of this integration. There are more required parameters related to the generic behavior of the collector. Check the setting sections for details.

Accepted authentication methods

Authentication method

Service key

Service key

Required

Base URL

Required

Run the collector

Once the data source is configured, you can either send us the required information if you want us to host and manage the collector for you (Cloud collector), or deploy and host the collector in your own machine using a Docker image (On-premise collector).

Collector services detail

This section is intended to explain how to proceed with specific actions for services.

flow_log

 Verify data collection

Internal process and deduplication method

All flow log records are fetched via the v2 Export API and filtered/ordered by their created timestamp. The collector continually pulls new events since the last recorded timestamp. A unique hash value is computed for each event and used for deduplication purposes to ensure events are not fetched multiple times in subsequent pulls.

Please note: the collector fetches logs from a Log Analysis instance. Log Analysis can house many different log types. When fetching logs, the collector will attempt to identify a `vpc_crn` property key in the log to determine if the log is a VPC flow log. If this key does not exist, the collector will skip that log. For the purposes of statistics tracking in the collector log output, non-VPC flow logs are not counted as events received or events filtered.

If your collector logs indicate that the collector is successfully running but processing 0 valid flow logs, please ensure that the base URL and service key you provided for your Log Analysis instance contains valid flow logs for VPC; for example, if a user indicates a base URL and service key for an IBM Cloud Activity Tracker instance, then the collector will successfully run but never fetch valid flow logs for VPC.

Devo categorization and destination

All events of this service are ingested into the table cloud.ibm.vpc.flow_log

Setup output

A successful run has the following output messages for the setup module:

2023-08-31T09:30:01.135    INFO InputProcess::MainThread -> EventPullerSetup(unknown,ibm_cloud_flow_log#10001,flow_log#predefined) -> Starting thread
2023-08-31T09:30:01.137    INFO InputProcess::EventPullerSetup(unknown,ibm_cloud_flow_log#10001,flow_log#predefined) -> Testing fetch from /v2/export.
2023-08-31T09:30:01.794    INFO InputProcess::EventPullerSetup(unknown,ibm_cloud_flow_log#10001,flow_log#predefined) -> Successfully tested fetch from /v2/export. Source is pullable.
2023-08-31T09:30:01.794    INFO InputProcess::EventPullerSetup(unknown,ibm_cloud_flow_log#10001,flow_log#predefined) -> Setup for module <EventPuller> has been successfully executed

Puller output

2023-08-31T09:30:02.142    INFO InputProcess::EventPuller(ibm_cloud_flow_log,10001,flow_log,predefined) -> Running the persistence upgrade steps
2023-08-31T09:30:02.143    INFO InputProcess::EventPuller(ibm_cloud_flow_log,10001,flow_log,predefined) -> Running the persistence corrections steps
2023-08-31T09:30:02.143    INFO InputProcess::EventPuller(ibm_cloud_flow_log,10001,flow_log,predefined) -> Running the persistence corrections steps
2023-08-31T09:30:02.143    INFO InputProcess::EventPuller(ibm_cloud_flow_log,10001,flow_log,predefined) -> No changes were detected in the persistence
2023-08-31T09:30:02.144    INFO InputProcess::EventPuller(ibm_cloud_flow_log,10001,flow_log,predefined) -> EventPuller(ibm_cloud_flow_log,10001,flow_log,predefined) Finalizing the execution of pre_pull()
2023-08-31T09:30:02.144    INFO InputProcess::EventPuller(ibm_cloud_flow_log,10001,flow_log,predefined) -> Starting data collection every 60 seconds
2023-08-31T09:30:02.144    INFO InputProcess::EventPuller(ibm_cloud_flow_log,10001,flow_log,predefined) -> Pull Started
2023-08-31T09:30:02.145    INFO InputProcess::EventPuller(ibm_cloud_flow_log,10001,flow_log,predefined) -> Fetching all event logs via params={'from': 1693485435, 'to': 1693488602, 'prefer': 'head'}
2023-08-31T09:30:02.908    INFO InputProcess::EventPuller(ibm_cloud_flow_log,10001,flow_log,predefined) -> Sending 183 event(s) to my.app.ibm.cloud.flow_log
2023-08-31T09:30:02.932    INFO InputProcess::EventPuller(ibm_cloud_flow_log,10001,flow_log,predefined) -> No more pagination_id values returned. Setting pull_completed to True.
2023-08-31T09:30:02.935    INFO InputProcess::EventPuller(ibm_cloud_flow_log,10001,flow_log,predefined) -> Updating the persistence
2023-08-31T09:30:02.936    INFO InputProcess::EventPuller(ibm_cloud_flow_log,10001,flow_log,predefined) -> (Partial) Statistics for this pull cycle (@devo_pulling_id=1693488602141):Number of requests made: 1; Number of events received: 185; Number of duplicated events filtered out: 2; Number of events generated and sent: 183; Average of events per second: 231.194.
2023-08-31T09:30:02.936    INFO InputProcess::EventPuller(ibm_cloud_flow_log,10001,flow_log,predefined) -> Statistics for this pull cycle (@devo_pulling_id=1693488602141):Number of requests made: 1; Number of events received: 185; Number of duplicated events filtered out: 2; Number of events generated and sent: 183; Average of events per second: 231.142.

After a successful collector’s execution (that is, no error logs found), you will see the following log message:

023-08-31T09:30:02.936    INFO InputProcess::EventPuller(ibm_cloud_flow_log,10001,flow_log,predefined) -> The data is up to date!
2023-08-31T09:30:02.936    INFO InputProcess::EventPuller(ibm_cloud_flow_log,10001,flow_log,predefined) -> Data collection completed. Elapsed time: 0.795 seconds. Waiting for 59.205 second(s) until the next one
 Restart the persistence

This collector uses persistent storage to download events in an orderly fashion and avoid duplicates. In case you want to re-ingest historical data or recreate the persistence, you can restart the persistence of this collector by following these steps:

  1. Edit the configuration file.

  2. Change the value of the start_time_in_utc parameter to a different one.

  3. Save the changes.

  4. Restart the collector.

The collector will detect this change and will restart the persistence using the parameters of the configuration file or the default configuration in case it has not been provided.

 Troubleshooting

This collector has different security layers that detect both an invalid configuration and abnormal operation. This table will help you detect and resolve the most common errors.

Error Type

Error Id

Error Message

Cause

Solution

InitVariablesError

1

Invalid start_time_in_utc: {ini_start_str}. Must be in parseable datetime format.

The configured start_time_in_utc parameter is a non-parseable format.

Update the start_time_in_utc value to have the recommended format as indicated in the guide.

InitVariablesError

2

Invalid start_time_in_utc: {ini_start_str}. Must be in the past.

The configured start_time_in_utc parameter is a future date.

Update the start_time_in_utc value to a past datetime.

SetupError

101

Failed to fetch OAuth token from {token_endpoint}. Exception: {e}.

The provided credentials, base URL, and/or token endpoint is incorrect.

Revisit the configuration steps and ensure that the correct values were specified in the config file.

SetupError

102

Failed to fetch data from {endpoint}. Source is not pullable.

The provided credentials, base URL, and/or token endpoint is incorrect.

Revisit the configuration steps and ensure that the correct values were specified in the config file.

ApiError

401

Error during API call to [API provider HTML error response here]

The server returned an HTTP 401 response.

Ensure that the provided credentials are correct and provide read access to the targeted data.

ApiError

429

Too many concurrent requests.]

IBM Cloud is reporting that too many simultaneous requests are being made against the Log Analysis instance.

This error can happen when a user attempts to manually restart the collector frequently or otherwise query the Log Analysis instance while the collector is running. In practice, this error should naturally correct itself within 15 minutes of the original report so long as simultaneous query requests cease.

If the collector continues to report this error after 15 minutes, please ensure that there is not another script or user also making API requests to the Log Analysis instance.

Log Analysis concurrency limit is determined by your instance configuration and tier.

ApiError

498

Error during API call to [API provider HTML error response here]

The server returned an HTTP 500 response.

If the API returns a 500 but successfully completes subsequent runs then you may ignore this error. If the API repeatedly returns a 500 error, ensure the server is reachable and operational.

Collector operations

This section is intended to explain how to proceed with specific operations of this collector.

 Verify collector operations

Initialization

The initialization module is in charge of setup and running the input (pulling logic) and output (delivering logic) services and validating the given configuration.

A successful run has the following output messages for the initializer module:

2023-01-10T15:22:57.146    INFO MainProcess::MainThread -> Loading configuration using the following files: {"full_config": "config.yaml", "job_config_loc": null, "collector_config_loc": null}
2023-01-10T15:22:57.146    INFO MainProcess::MainThread -> Using the default location for "job_config_loc" file: "/etc/devo/job/job_config.json"
2023-01-10T15:22:57.147    INFO MainProcess::MainThread -> "\etc\devo\job" does not exists
2023-01-10T15:22:57.147    INFO MainProcess::MainThread -> Using the default location for "collector_config_loc" file: "/etc/devo/collector/collector_config.json"
2023-01-10T15:22:57.148    INFO MainProcess::MainThread -> "\etc\devo\collector" does not exists
2023-01-10T15:22:57.148    INFO MainProcess::MainThread -> Results of validation of config files parameters: {"config": "config.yaml", "config_validated": True, "job_config_loc": "/etc/devo/job/job_config.json", "job_config_loc_default": True, "job_config_loc_validated": False, "collector_config_loc": "/etc/devo/collector/collector_config.json", "collector_config_loc_default": True, "collector_config_loc_validated": False}
2023-01-10T15:22:57.171 WARNING MainProcess::MainThread -> [WARNING] Illegal global setting has been ignored -> multiprocessing: False

Events delivery and Devo ingestion

The event delivery module is in charge of receiving the events from the internal queues where all events are injected by the pullers and delivering them using the selected compatible delivery method.

A successful run has the following output messages for the initializer module:

2023-01-10T15:23:00.788    INFO OutputProcess::MainThread -> DevoSender(standard_senders,devo_sender_0) -> Starting thread
2023-01-10T15:23:00.789    INFO OutputProcess::MainThread -> DevoSenderManagerMonitor(standard_senders,devo_1) -> Starting thread (every 300 seconds)
2023-01-10T15:23:00.790    INFO OutputProcess::MainThread -> DevoSenderManager(standard_senders,manager,devo_1) -> Starting thread
2023-01-10T15:23:00.842    INFO OutputProcess::MainThread -> global_status: {"output_process": {"process_id": 18804, "process_status": "running", "thread_counter": 21, "thread_names": ["MainThread", "pydevd.Writer", "pydevd.Reader", "pydevd.CommandThread", "pydevd.CheckAliveThread", "DevoSender(standard_senders,devo_sender_0)", "DevoSenderManagerMonitor(standard_senders,devo_1)", "DevoSenderManager(standard_senders,manager,devo_1)", "OutputStandardConsumer(standard_senders_consumer_0)", 

Sender services

The Integrations Factory Collector SDK has 3 different senders services depending on the event type to delivery (internal, standard, and lookup). This collector uses the following Sender Services:

Sender services

Description

internal_senders

In charge of delivering internal metrics to Devo such as logging traces or metrics.

standard_senders

In charge of delivering pulled events to Devo.

Sender statistics

Each service displays its own performance statistics that allow checking how many events have been delivered to Devo by type:

Logging trace

Description

Number of available senders: 1

Displays the number of concurrent senders available for the given Sender Service.

sender manager internal queue size: 0

Displays the items available in the internal sender queue.

Standard - Total number of messages sent: 57, messages sent since "2023-01-10 16:09:16.116750+00:00": 0 (elapsed 0.000 seconds

Displays the number of events from the last time and following the given example, the following conclusions can be obtained:

  • 44 events were sent to Devo since the collector started.

  • The last checkpoint timestamp was 2023-01-10 16:09:16.116750+00:00.

  • 21 events where sent to Devo between the last UTC checkpoint and now.

  • Those 21 events required 0.007 seconds to be delivered.

 Check memory usage

To check the memory usage of this collector, look for the following log records in the collector which are displayed every 5 minutes by default, always after running the memory-free process.

  • The used memory is displayed by running processes and the sum of both values will give the total used memory for the collector.

  • The global pressure of the available memory is displayed in the global value.

  • All metrics (Global, RSS, VMS) include the value before freeing and after previous -> after freeing memory

INFO InputProcess::MainThread -> [GC] global: 20.4% -> 20.4%, process: RSS(34.50MiB -> 34.08MiB), VMS(410.52MiB -> 410.02MiB)
INFO OutputProcess::MainThread -> [GC] global: 20.4% -> 20.4%, process: RSS(28.41MiB -> 28.41MiB), VMS(705.28MiB -> 705.28MiB)

Change log for v1.x.x

Release

Released on

Release type

Details

Recommendations

v1.0.0

INITIAL RELEASE


Features:

  • Flow log: Enable the collection, storage, and presentation of information about the Internet Protocol (IP) traffic going to and from network interfaces within your Virtual Private Cloud (VPC).

Released with Devo Collector SDK v1.10.0

Recommended version

  • No labels