IBM Cloud Flow Logs for VPC enables the collection, storage, and presentation of information about the Internet Protocol (IP) traffic going to and from network interfaces within your Virtual Private Cloud (VPC).
The IBM Cloud Flow Logs for VPC collector collects flow logs from an IBM log collector instance and sends them to Devo.
Devo collector features
Feature
Details
Allow parallel downloading (multipod)
not allowed
Running environments
collector server
on-premise
Populated Devo events
table
Flattening preprocessing
yes
Allowed source events obfuscation
yes
Data sources
Data source
Description
API endpoint
Collector service name
Devo table
Available from release
IBM Cloud
IBM Cloud flow logs for VPC
/v2/export
flow_log
cloud.ibm.vpc.flow_log
v1.0.0
For more information on how the events are parsed, visit our page.
Flattening preprocessing
Click here to see an example of a flow log object. However, the collector gets the flow log objects one by one, instead of grouped. This is due to a pre-processing performed by IBM.
Pre-processing is performed on raw flow logs fetched from the Log Analysis instance. IBM Cloud sends flow log files to COS; these files represent all logs captured within five minute intervals. When the COS trigger sends the flow logs to Log Analysis, each individual log line is sent along with metadata.
When Devo fetches the log lines from the Log Analysis instance, some metadata fields are fetched (e.g. the flow log file key and the account) and the prefix flow_log_ is added to properties that belong to the individual log line. Other properties, such as capture_start_time and capture_end_time are common properties for all log lines contained within the original flow log file stored in COS.
The collector retrieves IBM Cloud flow logs for VPC from a Log Analysis instance. To achieve this, users must have previously set up logging for VPC to direct log objects to a COS Bucket. Additionally, a cloud function should be in place to read and insert these logs into the Log Analysis instance.
IBM Cloud offers a comprehensive guide on setting up logging and the cloud function object. This guide also introduces the IBM Cloud VPC flow logs project terraform solution, which encompasses the deployment of all necessary components and a demonstration of its functionality. For detailed documentation on deploying each component, please refer to the README file associated with the terraform solution.
In general, users should undertake the following steps (please keep in mind that the referenced terraform solution automates all component creation for the demo VPC configuration):
Establish a cloud function to fetch flow log files from the COS Bucket and transfer them to the Log Analysis instance. The source code for this cloud function is available in the aforementioned IBM Cloud VPC flow logs project terraform solution.
It's important to note that using the example terraform solution in its default configuration will establish and configure a sample VPC flow network, a COS Bucket for sample log storage, a Log Analysis instance, and a cloud function to read from the COS Bucket and forward logs to the Log Analysis instance. For deployment in production environments, users should adhere to their organization's guidelines regarding IBM Cloud component deployment, such as using infrastructure-as-code tools, consulting IBM professional services, and so on. Production environments can also have multiple COS buckets pushing logs to the same Log Analysis instance.
For the purposes of the Devo collector, only one Log Analysis instance can be configured – if your organization has multiple Log Analysis instances for VPC flow logs, then you must configure multiple Devo collectors.
Once the Log Analysis instance is created, users will be able to fetch the necessary credentials:
Setting
Details
service_key
The IBM Cloud Log Analysis instance service key.
The service key can be found in the IBM Cloud console via: Observability > Logging > Select the Flow Log log collector instance > Open Dashboard > Settings > Organization > API Keys > Service Keys
base_url
The IBM Cloud Flow Logs for VPC log collector API base URL. Select from the following API endpoints
This minimum configuration refers exclusively to those specific parameters of this integration. There are more required parameters related to the generic behavior of the collector. Check the setting sections for details.
Accepted authentication methods
Authentication method
Service key
Service key
Required
Base URL
Required
Run the collector
Once the data source is configured, you can either send us the required information if you want us to host and manage the collector for you (Cloud collector), or deploy and host the collector in your own machine using a Docker image (On-premise collector).
Collector services detail
This section is intended to explain how to proceed with specific actions for services.
flow_log
Verify data collection
Internal process and deduplication method
All flow log records are fetched via the v2 Export API and filtered/ordered by their created timestamp. The collector continually pulls new events since the last recorded timestamp. A unique hash value is computed for each event and used for deduplication purposes to ensure events are not fetched multiple times in subsequent pulls.
Please note: the collector fetches logs from a Log Analysis instance. Log Analysis can house many different log types. When fetching logs, the collector will attempt to identify a `vpc_crn` property key in the log to determine if the log is a VPC flow log. If this key does not exist, the collector will skip that log. For the purposes of statistics tracking in the collector log output, non-VPC flow logs are not counted as events received or events filtered.
If your collector logs indicate that the collector is successfully running but processing 0 valid flow logs, please ensure that the base URL and service key you provided for your Log Analysis instance contains valid flow logs for VPC; for example, if a user indicates a base URL and service key for an IBM Cloud Activity Tracker instance, then the collector will successfully run but never fetch valid flow logs for VPC.
Devo categorization and destination
All events of this service are ingested into the table cloud.ibm.vpc.flow_log
Setup output
A successful run has the following output messages for the setup module:
2023-08-31T09:30:01.135 INFO InputProcess::MainThread -> EventPullerSetup(unknown,ibm_cloud_flow_log#10001,flow_log#predefined) -> Starting thread
2023-08-31T09:30:01.137 INFO InputProcess::EventPullerSetup(unknown,ibm_cloud_flow_log#10001,flow_log#predefined) -> Testing fetch from /v2/export.
2023-08-31T09:30:01.794 INFO InputProcess::EventPullerSetup(unknown,ibm_cloud_flow_log#10001,flow_log#predefined) -> Successfully tested fetch from /v2/export. Source is pullable.
2023-08-31T09:30:01.794 INFO InputProcess::EventPullerSetup(unknown,ibm_cloud_flow_log#10001,flow_log#predefined) -> Setup for module <EventPuller> has been successfully executed
Puller output
2023-08-31T09:30:02.142 INFO InputProcess::EventPuller(ibm_cloud_flow_log,10001,flow_log,predefined) -> Running the persistence upgrade steps
2023-08-31T09:30:02.143 INFO InputProcess::EventPuller(ibm_cloud_flow_log,10001,flow_log,predefined) -> Running the persistence corrections steps
2023-08-31T09:30:02.143 INFO InputProcess::EventPuller(ibm_cloud_flow_log,10001,flow_log,predefined) -> Running the persistence corrections steps
2023-08-31T09:30:02.143 INFO InputProcess::EventPuller(ibm_cloud_flow_log,10001,flow_log,predefined) -> No changes were detected in the persistence
2023-08-31T09:30:02.144 INFO InputProcess::EventPuller(ibm_cloud_flow_log,10001,flow_log,predefined) -> EventPuller(ibm_cloud_flow_log,10001,flow_log,predefined) Finalizing the execution of pre_pull()
2023-08-31T09:30:02.144 INFO InputProcess::EventPuller(ibm_cloud_flow_log,10001,flow_log,predefined) -> Starting data collection every 60 seconds
2023-08-31T09:30:02.144 INFO InputProcess::EventPuller(ibm_cloud_flow_log,10001,flow_log,predefined) -> Pull Started
2023-08-31T09:30:02.145 INFO InputProcess::EventPuller(ibm_cloud_flow_log,10001,flow_log,predefined) -> Fetching all event logs via params={'from': 1693485435, 'to': 1693488602, 'prefer': 'head'}
2023-08-31T09:30:02.908 INFO InputProcess::EventPuller(ibm_cloud_flow_log,10001,flow_log,predefined) -> Sending 183 event(s) to my.app.ibm.cloud.flow_log
2023-08-31T09:30:02.932 INFO InputProcess::EventPuller(ibm_cloud_flow_log,10001,flow_log,predefined) -> No more pagination_id values returned. Setting pull_completed to True.
2023-08-31T09:30:02.935 INFO InputProcess::EventPuller(ibm_cloud_flow_log,10001,flow_log,predefined) -> Updating the persistence
2023-08-31T09:30:02.936 INFO InputProcess::EventPuller(ibm_cloud_flow_log,10001,flow_log,predefined) -> (Partial) Statistics for this pull cycle (@devo_pulling_id=1693488602141):Number of requests made: 1; Number of events received: 185; Number of duplicated events filtered out: 2; Number of events generated and sent: 183; Average of events per second: 231.194.
2023-08-31T09:30:02.936 INFO InputProcess::EventPuller(ibm_cloud_flow_log,10001,flow_log,predefined) -> Statistics for this pull cycle (@devo_pulling_id=1693488602141):Number of requests made: 1; Number of events received: 185; Number of duplicated events filtered out: 2; Number of events generated and sent: 183; Average of events per second: 231.142.
After a successful collector’s execution (that is, no error logs found), you will see the following log message:
023-08-31T09:30:02.936 INFO InputProcess::EventPuller(ibm_cloud_flow_log,10001,flow_log,predefined) -> The data is up to date!
2023-08-31T09:30:02.936 INFO InputProcess::EventPuller(ibm_cloud_flow_log,10001,flow_log,predefined) -> Data collection completed. Elapsed time: 0.795 seconds. Waiting for 59.205 second(s) until the next one
Restart the persistence
This collector uses persistent storage to download events in an orderly fashion and avoid duplicates. In case you want to re-ingest historical data or recreate the persistence, you can restart the persistence of this collector by following these steps:
Edit the configuration file.
Change the value of the start_time_in_utc parameter to a different one.
Save the changes.
Restart the collector.
The collector will detect this change and will restart the persistence using the parameters of the configuration file or the default configuration in case it has not been provided.
Troubleshooting
This collector has different security layers that detect both an invalid configuration and abnormal operation. This table will help you detect and resolve the most common errors.
Error Type
Error Id
Error Message
Cause
Solution
InitVariablesError
1
Invalid start_time_in_utc: {ini_start_str}. Must be in parseable datetime format.
The configured start_time_in_utc parameter is a non-parseable format.
Update the start_time_in_utc value to have the recommended format as indicated in the guide.
InitVariablesError
2
Invalid start_time_in_utc: {ini_start_str}. Must be in the past.
The configured start_time_in_utc parameter is a future date.
Update the start_time_in_utc value to a past datetime.
SetupError
101
Failed to fetch OAuth token from {token_endpoint}. Exception: {e}.
The provided credentials, base URL, and/or token endpoint is incorrect.
Revisit the configuration steps and ensure that the correct values were specified in the config file.
SetupError
102
Failed to fetch data from {endpoint}. Source is not pullable.
The provided credentials, base URL, and/or token endpoint is incorrect.
Revisit the configuration steps and ensure that the correct values were specified in the config file.
ApiError
401
Error during API call to [API provider HTML error response here]
The server returned an HTTP 401 response.
Ensure that the provided credentials are correct and provide read access to the targeted data.
ApiError
429
Too many concurrent requests.]
IBM Cloud is reporting that too many simultaneous requests are being made against the Log Analysis instance.
This error can happen when a user attempts to manually restart the collector frequently or otherwise query the Log Analysis instance while the collector is running. In practice, this error should naturally correct itself within 15 minutes of the original report so long as simultaneous query requests cease.
If the collector continues to report this error after 15 minutes, please ensure that there is not another script or user also making API requests to the Log Analysis instance.
Log Analysis concurrency limit is determined by your instance configuration and tier.
ApiError
498
Error during API call to [API provider HTML error response here]
The server returned an HTTP 500 response.
If the API returns a 500 but successfully completes subsequent runs then you may ignore this error. If the API repeatedly returns a 500 error, ensure the server is reachable and operational.
Collector operations
This section is intended to explain how to proceed with specific operations of this collector.
Verify collector operations
Initialization
The initialization module is in charge of setup and running the input (pulling logic) and output (delivering logic) services and validating the given configuration.
A successful run has the following output messages for the initializer module:
2023-01-10T15:22:57.146 INFO MainProcess::MainThread -> Loading configuration using the following files: {"full_config": "config.yaml", "job_config_loc": null, "collector_config_loc": null}
2023-01-10T15:22:57.146 INFO MainProcess::MainThread -> Using the default location for "job_config_loc" file: "/etc/devo/job/job_config.json"
2023-01-10T15:22:57.147 INFO MainProcess::MainThread -> "\etc\devo\job" does not exists
2023-01-10T15:22:57.147 INFO MainProcess::MainThread -> Using the default location for "collector_config_loc" file: "/etc/devo/collector/collector_config.json"
2023-01-10T15:22:57.148 INFO MainProcess::MainThread -> "\etc\devo\collector" does not exists
2023-01-10T15:22:57.148 INFO MainProcess::MainThread -> Results of validation of config files parameters: {"config": "config.yaml", "config_validated": True, "job_config_loc": "/etc/devo/job/job_config.json", "job_config_loc_default": True, "job_config_loc_validated": False, "collector_config_loc": "/etc/devo/collector/collector_config.json", "collector_config_loc_default": True, "collector_config_loc_validated": False}
2023-01-10T15:22:57.171 WARNING MainProcess::MainThread -> [WARNING] Illegal global setting has been ignored -> multiprocessing: False
Events delivery and Devo ingestion
The event delivery module is in charge of receiving the events from the internal queues where all events are injected by the pullers and delivering them using the selected compatible delivery method.
A successful run has the following output messages for the initializer module:
The Integrations Factory Collector SDK has 3 different senders services depending on the event type to delivery (internal, standard, and lookup). This collector uses the following Sender Services:
Sender services
Description
internal_senders
In charge of delivering internal metrics to Devo such as logging traces or metrics.
standard_senders
In charge of delivering pulled events to Devo.
Sender statistics
Each service displays its own performance statistics that allow checking how many events have been delivered to Devo by type:
Logging trace
Description
Number of available senders: 1
Displays the number of concurrent senders available for the given Sender Service.
sender manager internal queue size: 0
Displays the items available in the internal sender queue.
Standard - Total number of messages sent: 57, messages sent since "2023-01-10 16:09:16.116750+00:00": 0 (elapsed 0.000 seconds
Displays the number of events from the last time and following the given example, the following conclusions can be obtained:
44 events were sent to Devo since the collector started.
The last checkpoint timestamp was 2023-01-10 16:09:16.116750+00:00.
21 events where sent to Devo between the last UTC checkpoint and now.
Those 21 events required 0.007 seconds to be delivered.
Check memory usage
To check the memory usage of this collector, look for the following log records in the collector which are displayed every 5 minutes by default, always after running the memory-free process.
The used memory is displayed by running processes and the sum of both values will give the total used memory for the collector.
The global pressure of the available memory is displayed in the global value.
All metrics (Global, RSS, VMS) include the value before freeing and after previous -> after freeing memory
Flow log: Enable the collection, storage, and presentation of information about the Internet Protocol (IP) traffic going to and from network interfaces within your Virtual Private Cloud (VPC).