Table of Contents | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Purpose
The Microsoft Azure collector gets data from Azure cloud computing services. Common uses are:
Detect malicious Active Directory Entra ID authentication
Detect malicious role, policy, and group changes impacting cloud infrastructure
Correlate risky users identified by Entra ID with data you have in Devo
Detect malicious Application Gateway traffic
Detect failures and measure costs of virtual machines
...
Features | Details | |||
---|---|---|---|---|
Allow parallel downloading ( |
Note | The vm_metrics service cannot work in multipod mode. If you want to use the event_hubs service in multipod mode, you must not include a vm_service in the same collector. | Running environments |
on-premise |
Populated Devo events |
| |||
Flattening pre-processing |
| |||
Allowed source events obfuscation |
|
Data source description
...
Data source
...
Description
...
API endpoint
...
Collector service name
...
Devo table
...
VM Metrics
...
With the advantages of the Microsoft Azure API, one can obtain metrics about the deployed Virtual Machines, gathering them on our platform, making it easier to query and analyze in the Devo platform and Activeboards.
...
Azure Compute Management Client SDK and Azure Monitor Management Client SDK
...
vm_metrics
...
cloud.azure.vm.metrics_simple
...
Event Hubs
...
Several Microsoft Azure services can generate some type of execution information to be sent to an EventHub service. (see next section)
...
Azure Event Hubs SDK
...
event_hubs
and event_hubs_autodiscover
...
<auto_tag_description>
Info |
---|
Valid for all cloud.azure tables by setting the output option to stream to Event Hub. |
Event hubs: Auto-categorization of Microsoft Azure service messages
Many of the available Microsoft Azure services can generate some type of execution information to be sent to an EventHub service. This type of data can be categorized as events or metrics. The events, in turn, can be from different subtypes: audits, status, logs, etc.
All such data will be gathered by Devo’s Microsoft Azure collector and sent to our platform, where message auto-categorization functionality is enabled for sending the messages to relevant Devo tables in an automatic way.
Although EventHub is the service used for centralizing Azure services' data, it also generates information that can be sent to itself. Learn more in this article.
Note |
---|
In case the amount of egress data exceeds Throughput per Unit limits set by Azure (2 MB/s or 4096 events per second), it won’t be possible for Devo to continue reliable ingestion of data. You can monitor ingress/egress throughput in Azure Portal EventHub Namespace, and based on trends/alerts, you can add another EventHub to resolve this. To avoid this from happening in the first place, please follow scalability guidance provided by Microsoft in their technical documentation. |
Vendor setup
The Microsoft Azure collector centralizes the data with an Event Hub using the Azure SDK. To use it, you need to configure the resources in the Azure Portal and set the right permissions to access the information.
Event Hub events
Expand | ||
---|---|---|
| ||
If you want to use Azure Blob Storage for checkpointing purposes, you need to create a storage account to store the checkpoints. If you do not wish to use Azure Blob storage (i.e. you will use Devo local persistence), you can skip the Blob Storage configuration steps. Connection string
Role assignmentAlternatively, users can grant the necessary permissions to the registered application to access the Event Hub without using the Repeat steps 1-2 from the Connection String section to create the Storage Account.
|
Expand | ||
---|---|---|
| ||
Connection stringUsers can either obtain a connection string or use Role Assignments to allow the collector to access the Event Hub.
Role assignmentAlternatively, users can grant the necessary permissions to the registered application to access the Event Hub without using the Repeat all steps except the last one from the previous section to create the Event Hub.
|
Expand | ||
---|---|---|
| ||
|
Event Hub Auto-Discover
Expand | ||
---|---|---|
| ||
About the featureTo configure access to event hubs for the auto-discovery feature, you need to grant the necessary permissions to the registered application to access the Event Hub without using the Role assignment (Namespace)Repeat the steps from the Event Hubs Role Assignment section, except that the necessary role is the Azure Event Hubs Namespace Data Owner role. This allows the collector to enumerate the event hubs in the namespace and create consumer groups if necessary. |
Minimum configuration for basic pulling
Although this collector supports advanced configuration, the fields required to retrieve data with basic configuration are defined below.
Info |
---|
This minimum configuration refers exclusively to those specific parameters of this integration. There are more required parameters related to the generic behavior of the collector. Check setting sections for details. |
...
Setting
...
Details
...
tenant_id
...
The Azure application tenant ID.
...
client_id
...
The Azure application client ID.
...
client_secret
...
The Azure application client secret.
...
subscription_id
...
The Azure application subscription ID.
Info |
---|
For Azure Event Hub, it is enough with the event hub name and the connection string (and optionally consumer group). No credentials are required. |
Accepted authentication methods
...
Authentication method
...
Tenant ID
...
Client ID
...
Client secret
...
Subscription ID
...
OAuth2
...
Status | ||||
---|---|---|---|---|
|
...
Status | ||||
---|---|---|---|---|
|
...
Status | ||||
---|---|---|---|---|
|
...
Status | ||||
---|---|---|---|---|
|
Run the collector
Once the data source is configured, you can either send us the required information if you want us to host and manage the collector for you (Cloud collector), or deploy and host the collector in your own machine using a Docker image (On-premise collector).
...
Rw tab | ||
---|---|---|
|
We use a piece of software called Collector Server to host and manage all our available collectors.
To enable the collector for a customer:
In the Collector Server GUI, access the domain in which you want this instance to be created
Click Add Collector and find the one you wish to add.
In the Version field, select the latest value.
In the Collector Name field, set the value you prefer (this name must be unique inside the same Collector Server domain).
In the sending method select Direct Send. Direct Send configuration is optional for collectors that create
Table
events, but mandatory for those that createLookups
.In the Parameters section, establish the Collector Parameters as follows below:
Editing the JSON configuration
Code Block |
---|
{
"global_overrides": {
"debug": false
},
"inputs": {
"azure": {
"id": "<short_unique_id>",
"enabled": true,
"credentials": {
"subscription_id": "<subscription_id_value>",
"client_id": "<client_id_value>",
"client_secret": "<client_secret_value>",
"tenant_id": "<tenant_id_value>"
},
"environment": "<environment_value>",
"services": {
"vm_metrics": {
"request_period_in_seconds": "<request_period_in_seconds_value>",
"start_time_in_utc": "<start_time_in_utc_value>",
"include_resource_id_patterns": [
"<include_resource_id_patterns_values>"
],
"exclude_resource_id_patterns": [
"<exclude_resource_id_patterns_values>"
]
}
}
},
"azure_event_hub": {
"id": "<short_unique_id>",
"enabled": true,
"credentials": {
"subscription_id": "<subscription_id_value>",
"client_id": "<client_id_value>",
"client_secret": "<client_secret_value>",
"tenant_id": "<tenant_id_value>"
},
"environment": "<environment_value>",
"services": {
"event_hubs": {
"override_pull_report_frequency_seconds": "<override_pull_report_frequency_seconds_value>",
"override_consumer_client_ttl_seconds": "<override_consumer_client_ttl_seconds_value>",
"queues": {
"<queue_name_value>": {
"namespace": "<namespace_value>",
"event_hub_name": "<event_hub_name_value>",
"event_hub_connection_string": "<event_hub_connection_string_value>",
"consumer_group": "<consumer_group_value>",
"blob_storage_connection_string": "<blob_storage_connection_string_value>",
"blob_storage_container_name": "<blob_storage_container_name_value>",
"blob_storage_account_name": "<blob_storage_account_name_value>",
"compatibility_version": "<compatibility_version_value>",
"duplicated_messages_mechanism": "<duplicated_messages_mechanism_value>",
"override_starting_position": "<override_starting_position_value>",
"override_tag": "<override_tag_value>",
"extend_tag": "<extend_tag_value>",
"client_thread_limit": "<client_thread_limit_value>",
"uamqp_transport": "<uamqp_transport_value>",
"partition_ids": ["<partition_id>"]
}
}
},
"event_hubs_auto_discover": {
"resource_group": "<resource_group_value>",
"namespace": "<namespace_value>",
"blob_storage_account_name": "<blob_storage_account_name_value>",
"blob_storage_connection_string": "<blob_storage_connection_string_value>",
"consumer_group": "<consumer_group_value>",
"duplicated_messages_mechanism": "<duplicated_messages_mechanism_value>",
"override_pull_report_frequency_seconds": "<override_pull_report_frequency_seconds_value>",
"override_consumer_client_ttl_seconds": "<override_consumer_client_ttl_seconds_value>",
"override_starting_position": "<override_starting_position_value>",
"override_blob_storage_container_prefix": "<override_blob_storage_container_prefix_value>",
"client_thread_limit": "<client_thread_limit_value>",
"uamqp_transport": "<uamqp_transport_value>"
}
}
}
}
} |
The following table outlines the parameters available for configuring the collector. Each parameter is categorized by its necessity (mandatory or optional), data type, acceptable values or formats, and a brief description.
...
Parameter
...
Data type
...
Requirement
...
Value range / Format
...
Description
...
short_unique_id
...
str
...
Mandatory
...
Min length: 1, Max length: 5
...
Short, unique ID for input service, used in persistence addressing. Avoid duplicates to prevent collisions.
...
tenant_id_value
...
str
...
Mandatory
...
Min length: 1
...
Tenant ID for Azure authentication.
...
client_id_value
...
str
...
Mandatory
...
Min length: 1
...
Client ID for Azure authentication.
...
client_secret_value
...
str
...
Mandatory
...
Min length: 1
...
Client secret for Azure authentication.
...
subscription_id_value
...
str
...
Mandatory
...
Min length: 1
...
Azure subscription ID.
...
environment_value
...
str
...
Optional
...
Min length: 1
...
Differentiates environments (e.g., dev, prod). Remove if unused.
...
request_period_in_seconds_value
...
int
...
Optional
...
Min: 60
...
Custom period in seconds between data pulls, overriding default (300s).
...
start_time_in_utc_value
...
str
...
Optional
...
UTC datetime format: %Y-%m-%dT%H-%M-%SZ
...
Custom start date for data retrieval, for historical data download. Remove if unused.
...
include_resource_id_patterns_values
...
[str]
...
Optional
...
Glob patterns e.g., ["*VM-GROUP-1*"]
...
Includes resources matching patterns. Remove if unused.
...
exclude_resource_id_patterns_values
...
[str]
...
Optional
...
Glob patterns e.g., ["*VM-GROUP-1*"]
...
Excludes resources matching patterns. Remove if unused.
...
queue_name_value
...
str
...
Mandatory
...
Min length: 1
...
Name for the queue, appears in related logs.
...
event_hub_name_value
...
str
...
Mandatory
...
Min length: 1
...
Name of the Event Hub to pull events from.
...
event_hub_connection_string_value
...
str
...
Mandatory
...
Min length: 1
...
Connection string for the Event Hub.
...
consumer_group_value
...
str
...
Optional
...
Min length: 1, Default: $Default
...
Consumer group for the Event Hub. Defaults to $Default
.
...
events_use_autocategory_value
...
bool
...
Optional
...
Default: true
...
Enables auto-tagging of events. This value is always true.
...
blob_storage_connection_string_value
...
str
...
Optional
...
Min length: 1
...
Connection string for blob storage, optional for Azure Blob Storage checkpointing.
...
blob_storage_container_name_value
...
str
...
Optional
...
Min length: 1
...
Blob storage container name, required if using Azure Blob Storage checkpointing.
...
blob_storage_account_name_value
...
str
...
Optional
...
Min length: 1
...
Blob storage account name, alternative to using connection string for checkpointing.
...
compatibility_version_value
...
str
...
Optional
...
Version strings
...
Compatibility version for event processing.
...
duplicated_messages_mechanism_value
...
str
...
Optional
...
One of: "local"
, "global"
, "none"
...
Deduplication mechanism for messages: local, global, or none.
...
override_starting_position_value
...
str
...
Optional
...
One of: "-1"
, "@latest"
, "[UTC datetime value]"
...
Starting position for event
fetching: from the beginning of
available data (-1), from the
latest data fetched (@fetched),
or a specific datetime (%Y-%m-
%dT%H-%M-%SZ format).
...
override_tag_value
...
str
...
Optional
...
Tag-friendly string
...
Optional tag to override the default tagging mechanism. See Event Hubs Tagging Configuration.
...
extend_tag_value
...
str
...
Optional
...
Object that can include any of the following properties: default_tag, tag_map, jmespath_refs
...
Advanced feature. Allows users to add/update various properties of the tag. If the user utilized override_tag
and configured a simple tag string, this parameter will have no effect. If supplied, default_tag
overrides the default tag, jmespath_refs
adds/updates jmespath substitution values, and tag_map
will add/update various tag paths to the pre-existing tag map. See Event Hubs Tagging Configuration.
...
override_pull_report_frequency_seconds_value
...
int
...
Optional
...
Default: 60
...
Frequency in seconds for reporting pull statistics in logs.
...
override_consumer_client_ttl_seconds_value
...
int
...
Optional
...
Default varies by service
...
Time-to-live in seconds for consumer clients, after which the collector restarts the pull cycle.
...
resource_group_value
...
str
...
Mandatory
...
Min length: 1
...
Azure resource group for event hub discovery.
...
namespace_value
...
str
...
Mandatory
...
Min length: 1
...
Namespace within Azure for event hub discovery.
...
override_blob_storage_container_prefix_value
...
str
...
Optional
...
Min length: 3, Max length: 10; Default: devo-
...
Prefix for blob storage containers created by auto-discovery service. Remove if unused.
...
uamqp_transport_value
...
bool
...
Optional
...
Default: false
...
Allows users to override/force
event hub SDK to use legacy
UAMQP transport mechanism
(true)instead of the
default/current PyAMQP
mechanism (false).
...
<partition_ids>
...
str
...
Optional
...
List of
partition
number, as["1","3","5","7"]
...
Allows to define which partitions are going to be connected by this instance of the collector. It overrides client_thread_limit_value
...
client_thread_limit_value
...
int
...
Optional
...
Min value: 1
...
Adv feature - most users should use partition_ids
instead to explicitly define what partitions the collector instance will query. Number of consumer threads that the collector will create. By default, collector will create as many threads as there are consumers in the event hub.
Info |
---|
Parameters marked as "Mandatory" are required for the collector's configuration. Optional parameters can be omitted or removed if not used, but they provide additional customization and control over the collector's behavior. |
Note |
---|
Local deduplication means that duplicates are deleted in the data received from the current collector. Global means that duplicates are search for all the instances of the collector. None means that duplicates are not deleted. See more details in the section Internal Process and Deduplication Method within the Even Hubs section of the Collector Services Detail. If you deploy one collector, use local. If you deploy several instances of the collector, use global. |
Note |
---|
|
Rw tab | ||
---|---|---|
|
This data collector can be run in any machine that has the Docker service available because it should be executed as a docker container. The following sections explain how to prepare all the required setup for having the data collector running.
Structure
The following directory structure will be required as part of the setup procedure (it can be created under any directory):
Code Block |
---|
<any_directory>
└── devo-collectors/
└── azure/
├── certs/
│ ├── chain.crt
│ ├── <your_domain>.key
│ └── <your_domain>.crt
├── state/
└── config/
└── config-azure.yaml |
Devo credentials
In Devo, go to Administration → Credentials → X.509 Certificates, download the Certificate, Private key and Chain CA and save them in <any_directory>/devo-collectors/azure/certs
. Learn more about security credentials in Devo here.
...
Editing the config.yaml file
In the config-azure.yaml file, replace the <app_id>
, <active_directory_id>
, <subscription_id>
and <secret>
values and enter the ones that you got in the previous steps. In the <short_unique_identifier>
placeholder, enter the value that you choose.
Code Block | ||
---|---|---|
| ||
globals:
debug: false
id: <collector_id_value>
name: <collector_name_value>
persistence:
type: filesystem
config:
directory_name: state
outputs:
devo_1:
type: devo_platform
config:
address: <devo_address>
port: 443
type: SSL
chain: <chain_filename>
cert: <cert_filename>
key: <key_filename>
inputs:
azure:
id: <short_unique_id>
enabled: true
credentials:
subscription_id: <subscription_id_value>
client_id: <client_id_value>
client_secret: <client_secret_value>
tenant_id: <tenant_id_value>
environment: <environment_value>
services:
vm_metrics:
request_period_in_seconds: <request_period_in_seconds_value>
start_time_in_utc: <start_time_in_utc_value>
include_resource_id_patterns: [<include_resource_id_patterns_values>]
exclude_resource_id_patterns: [<exclude_resource_id_patterns_values>]
azure_event_hub:
id: <short_unique_id>
enabled: true
credentials:
subscription_id: <subscription_id_value>
client_id: <client_id_value>
client_secret: <client_secret_value>
tenant_id: <tenant_id_value>
environment: <environment_value>
services:
event_hubs:
override_pull_report_frequency_seconds: <override_pull_report_frequency_seconds_value>
override_consumer_client_ttl_seconds: <override_consumer_client_ttl_seconds_value>
queues:
<queue_name_value>:
namespace: <namespace_value>
event_hub_name: <event_hub_name_value>
event_hub_connection_string: <event_hub_connection_string_value>
consumer_group: <consumer_group_value>
events_use_auto_category: <events_use_auto_category_value>
blob_storage_connection_string: <blob_storage_connection_string_value>
blob_storage_container_name: <blob_storage_container_name_value>
blob_storage_account_name: <blob_storage_account_name_value>
compatibility_version: <compatibility_version_value>
duplicated_messages_mechanism: <duplicated_messages_mechanism_value>
override_starting_position: <override_starting_position_value>
override_tag: <override_tag_value>
client_thread_limit: <client_thread_limit_value>
uamqp_transport: <uamqp_transport_value>
partition_ids: [<partition_id>]
event_hubs_auto_discover:
resource_group: <resource_group_value>
namespace: <namespace_value>
blob_storage_account_name: <blob_storage_account_name_value>
blob_storage_connection_string: <blob_storage_connection_string_value>
consumer_group: <consumer_group_value>
events_use_auto_category: <events_use_auto_category_value>
duplicated_messages_mechanism: <duplicated_messages_mechanism_value>
override_pull_report_frequency_seconds: <override_pull_report_frequency_seconds_value>
override_consumer_client_ttl_seconds: <override_consumer_client_ttl_seconds_value>
override_starting_position: <override_starting_position_value>
override_blob_storage_container_prefix: <override_blob_storage_container_prefix_value>
client_thread_limit: <client_thread_limit_value>
uamqp_transport: <uamqp_transport_value> |
...
Parameter
...
Data type
...
Requirement
...
Value range / Format
...
Description
...
collector_id_value
...
str
...
Mandatory
...
Min length: 1, Max length: 5
...
Unique identifier for the collector.
...
collector_name_value
...
str
...
Mandatory
...
Min length: 1, Max length: 10
...
Name assigned to the collector.
...
devo_address
...
str
...
Mandatory
...
One of: collector-us.devo.io
, collector-eu.devo.io
...
Devo Cloud destination for events.
...
chain_filename
...
str
...
Mandatory
...
Min length: 4, Max length: 20
...
Filename of the chain.crt
file from your Devo domain.
...
cert_filename
...
str
...
Mandatory
...
Min length: 4, Max length: 20
...
Filename of the file.cert
from your Devo domain.
...
key_filename
...
str
...
Mandatory
...
Min length: 4, Max length: 20
...
Filename of the file.key
from your Devo domain.
...
short_unique_id
...
str
...
Mandatory
...
Min length: 1, Max length: 5
...
Short, unique ID for input service, used in persistence addressing. Avoid duplicates to prevent collisions.
...
tenant_id_value
...
str
...
Mandatory
...
Min length: 1
...
Tenant ID for Azure authentication.
...
client_id_value
...
str
...
Mandatory
...
Min length: 1
...
Client ID for Azure authentication.
...
client_secret_value
...
str
...
Mandatory
...
Min length: 1
...
Client secret for Azure authentication.
...
subscription_id_value
...
str
...
Mandatory
...
Min length: 1
...
Azure subscription ID.
...
environment_value
...
str
...
Optional
...
Min length: 1
...
Differentiates environments (e.g., dev, prod). Remove if unused.
...
request_period_in_seconds_value
...
int
...
Optional
...
Min: 60
...
Custom period in seconds between data pulls, overriding default (300s).
...
start_time_in_utc_value
...
str
...
Optional
...
UTC datetime format: %Y-%m-%dT%H-%M-%SZ
...
Custom start date for data retrieval, for historical data download. Remove if unused.
...
include_resource_id_patterns_values
...
[str]
...
Optional
...
Glob patterns e.g., ["*VM-GROUP-1*"]
...
Includes resources matching patterns. Remove if unused.
...
exclude_resource_id_patterns_values
...
[str]
...
Optional
...
Glob patterns e.g., ["*VM-GROUP-1*"]
...
Excludes resources matching patterns. Remove if unused.
...
queue_name_value
...
str
...
Mandatory
...
Min length: 1
...
Name for the queue, appears in related logs.
...
event_hub_name_value
...
str
...
Mandatory
...
Min length: 1
...
Name of the Event Hub to pull events from.
...
event_hub_connection_string_value
...
str
...
Mandatory
...
Min length: 1
...
Connection string for the Event Hub.
...
consumer_group_value
...
str
...
Optional
...
Min length: 1, Default: $Default
...
Consumer group for the Event Hub. Defaults to $Default
.
...
events_use_autocategory_value
...
bool
...
Optional
...
Default: false
...
Enables/disables auto-tagging of events.
...
blob_storage_connection_string_value
...
str
...
Optional
...
Min length: 1
...
Connection string for blob storage, optional for Azure Blob Storage checkpointing.
...
blob_storage_container_name_value
...
str
...
Optional
...
Min length: 1
...
Blob storage container name, required if using Azure Blob Storage checkpointing.
...
blob_storage_account_name_value
...
str
...
Optional
...
Min length: 1
...
Blob storage account name, alternative to using connection string for checkpointing.
...
compatibility_version_value
...
str
...
Optional
...
Version strings
...
Compatibility version for event processing.
...
duplicated_messages_mechanism_value
...
str
...
Optional
...
One of: "local"
, "global"
, "none"
...
Deduplication mechanism for messages: local, global, or none (see note below).
...
override_starting_position_value
...
str
...
Optional
...
One of: "-1"
, "@latest"
, "[UTC datetime value]"
...
Starting position for event
fetching: from the beginning of
available data (-1), from the
latest data fetched (@fetched),
or a specific datetime (%Y-%m-
%dT%H-%M-%SZ format).
...
override_tag_value
...
str
...
Optional
...
Tag-friendly string
...
Optional tag to override the default tagging mechanism. See Event Hubs Tagging Configuration.
...
extend_tag_value
...
str
...
Optional
...
Object that can include any of the following properties: default_tag
, tag_map
, jmespath_refs
.
...
Advanced feature. Allows users to add/update various properties of the tag. If the user utilized override_tag
and configured a simple tag string, this parameter will have no effect. If supplied, default_tag
overrides the default tag, jmespath_refs
add/update jmespath substitution values, and tag_map
will add/update various tag paths to the pre-existing tag map. See Event Hubs Tagging Configuration.
...
override_pull_report_frequency_seconds_value
...
int
...
Optional
...
Default: 60
...
Frequency in seconds for reporting pull statistics in logs.
...
override_consumer_client_ttl_seconds_value
...
int
...
Optional
...
Default varies by service
...
Time-to-live in seconds for consumer clients, after which the collector restarts the pull cycle.
...
resource_group_value
...
str
...
Mandatory
...
Min length: 1
...
Azure resource group for event hub discovery.
...
namespace_value
...
str
...
Mandatory
...
Min length: 1
...
Namespace within Azure for event hub discovery.
...
override_blob_storage_container_prefix_value
...
str
...
Optional
...
Min length: 3, Max length: 10; Default: devo-
...
Prefix for blob storage containers created by auto-discovery service. Remove if unused.
...
uamqp_transport_value
...
bool
...
Optional
...
Default: false
...
Allows users to override/force
event hub SDK to use legacy
UAMQP transport mechanism
(true)instead of the
default/current PyAMQP
mechanism (false)
...
<partition_ids>
...
str
...
Optional
...
List of
partition
number, as["1","3","5","7"]
...
Allows to define which partitions are going to be connected by this instance of the collector. It overrides client_thread_limit_value
...
client_thread_limit_value
...
int
...
Optional
...
Min value: 1
...
Advanced feature - most users should use partition_ids
instead to explicitly define what partitions the collector instance will query. Number of consumer threads that the collector will create. By default, collector will create as many threads as there are consumers in the event hub.
Info |
---|
Parameters marked as "Mandatory" are required for the collector's configuration. Optional parameters can be omitted or removed if not used, but they provide additional customization and control over the collector's behavior. |
Note |
---|
Local deduplication means that duplicates are deleted in the data received from the current collector. Global means that duplicates are search for all the instances of the collector. None means that duplicates are not deleted. See more details in the section Internal Process and Deduplication Method. If you deploy one collector, use local. If you deploy several instances of the collector, use global. |
Note |
---|
|
Download the Docker image
The collector should be deployed as a Docker container. Download the Docker image of the collector as a .tgz file by clicking the link in the following table:
Collector Docker image | SHA-256 hash |
---|---|
|
Use the following command to add the Docker image to the system:
Code Block |
---|
gunzip -c collector-azure-docker-image-<version>.tgz | docker load |
Info |
---|
Once the Docker image is imported, it will show the real name of the Docker image (including version info). |
The Docker image can be deployed on the following services:
...
Execute the following command on the root directory <any_directory>/devo-collectors/azure/
Code Block |
---|
docker run \
--name collector-azure \
--volume $PWD/certs:/devo-collector/certs \
--volume $PWD/config:/devo-collector/config \
--volume $PWD/state:/devo-collector/state \
--env CONFIG_FILE=config-azure.yaml \
--rm -it docker.devo.internal/collector/azure:<version> |
Note |
---|
Replace |
...
The following Docker Compose file can be used to execute the Docker container. It must be created in the <any_directory>/devo-collectors/azure/
directory.
Code Block | ||
---|---|---|
| ||
version: '3'
services:
collector-azure:
image: docker.devo.internal/collector/azure:${IMAGE_VERSION:-latest}
container_name: collector-azure
volumes:
- ./certs:/devo-collector/certs
- ./config:/devo-collector/config
- ./state:/devo-collector/state
environment:
- CONFIG_FILE=${CONFIG_FILE:-config-azure.yaml} |
To run the container using docker-compose, execute the following command from the <any_directory>/devo-collectors/azure/
directory:
Code Block |
---|
IMAGE_VERSION=<version> docker-compose up -d |
Note |
---|
Replace |
Collector services detail
This section is intended to explain how to proceed with specific actions for services.
Expand | ||
---|---|---|
| ||
Internal process and deduplication methodAll VM metrics data are pulled with a time grain value of Devo categorization and destinationAll events of this service are ingested into the table Restart the persistenceThis collector uses persistent storage to download events in an orderly fashion and avoid duplicates. In case you want to re-ingest historical data or recreate the persistence, you can restart the persistence of this collector by following these steps:
The collector will detect this change and will restart the persistence using the parameters of the configuration file or the default configuration in case it has not been provided. |
...
title | Event Hubs (event_hubs) |
---|
General principles
Understanding the following principles of Azure Event Hubs is crucial:
Consumer Groups: A single event hub can have multiple consumer groups, each representing a separate view of the event stream.
Checkpointing: The SDK supports checkpoint mechanisms to balance the load among consumers for the same event hub and consumer group. Supported
mechanisms include:Azure Blob Storage Checkpoint: Recommended to use one container per consumer group per event hub.
Partition Restrictions: Azure Event Hubs limits the number of partitions based on the event hub tier. For quotas and limits, refer to the official documentation.
Configuration options
Devo supports various configurations to cater to different Azure setups.
Event Hubs Tagging Configuration
Event Hubs supports multiple tagging parameters and formats to categorize and manage event data efficiently. Below are the configuration options for overriding, auto-categorizing, and extending tags.
The default configuration of the tag mapping can be found in this article.
Override tag
Note |
---|
Advanced setting. Please consult to Devo support before use advanced tag map. |
To customize the default tag behavior, users can configure the override_tag
parameter within the Event Hub queue configuration. This parameter allows either a simple tag string or a more advanced tag mapping structure to be applied to all records.
The advanced tag map structure follows this format:
default_tag
: A fallback tag applied to all records not matched by anytag_map
entry.tag_map
: A list of tag entries, each containing a tag value and a JMESPath expression to match specific records.jmespath_refs
: Reference variables that can be used within JMESPath expressions in thetag_map
. These act as reusable values within the tag map's matching logic.
Code Block |
---|
override_tag:
default_tag: "tag_value"
tag_map:
- tag: "tag_value"
jmespath: "[?condition]"
- tag: "tag_value"
jmespath: "[?condition]"
...
jmespath_refs:
jmespath_ref_1: "{jmespath_expression_1}"
jmespath_ref_2: "{jmespath_expression_2}"
... |
Code Block |
---|
"override_tag": {
"default_tag": "tag_value",
"tag_map": [
{
"tag": "tag_value",
"jmespath": "[?condition]"
},
{
"tag": "tag_value",
"jmespath": "[?condition]"
}
.......
],
"jmespath_refs": {
"jmespath_ref_1": "{jmespath_expression_1}",
"jmespath_ref_2": "{jmespath_expression_2}"
}
........
} |
Auto-Category Tagging
Note |
---|
From version 2.4 onwards, Auto Category is always enabled |
Auto-category automatically appends pre-defined tags to the default tag (or the override_tag
, if specified), enabling Azure events to be mapped dynamically to the appropriate Devo tag.
The system attempt to extract both the resource ID and the event category from the Azure event. If an event does not match any preconfigured tag mappings, it will be categorized under the following format: cloud.azure.{resource_id}.{category}.{queue_name}
.
Auto-category tags are evaluated before the default or override tags.
Extend tag
Users can further customize tags by using the extend_tag
parameter in the Event Hub queue configuration. This feature allows for the extension or updating of various tag properties. If override_tag
is being used, the extend_tag
will modify it; otherwise, it will extend the default tag.
The extend_tag
parameter offers the following options:
default_tag
: Replaces the existing default tag.jmespath_refs
: Adds or updates JMESPath substitution values.tag_map
: Adds or updates entries in the existing tag map. If anextend_tag
entry matches an existing tag or JMESPath expression, that entry is replaced; otherwise, the new entry is appended.
Here is an example of extend_tag
configuration:
Note |
---|
Please note that the actual internal tag structure is not displayed in this guide as it is subject to change. |
Code Block |
---|
extend_tag:
default_tag: "new_tag"
tag_map:
- tag: "my.app.sql"
jmespath: "[?category=='sql']"
- tag: "my.app.eh.storage"
jmespath: "[?category=='storage']"
...
jmespath_refs:
jmespath_ref_1: "{jmespath_expression_1}"
jmespath_ref_2: "{jmespath_expression_2}"
... |
Code Block |
---|
"extend_tag": {
"default_tag": "new_tag",
"tag_map": [
{
"tag": "my.app.sql",
"jmespath": "[?category=='sql']"
},
{
"tag": "my.app.eh.storage",
"jmespath": "[?category=='storage']"
}
........
],
"jmespath_refs": {
"jmespath_ref_1": "{jmespath_expression_1}",
"jmespath_ref_2": "{jmespath_expression_2}"
........
}
} |
If the original, internal tag structure looks like this:
Code Block |
---|
tag:
default_tag: "my.app.eh"
tag_map:
- tag: "my.app.eh.authentication"
jmespath: "[?category=='auth']"
- tag: "my.app.eh.sql"
jmespath: "[?category=='sql']" |
Code Block |
---|
"tag": {
"default_tag": "my.app.eh",
"tag_map": [
{
"tag": "my.app.eh.authentication",
"jmespath": "[?category=='auth']"
},
{
"tag": "my.app.eh.sql",
"jmespath": "[?category=='sql']"
}
]
} |
And the extend_tag
configuration is applied, the resultant tag will be:
Code Block |
---|
tag:
default_tag: "new_tag"
tag_map:
- tag: "my.app.eh.sql"
jmespath: "[?category=='sql']"
- tag: "my.app.eh.storage"
jmespath: "[?category=='storage']"
- tag: "my.app.eh.authentication"
jmespath: "[?category=='auth']"
jmespath_refs:
jmespath_ref_1: "{jmespath_expression_1}"
jmespath_ref_2: "{jmespath_expression_2}" |
Code Block |
---|
"tag": {
"default_tag": "new_tag",
"tag_map": [
{
"tag": "my.app.eh.sql",
"jmespath": "[?category=='sql']"
},
{
"tag": "my.app.eh.storage",
"jmespath": "[?category=='storage']"
},
{
"tag": "my.app.eh.authentication",
"jmespath": "[?category=='auth']"
}
],
"jmespath_refs": {
"jmespath_ref_1": "{jmespath_expression_1}",
"jmespath_ref_2": "{jmespath_expression_2}"
}
} |
Event Hubs authentication configuration
Event Hubs authentication can be via connection strings or client credentials (assigning the Azure Event Hubs Data Receiver
role). Preference is given to connection string configuration when both are available.
...
Required parameters
...
Connection string configuration
...
event_hub_connection_string
event_hub_name
Yaml
Code Block |
---|
inputs:
azure_event_hub:
id: 100001
enabled: true
services:
event_hubs:
queues:
queue_a:
event_hub_name: event_hub_value
event_hub_connection_string: event_hub_connection_string_value |
Json
Code Block |
---|
"inputs": {
"azure_event_hub": {
"id": 100001,
"enabled": true,
"services": {
"event_hubs": {
"queues": {
"queue_a": {
"event_hub_name": "event_hub_value",
"event_hub_connection_string": "event_hub_connection_string_value"
}
}
}
}
}
} |
...
Client credentials configuration
...
event_hub_name
namespace
Credentials.client_id
Credentials.client_secret
Credentials.tenant_id
Yaml
Code Block |
---|
inputs:
azure_event_hub:
id: 100001
enabled: true
credentials:
client_id: client_id_value
client_secret: client_secret_value
tenant_id: tenant_id_value
services:
event_hubs:
queues:
queue_a:
namespace: namespace_value
event_hub_name: event_hub_name_value |
Code Block |
---|
"inputs": {
"azure_event_hub": {
"id": 100001,
"enabled": true,
"credentials": {
"client_id": "client_id_value",
"client_secret": "client_secret_value",
"tenant_id": "tenant_id_value"
},
"services": {
"event_hubs": {
"queues": {
"queue_a": {
"namespace": "namespace_value",
"event_hub_name": "event_hub_name_value"
}
}
}
}
}
} |
Azure Blob storage checkpoint configuration
Optional and configurable via connection strings or client credentials. If all possible parameters are present, the collector will favor the connection string configuration.
...
Required parameters
...
Connection string configuration
...
blob_storage_connection_string
blob_storage_container_name
Yaml
Code Block |
---|
inputs:
azure_event_hub:
id: 100001
enabled: true
services:
event_hubs:
queues:
queue_a:
event_hub_name: event_hub_value
event_hub_connection_string: event_hub_connection_string_value
blob_storage_connection_string: blob_storage_connection_string_value
blob_storage_container_name: blob_storage_container_name_value |
Json
Code Block |
---|
"inputs": {
"azure_event_hub": {
"id": 100001,
"enabled": true,
"services": {
"event_hubs": {
"queues": {
"queue_a": {
"event_hub_name": "event_hub_value",
"event_hub_connection_string": "event_hub_connection_string_value",
"blob_storage_connection_string": "blob_storage_connection_string_value",
"blob_storage_container_name": "blob_storage_container_name_value"
}
}
}
}
}
} |
...
Client credentials configuration
...
blob_storage_account_name
blob_storage_container_name
Credentials.client_id
Credentials.client_secret
Credentials.tenant_id
Yaml
Code Block |
---|
inputs:
azure_event_hub:
id: 100001
enabled: true
credentials:
client_id: client_id_value
client_secret: client_secret_value
tenant_id: tenant_id_value
services:
event_hubs:
queues:
queue_a:
event_hub_name: event_hub_value
event_hub_connection_string: event_hub_connection_string_value
blob_storage_account_name: blob_storage_account_name_value
blob_storage_container_name: blob_storage_container_name_value |
Json
Code Block |
---|
"inputs": {
"azure_event_hub": {
"id": 100001,
"enabled": true,
"credentials": {
"client_id": "client_id_value",
"client_secret": "client_secret_value",
"tenant_id": "tenant_id_value"
},
"services": {
"event_hubs": {
"queues": {
"queue_a": {
"event_hub_name": "event_hub_value",
"event_hub_connection_string": "event_hub_connection_string_value",
"blob_storage_account_name": "blob_storage_account_name_value",
"blob_storage_container_name": "blob_storage_container_name_value"
}
}
}
}
}
} |
Workflow overview
Queue Iteration: Iterate over configured
queues
.Event Hub Details: Retrieve details, including partition count.
Client Creation: For each queue, create Event Hub consumer clients.
If the user configured a
client_thread_limit
, clients will be created for each event hub partition up to the specified limit. In these cases, the consumer clients will not be explicitly assigned partitions and load balancing and partition assignment will be performed dynamically by the event hub SDK utilizing the checkpoints.If the user did not configure a
client_thread_limit
, the collector will create a consumer client for each partition and explicitly assign the respective partition ID to the consumer client.
Event Fetching: Enable consumers to start fetching events.
Load balancing and event processing occurs throughout the fetching loop.
Event Processing: Events are fetched in batches. Records are extracted from event batches, deduplicated, tagged, and sent to Devo.
Checkpointing: After processing an event batch, checkpoints are updated so that the events will not be fetched again.
Configuration considerations
...
Multi-pod mode
...
While multi-pod mode is supported and represents the highest throughput possible for the collector, it requires the user to configure the collector in a specific manner to ensure that the collector operates efficiently and does not send duplicate events to Devo (see below). In most cases, multi-pod mode is unnecessary.
High Throughput: Multi-pod mode allows potentially the highest throughput.
Multi-pod mode is recommended for scenarios in which the user has more partitions than can be supported on a single collector instance.
Consumer Client Thread Limit: The user should specify a
client_thread_limit
to ensure that the collector utilizes load balancing instead of explicitly assigning partition IDs to the consumer clients.In load-balancing mode, having fewer consumer clients than partitions is allowable, but not as efficient as some consumer clients will fetch events from multiple partitions.
In load-balancing mode, having more consumer clients than partitions is allowable, but not as efficient as some consumer clients will not be assigned any partitions.
The most efficient design is to ensure that there are as many consumer clients as there are partitions distributed amongst the pods. The easiest way to achieve this is to set the
client_thread_limit
to1
and creating as many pods as there are partitions.
Azure Blob Storage Checkpointing: Required for multi-pod mode.
Warning: Running in multi-pod with local checkpointing will result in duplicate events being sent to Devo because the load balancing operation will have no visibility of the other pods' checkpoints.
...
Standard mode
...
Both checkpointing options are supported. In standard mode, the collector will automatically create one consumer client thread per partition per event hub.
If the event hubs you wish to fetch data from have too many partitions that can be supported on a single instance (i.e. you have 100 event hubs each with 32 partitions, therefore the collector attempts to create 3200 consumer clients), then you should create multiple collector instances and configure each one to fetch from a subset of the desired events hubs.
Internal process and deduplication method
The collector uses the event_hubs
service to pull events from the Azure Event Hubs. Each queue in the event_hubs
service represents an event hub that is polled for events.
...
Collector deduplication mechanisms
...
Events are deduplicated using the duplicated_messages_mechanism
parameter. There are two methods available:
Local Deduplication: Ensures that subsequent duplicate events from the same event hub are not sent to Devo. This method operates individually within each consumer client.
Global Deduplication: Utilizes a shared cache across all event hub consumers for a given collector. As events are ingested into Devo, the collector checks if the event has already been consumed by another event hub consumer. The event will not be sent to Devo if it has already been consumed. The global cache
tracks the last 1000 events for each consumer client.
If the global
deduplication method is selected, the collector will automatically employ the local deduplication method as well.
...
Checkpointing mechanisms
...
The collector offers two distinct methods for checkpointing, each designed to prevent the re-fetching of events from Azure Event Hubs. These mechanisms ensure efficient event processing by maintaining a record of the last processed event in each partition.
...
Local Persistence Checkpointing
Overview: By default, the collector employs local persistence checkpointing. This method is designed to keep track of the last event offset within each partition of
an Event Hub, ensuring events are processed once without duplication.How It Works: As the collector consumes messages from an Event Hub, it records the offset of the last processed event locally. On subsequent pulls from the Event
Hub, the collector resumes processing from the next event after the last recorded offset, effectively skipping previously processed events.Use Case: Ideal for single-instance deployments where all partitions of an Event Hub are managed by a single collector instance.
...
Azure Blob Storage Checkpointing
Overview: As an alternative to local persistence, the collector can be configured to use Azure Blob Storage for checkpointing. This approach leverages Azure's cloud storage to maintain event processing state.
Configuration:
Option 1: Specify both an Azure Blob Storage account and container name. This method requires the collector to have appropriate access permissions to the specified Blob Storage account.
Option 2: Provide an Azure Blob Storage connection string and container name. This method is straightforward and recommended if you have the connection
string readily available.
Benefits:
Multi-pod Support: Enables the collector to operate in a distributed environment, such as Kubernetes, where multiple instances (pods) of the collector can run concurrently. Checkpointing data stored in Azure Blob Storage ensures that each instance has access to the current state of event processing, facilitating efficient load balancing and event partition management.
Durability: Utilizes Azure Blob Storage's durability and availability features to safeguard checkpointing data against data loss or corruption.
Use Case: Recommended for environments requiring multi-pod deployment or when a user prefers to centralize checkpointing within their Azure infrastructure.
...
title | Event Hubs Auto Discover (event_hubs_auto_discover) |
---|
General principles
Refer to Event Hubs - General Principles for general principles.
Configuration options
Devo supports only one for this service. Connection strings are not supported.
Event Hubs Auto Discover authentication configuration
Event Hubs authentication can be via connection strings or client credentials (assigning the Azure Event Hubs Data Receiver
role).
Preference is given to connection string configuration when both are available.
...
Required parameters
...
Connection string configuration
...
event_hub_connection_string
event_hub_name
Yaml
Code Block |
---|
inputs:
azure_event_hub:
id: 100001
enabled: true
services:
event_hubs:
queues:
queue_a:
event_hub_name: event_hub_value
event_hub_connection_string: event_hub_connection_string_value |
Json
Code Block |
---|
"inputs": {
"azure_event_hub": {
"id": 100001,
"enabled": true,
"services": {
"event_hubs": {
"queues": {
"queue_a": {
"event_hub_name": "event_hub_value",
"event_hub_connection_string": "event_hub_connection_string_value"
}
}
}
}
}
} |
...
Client credentials configuration
...
event_hub_name
namespace
Credentials.client_id
Credentials.client_secret
Credentials.tenant_id
Yaml
Code Block |
---|
inputs:
azure_event_hub:
id: 100001
enabled: true
credentials:
client_id: client_id_value
client_secret: client_secret_value
tenant_id: tenant_id_value
services:
event_hubs:
queues:
queue_a:
namespace: namespace_value
event_hub_name: event_hub_name_value |
Json
Code Block |
---|
"inputs": {
"azure_event_hub": {
"id": 100001,
"enabled": true,
"credentials": {
"client_id": "client_id_value",
"client_secret": "client_secret_value",
"tenant_id": "tenant_id_value"
},
"services": {
"event_hubs": {
"queues": {
"queue_a": {
"namespace": "namespace_value",
"event_hub_name": "event_hub_name_value"
}
}
}
}
}
} |
Azure Blob storage checkpoint configuration
Optional and configurable via connection strings or client credentials.
If all possible parameters are present, the collector will favor the connection string configuration.
...
Required parameters
...
Connection string configuration
...
blob_storage_connection_string
blob_storage_container_name
Yaml
Code Block |
---|
inputs:
azure_event_hub:
id: 100001
enabled: true
services:
event_hubs:
queues:
queue_a:
event_hub_name: event_hub_value
event_hub_connection_string: event_hub_connection_string_value
blob_storage_connection_string: blob_storage_connection_string_value
blob_storage_container_name: blob_storage_container_name_value |
Json
Code Block |
---|
"inputs": {
"azure_event_hub": {
"id": 100001,
"enabled": true,
"services": {
"event_hubs": {
"queues": {
"queue_a": {
"event_hub_name": "event_hub_value",
"event_hub_connection_string": "event_hub_connection_string_value",
"blob_storage_connection_string": "blob_storage_connection_string_value",
"blob_storage_container_name": "blob_storage_container_name_value"
}
}
}
}
}
} |
...
Client credentials configuration
...
blob_storage_account_name
blob_storage_container_name
Credentials.client_id
Credentials.client_secret
Credentials.tenant_id
Yaml
Code Block |
---|
inputs:
azure_event_hub:
id: 100001
enabled: true
credentials:
client_id: client_id_value
client_secret: client_secret_value
tenant_id: tenant_id_value
services:
event_hubs:
queues:
queue_a:
event_hub_name: event_hub_value
event_hub_connection_string: event_hub_connection_string_value
blob_storage_account_name: blob_storage_account_name_value
blob_storage_container_name: blob_storage_container_name_value |
Json
Code Block |
---|
"inputs": {
"azure_event_hub": {
"id": 100001,
"enabled": true,
"credentials": {
"client_id": "client_id_value",
"client_secret": "client_secret_value",
"tenant_id": "tenant_id_value"
},
"services": {
"event_hubs": {
"queues": {
"queue_a": {
"event_hub_name": "event_hub_value",
"event_hub_connection_string": "event_hub_connection_string_value",
"blob_storage_account_name": "blob_storage_account_name_value",
"blob_storage_container_name": "blob_storage_container_name_value"
}
}
}
}
}
} |
Internal process and deduplication method
The collector uses the event_hubs_auto_discover
to dynamically query a given resource group and namespace for all available event hubs.
All deduplication methods and checkpointing methods listed in the event_hubs
service apply; however, there are some additional considerations one should make when configuring the event_hubs_auto_discover
service.
The event_hubs_auto_discover
service will effectively restart all event hub consumers after one hour (this time can be overridden via the override_consumer_client_ttl_seconds_value
parameter.) On restart, the collector will re-discover all available event hubs and begin pulling data again. Any event hubs that might have been created between the last run and the current run will be discovered and pulled from.
Due to the nature of this service, if a user has configure Azure Blob Storage checkpointing, the collector will attempt to create containers in the configured Azure Blob storage account. If the configured credentials do not have write access to the storage account, an error will be presented to the logs and indicate that the user must grant write access to the credentials.
...
Checkpointing
...
The collector supports two forms of checkpointing.
...
Local persistence checkpointing
...
By default, the collector will utilize local persistence checkpointing to ensure that events are not fetched multiple times from a given partition in a given event hub. The collector will store the last event offset as messages are consumed.
...
Azure Blob Storage checkpointing
...
Optionally, users can specify an Azure Blob Storage account or an Azure Blob Storage connection string to use Azure Blob Storage checkpointing. This allows the collector to run in multi-pod mode and all checkpointing data is stored within the Azure Storage account.
Unlike the event_hubs
service, the event_hubs_auto_discover
service will create containers for the discovered event hubs in the configured Azure Blob
Storage account. The containers are prefixed with devo-
(though this value can be overridden in the configuration) and a hash calculated from the resource group, namespace, event hub name, and consumer group. This hash is used to ensure that the container name is unique and does not conflict with other container names and is within the character limit for Azure container names.
...
title | Troubleshooting |
---|
Common logic
This collector has different security layers that detect both an invalid configuration and abnormal operation. This table will help you detect and resolve the most common errors.
...
Error type
...
Error ID
...
Error message
...
Cause
...
Solution
...
InitVariablesError
...
1
...
Invalid start_time_in_utc: {ini_start_str}
. Must be in parseable datetime format.
...
The configured start_time_in_utc
parameter is a non-parseable format.
...
Update the start_time_in_utc
value to have the recommended format as indicated in the guide.
...
InitVariablesError
...
2
...
Invalid start_time_in_utc: {ini_start_str}
. Must be in the past.
...
The configured start_time_in_utc
parameter is a future date.
...
Update the start_time_in_utc
value to a past datetime.
...
PullError
...
350
...
Could not match tag to record and no default tag provided: {record}
...
Advanced tagging configured but no default tag provided and record did not match any of tag pathways
...
Provide default tag in advanced tag mapping object
...
ApiError
...
401
...
An error occurred while trying to authenticate with the Azure API. Exception: {e}
...
The collector is unable to authenticate with the Azure API.
...
Check the credentials and ensure that the collector has the necessary permissions to access the Azure API.
...
ApiError
...
410
...
An error occurred while trying to check if container '{container_name}'
exists. Ensure that the blob storage account name or connection string is correct. Exception: {e}
...
The collector was unable to locate the specified blob storage container name.
...
Ensure the container exists and the credentials have READ access to the container
...
ApiError
...
411
...
An error occurred while trying to check if container '{container_name}'
exists. Ensure that the application has necessary permissions to access the containers. Exception: {e}
...
The collector was unable to access the specified blob storage container name.
...
Ensure the container exists and the credentials have READ access to the container
...
ApiError
...
412
...
An error occurred while trying to create container '{container_name}'
. Ensure that the application has necessary permissions to create containers. Exception: {e}
...
The collector was unable to create the container for the auto discover service and the user indicated to use Azure Blob Storage checkpointing.
...
Ensure the credentials have WRITE access to the container storage account.
...
ApiError
...
420
...
An error occurred while trying to get consumer group '{consumer_group_name}'
. Exception: {e}
...
The collector was unable to access the specified consumer group name.
...
Ensure the consumer group exists and the credentials have READ access to the consumer group
...
ApiError
...
421
...
An error occurred while trying to create consumer group '{consumer_group_name}'
. Ensure that the application has necessary permissions to create consumer groups. Exception: {e}
...
The collector was unable to create the consumer group for the auto discover service.
...
Ensure the credentials have WRITE access to the event hub namespace or use the $Default
consumer group.
Typical issues
CBS token error - This issue happens usually when the connection string includes the event hub namespace name instead of the event hub name. Both values are usually different and it is easy to mix up both. You can find a explanation here.
Delayed events - You can use the
@devo_event_enqueued_time
value in the tablecloud.azure
to check the time that the events are queued in Azure. The delayed events can be caused by Event Hub itself (high enqueued time), or by lack of processing capacity of collector. In this case, it is necessary to add more collector instances, or to create a collector for each partition.Duplicated events - Adjust the value of the config parameter
duplicated_messages_mechanism_value
according to your deployment. If you are running several instances, change the value tolocal
. See [Internal Process and Deduplication Method](Internal Process and Deduplication Method) for more details.
Metadata decorators useful for troubleshooting
The collector adds some metadata to the events that can be useful for issue diagnose. This metadata can be found in the cloud.azure
table:
devo_record_idx: identificator for the event, composed by the event secuence number and a ordinal number
devo_record_hash_id: hash value of the whole record. If two records have the same value, they are exactly equal.
devo_event_offset: offset value for the record in the EventHub queue
devo_event_enqueued_time: time in which the event was enqueued in the EventHub
devo_event_sequence_number: secuence number in the EventHub
devo_eh_partition_id: source partition for the event
devo_eh_consumer_group: source consumer group
devo_eh_fully_qualified_namespace: source namespace dor the event
devo_pulling_id: epoch timestamp corresponding the time the event was sent to Devo by the collector. It should be close to the eventdate of the event.
More details about the meaning of this metadata can be found in Microsoft EventHub webpage.
Delayed events
For instance, this metadata can be used to find the cause of delayed events. You can use the timestamp
, the devo_event_enqueued_time
, and the eventdate
values in the cloud.azure
table to check the creation time of the events, when the events are queued in Azure EventHub, and when the events are received in Devo.
Delayed events can be caused by EventHub itself. In this case, there is a big time difference between the enqueued time and the creation date of the event; the devo_event_enqueued_time - timestamp
value is large. This delay can be caused by license limits or the type of event source.
Otherwise, if there is a delay and that difference is small, a possible cause is a lack of processing capacity in the collector. In this case, the value of eventdate - devo_event_enqueued_time
is large. It may be necessary to add more collector instances or to create a collector for each partition. See Configuration Considerations about the multi-pod mode.
Collector operations
This section is intended to explain how to proceed with specific operations of this collector.
...
title | Verify collector operations |
---|
Initialization
The initialization module is in charge of setup and running the input (pulling logic) and output (delivering logic) services and validating the given configuration.
A successful run has the following output messages for the initializer module:
Code Block |
---|
2023-01-10T15:22:57.146 INFO MainProcess::MainThread -> Loading configuration using the following files: {"full_config": "config-test-local.yaml", "job_config_loc": null, "collector_config_loc": null}
2023-01-10T15:22:57.146 INFO MainProcess::MainThread -> Using the default location for "job_config_loc" file: "/etc/devo/job/job_config.json"
2023-01-10T15:22:57.147 INFO MainProcess::MainThread -> "\etc\devo\job" does not exists
2023-01-10T15:22:57.147 INFO MainProcess::MainThread -> Using the default location for "collector_config_loc" file: "/etc/devo/collector/collector_config.json"
2023-01-10T15:22:57.148 INFO MainProcess::MainThread -> "\etc\devo\collector" does not exists
2023-01-10T15:22:57.148 INFO MainProcess::MainThread -> Results of validation of config files parameters: {"config": "C:\git\collectors2\devo-collector-<name>\config\config.yaml", "config_validated": True, "job_config_loc": "/etc/devo/job/job_config.json", "job_config_loc_default": True, "job_config_loc_validated": False, "collector_config_loc": "/etc/devo/collector/collector_config.json", "collector_config_loc_default": True, "collector_config_loc_validated": False}
2023-01-10T15:22:57.171 WARNING MainProcess::MainThread -> [WARNING] Illegal global setting has been ignored -> multiprocessing: FalseEvents delivery and Devo ingestion |
The event delivery module is in charge of receiving the events from the internal queues where all events are injected by the pullers and delivering them using the selected compatible delivery method.
A successful run has the following output messages for the initializer module:
Code Block |
---|
023-01-10T15:23:00.788 INFO OutputProcess::MainThread -> DevoSender(standard_senders,devo_sender_0) -> Starting thread
2023-01-10T15:23:00.789 INFO OutputProcess::MainThread -> DevoSenderManagerMonitor(standard_senders,devo_1) -> Starting thread (every 300 seconds)
2023-01-10T15:23:00.790 INFO OutputProcess::MainThread -> DevoSenderManager(standard_senders,manager,devo_1) -> Starting thread
2023-01-10T15:23:00.842 INFO OutputProcess::MainThread -> global_status: {"output_process": {"process_id": 18804, "process_status": "running", "thread_counter": 21, "thread_names": ["MainThread", "pydevd.Writer", "pydevd.Reader", "pydevd.CommandThread", "pydevd.CheckAliveThread", "DevoSender(standard_senders,devo_sender_0)", "DevoSenderManagerMonitor(standard_senders,devo_1)", "DevoSenderManager(standard_senders,manager,devo_1)", "OutputStandardConsumer(standard_senders_consumer_0)", |
Sender services
The Integrations Factory Collector SDK has 3 different senders services depending on the event type to delivery (internal
, standard
, and lookup
). This collector uses the following Sender Services:
...
Sender services
...
Description
...
internal_senders
...
In charge of delivering internal metrics to Devo such as logging traces or metrics.
...
standard_senders
...
In charge of delivering pulled events to Devo.
Sender statistics
Each service displays its own performance statistics that allow checking how many events have been delivered to Devo by type:
...
Logging trace
...
Description
...
Number of available senders: 1
...
Displays the number of concurrent senders available for the given Sender Service.
...
sender manager internal queue size: 0
...
Displays the items available in the internal sender queue.
Info |
---|
This value helps detect bottlenecks and needs to increase the performance of data delivery to Devo. This last can be made by increasing the concurrent senders. |
...
Total number of messages sent: 44, messages sent since "2022-06-
28 10:39:22.511671+00:00": 21 (elapsed 0.007 seconds)
...
Displays the number of events from the last time and following the given example, the following conclusions can be obtained:
44 events were sent to Devo since the collector started.
The last checkpoint timestamp was
2022-06-28 10:39:22.511671+00:00
.21 events where sent to Devo between the last UTC checkpoint and now.
Those 0 events required
0.007 seconds
to be delivered.
Info |
---|
By default these traces will be shown every 10 minutes. |
Expand | ||
---|---|---|
| ||
To check the memory usage of this collector, look for the following log records in the collector which are displayed every 5 minutes by default, always after running the memory-free process.
|
Change log
Release
Released on
Release type
Recommendations
v2.4.0
Status | ||||
---|---|---|---|---|
|
Recommend
Expand | ||
---|---|---|
| ||
Improvements
|
v2.2.0
Status | ||||
---|---|---|---|---|
|
Status | ||||
---|---|---|---|---|
|
Upgrade
Expand | ||
---|---|---|
| ||
Feature
Improvements
|
v2.0.0
Status | ||||
---|---|---|---|---|
|
Update
Expand | ||
---|---|---|
| ||
Improvements
|
v1.9.0
Status | ||||
---|---|---|---|---|
|
Update
Expand | ||
---|---|---|
| ||
Improvements
|
v1.8.0
Status | ||||
---|---|---|---|---|
|
Status | ||||
---|---|---|---|---|
|
Update
Expand | ||
---|---|---|
| ||
Improvements
Bug fixing
|
v1.7.1
Status | ||||
---|---|---|---|---|
|
Update
Expand | ||
---|---|---|
| ||
Bug fixing
|
v1.7.0
Status | ||||
---|---|---|---|---|
|
Status | ||||
---|---|---|---|---|
|
Update
Expand | ||
---|---|---|
| ||
Improvements
Bug fixing
|
v1.6.0
Status | ||||
---|---|---|---|---|
|
Status | ||||
---|---|---|---|---|
|
Update
Expand | ||
---|---|---|
| ||
Improvements
Bug fixing
|
v1.5.0
Status | ||||
---|---|---|---|---|
|
Update
Expand | ||
---|---|---|
| ||
Bug fixing
|
v1.4.1
Status | ||||
---|---|---|---|---|
|
Update
Expand | ||
---|---|---|
| ||
Improvements
|
v1.4.0
Status | ||||
---|---|---|---|---|
|
Update
Expand | ||
---|---|---|
| ||
Improvements New events types are accepted for the service
|
v1.3.2
Status | ||||
---|---|---|---|---|
|
Update
title | Details |
---|
Bug fixing
A configuration bug has been fixed to enable the autocategorization of the following events
RiskyUsers