- Created by Laszlo Frazer , last modified on Feb 07, 2025
You are viewing an old version of this content. View the current version.
Compare with Current View Version History
« Previous Version 4 Next »
Purpose
The Microsoft Azure collector gets data from Azure cloud computing services. Common uses are:
Detect malicious Active Directory authentication
Detect malicious role, policy, and group changes
Correlate risky users identified by Entra ID with data you have in Devo
Detect malicious Application Gateway traffic
Detect failures and measure costs of virtual machines
Run It
The Azure Collector has two services:
VM Metrics, for Virtual Machines
Event Hub, for everything else in Azure
These services should be enabled in separate collector instances.
Devo collector features
Features | Details |
---|---|
Allow parallel downloading ( |
The vm_metrics service cannot work in multipod mode. If you want to use the event_hubs service in multipod mode, you must not include a vm_service in the same collector. |
Running environments |
|
Populated Devo events |
|
Flattening pre-processing |
|
Allowed source events obfuscation |
|
Data source description
Data source | Description | API endpoint | Collector service name | Devo table |
---|---|---|---|---|
VM Metrics | With the advantages of the Microsoft Azure API, one can obtain metrics about the deployed Virtual Machines, gathering them on our platform, making it easier to query and analyze in the Devo platform and Activeboards. | Azure Compute Management Client SDK and Azure Monitor Management Client SDK |
|
|
Event Hubs | Several Microsoft Azure services can generate some type of execution information to be sent to an EventHub service. (see next section) | Azure Event Hubs SDK |
|
Valid for all cloud.azure tables by setting the output option to stream to Event Hub. |
Event hubs: Auto-categorization of Microsoft Azure service messages
Many of the available Microsoft Azure services can generate some type of execution information to be sent to an EventHub service. This type of data can be categorized as events or metrics. The events, in turn, can be from different subtypes: audits, status, logs, etc.
All such data will be gathered by Devo’s Microsoft Azure collector and sent to our platform, where message auto-categorization functionality is enabled for sending the messages to relevant Devo tables in an automatic way.
Although EventHub is the service used for centralizing Azure services' data, it also generates information that can be sent to itself. Learn more in this article.
In case the amount of egress data exceeds Throughput per Unit limits set by Azure (2 MB/s or 4096 events per second), it won’t be possible for Devo to continue reliable ingestion of data. You can monitor ingress/egress throughput in Azure Portal EventHub Namespace, and based on trends/alerts, you can add another EventHub to resolve this. To avoid this from happening in the first place, please follow scalability guidance provided by Microsoft in their technical documentation.
Vendor setup
The Microsoft Azure collector centralizes the data with an Event Hub using the Azure SDK. To use it, you need to configure the resources in the Azure Portal and set the right permissions to access the information.
Event Hub events
If you want to use Azure Blob Storage for checkpointing purposes, you need to create a storage account to store the checkpoints. If you do not wish to use Azure Blob storage (i.e. you will use Devo local persistence), you can skip the Blob Storage configuration steps.
Connection string
From the left portal menu, select Storage accounts to display a list of your storage accounts. If the portal menu isn't visible, select the menu button to toggle it on.

On the Storage accounts page, select Create.

After the storage account is created, select it from the list of storage accounts, click on Access keys in the left menu, and copy the connection string.

Role assignment
Alternatively, users can grant the necessary permissions to the registered application to access the Event Hub without using the RootManageSharedAccessKey
. Roles can be assigned in a variety of ways (e.g. inherited from the subscription group), but the following steps will show how to assign the necessary roles directly to the Storage Account.
Repeat steps 1-2 from the Connection String section to create the Storage Account.
In the Storage Account, click Access control (IAM) in the left menu, click + Add, and click Add Access Role Assignment.
Search for either the Storage Blob Data Contributor or Storage Blob Data Owner role and select it and then click Next.
Click + Select members and search for the previously created App registration, select it, click Next.
Click Review + Assign.
Connection string
Users can either obtain a connection string or use Role Assignments to allow the collector to access the Event Hub.
In your Azure account, search for the Event Hubs service and click on it.

Create an Event Hub resource per region (repeat the steps below for each region):
Click Add.

Fill the mandatory fields keeping in mind that the Event Hub must be in the same region as the resources that you are going to monitor (and only need one per region). The Throughput Units option refers to the ingress/egress limit in MB/s (each unit is 1 MB/s or 1000 events/second ingress, 2 MB/s, or 4096 events/second egress). You should adjust it according to the data volume (this can be modified later).

The previous steps create an EventHub namespace; now go to Event Hubs, search the created one and click on it.

Now click on the + Event Hub button and create a new resource. You only need to fill the Name and Partition Count fields (the Partition Count field will divide the data into different partitions to make it easier to read large volumes of data). Write down the EventHub name to be used later in the configuration file.


Once the Event Hub is created in the namespace, click it and select Consumer Group in the left menu. Note that a dedicated Consumer Group for Devo needs to be created if the existing consumer groups are already in use.

Here you will see the Event Hub consumer groups. This will be used by the collector (or other applications) for reading data from the Event Hub. Write down the Consumer group name that you will use later in the configuration file.

Now, in the Event Hub Namespace, click on Shared access policies, search the default policy named RootManageSharedAccessKey and click it.

Copy and write down the primary (or secondary) connection string to be used later in the configuration file.
Role assignment
Alternatively, users can grant the necessary permissions to the registered application to access the Event Hub without using the RootManageSharedAccessKey
. Roles can be assigned in a variety of ways (e.g. inherited from the subscription group), but the following steps will show how to assign the necessary roles directly to the Event Hub Namespace.
Repeat all steps except the last one from the previous section to create the Event Hub.
In the Event Hub Namespace, click Access control (IAM) in the left menu, click + Add, and click Add Access Role Assignment.

Search for either the Azure Event Hubs Data Receiver or Azure Event Hubs Data Owner role and select it and then click Next.

Click + Select members and search for the previously created App registration, select it, click Next.

Click Review + Assign.

Now, search the Monitor service and click on it.

Click the Diagnostic Settings option in the left area.
A list of the deployed resources will be shown. Search for the resources that you want to monitor, select them, and click Add diagnostic setting.

Type a name for the rule and check the required category details (logs will be sent to the cloud.azure.eh.events table, and metrics will be sent to the cloud.azure.eh.metrics table).

Check Stream to an Event Hub, and select the corresponding Event hub namespace, Event hub name, and Event hub policy name.

Click Save to finish the process.

Event Hub Auto-Discover
About the feature
To configure access to event hubs for the auto-discovery feature, you need to grant the necessary permissions to the registered application to access the Event Hub without using the RootManageSharedAccessKey
. Furthermore, the auto-discovery feature will enumerate a namespace and resource group for all available event hubs and optionally create consumer groups (if the configuration specifies a consumer group other than $Default
and that consumer group does not exist when he collector connects to the event hub) and optionally create Azure Blob Storage containers for checkpointing purposes (if the user specifies a storage account and container in the configuration file).
Role assignment (Namespace)
Repeat the steps from the Event Hubs Role Assignment section, except that the necessary role is the Azure Event Hubs Namespace Data Owner role. This allows the collector to enumerate the event hubs in the namespace and create consumer groups if necessary.
Minimum configuration for basic pulling
Although this collector supports advanced configuration, the fields required to retrieve data with basic configuration are defined below.
This minimum configuration refers exclusively to those specific parameters of this integration. There are more required parameters related to the generic behavior of the collector. Check setting sections for details.
Setting | Details |
---|---|
| The Azure application tenant ID. |
| The Azure application client ID. |
| The Azure application client secret. |
| The Azure application subscription ID. |
For Azure Event Hub, it is enough with the event hub name and the connection string (and optionally consumer group). No credentials are required.
Accepted authentication methods
Authentication method | Tenant ID | Client ID | Client secret | Subscription ID |
---|---|---|---|---|
OAuth2 | REQUIRED | REQUIRED | REQUIRED | REQUIRED |
Run the collector
Once the data source is configured, you can either send us the required information if you want us to host and manage the collector for you (Cloud collector), or deploy and host the collector in your own machine using a Docker image (On-premise collector).
Collector services detail
This section is intended to explain how to proceed with specific actions for services.
Internal process and deduplication method
All VM metrics data are pulled with a time grain value of PT1M
(1 minute). The collector polls for all available VM resource IDs and then pulls the metrics for each resource ID. Checkpoints are persisted to ensure that duplicate data is not sent to Devo.
Devo categorization and destination
All events of this service are ingested into the table cloud.azure.vm.metrics_simple
Restart the persistence
This collector uses persistent storage to download events in an orderly fashion and avoid duplicates. In case you want to re-ingest historical data or recreate the persistence, you can restart the persistence of this collector by following these steps:
Edit the configuration file.
Change the value of the
start_time_in_utc parameter
to a different one.Save the changes.
Restart the collector.
The collector will detect this change and will restart the persistence using the parameters of the configuration file or the default configuration in case it has not been provided.
General principles
Understanding the following principles of Azure Event Hubs is crucial:
Consumer Groups: A single event hub can have multiple consumer groups, each representing a separate view of the event stream.
Checkpointing: The SDK supports checkpoint mechanisms to balance the load among consumers for the same event hub and consumer group. Supported
mechanisms include:Azure Blob Storage Checkpoint: Recommended to use one container per consumer group per event hub.
Partition Restrictions: Azure Event Hubs limits the number of partitions based on the event hub tier. For quotas and limits, refer to the official documentation.
Configuration options
Devo supports various configurations to cater to different Azure setups.
Event Hubs Tagging Configuration
Event Hubs supports multiple tagging parameters and formats to categorize and manage event data efficiently. Below are the configuration options for overriding, auto-categorizing, and extending tags.
The default configuration of the tag mapping can be found in this article.
Override tag
Advanced setting. Please consult to Devo support before use advanced tag map.
To customize the default tag behavior, users can configure the override_tag
parameter within the Event Hub queue configuration. This parameter allows either a simple tag string or a more advanced tag mapping structure to be applied to all records.
The advanced tag map structure follows this format:
default_tag
: A fallback tag applied to all records not matched by anytag_map
entry.tag_map
: A list of tag entries, each containing a tag value and a JMESPath expression to match specific records.jmespath_refs
: Reference variables that can be used within JMESPath expressions in thetag_map
. These act as reusable values within the tag map's matching logic.
override_tag: default_tag: "tag_value" tag_map: - tag: "tag_value" jmespath: "[?condition]" - tag: "tag_value" jmespath: "[?condition]" ... jmespath_refs: jmespath_ref_1: "{jmespath_expression_1}" jmespath_ref_2: "{jmespath_expression_2}" ...
"override_tag": { "default_tag": "tag_value", "tag_map": [ { "tag": "tag_value", "jmespath": "[?condition]" }, { "tag": "tag_value", "jmespath": "[?condition]" } ....... ], "jmespath_refs": { "jmespath_ref_1": "{jmespath_expression_1}", "jmespath_ref_2": "{jmespath_expression_2}" } ........ }
Auto-Category Tagging
From version 2.4 onwards, Auto Category is always enabled
Auto-category automatically appends pre-defined tags to the default tag (or the override_tag
, if specified), enabling Azure events to be mapped dynamically to the appropriate Devo tag.
The system attempt to extract both the resource ID and the event category from the Azure event. If an event does not match any preconfigured tag mappings, it will be categorized under the following format: cloud.azure.{resource_id}.{category}.{queue_name}
.
Auto-category tags are evaluated before the default or override tags.
Extend tag
Users can further customize tags by using the extend_tag
parameter in the Event Hub queue configuration. This feature allows for the extension or updating of various tag properties. If override_tag
is being used, the extend_tag
will modify it; otherwise, it will extend the default tag.
The extend_tag
parameter offers the following options:
default_tag
: Replaces the existing default tag.jmespath_refs
: Adds or updates JMESPath substitution values.tag_map
: Adds or updates entries in the existing tag map. If anextend_tag
entry matches an existing tag or JMESPath expression, that entry is replaced; otherwise, the new entry is appended.
Here is an example of extend_tag
configuration:
Please note that the actual internal tag structure is not displayed in this guide as it is subject to change.
extend_tag: default_tag: "new_tag" tag_map: - tag: "my.app.sql" jmespath: "[?category=='sql']" - tag: "my.app.eh.storage" jmespath: "[?category=='storage']" ... jmespath_refs: jmespath_ref_1: "{jmespath_expression_1}" jmespath_ref_2: "{jmespath_expression_2}" ...
"extend_tag": { "default_tag": "new_tag", "tag_map": [ { "tag": "my.app.sql", "jmespath": "[?category=='sql']" }, { "tag": "my.app.eh.storage", "jmespath": "[?category=='storage']" } ........ ], "jmespath_refs": { "jmespath_ref_1": "{jmespath_expression_1}", "jmespath_ref_2": "{jmespath_expression_2}" ........ } }
If the original, internal tag structure looks like this:
tag: default_tag: "my.app.eh" tag_map: - tag: "my.app.eh.authentication" jmespath: "[?category=='auth']" - tag: "my.app.eh.sql" jmespath: "[?category=='sql']"
"tag": { "default_tag": "my.app.eh", "tag_map": [ { "tag": "my.app.eh.authentication", "jmespath": "[?category=='auth']" }, { "tag": "my.app.eh.sql", "jmespath": "[?category=='sql']" } ] }
And the extend_tag
configuration is applied, the resultant tag will be:
tag: default_tag: "new_tag" tag_map: - tag: "my.app.eh.sql" jmespath: "[?category=='sql']" - tag: "my.app.eh.storage" jmespath: "[?category=='storage']" - tag: "my.app.eh.authentication" jmespath: "[?category=='auth']" jmespath_refs: jmespath_ref_1: "{jmespath_expression_1}" jmespath_ref_2: "{jmespath_expression_2}"
"tag": { "default_tag": "new_tag", "tag_map": [ { "tag": "my.app.eh.sql", "jmespath": "[?category=='sql']" }, { "tag": "my.app.eh.storage", "jmespath": "[?category=='storage']" }, { "tag": "my.app.eh.authentication", "jmespath": "[?category=='auth']" } ], "jmespath_refs": { "jmespath_ref_1": "{jmespath_expression_1}", "jmespath_ref_2": "{jmespath_expression_2}" } }
Event Hubs authentication configuration
Event Hubs authentication can be via connection strings or client credentials (assigning the Azure Event Hubs Data Receiver
role). Preference is given to connection string configuration when both are available.
| Required parameters |
---|---|
Connection string configuration |
Yaml inputs: azure_event_hub: id: 100001 enabled: true services: event_hubs: queues: queue_a: event_hub_name: event_hub_value event_hub_connection_string: event_hub_connection_string_value Json "inputs": { "azure_event_hub": { "id": 100001, "enabled": true, "services": { "event_hubs": { "queues": { "queue_a": { "event_hub_name": "event_hub_value", "event_hub_connection_string": "event_hub_connection_string_value" } } } } } } |
Client credentials configuration |
Yaml inputs: azure_event_hub: id: 100001 enabled: true credentials: client_id: client_id_value client_secret: client_secret_value tenant_id: tenant_id_value services: event_hubs: queues: queue_a: namespace: namespace_value event_hub_name: event_hub_name_value "inputs": { "azure_event_hub": { "id": 100001, "enabled": true, "credentials": { "client_id": "client_id_value", "client_secret": "client_secret_value", "tenant_id": "tenant_id_value" }, "services": { "event_hubs": { "queues": { "queue_a": { "namespace": "namespace_value", "event_hub_name": "event_hub_name_value" } } } } } } |
Azure Blob storage checkpoint configuration
Optional and configurable via connection strings or client credentials. If all possible parameters are present, the collector will favor the connection string configuration.
| Required parameters |
---|---|
Connection string configuration |
Yaml inputs: azure_event_hub: id: 100001 enabled: true services: event_hubs: queues: queue_a: event_hub_name: event_hub_value event_hub_connection_string: event_hub_connection_string_value blob_storage_connection_string: blob_storage_connection_string_value blob_storage_container_name: blob_storage_container_name_value Json "inputs": { "azure_event_hub": { "id": 100001, "enabled": true, "services": { "event_hubs": { "queues": { "queue_a": { "event_hub_name": "event_hub_value", "event_hub_connection_string": "event_hub_connection_string_value", "blob_storage_connection_string": "blob_storage_connection_string_value", "blob_storage_container_name": "blob_storage_container_name_value" } } } } } } |
Client credentials configuration |
Yaml inputs: azure_event_hub: id: 100001 enabled: true credentials: client_id: client_id_value client_secret: client_secret_value tenant_id: tenant_id_value services: event_hubs: queues: queue_a: event_hub_name: event_hub_value event_hub_connection_string: event_hub_connection_string_value blob_storage_account_name: blob_storage_account_name_value blob_storage_container_name: blob_storage_container_name_value Json "inputs": { "azure_event_hub": { "id": 100001, "enabled": true, "credentials": { "client_id": "client_id_value", "client_secret": "client_secret_value", "tenant_id": "tenant_id_value" }, "services": { "event_hubs": { "queues": { "queue_a": { "event_hub_name": "event_hub_value", "event_hub_connection_string": "event_hub_connection_string_value", "blob_storage_account_name": "blob_storage_account_name_value", "blob_storage_container_name": "blob_storage_container_name_value" } } } } } } |
Workflow overview
Queue Iteration: Iterate over configured
queues
.Event Hub Details: Retrieve details, including partition count.
Client Creation: For each queue, create Event Hub consumer clients.
If the user configured a
client_thread_limit
, clients will be created for each event hub partition up to the specified limit. In these cases, the consumer clients will not be explicitly assigned partitions and load balancing and partition assignment will be performed dynamically by the event hub SDK utilizing the checkpoints.If the user did not configure a
client_thread_limit
, the collector will create a consumer client for each partition and explicitly assign the respective partition ID to the consumer client.
Event Fetching: Enable consumers to start fetching events.
Load balancing and event processing occurs throughout the fetching loop.
Event Processing: Events are fetched in batches. Records are extracted from event batches, deduplicated, tagged, and sent to Devo.
Checkpointing: After processing an event batch, checkpoints are updated so that the events will not be fetched again.
Configuration considerations
Multi-pod mode | While multi-pod mode is supported and represents the highest throughput possible for the collector, it requires the user to configure the collector in a specific manner to ensure that the collector operates efficiently and does not send duplicate events to Devo (see below). In most cases, multi-pod mode is unnecessary.
|
---|---|
Standard mode |
|
Internal process and deduplication method
The collector uses the event_hubs
service to pull events from the Azure Event Hubs. Each queue in the event_hubs
service represents an event hub that is polled for events.
Collector deduplication mechanisms | Events are deduplicated using the
If the | |
---|---|---|
Checkpointing mechanisms | The collector offers two distinct methods for checkpointing, each designed to prevent the re-fetching of events from Azure Event Hubs. These mechanisms ensure efficient event processing by maintaining a record of the last processed event in each partition. | Local Persistence Checkpointing
|
Azure Blob Storage Checkpointing
|
General principles
Refer to Event Hubs - General Principles for general principles.
Configuration options
Devo supports only one for this service. Connection strings are not supported.
Event Hubs Auto Discover authentication configuration
Event Hubs authentication can be via connection strings or client credentials (assigning the Azure Event Hubs Data Receiver
role).
Preference is given to connection string configuration when both are available.
| Required parameters |
---|---|
Connection string configuration |
Yaml inputs: azure_event_hub: id: 100001 enabled: true services: event_hubs: queues: queue_a: event_hub_name: event_hub_value event_hub_connection_string: event_hub_connection_string_value Json "inputs": { "azure_event_hub": { "id": 100001, "enabled": true, "services": { "event_hubs": { "queues": { "queue_a": { "event_hub_name": "event_hub_value", "event_hub_connection_string": "event_hub_connection_string_value" } } } } } } |
Client credentials configuration |
Yaml inputs: azure_event_hub: id: 100001 enabled: true credentials: client_id: client_id_value client_secret: client_secret_value tenant_id: tenant_id_value services: event_hubs: queues: queue_a: namespace: namespace_value event_hub_name: event_hub_name_value Json "inputs": { "azure_event_hub": { "id": 100001, "enabled": true, "credentials": { "client_id": "client_id_value", "client_secret": "client_secret_value", "tenant_id": "tenant_id_value" }, "services": { "event_hubs": { "queues": { "queue_a": { "namespace": "namespace_value", "event_hub_name": "event_hub_name_value" } } } } } } |
Azure Blob storage checkpoint configuration
Optional and configurable via connection strings or client credentials.
If all possible parameters are present, the collector will favor the connection string configuration.
| Required parameters |
---|---|
Connection string configuration |
Yaml inputs: azure_event_hub: id: 100001 enabled: true services: event_hubs: queues: queue_a: event_hub_name: event_hub_value event_hub_connection_string: event_hub_connection_string_value blob_storage_connection_string: blob_storage_connection_string_value blob_storage_container_name: blob_storage_container_name_value Json "inputs": { "azure_event_hub": { "id": 100001, "enabled": true, "services": { "event_hubs": { "queues": { "queue_a": { "event_hub_name": "event_hub_value", "event_hub_connection_string": "event_hub_connection_string_value", "blob_storage_connection_string": "blob_storage_connection_string_value", "blob_storage_container_name": "blob_storage_container_name_value" } } } } } } |
Client credentials configuration |
Yaml inputs: azure_event_hub: id: 100001 enabled: true credentials: client_id: client_id_value client_secret: client_secret_value tenant_id: tenant_id_value services: event_hubs: queues: queue_a: event_hub_name: event_hub_value event_hub_connection_string: event_hub_connection_string_value blob_storage_account_name: blob_storage_account_name_value blob_storage_container_name: blob_storage_container_name_value Json "inputs": { "azure_event_hub": { "id": 100001, "enabled": true, "credentials": { "client_id": "client_id_value", "client_secret": "client_secret_value", "tenant_id": "tenant_id_value" }, "services": { "event_hubs": { "queues": { "queue_a": { "event_hub_name": "event_hub_value", "event_hub_connection_string": "event_hub_connection_string_value", "blob_storage_account_name": "blob_storage_account_name_value", "blob_storage_container_name": "blob_storage_container_name_value" } } } } } } |
Internal process and deduplication method
The collector uses the event_hubs_auto_discover
to dynamically query a given resource group and namespace for all available event hubs.
All deduplication methods and checkpointing methods listed in the event_hubs
service apply; however, there are some additional considerations one should make when configuring the event_hubs_auto_discover
service.
The event_hubs_auto_discover
service will effectively restart all event hub consumers after one hour (this time can be overridden via the override_consumer_client_ttl_seconds_value
parameter.) On restart, the collector will re-discover all available event hubs and begin pulling data again. Any event hubs that might have been created between the last run and the current run will be discovered and pulled from.
Due to the nature of this service, if a user has configure Azure Blob Storage checkpointing, the collector will attempt to create containers in the configured Azure Blob storage account. If the configured credentials do not have write access to the storage account, an error will be presented to the logs and indicate that the user must grant write access to the credentials.
Checkpointing | The collector supports two forms of checkpointing. |
---|---|
Local persistence checkpointing | By default, the collector will utilize local persistence checkpointing to ensure that events are not fetched multiple times from a given partition in a given event hub. The collector will store the last event offset as messages are consumed. |
Azure Blob Storage checkpointing | Optionally, users can specify an Azure Blob Storage account or an Azure Blob Storage connection string to use Azure Blob Storage checkpointing. This allows the collector to run in multi-pod mode and all checkpointing data is stored within the Azure Storage account. Unlike the |
Common logic
This collector has different security layers that detect both an invalid configuration and abnormal operation. This table will help you detect and resolve the most common errors.
Error type | Error ID | Error message | Cause | Solution |
---|---|---|---|---|
| 1 | Invalid | The configured | Update the |
| 2 | Invalid | The configured | Update the |
| 350 | Could not match tag to record and no default tag provided: | Advanced tagging configured but no default tag provided and record did not match any of tag pathways | Provide default tag in advanced tag mapping object |
| 401 | An error occurred while trying to authenticate with the Azure API. Exception: | The collector is unable to authenticate with the Azure API. | Check the credentials and ensure that the collector has the necessary permissions to access the Azure API. |
| 410 | An error occurred while trying to check if container | The collector was unable to locate the specified blob storage container name. | Ensure the container exists and the credentials have READ access to the container |
| 411 | An error occurred while trying to check if container | The collector was unable to access the specified blob storage container name. | Ensure the container exists and the credentials have READ access to the container |
| 412 | An error occurred while trying to create container | The collector was unable to create the container for the auto discover service and the user indicated to use Azure Blob Storage checkpointing. | Ensure the credentials have WRITE access to the container storage account. |
| 420 | An error occurred while trying to get consumer group | The collector was unable to access the specified consumer group name. | Ensure the consumer group exists and the credentials have READ access to the consumer group |
| 421 | An error occurred while trying to create consumer group | The collector was unable to create the consumer group for the auto discover service. | Ensure the credentials have WRITE access to the event hub namespace or use the |
Typical issues
CBS token error - This issue happens usually when the connection string includes the event hub namespace name instead of the event hub name. Both values are usually different and it is easy to mix up both. You can find a explanation here.
Delayed events - You can use the
@devo_event_enqueued_time
value in the tablecloud.azure
to check the time that the events are queued in Azure. The delayed events can be caused by Event Hub itself (high enqueued time), or by lack of processing capacity of collector. In this case, it is necessary to add more collector instances, or to create a collector for each partition.Duplicated events - Adjust the value of the config parameter
duplicated_messages_mechanism_value
according to your deployment. If you are running several instances, change the value tolocal
. See [Internal Process and Deduplication Method](Internal Process and Deduplication Method) for more details.
Metadata decorators useful for troubleshooting
The collector adds some metadata to the events that can be useful for issue diagnose. This metadata can be found in the cloud.azure
table:
devo_record_idx: identificator for the event, composed by the event secuence number and a ordinal number
devo_record_hash_id: hash value of the whole record. If two records have the same value, they are exactly equal.
devo_event_offset: offset value for the record in the EventHub queue
devo_event_enqueued_time: time in which the event was enqueued in the EventHub
devo_event_sequence_number: secuence number in the EventHub
devo_eh_partition_id: source partition for the event
devo_eh_consumer_group: source consumer group
devo_eh_fully_qualified_namespace: source namespace dor the event
devo_pulling_id: epoch timestamp corresponding the time the event was sent to Devo by the collector. It should be close to the eventdate of the event.
More details about the meaning of this metadata can be found in Microsoft EventHub webpage.
Delayed events
For instance, this metadata can be used to find the cause of delayed events. You can use the timestamp
, the devo_event_enqueued_time
, and the eventdate
values in the cloud.azure
table to check the creation time of the events, when the events are queued in Azure EventHub, and when the events are received in Devo.
Delayed events can be caused by EventHub itself. In this case, there is a big time difference between the enqueued time and the creation date of the event; the devo_event_enqueued_time - timestamp
value is large. This delay can be caused by license limits or the type of event source.
Otherwise, if there is a delay and that difference is small, a possible cause is a lack of processing capacity in the collector. In this case, the value of eventdate - devo_event_enqueued_time
is large. It may be necessary to add more collector instances or to create a collector for each partition. See Configuration Considerations about the multi-pod mode.
Collector operations
This section is intended to explain how to proceed with specific operations of this collector.
Initialization
The initialization module is in charge of setup and running the input (pulling logic) and output (delivering logic) services and validating the given configuration.
A successful run has the following output messages for the initializer module:
2023-01-10T15:22:57.146 INFO MainProcess::MainThread -> Loading configuration using the following files: {"full_config": "config-test-local.yaml", "job_config_loc": null, "collector_config_loc": null} 2023-01-10T15:22:57.146 INFO MainProcess::MainThread -> Using the default location for "job_config_loc" file: "/etc/devo/job/job_config.json" 2023-01-10T15:22:57.147 INFO MainProcess::MainThread -> "\etc\devo\job" does not exists 2023-01-10T15:22:57.147 INFO MainProcess::MainThread -> Using the default location for "collector_config_loc" file: "/etc/devo/collector/collector_config.json" 2023-01-10T15:22:57.148 INFO MainProcess::MainThread -> "\etc\devo\collector" does not exists 2023-01-10T15:22:57.148 INFO MainProcess::MainThread -> Results of validation of config files parameters: {"config": "C:\git\collectors2\devo-collector-<name>\config\config.yaml", "config_validated": True, "job_config_loc": "/etc/devo/job/job_config.json", "job_config_loc_default": True, "job_config_loc_validated": False, "collector_config_loc": "/etc/devo/collector/collector_config.json", "collector_config_loc_default": True, "collector_config_loc_validated": False} 2023-01-10T15:22:57.171 WARNING MainProcess::MainThread -> [WARNING] Illegal global setting has been ignored -> multiprocessing: FalseEvents delivery and Devo ingestion
The event delivery module is in charge of receiving the events from the internal queues where all events are injected by the pullers and delivering them using the selected compatible delivery method.
A successful run has the following output messages for the initializer module:
023-01-10T15:23:00.788 INFO OutputProcess::MainThread -> DevoSender(standard_senders,devo_sender_0) -> Starting thread 2023-01-10T15:23:00.789 INFO OutputProcess::MainThread -> DevoSenderManagerMonitor(standard_senders,devo_1) -> Starting thread (every 300 seconds) 2023-01-10T15:23:00.790 INFO OutputProcess::MainThread -> DevoSenderManager(standard_senders,manager,devo_1) -> Starting thread 2023-01-10T15:23:00.842 INFO OutputProcess::MainThread -> global_status: {"output_process": {"process_id": 18804, "process_status": "running", "thread_counter": 21, "thread_names": ["MainThread", "pydevd.Writer", "pydevd.Reader", "pydevd.CommandThread", "pydevd.CheckAliveThread", "DevoSender(standard_senders,devo_sender_0)", "DevoSenderManagerMonitor(standard_senders,devo_1)", "DevoSenderManager(standard_senders,manager,devo_1)", "OutputStandardConsumer(standard_senders_consumer_0)",
Sender services
The Integrations Factory Collector SDK has 3 different senders services depending on the event type to delivery (internal
, standard
, and lookup
). This collector uses the following Sender Services:
Sender services | Description |
---|---|
| In charge of delivering internal metrics to Devo such as logging traces or metrics. |
| In charge of delivering pulled events to Devo. |
Sender statistics
Each service displays its own performance statistics that allow checking how many events have been delivered to Devo by type:
Logging trace | Description |
---|---|
| Displays the number of concurrent senders available for the given Sender Service. |
| Displays the items available in the internal sender queue. This value helps detect bottlenecks and needs to increase the performance of data delivery to Devo. This last can be made by increasing the concurrent senders. |
| Displays the number of events from the last time and following the given example, the following conclusions can be obtained:
By default these traces will be shown every 10 minutes. |
To check the memory usage of this collector, look for the following log records in the collector which are displayed every 5 minutes by default, always after running the memory-free process.
The used memory is displayed by running processes and the sum of both values will give the total used memory for the collector.
The global pressure of the available memory is displayed in the
global
value.All metrics (Global, RSS, VMS) include the value before freeing and after
previous -> after freeing memory
INFO InputProcess::MainThread -> [GC] global: 20.4% -> 20.4%, process: RSS(34.50MiB -> 34.08MiB), VMS(410.52MiB -> 410.02MiB) INFO OutputProcess::MainThread -> [GC] global: 20.4% -> 20.4%, process: RSS(28.41MiB -> 28.41MiB), VMS(705.28MiB -> 705.28MiB)
Change log
Release | Released on | Release type | Recommendations |
---|---|---|---|
|
| IMPROVEMENTS |
|
Details Improvements
| |||
| NEW FEATURE |
| |
Details Feature
Improvements
| |||
|
| IMPROVEMENTS |
|
Details Improvements
| |||
| IMPROVEMENTS |
| |
Details Improvements
| |||
| IMPROVEMENTS |
| |
Details Improvements
Bug fixing
| |||
| BUG FIXING |
| |
Details Bug fixing
| |||
| IMPROVEMENTS |
| |
Details Improvements
Bug fixing
| |||
| IMPROVEMENTS |
| |
Details Improvements
Bug fixing
| |||
| BUG FIXING |
| |
Details Bug fixing
| |||
| IMPROVEMENTS |
| |
Details Improvements
| |||
| IMPROVEMENTS |
| |
Details Improvements New events types are accepted for the service
| |||
| BUG FIXING |
| |
Details Bug fixing A configuration bug has been fixed to enable the autocategorization of the following events
|
- No labels