Purpose
The Microsoft Azure collector gets data from Azure cloud computing services. Common uses are:
Detect malicious Entra ID authentication
Detect malicious role, policy, and group changes impacting cloud infrastructure
Correlate risky users identified by Entra ID with data you have in Devo
Detect malicious Application Gateway traffic
Detect failures and measure costs of virtual machines
Run It
The Azure Collector has two services:
VM Metrics, for Virtual Machines
Event Hub, for everything else in Azure
These services should be enabled in separate collector instances.
Devo collector features
Features | Details |
---|
Allow parallel downloading (multipod ) | The vm_metrics service cannot work in multipod mode. If you want to use the event_hubs service in multipod mode, you must not include a vm_service in the same collector. |
Running environments | collector server
on-premise
|
Populated Devo events | |
Flattening pre-processing | |
Allowed source events obfuscation | |
Data source description
Data source | Description | API endpoint | Collector service name | Devo table |
---|
VM Metrics | With the advantages of the Microsoft Azure API, one can obtain metrics about the deployed Virtual Machines, gathering them on our platform, making it easier to query and analyze in the Devo platform and Activeboards. | Azure Compute Management Client SDK and Azure Monitor Management Client SDK | vm_metrics
| cloud.azure.vm.metrics_simple
|
Event Hubs | Several Microsoft Azure services can generate some type of execution information to be sent to an EventHub service. (see next section) | Azure Event Hubs SDK | event_hubs and event_hubs_autodiscover
| <auto_tag_description>
|
Event hubs: Auto-categorization of Microsoft Azure service messages
Many of the available Microsoft Azure services can generate some type of execution information to be sent to an EventHub service. This type of data can be categorized as events or metrics. The events, in turn, can be from different subtypes: audits, status, logs, etc.
All such data will be gathered by Devo’s Microsoft Azure collector and sent to our platform, where message auto-categorization functionality is enabled for sending the messages to relevant Devo tables in an automatic way.
Although EventHub is the service used for centralizing Azure services' data, it also generates information that can be sent to itself. Learn more in this article.
Vendor setup
The Microsoft Azure collector centralizes the data with an Event Hub using the Azure SDK. To use it, you need to configure the resources in the Azure Portal and set the right permissions to access the information.
Event Hub events
Event Hub Auto-Discover
Configuring auto-discovery
About the feature
To configure access to event hubs for the auto-discovery feature, you need to grant the necessary permissions to the registered application to access the Event Hub without using the RootManageSharedAccessKey
. Furthermore, the auto-discovery feature will enumerate a namespace and resource group for all available event hubs and optionally create consumer groups (if the configuration specifies a consumer group other than $Default
and that consumer group does not exist when he collector connects to the event hub) and optionally create Azure Blob Storage containers for checkpointing purposes (if the user specifies a storage account and container in the configuration file).
Role assignment (Namespace)
Repeat the steps from the Event Hubs Role Assignment section, except that the necessary role is the Azure Event Hubs Namespace Data Owner role. This allows the collector to enumerate the event hubs in the namespace and create consumer groups if necessary.
Minimum configuration for basic pulling
Although this collector supports advanced configuration, the fields required to retrieve data with basic configuration are defined below.
Setting | Details |
---|
tenant_id
| The Azure application tenant ID. |
client_id
| The Azure application client ID. |
client_secret
| The Azure application client secret. |
subscription_id
| The Azure application subscription ID. |
Accepted authentication methods
Authentication method | Tenant ID | Client ID | Client secret | Subscription ID |
---|
OAuth2 | REQUIRED | REQUIRED | REQUIRED | REQUIRED |
Run the collector
Once the data source is configured, you can either send us the required information if you want us to host and manage the collector for you (Cloud collector), or deploy and host the collector in your own machine using a Docker image (On-premise collector).
Collector services detail
This section is intended to explain how to proceed with specific actions for services.
Event Hubs (event_hubs)
General principles
Understanding the following principles of Azure Event Hubs is crucial:
Consumer Groups: A single event hub can have multiple consumer groups, each representing a separate view of the event stream.
Checkpointing: The SDK supports checkpoint mechanisms to balance the load among consumers for the same event hub and consumer group. Supported
mechanisms include:
Partition Restrictions: Azure Event Hubs limits the number of partitions based on the event hub tier. For quotas and limits, refer to the official documentation.
Configuration options
Devo supports various configurations to cater to different Azure setups.
Event Hubs authentication configuration
Event Hubs authentication can be via connection strings or client credentials (assigning the Azure Event Hubs Data Receiver
role). Preference is given to connection string configuration when both are available.
| Required parameters |
---|
Connection string configuration | Yaml
inputs:
azure_event_hub:
id: 100001
enabled: true
services:
event_hubs:
queues:
queue_a:
event_hub_name: event_hub_value
event_hub_connection_string: event_hub_connection_string_value
Json
"inputs": {
"azure_event_hub": {
"id": 100001,
"enabled": true,
"services": {
"event_hubs": {
"queues": {
"queue_a": {
"event_hub_name": "event_hub_value",
"event_hub_connection_string": "event_hub_connection_string_value"
}
}
}
}
}
}
|
---|
Client credentials configuration | Yaml
inputs:
azure_event_hub:
id: 100001
enabled: true
credentials:
client_id: client_id_value
client_secret: client_secret_value
tenant_id: tenant_id_value
services:
event_hubs:
queues:
queue_a:
namespace: namespace_value
event_hub_name: event_hub_name_value
"inputs": {
"azure_event_hub": {
"id": 100001,
"enabled": true,
"credentials": {
"client_id": "client_id_value",
"client_secret": "client_secret_value",
"tenant_id": "tenant_id_value"
},
"services": {
"event_hubs": {
"queues": {
"queue_a": {
"namespace": "namespace_value",
"event_hub_name": "event_hub_name_value"
}
}
}
}
}
}
|
---|
Configuration considerations
Multi-pod mode | While multi-pod mode is supported and represents the highest throughput possible for the collector, it requires the user to configure the collector in a specific manner to ensure that the collector operates efficiently and does not send duplicate events to Devo (see below). In most cases, multi-pod mode is unnecessary. High Throughput: Multi-pod mode allows potentially the highest throughput. Consumer Client Thread Limit: The user should specify a client_thread_limit to ensure that the collector utilizes load balancing instead of explicitly assigning partition IDs to the consumer clients. In load-balancing mode, having fewer consumer clients than partitions is allowable, but not as efficient as some consumer clients will fetch events from multiple partitions. In load-balancing mode, having more consumer clients than partitions is allowable, but not as efficient as some consumer clients will not be assigned any partitions. The most efficient design is to ensure that there are as many consumer clients as there are partitions distributed amongst the pods. The easiest way to achieve this is to set the client_thread_limit to 1 and creating as many pods as there are partitions.
Azure Blob Storage Checkpointing: Required for multi-pod mode.
|
---|
Standard mode | Both checkpointing options are supported. In standard mode, the collector will automatically create one consumer client thread per partition per event hub. If the event hubs you wish to fetch data from have too many partitions that can be supported on a single instance (i.e. you have 100 event hubs each with 32 partitions, therefore the collector attempts to create 3200 consumer clients), then you should create multiple collector instances and configure each one to fetch from a subset of the desired events hubs.
|
---|
Internal process and deduplication method
The collector uses the event_hubs
service to pull events from the Azure Event Hubs. Each queue in the event_hubs
service represents an event hub that is polled for events.
Collector deduplication mechanisms | Events are deduplicated using the duplicated_messages_mechanism parameter. There are two methods available: Local Deduplication: Ensures that subsequent duplicate events from the same event hub are not sent to Devo. This method operates individually within each consumer client. Global Deduplication: Utilizes a shared cache across all event hub consumers for a given collector. As events are ingested into Devo, the collector checks if the event has already been consumed by another event hub consumer. The event will not be sent to Devo if it has already been consumed. The global cache tracks the last 1000 events for each consumer client.
If the global deduplication method is selected, the collector will automatically employ the local deduplication method as well. |
---|
Checkpointing mechanisms | The collector offers two distinct methods for checkpointing, each designed to prevent the re-fetching of events from Azure Event Hubs. These mechanisms ensure efficient event processing by maintaining a record of the last processed event in each partition. | Local Persistence Checkpointing Overview: By default, the collector employs local persistence checkpointing. This method is designed to keep track of the last event offset within each partition of an Event Hub, ensuring events are processed once without duplication. How It Works: As the collector consumes messages from an Event Hub, it records the offset of the last processed event locally. On subsequent pulls from the Event Hub, the collector resumes processing from the next event after the last recorded offset, effectively skipping previously processed events. Use Case: Ideal for single-instance deployments where all partitions of an Event Hub are managed by a single collector instance.
|
---|
Azure Blob Storage Checkpointing Overview: As an alternative to local persistence, the collector can be configured to use Azure Blob Storage for checkpointing. This approach leverages Azure's cloud storage to maintain event processing state. Configuration: Option 1: Specify both an Azure Blob Storage account and container name. This method requires the collector to have appropriate access permissions to the specified Blob Storage account. Option 2: Provide an Azure Blob Storage connection string and container name. This method is straightforward and recommended if you have the connection string readily available.
Benefits: Multi-pod Support: Enables the collector to operate in a distributed environment, such as Kubernetes, where multiple instances (pods) of the collector can run concurrently. Checkpointing data stored in Azure Blob Storage ensures that each instance has access to the current state of event processing, facilitating efficient load balancing and event partition management. Durability: Utilizes Azure Blob Storage's durability and availability features to safeguard checkpointing data against data loss or corruption.
Use Case: Recommended for environments requiring multi-pod deployment or when a user prefers to centralize checkpointing within their Azure infrastructure.
|