Document toolboxDocument toolbox

Microsoft 365 Management API collector

Overview

The Office 365 Management APIs provide a single extensibility platform for all Office 365 customers' and partners' management tasks, including service communications, security, compliance, reporting, and auditing.

Devo collector features

Feature

Details

Feature

Details

Allow parallel downloading (multipod)

not allowed

Running environments

  • collector server

  • on-premise

Populated Devo events

table

Flattening preprocessing

no

Allowed source events obfuscation

yes

Data sources

Data source

API endpoint

Collector service name

Devo table

Active Directory

Audit.AzureActiveDirectory

azure_active_directory

cloud.office365.management.azure_active_directory

Sharepoint

Audit.Sharepoint

sharepoint

cloud.office365.management.sharepoint and cloud.office365.management.onedrive

Exchange

Audit.Exchange

exchange

cloud.office365.management.exchange

General Audit

Audit.General

general_audit

cloud.office365.management.*

DLP

DLP.All

dlp

Any table listed above

URI Retry

This service is mandatory for retrying any URI that failed from any service.

-

uri_retry

Any service above.

For more information on how the events are parsed, visit our page.

Minimum configuration required for basic pulling

Although this collector supports advanced configuration, the fields required to retrieve data with basic configuration are defined below.

This minimum configuration refers exclusively to those specific parameters of this integration. There are more required parameters related to the generic behavior of the collector. Check setting sections for details.

Setting

Details

Setting

Details

tenant_id

The Azure application tenant ID

client_id

The Azure application client ID

client_secret

The Azure application client secret

Accepted authentication methods

Authentication method

Tenant ID

Client ID

Client Secret

Authentication method

Tenant ID

Client ID

Client Secret

OAuth2

REQUIRED

REQUIRED

REQUIRED

Vendor setup

Getting credentials

To log in to the Azure subscription, you need to get the Active Directory ID, Application ID (service principal identification), and the client secret (service principal "password"). To get them, follow these steps:

  1. Begin by creating and registering your application within Azure AD. Give it a name of your choice to identify it, such as devo-integration. The Redirect URI field may be left blank. Make note of the application's Client ID as well as the Tenant ID. Learn more here.

  2. Move to the API Permissions section on the left menu, then click Add a permission in the main pane. Find the Office 365 Management APIs section and click on it.

  3. Then click Application permissions, and enable the appropriate permissions, at least the two under ActivityFeed. Click Add permissions.

  4. Once you have added the permissions, you need to grant admin consent to the application. You should see a message confirming admin consent for the requested permissions. Learn more here.

    • The permissions that need to be set are as follows:

      • Read activity data from your organization.

      • Read service health information from your organization.

      • Read DLP policy events including detected sensitive data (only if pulling DLP.All from Management Activity).

  5. Generate a new key (also called client secret value in the application) and copy/record it for later use. This is done in the left-hand menu under Certificates & secrets and can be done by clicking New client secret. Learn more here.

Subscriptions and tables

The Management Collector pulls a lot of different tables. If there are tables you don't want, remove the services from the configuration.

Service

Tables

Service

Tables

Azure Active Directory Service

  • cloud.office365.management.azureactivedirectory

Azure Sharepoint Service

  • cloud.office365.management.sharepoint

  • cloud.office365.management.onedrive

Azure Exchange Service

  • cloud.office365.management.exchange

Azure General Audit Service

 

Note these are all the possible tables, but your subscription might not allow all of these. If you’re not getting one of these tables, contact Microsoft.

  • cloud.office365.management.aip

  • cloud.office365.management.airinvestigation

  • cloud.office365.management.cca

  • cloud.office365.management.compliance

  • cloud.office365.management.compliancemanager

  • cloud.office365.management.copilot

  • cloud.office365.management.corereporting

  • cloud.office365.management.crm

  • cloud.office365.management.dlpsensitiveinformationtype

  • cloud.office365.management.endpoint

  • cloud.office365.management.mcas

  • cloud.office365.management.microsoftdefenderforidentity

  • cloud.office365.management.microsoftflow

  • cloud.office365.management.microsoftforms

  • cloud.office365.management.microsoftstream

  • cloud.office365.management.microsoftteams

  • cloud.office365.management.microsofttodo

  • cloud.office365.management.mip

  • cloud.office365.management.manalytics

  • cloud.office365.management.officeapps

  • cloud.office365.management.onedriveforbusiness

  • cloud.office365.management.planner

  • cloud.office365.management.powerapps

  • cloud.office365.management.powerbi

  • cloud.office365.management.powerplatformadmin

  • cloud.office365.management.publicendpoint

  • cloud.office365.management.quarantine

  • cloud.office365.management.rdl

  • cloud.office365.management.se

  • cloud.office365.management.securitycompliancecenter

  • cloud.office365.management.skypeforbusiness

  • cloud.office365.management.threatintelligence

  • cloud.office365.management.workplaceanalytics

  • cloud.office365.management.yammer

Azure DLP Service

  • cloud.office365.management.*

Run the collector

Once the data source is configured, you can either send us the required information if you want us to host and manage the collector for you (Cloud collector), or deploy and host the collector in your own machine using a Docker image (On-premise collector).

This section is intended to explain how to proceed with specific actions for services.

Internal process and deduplication method

The collector will pull URIs from the subscription and check them against the URIs in the list. The collector will then pull the events from the URI, check the ID of the event against other IDs and then send them.

Content blob deduplication

The collector fetches content blobs from the subscription and extracts records from each content blob. The created date of the content blob is used to determine if the content blob is new or not. The collector will only fetch content blobs created since the last successful content blob record extraction.

Event deduplication

Overview

The Office 365 activity and reporting APIs can introduce many duplicitous events for various services across subsequent and non-contiguous content blobs. To ensure that only unique events are sent to Devo, the collector uses a Bloom filter for efficient deduplication. A Bloom filter is a space-efficient probabilistic data structure used to test whether an element is part of a set.

How it works
  1. Hash Calculation:

    • Each record's content is hashed, excluding (parent content blob metadata), ensuring even minor content changes produce different hashes.

  2. Bloom Filter:

    • Stores hashes in a compact bit array.

    • New records' hashes are checked against the filter to detect duplicates.

    • Unique records are added to the filter; duplicates are skipped.

  3. Configurable Size:

    • Configurable via override_bloom_filter_size parameter. Default value: 10 million records (~12 Mb).

Bloom filter buffer
  1. Buffering recent hashes:

    • Maintains a buffer (deque) of recent hashes to ensure recent events are deduplicated even after filter resets.

  2. Resetting the bloom filter:

    • On reaching capacity, the Bloom filter is reset and reseeded with recent hashes from the buffer.

  3. Configurable buffer size:

    • Configurable via override_bloom_filter_buffer_size parameter. Default value: 100,000 records (~1 Mb).

Devo categorization and destination

All services will check the workload when pulling the events and append it to the tag.

Setup/Puller output

2024-04-02T12:36:10.881042712Z 2024-04-02T12:36:10.880 INFO InputProcess::MainThread -> InputThread(office365,45635) - Starting thread (execution_period=300s) 2024-04-02T12:36:10.900848871Z 2024-04-02T12:36:10.900 INFO InputProcess::MainThread -> ServiceThread(office365,45635,dlp,predefined) - Starting thread (execution_period=300s) 2024-04-02T12:36:10.901635871Z 2024-04-02T12:36:10.901 INFO InputProcess::MainThread -> ManagementPullerSetup(o365-collector,office365#45635,dlp#predefined) -> Starting thread 2024-04-02T12:36:10.902970384Z 2024-04-02T12:36:10.902 INFO InputProcess::MainThread -> ManagementPuller(office365,45635,dlp,predefined) - Starting thread 2024-04-02T12:36:10.903841384Z 2024-04-02T12:36:10.903 WARNING InputProcess::ManagementPuller(office365,45635,dlp,predefined) -> Waiting until setup will be executed 2024-04-02T12:36:10.910390935Z 2024-04-02T12:36:10.909 WARNING InputProcess::ManagementPullerSetup(o365-collector,office365#45635,dlp#predefined) -> The token/header/authentication has not been created yet 2024-04-02T12:36:10.912045728Z 2024-04-02T12:36:10.911 INFO InputProcess::ManagementPullerSetup(o365-collector,office365#45635,dlp#predefined) -> using base url: https://manage.office.com 2024-04-02T12:36:11.221983503Z 2024-04-02T12:36:11.221 INFO InputProcess::ManagementPullerSetup(o365-collector,office365#45635,dlp#predefined) -> Setup for module <ManagementPuller> has been successfully executed 2024-04-02T12:36:11.906707525Z 2024-04-02T12:36:11.905 INFO InputProcess::ManagementPuller(office365,45635,dlp,predefined) -> ManagementPuller(office365,45635,dlp,predefined) Starting the execution of pre_pull() 2024-04-02T12:36:11.907795456Z 2024-04-02T12:36:11.906 INFO InputProcess::ManagementPuller(office365,45635,dlp,predefined) -> Reading persisted data 2024-04-02T12:36:11.910462424Z 2024-04-02T12:36:11.909 INFO InputProcess::ManagementPuller(office365,45635,dlp,predefined) -> Data retrieved from the persistence: {'@persistence_version': 1, 'start_time_in_utc': None, 'last_event_time_in_utc': '2024-04-02 12:35:07'} 2024-04-02T12:36:11.911358075Z 2024-04-02T12:36:11.910 INFO InputProcess::ManagementPuller(office365,45635,dlp,predefined) -> Start time not found in config, using 2024-04-02 12:35:11 2024-04-02T12:36:11.912847398Z 2024-04-02T12:36:11.911 INFO InputProcess::ManagementPuller(office365,45635,dlp,predefined) -> Running the persistence upgrade steps 2024-04-02T12:36:11.915154717Z 2024-04-02T12:36:11.913 INFO InputProcess::ManagementPuller(office365,45635,dlp,predefined) -> Running the persistence corrections steps 2024-04-02T12:36:11.916748235Z 2024-04-02T12:36:11.915 INFO InputProcess::ManagementPuller(office365,45635,dlp,predefined) -> Running the persistence corrections steps 2024-04-02T12:36:11.918276116Z 2024-04-02T12:36:11.917 INFO InputProcess::ManagementPuller(office365,45635,dlp,predefined) -> No changes were detected in the persistence 2024-04-02T12:36:11.919248467Z 2024-04-02T12:36:11.918 INFO InputProcess::ManagementPuller(office365,45635,dlp,predefined) -> ManagementPuller(office365,45635,dlp,predefined) Finalizing the execution of pre_pull() 2024-04-02T12:36:11.920446419Z 2024-04-02T12:36:11.919 INFO InputProcess::ManagementPuller(office365,45635,dlp,predefined) -> Starting data collection every 60 seconds 2024-04-02T12:36:11.924162570Z 2024-04-02T12:36:11.923 INFO InputProcess::ManagementPuller(office365,45635,dlp,predefined) -> Pull Started 2024-04-02T12:36:12.045307400Z 2024-04-02T12:36:12.044 INFO InputProcess::ManagementPuller(office365,45635,dlp,predefined) -> Found 1 removed 0 2024-04-02T12:36:12.221770395Z 2024-04-02T12:36:12.221 INFO InputProcess::ManagementPuller(office365,45635,dlp,predefined) -> (Partial) Statistics for this pull cycle (@devo_pulling_id=1712061371905):Number of requests made: 1; Number of events received: 30; Number of duplicated events filtered out: 0; Number of events generated and sent: 30; Average of events per second: 101.027. 2024-04-02T12:36:12.222243522Z 2024-04-02T12:36:12.222 INFO InputProcess::ManagementPuller(office365,45635,dlp,predefined) -> Statistics for this pull cycle (@devo_pulling_id=1712061371905):Number of requests made: 1; Number of events received: 30; Number of duplicated events filtered out: 0; Number of events generated and sent: 30; Average of events per second: 100.751. 2024-04-02T12:36:12.222631040Z 2024-04-02T12:36:12.222 INFO InputProcess::ManagementPuller(office365,45635,dlp,predefined) -> The data is up to date! 2024-04-02T12:36:12.223216005Z 2024-04-02T12:36:12.223 INFO InputProcess::ManagementPuller(office365,45635,dlp,predefined) -> Data collection completed. Elapsed time: 0.318 seconds. Waiting for 59.682 second(s) until the next one```

This collector uses persistent storage to download events in an orderly fashion and avoid duplicates. In case you want to re-ingest historical data or recreate the persistence, you can restart the persistence of this collector by following these steps:

  1. Edit the configuration file.

  2. Change the value of the start_time_in_utc parameter parameter to a different one.

  3. Save the changes.

  4. Restart the collector.

The collector will detect this change and will restart the persistence using the parameters of the configuration file or the default configuration in case it has not been provided.

This collector has different security layers that detect both an invalid configuration and abnormal operation. This table will help you detect and resolve the most common errors.

Error type

Error ID

Error message

Cause

Solution

Error type

Error ID

Error message

Cause

Solution

InitVariablesError

1

Invalid start_time_in_utc: {ini_start_str}. Must be in parseable datetime format.

The configured start_time_in_utc parameter is a non-parseable format.

Update the start_time_in_utc value to have the recommended format as indicated in the guide.

ApiError

400

HTTP ERROR 400: Bad request: The server could not understand the request. {e}

The collector is unable to authenticate with the API.

Check the credentials and ensure that the collector has the necessary permissions to access the Management API.

ApiError

401

HTTP ERROR 401: Unauthorized: Authentication is required and has failed or has not been provided. {e}

The collector is unable to authenticate with the API.

Check the credentials and ensure that the collector has the necessary permissions to access the Management API.

ApiError

402

HTTP ERROR 429: Too Many Requests: The user has sent too many requests in a given amount of time.

Too many requests

Adjust the request limits.

ApiError

402

HTTP ERROR 500: Server Error: An error occurred on the server. {e}

it means that the server has encountered an unexpected condition or configuration problem that prevents it from fulfilling the request made by the browser or client.

 

ApiError

403

Request Error: Received status code {status_code} {response.json()}

Unhandled status.

Read Stack trace and status code to determine

Collector operations

This section is intended to explain how to proceed with specific operations of this collector.

Initialization

The initialization module is in charge of setup and running the input (pulling logic) and output (delivering logic) services and validating the given configuration.

A successful run has the following output messages for the initializer module:

2023-01-10T15:22:57.146 INFO MainProcess::MainThread -> Loading configuration using the following files: {"full_config": "config-test-local.yaml", "job_config_loc": null, "collector_config_loc": null} 2023-01-10T15:22:57.146 INFO MainProcess::MainThread -> Using the default location for "job_config_loc" file: "/etc/devo/job/job_config.json" 2023-01-10T15:22:57.147 INFO MainProcess::MainThread -> "\etc\devo\job" does not exists 2023-01-10T15:22:57.147 INFO MainProcess::MainThread -> Using the default location for "collector_config_loc" file: "/etc/devo/collector/collector_config.json" 2023-01-10T15:22:57.148 INFO MainProcess::MainThread -> "\etc\devo\collector" does not exists 2023-01-10T15:22:57.148 INFO MainProcess::MainThread -> Results of validation of config files parameters: {"config": "C:\git\collectors2\devo-collector-<name>\config\config.yaml", "config_validated": True, "job_config_loc": "/etc/devo/job/job_config.json", "job_config_loc_default": True, "job_config_loc_validated": False, "collector_config_loc": "/etc/devo/collector/collector_config.json", "collector_config_loc_default": True, "collector_config_loc_validated": False} 2023-01-10T15:22:57.171 WARNING MainProcess::MainThread -> [WARNING] Illegal global setting has been ignored -> multiprocessing: False

Events delivery and Devo ingestion

The event delivery module is in charge of receiving the events from the internal queues where all events are injected by the pullers and delivering them using the selected compatible delivery method.

A successful run has the following output messages for the initializer module:

2023-01-10T15:23:00.788 INFO OutputProcess::MainThread -> DevoSender(standard_senders,devo_sender_0) -> Starting thread 2023-01-10T15:23:00.789 INFO OutputProcess::MainThread -> DevoSenderManagerMonitor(standard_senders,devo_1) -> Starting thread (every 300 seconds) 2023-01-10T15:23:00.790 INFO OutputProcess::MainThread -> DevoSenderManager(standard_senders,manager,devo_1) -> Starting thread 2023-01-10T15:23:00.842 INFO OutputProcess::MainThread -> global_status: {"output_process": {"process_id": 18804, "process_status": "running", "thread_counter": 21, "thread_names": ["MainThread", "pydevd.Writer", "pydevd.Reader", "pydevd.CommandThread", "pydevd.CheckAliveThread", "DevoSender(standard_senders,devo_sender_0)", "DevoSenderManagerMonitor(standard_senders,devo_1)", "DevoSenderManager(standard_senders,manager,devo_1)", "OutputStandardConsumer(standard_senders_consumer_0)",

Sender services

The Integrations Factory Collector SDK has 3 different senders services depending on the event type to delivery (internal, standard, and lookup). This collector uses the following Sender Services:

Sender services

Description

Sender services

Description

internal_senders

In charge of delivering internal metrics to Devo such as logging traces or metrics.

standard_senders

In charge of delivering pulled events to Devo.

Sender statistics

Each service displays its own performance statistics that allow checking how many events have been delivered to Devo by type:

Logging trace

Description

Logging trace

Description

Number of available senders: 1

Displays the number of concurrent senders available for the given Sender Service.

sender manager internal queue size: 0

Displays the items available in the internal sender queue.

Total number of messages sent: 57, messages sent since "2023-01-10 16:09:16.116750+00:00": 0 (elapsed 0.000 seconds)

Displays the number of events from the last time and following the given example, the following conclusions can be obtained:

  • 44 events were sent to Devo since the collector started.

  • The last checkpoint timestamp was 2023-01-10 16:09:16.116750+00:00

  • 21 events where sent to Devo between the last UTC checkpoint and now.

  • Those 21 events required 0.00 seconds to be delivered.

To check the memory usage of this collector, look for the following log records in the collector which are displayed every 5 minutes by default, always after running the memory-free process.

  • The used memory is displayed by running processes and the sum of both values will give the total used memory for the collector.

  • The global pressure of the available memory is displayed in the global value.

  • All metrics (Global, RSS, VMS) include the value before freeing and after previous -> after freeing memory

Change log

Release

Released on

Release type

Details

Recommendations

Release

Released on

Release type

Details

Recommendations

v2.4.0

Aug 26, 2024

IMPROVEMENTS

Improvements

  • Updated DCSDK to 1.12.4

Recommended version

v2.3.3

Jul 9, 2024

BUG FIXING

Bug fixes

  • Fixed Issue with override parameters

Upgrade

v2.3.2

Jul 11, 2024

BUG FIXING

Bug fixes

  • Fix Subscription Detection

  • Performs case-insensitive detection for pre-existing content type descriptions ("Sharepoint" vs "SharePoint").

  • Fixed issue with the sender method.

Upgrade

v2.3.0

Jul 10, 2024

IMPROVEMENTS

Improvements

  • Updated DCDSK from 1.11.1 to 1.12.2

    • Fixed high vulnerability in Docker Image

    • Upgrade DevoSDK dependency to version v5.4.0

    • Fixed error in persistence system

    • Applied changes to make DCSDK compatible with MacOS

    • Added new sender for relay in house + TLS

    • Added persistence functionality for gzip sending buffer

    • Added Automatic activation of gzip sending

    • Improved behaviour when persistence fails

    • Upgraded DevoSDK dependency

    • Fixed console log encoding

    • Restructured python classes

    • Improved behaviour with non-utf8 characters

    • Decreased defaut size value for internal queues (Redis limitation, from 1GiB to 256MiB)

    • New persistence format/structure (compression in some cases)

    • Removed dmesg execution (It was invalid for docker execution)

Bug fixes

  • Fix FQDN Logging

    • Utilizes standard Python urllib method to extract hostnames from request responses for logging; addresses a multiprocess caching issue and increases performance.

Upgrade

v2.2.0

Jun 20, 2024

IMPROVEMENTS

Improvements

Added override urls for GCC/DoD/ and other government endpoints.

Fixed bug with subscription enabling

Upgrade

v2.1.0

Jun 17, 2024

IMPROVEMENTS

Improvements

Improvements with de-duplication

Upgrade

v2.0.0

Jun 10, 2024

BUG FIXING

Bug fixing

Fixed rate limiting. The way the token is handled has been updated.

Upgrade

v1.0.0

Apr 29, 2024

INITIAL RELEASE



-

-