Document toolboxDocument toolbox

Google Workspace Logs in BigQuery collector

Overview

Google Workspace allows users to export logs to BigQuery to gain insights into report activity and
usage logs.

Devo collector features

Feature

Details

Feature

Details

Allow parallel downloading (multipod)

not allowed

Running environments

  • collector server

  • on-premise

Populated Devo events

table

Flattening preprocessing

no

Allowed source events obfuscation

yes

Data sources

Data source

Description

API endpoint

Collector service name

Devo table

Data source

Description

API endpoint

Collector service name

Devo table

Activity Records

Activity records include activity data for various Google Workspace reports (e.g. "Accounts", "Admin", "Gmail", etc.

BigQuery search (partition query)

activity_records

Various (see service description mapping

Vendor setup

Supported editions for this feature include Frontline Standard, Enterprise Standard and Plus, Education Standard and Plus, and Enterprise Essentials Plus. Compare your edition.

Set up a BigQuery project for reporting logs - Google Workspace Admin Help

Before you set up BigQuery logs in the Google Admin console, establish a BigQuery project for your reporting logs in the Google Cloud console.

Set up BigQuery logs in the Google Admin console

Logs are written to the dataset the following day, with the necessary service account permissions automatically configured.

Create Google Service Account Credentials for BigQuery Access

Minimum configuration required for basic pulling

Although this collector supports advanced configuration, the fields required to retrieve data with basic configuration are defined below.

This minimum configuration refers exclusively to those specific parameters of this integration. There are more required parameters related to the generic behavior of the collector. Check setting sections for details.

Setting

Details

Setting

Details

service_account_info

The GCP service account (with BigQuery read/query access) credentials info

Accepted authentication methods

Authentication method

Customer ID

Client ID

Client secret

Authentication method

Customer ID

Client ID

Client secret

Service Account Credentials

REQUIRED

REQUIRED

REQUIRED

Run the collector

Once the data source is configured, you can either send us the required information if you want us to host and manage the collector for you (Cloud collector), or deploy and host the collector in your own machine using a Docker image (On-premise collector).

Collector services detail

This section is intended to explain how to proceed with specific actions for services.

Activity records (activity_records)

Internal process and deduplication method

All activity records are fetched continually based on the time_usec date time key via BigQuery queries. The collector utilizes the partitions to perform efficient queries. The collector stores the last time_usec value and the associated record IDs to ensure that no duplicate records are ingested into the Devo table.

Devo categorization and destination

Events are sent to Devo based on the record_type.

  • gmail record type events are sent to the cloud.gcp.bigquery.gmail table

  • All other record types are sent to my.app.gsuite_activity.{record_type}

Setup output

2024-05-10T00:49:40.248 INFO InputProcess::MainThread -> BigQueryPullerSetup(unknown,google_workspace_logs_in_bigquery#100000,activity_records#predefined) -> Starting thread 2024-05-10T00:49:40.249 INFO InputProcess::BigQueryPullerSetup(unknown,google_workspace_logs_in_bigquery#100000,activity_records#predefined) -> Using service account info to authenticate 2024-05-10T00:49:40.262 INFO OutputProcess::MainThread -> DevoSender(standard_senders,devo_sender_0) -> Starting thread 2024-05-10T00:49:40.654 INFO InputProcess::BigQueryPullerSetup(unknown,google_workspace_logs_in_bigquery#100000,activity_records#predefined) -> Setup for module <BigQueryPuller> has been successfully executed

Puller output

2024-05-10T00:49:41.324 INFO InputProcess::BigQueryPuller(google_workspace_logs_in_bigquery,100000,activity_records,predefined) -> BigQueryPuller(google_workspace_logs_in_bigquery,100000,activity_records,predefined) Starting the execution of pre_pull() 2024-05-10T00:49:41.326 INFO InputProcess::BigQueryPuller(google_workspace_logs_in_bigquery,100000,activity_records,predefined) -> Reading persisted data 2024-05-10T00:49:41.328 INFO InputProcess::BigQueryPuller(google_workspace_logs_in_bigquery,100000,activity_records,predefined) -> Data retrieved from the persistence: {'@persistence_version': 1, 'start_time_in_utc': '2024-04-20T00:00:00.000000Z', 'last_event_time_in_utc': '2024-04-22T08:20:00.000000Z', 'last_ids': ['1991b279fb251ee850b64cc04f722b914b6504563a3dad18f40c09c4d1f8c0e5']} 2024-05-10T00:49:41.330 INFO InputProcess::BigQueryPuller(google_workspace_logs_in_bigquery,100000,activity_records,predefined) -> Running the persistence upgrade steps 2024-05-10T00:49:41.331 INFO InputProcess::BigQueryPuller(google_workspace_logs_in_bigquery,100000,activity_records,predefined) -> Running the persistence corrections steps 2024-05-10T00:49:41.331 INFO InputProcess::BigQueryPuller(google_workspace_logs_in_bigquery,100000,activity_records,predefined) -> Running the persistence corrections steps 2024-05-10T00:49:41.333 WARNING InputProcess::BigQueryPuller(google_workspace_logs_in_bigquery,100000,activity_records,predefined) -> Some changes have been detected and the persistence needs to be updated. Previous content: {'@persistence_version': 1, 'start_time_in_utc': '2024-04-20T00:00:00.000000Z', 'last_event_time_in_utc': '2024-04-22T08:20:00.000000Z', 'last_ids': ['1991b279fb251ee850b64cc04f722b914b6504563a3dad18f40c09c4d1f8c0e5']}. New content: {'@persistence_version': 1, 'start_time_in_utc': None, 'last_event_time_in_utc': '2024-05-10T04:49:36.324054Z', 'last_ids': []} 2024-05-10T00:49:41.338 INFO InputProcess::BigQueryPuller(google_workspace_logs_in_bigquery,100000,activity_records,predefined) -> Updating the persistence 2024-05-10T00:49:41.340 WARNING InputProcess::BigQueryPuller(google_workspace_logs_in_bigquery,100000,activity_records,predefined) -> Persistence has been updated successfully 2024-05-10T00:49:41.340 INFO InputProcess::BigQueryPuller(google_workspace_logs_in_bigquery,100000,activity_records,predefined) -> BigQueryPuller(google_workspace_logs_in_bigquery,100000,activity_records,predefined) Finalizing the execution of pre_pull() 2024-05-10T00:49:41.340 INFO InputProcess::BigQueryPuller(google_workspace_logs_in_bigquery,100000,activity_records,predefined) -> Starting data collection every 300 seconds 2024-05-10T00:49:41.341 INFO InputProcess::BigQueryPuller(google_workspace_logs_in_bigquery,100000,activity_records,predefined) -> Pull Started 2024-05-10T00:49:41.342 INFO InputProcess::BigQueryPuller(google_workspace_logs_in_bigquery,100000,activity_records,predefined) -> Fetching records via fetch_activity_data occurring between 2024-05-10T04:49:36.324054Z and 2024-05-10T04:49:36.324054Z 2024-05-10T00:49:43.406 INFO InputProcess::BigQueryPuller(google_workspace_logs_in_bigquery,100000,activity_records,predefined) -> Pull completed up to the current time. 2024-05-10T00:49:43.410 INFO InputProcess::BigQueryPuller(google_workspace_logs_in_bigquery,100000,activity_records,predefined) -> Updating the persistence 2024-05-10T00:49:43.411 INFO InputProcess::BigQueryPuller(google_workspace_logs_in_bigquery,100000,activity_records,predefined) -> (Partial) Statistics for this pull cycle (@devo_pulling_id=1715316581324):Number of requests made: 1; Number of events received: 0; Number of duplicated events filtered out: 0; Number of events generated and sent: 0; Average of events per second: 0.000. 2024-05-10T00:49:43.411 INFO InputProcess::BigQueryPuller(google_workspace_logs_in_bigquery,100000,activity_records,predefined) -> Statistics for this pull cycle (@devo_pulling_id=1715316581324):Number of requests made: 1; Number of events received: 0; Number of duplicated events filtered out: 0; Number of events generated and sent: 0; Average of events per second: 0.000. 2024-05-10T00:49:43.411 INFO InputProcess::BigQueryPuller(google_workspace_logs_in_bigquery,100000,activity_records,predefined) -> The data is up to date! 2024-05-10T00:49:43.411 INFO InputProcess::BigQueryPuller(google_workspace_logs_in_bigquery,100000,activity_records,predefined) -> Data collection completed. Elapsed time: 2.087 seconds. Waiting for 297.913 second(s) until the next one

Restart the persistence

This collector uses persistent storage to download events in an orderly fashion and avoid duplicates. In case you want to re-ingest historical data or recreate the persistence, you can restart the persistence of this collector by following these steps:

  1. Edit the configuration file.

  2. Change the value of the start_time_in_utc parameter to a different one.

  3. Save the changes.

  4. Restart the collector.

The collector will detect this change and will restart the persistence using the parameters of the configuration file or the default configuration in case it has not been provided.

Troubleshooting ]

This collector has different security layers that detect both an invalid configuration and abnormal operation. This table will help you detect and resolve the most common errors.

Error type

Error ID

Error message

Cause

Solution

Error type

Error ID

Error message

Cause

Solution

InitVariablesError

1

Invalid start_time_in_utc: {ini_start_str}. Must be in parseable datetime format.

The configured start_time_in_utc parameter is a non-parseable format.

Update the start_time_in_utc value to have the recommended format as indicated in the guide.

InitVariablesError

2

Invalid start_time_in_utc: {ini_start_str}. Must be in the past.

The configured start_time_in_utc parameter is a future date.

Update the start_time_in_utc value to a past datetime.

ApiError

401

An error occurred while trying to authenticate with the Azure API. Exception: {e}

The collector is unable to authenticate with the Azure API.

Check the credentials and ensure that the collector has the necessary permissions to access the Azure API.

ApiError

410

An error occurred while trying to check if container {container_name} exists. Ensure that the blob storage account name or connection string is correct. Exception: {e}

The collector was unable to locate the specified blob storage container name.

Ensure the container exists and the credentials have READ access to the container

ApiError

411

An error occurred while trying to check if container {container_name} exists. Ensure that the application has necessary permissions to access the containers. Exception: {e}

The collector was unable to access the specified blob storage container name.

Ensure the container exists and the credentials have READ access to the container

ApiError

412

An error occurred while trying to create container {container_name}. Ensure that the application has necessary permissions to create containers. Exception: {e}

The collector was unable to create the container for the auto discover service and the user indicated to use Azure Blob Storage checkpointing.

Ensure the credentials have WRITE access to the container storage account.

ApiError

420

An error occurred while trying to get consumer group {consumer_group_name}. Exception: {e}

The collector was unable to access the specified consumer group name.

Ensure the consumer group exists and the credentials have READ access to the consumer group

ApiError

421

An error occurred while trying to create consumer group {consumer_group_name}. Ensure that the application has necessary permissions to create consumer groups. Exception: {e}

The collector was unable to create the consumer group for the auto discover service.

Ensure the credentials have WRITE access to the event hub namespace or use the $Default consumer group.

Collector operations

This section is intended to explain how to proceed with specific operations of this collector.

Initialization

The initialization module is in charge of setup and running the input (pulling logic) and output (delivering logic) services and validating the given configuration.

A successful run has the following output messages for the initializer module:

2023-01-10T15:22:57.146 INFO MainProcess::MainThread -> Loading configuration using the following files: {"full_config": "config-test-local.yaml", "job_config_loc": null, "collector_config_loc": null} 2023-01-10T15:22:57.146 INFO MainProcess::MainThread -> Using the default location for "job_config_loc" file: "/etc/devo/job/job_config.json" 2023-01-10T15:22:57.147 INFO MainProcess::MainThread -> "\etc\devo\job" does not exists 2023-01-10T15:22:57.147 INFO MainProcess::MainThread -> Using the default location for "collector_config_loc" file: "/etc/devo/collector/collector_config.json" 2023-01-10T15:22:57.148 INFO MainProcess::MainThread -> "\etc\devo\collector" does not exists 2023-01-10T15:22:57.148 INFO MainProcess::MainThread -> Results of validation of config files parameters: {"config": "C:\git\collectors2\devo-collector-<name>\config\config.yaml", "config_validated": True, "job_config_loc": "/etc/devo/job/job_config.json", "job_config_loc_default": True, "job_config_loc_validated": False, "collector_config_loc": "/etc/devo/collector/collector_config.json", "collector_config_loc_default": True, "collector_config_loc_validated": False} 2023-01-10T15:22:57.171 WARNING MainProcess::MainThread -> [WARNING] Illegal global setting has been ignored -> multiprocessing: False

Events delivery and Devo ingestion

The event delivery module is in charge of receiving the events from the internal queues where all events are injected by the pullers and delivering them using the selected compatible delivery method.

A successful run has the following output messages for the initializer module:

By default, these information traces will be displayed every 10 minutes.

Sender services

The Integrations Factory Collector SDK has 3 different senders services depending on the event type to delivery (internal, standard, and lookup). This collector uses the following Sender Services:

Sender services

Description

Sender services

Description

internal_senders

In charge of delivering internal metrics to Devo such as logging traces or metrics.

standard_senders

In charge of delivering pulled events to Devo.

Sender statistics

Each service displays its own performance statistics that allow checking how many events have been delivered to Devo by type:

Logging trace

Description

Logging trace

Description

Number of available senders: 1

Displays the number of concurrent senders available for the given Sender Service.

sender manager internal queue size: 0

Displays the items available in the internal sender queue.

This value helps detect bottlenecks and needs to increase the performance of data delivery to Devo. This last can be made by increasing the concurrent senders.

Total number of messages sent: 44, messages sent since "2022-06-28 10:39:22.511671+00:00": 21 (elapsed 0.007 seconds)

Displays the number of events from the last time and following the given example, the following conclusions can be obtained:

  • 44 events were sent to Devo since the collector started.

  • The last checkpoint timestamp was 2022-06-28 10:39:22.511671+00:00.

  • 21 events where sent to Devo between the last UTC checkpoint and now.

  • Those 21 events required 0.007 seconds to be delivered.

To check the memory usage of this collector, look for the following log records in the collector which are displayed every 5 minutes by default, always after running the memory-free process.

  • The used memory is displayed by running processes and the sum of both values will give the total used memory for the collector.

  • The global pressure of the available memory is displayed in the global value.

  • All metrics (Global, RSS, VMS) include the value before freeing and after previous -> after freeing memory

Change log

Release

Released on

Release type

Details

Recommendations

Release

Released on

Release type

Details

Recommendations

v1.0.0

Jun 6, 2024

NEW COLLECTOR



New collector

-