Google Workspace Logs in BigQuery collector
Overview
Google Workspace allows users to export logs to BigQuery to gain insights into report activity and
usage logs.
Devo collector features
Feature | Details |
---|---|
Allow parallel downloading ( |
|
Running environments |
|
Populated Devo events |
|
Flattening preprocessing |
|
Allowed source events obfuscation |
|
Data sources
Data source | Description | API endpoint | Collector service name | Devo table |
---|---|---|---|---|
Activity Records | Activity records include activity data for various Google Workspace reports (e.g. "Accounts", "Admin", "Gmail", etc. | BigQuery search (partition query) |
| Various (see service description mapping |
Vendor setup
Supported editions for this feature include Frontline Standard, Enterprise Standard and Plus, Education Standard and Plus, and Enterprise Essentials Plus. Compare your edition.
Set up a BigQuery project for reporting logs - Google Workspace Admin Help
Before you set up BigQuery logs in the Google Admin console, establish a BigQuery project for your reporting logs in the Google Cloud console.
Set up BigQuery logs in the Google Admin console
Logs are written to the dataset the following day, with the necessary service account permissions automatically configured.
Create Google Service Account Credentials for BigQuery Access
Minimum configuration required for basic pulling
Although this collector supports advanced configuration, the fields required to retrieve data with basic configuration are defined below.
This minimum configuration refers exclusively to those specific parameters of this integration. There are more required parameters related to the generic behavior of the collector. Check setting sections for details.
Setting | Details |
---|---|
| The GCP service account (with BigQuery read/query access) credentials info |
Accepted authentication methods
Authentication method | Customer ID | Client ID | Client secret |
---|---|---|---|
Service Account Credentials | REQUIRED | REQUIRED | REQUIRED |
Run the collector
Once the data source is configured, you can either send us the required information if you want us to host and manage the collector for you (Cloud collector), or deploy and host the collector in your own machine using a Docker image (On-premise collector).
Collector services detail
This section is intended to explain how to proceed with specific actions for services.
Activity records (activity_records
)
Internal process and deduplication method
All activity records are fetched continually based on the time_usec
date time key via BigQuery queries. The collector utilizes the partitions to perform efficient queries. The collector stores the last time_usec
value and the associated record IDs to ensure that no duplicate records are ingested into the Devo table.
Devo categorization and destination
Events are sent to Devo based on the record_type
.
gmail
record type events are sent to thecloud.gcp.bigquery.gmail
tableAll other record types are sent to
my.app.gsuite_activity.{record_type}
Setup output
2024-05-10T00:49:40.248 INFO InputProcess::MainThread -> BigQueryPullerSetup(unknown,google_workspace_logs_in_bigquery#100000,activity_records#predefined) -> Starting thread
2024-05-10T00:49:40.249 INFO InputProcess::BigQueryPullerSetup(unknown,google_workspace_logs_in_bigquery#100000,activity_records#predefined) -> Using service account info to authenticate
2024-05-10T00:49:40.262 INFO OutputProcess::MainThread -> DevoSender(standard_senders,devo_sender_0) -> Starting thread
2024-05-10T00:49:40.654 INFO InputProcess::BigQueryPullerSetup(unknown,google_workspace_logs_in_bigquery#100000,activity_records#predefined) -> Setup for module <BigQueryPuller> has been successfully executed
Puller output
2024-05-10T00:49:41.324 INFO InputProcess::BigQueryPuller(google_workspace_logs_in_bigquery,100000,activity_records,predefined) -> BigQueryPuller(google_workspace_logs_in_bigquery,100000,activity_records,predefined) Starting the execution of pre_pull()
2024-05-10T00:49:41.326 INFO InputProcess::BigQueryPuller(google_workspace_logs_in_bigquery,100000,activity_records,predefined) -> Reading persisted data
2024-05-10T00:49:41.328 INFO InputProcess::BigQueryPuller(google_workspace_logs_in_bigquery,100000,activity_records,predefined) -> Data retrieved from the persistence: {'@persistence_version': 1, 'start_time_in_utc': '2024-04-20T00:00:00.000000Z', 'last_event_time_in_utc': '2024-04-22T08:20:00.000000Z', 'last_ids': ['1991b279fb251ee850b64cc04f722b914b6504563a3dad18f40c09c4d1f8c0e5']}
2024-05-10T00:49:41.330 INFO InputProcess::BigQueryPuller(google_workspace_logs_in_bigquery,100000,activity_records,predefined) -> Running the persistence upgrade steps
2024-05-10T00:49:41.331 INFO InputProcess::BigQueryPuller(google_workspace_logs_in_bigquery,100000,activity_records,predefined) -> Running the persistence corrections steps
2024-05-10T00:49:41.331 INFO InputProcess::BigQueryPuller(google_workspace_logs_in_bigquery,100000,activity_records,predefined) -> Running the persistence corrections steps
2024-05-10T00:49:41.333 WARNING InputProcess::BigQueryPuller(google_workspace_logs_in_bigquery,100000,activity_records,predefined) -> Some changes have been detected and the persistence needs to be updated. Previous content: {'@persistence_version': 1, 'start_time_in_utc': '2024-04-20T00:00:00.000000Z', 'last_event_time_in_utc': '2024-04-22T08:20:00.000000Z', 'last_ids': ['1991b279fb251ee850b64cc04f722b914b6504563a3dad18f40c09c4d1f8c0e5']}. New content: {'@persistence_version': 1, 'start_time_in_utc': None, 'last_event_time_in_utc': '2024-05-10T04:49:36.324054Z', 'last_ids': []}
2024-05-10T00:49:41.338 INFO InputProcess::BigQueryPuller(google_workspace_logs_in_bigquery,100000,activity_records,predefined) -> Updating the persistence
2024-05-10T00:49:41.340 WARNING InputProcess::BigQueryPuller(google_workspace_logs_in_bigquery,100000,activity_records,predefined) -> Persistence has been updated successfully
2024-05-10T00:49:41.340 INFO InputProcess::BigQueryPuller(google_workspace_logs_in_bigquery,100000,activity_records,predefined) -> BigQueryPuller(google_workspace_logs_in_bigquery,100000,activity_records,predefined) Finalizing the execution of pre_pull()
2024-05-10T00:49:41.340 INFO InputProcess::BigQueryPuller(google_workspace_logs_in_bigquery,100000,activity_records,predefined) -> Starting data collection every 300 seconds
2024-05-10T00:49:41.341 INFO InputProcess::BigQueryPuller(google_workspace_logs_in_bigquery,100000,activity_records,predefined) -> Pull Started
2024-05-10T00:49:41.342 INFO InputProcess::BigQueryPuller(google_workspace_logs_in_bigquery,100000,activity_records,predefined) -> Fetching records via fetch_activity_data occurring between 2024-05-10T04:49:36.324054Z and 2024-05-10T04:49:36.324054Z
2024-05-10T00:49:43.406 INFO InputProcess::BigQueryPuller(google_workspace_logs_in_bigquery,100000,activity_records,predefined) -> Pull completed up to the current time.
2024-05-10T00:49:43.410 INFO InputProcess::BigQueryPuller(google_workspace_logs_in_bigquery,100000,activity_records,predefined) -> Updating the persistence
2024-05-10T00:49:43.411 INFO InputProcess::BigQueryPuller(google_workspace_logs_in_bigquery,100000,activity_records,predefined) -> (Partial) Statistics for this pull cycle (@devo_pulling_id=1715316581324):Number of requests made: 1; Number of events received: 0; Number of duplicated events filtered out: 0; Number of events generated and sent: 0; Average of events per second: 0.000.
2024-05-10T00:49:43.411 INFO InputProcess::BigQueryPuller(google_workspace_logs_in_bigquery,100000,activity_records,predefined) -> Statistics for this pull cycle (@devo_pulling_id=1715316581324):Number of requests made: 1; Number of events received: 0; Number of duplicated events filtered out: 0; Number of events generated and sent: 0; Average of events per second: 0.000.
2024-05-10T00:49:43.411 INFO InputProcess::BigQueryPuller(google_workspace_logs_in_bigquery,100000,activity_records,predefined) -> The data is up to date!
2024-05-10T00:49:43.411 INFO InputProcess::BigQueryPuller(google_workspace_logs_in_bigquery,100000,activity_records,predefined) -> Data collection completed. Elapsed time: 2.087 seconds. Waiting for 297.913 second(s) until the next one
Restart the persistence
This collector uses persistent storage to download events in an orderly fashion and avoid duplicates. In case you want to re-ingest historical data or recreate the persistence, you can restart the persistence of this collector by following these steps:
Edit the configuration file.
Change the value of the
start_time_in_utc parameter
to a different one.Save the changes.
Restart the collector.
The collector will detect this change and will restart the persistence using the parameters of the configuration file or the default configuration in case it has not been provided.
Troubleshooting ]
This collector has different security layers that detect both an invalid configuration and abnormal operation. This table will help you detect and resolve the most common errors.
Error type | Error ID | Error message | Cause | Solution |
---|---|---|---|---|
| 1 | Invalid | The configured | Update the |
| 2 | Invalid | The configured | Update the |
| 401 | An error occurred while trying to authenticate with the Azure API. Exception: {e} | The collector is unable to authenticate with the Azure API. | Check the credentials and ensure that the collector has the necessary permissions to access the Azure API. |
| 410 | An error occurred while trying to check if container | The collector was unable to locate the specified blob storage container name. | Ensure the container exists and the credentials have READ access to the container |
| 411 | An error occurred while trying to check if container | The collector was unable to access the specified blob storage container name. | Ensure the container exists and the credentials have READ access to the container |
| 412 | An error occurred while trying to create container | The collector was unable to create the container for the auto discover service and the user indicated to use Azure Blob Storage checkpointing. | Ensure the credentials have WRITE access to the container storage account. |
| 420 | An error occurred while trying to get consumer group | The collector was unable to access the specified consumer group name. | Ensure the consumer group exists and the credentials have READ access to the consumer group |
| 421 | An error occurred while trying to create consumer group | The collector was unable to create the consumer group for the auto discover service. | Ensure the credentials have WRITE access to the event hub namespace or use the |
Collector operations
This section is intended to explain how to proceed with specific operations of this collector.
Change log
Release | Released on | Release type | Details | Recommendations |
---|---|---|---|---|
| Jun 6, 2024 | NEW COLLECTOR | New collector |
|