Snowflake collector

Overview

Snowflake’s Data Cloud is powered by an advanced data platform provided as Software-as-a-Service (SaaS). Snowflake enables data storage, processing and analytic solutions that are faster, easier to use, and far more flexible than traditional offerings.

Configuration requirements

To run this collector, there are some configurations detailed below that you need to consider.

Configuration	Details
Username and password	You will need to have a username and password to set up this collector.
Account identifier	You will need to have the account identifier you can find in your Snowflake’s instance.
Warehouse	The selected warehouse to compute and generate data.

More information

Refer to the Vendor setup section to know more about these configurations.

Devo collector features

Feature	Details
Allow parallel downloading (`multipod`)	`not allowed`
Running environments	`collector server` `on-premise`
Populated Devo events	`table`
Flattening preprocessing	`no`

Data sources

Data source	Description	API endpoint	Collector service name	Minimum role needed	Devo table	Available from release
Access History	This topic provides concepts on the user access history in Snowflake.	`SELECT * FROM access_history`	`access_history`	`GOVERNANCEVIEWER`	`db.snowflake.history.access`	`v1.0.0`
Login History	This topic is used to query login attempts by Snowflake users within the last 365 days (1 year).	`SELECT * FROM login_history`	`login_history`	`SECURITYVIEWER`	`db.snowflake.history.login`	`v1.0.0`
Sessions	This topic provides information on the session, including information on the authentication method to Snowflake and the Snowflake login event.	`SELECT * FROM sessions`	`sessions`	`SECURITYVIEWER`	`db.snowflake.history.session`	`v1.0.0`
Custom SQL	This topic provides the chance to perform a custom SQL query	`{custom_query}`	`custom_service`	`Any role that can access the schema` https://docs.snowflake.com/en/sql-reference/snowflake-db-roles	`my.app.{custom_level_1}.{custom_level_2}`	`v1.0.0`

For more information on how the events are parsed, visit our page.

Vendor setup

There are some requirements to set up the collector. You need to get a username, password, and account identifier from Snowflake.

User your username and password that you used when creating a Snowflake account.
Get your account identifier in your instance. For example https://<account_identifier>.snowflakecomputing.com

Minimum configuration required for basic pulling

Although this collector supports advanced configuration, the fields required to retrieve data with basic configuration are defined below.

This minimum configuration refers exclusively to those specific parameters of this integration. There are more required parameters related to the generic behavior of the collector. Check setting sections for details.

Setting	Details
`username`	The username to authenticate to the service.
`password`	The password to authenticate to the service.
`account_identifier`	Account identifier is needed for connection to snowflake using the connector package available for python. Account identifier can be found in your instance URL. For example, https://<account_identifier>.snowflakecomputing.com/api. If you are not able to find the account identifier, then click on the bottom left account icon, and copy the URL as shown below.
`warehouse`	The selected warehouse to compute and generate data.

See the Accepted authentication methods section to verify what settings are required based on the desired authentication method.

Accepted authentication methods

Authentication method	Username	Password	Account identifier	Warehouse
Username/Password (including Account identifier and warehouse)	REQUIRED	REQUIRED	REQUIRED	REQUIRED

Run the collector

Once the data source is configured, you can send us the required information if you want us to host and manage the collector for you (Cloud collector), or deploy and host the collector in your own machine using a Docker image (On-premise collector).

Collector services detail

This section is intended to explain how to proceed with specific actions for services.

Access history

Access History in Snowflake refers to when the user query reads column data and when the SQL statement performs a data write operation, such as Insert, Update, and Delete, from the source data object to the target data object. The user access history can be found by querying the Account Usage Access_history view.

Each row in the Access_history view contains a single record per SQL statement. The record describes the columns the query accessed directly and indirectly (for example, the underlying tables that the data for the query comes from). These records facilitate regulatory compliance auditing and provide insights on popular and frequently accessed tables and columns since there is a direct link between the user (for example, query operator), the query, the table or view, the column, and the data.

Devo categorization and destination

All events of this service are ingested into the table db.snowflake.history.access

Verify data collection

Once the collector has been launched, it is important to check if the ingestion is performed in a proper way. To do so, go to the collector’s logs console.

This service has the following components:

Component	Description
Setup	The setup module is in charge of authenticating the service and managing the token expiration when needed.
Puller	The setup module is in charge of pulling the data in a organized way and delivering the events via SDK.

Setup output

A successful run has the following output messages for the setup module:

INFO	InputProcess::SnowflakeDataPullerSetup(example_collector,snowflake#abc123,access_history#predefined,all) -> Starting the execution of setup()
INFO	InputProcess::SnowflakeDataPullerSetup(example_collector,snowflake#abc123,access_history#predefined,all) -> Establishing Snowflake connection using provided username/password/account_identifier. 
INFO	InputProcess::SnowflakeDataPullerSetup(example_collector,snowflake#abc123,access_history#predefined,all) -> Snowflake Connector for Python Version: 2.9.0, Python Version: 3.9.12, Platform: Linux-5.14.0-1055-oem-x86_64-with-glibc2.31
INFO	InputProcess::SnowflakeDataPullerSetup(example_collector,snowflake#abc123,access_history#predefined,all) -> This connection is in OCSP Fail Open Mode. TLS Certificates would be checked for validity and revocation status. Any other Certificate Revocation related exceptions or OCSP Responder failures would be disregarded in favor of connectivity.
INFO 	InputProcess::SnowflakeDataPullerSetup(example_collector,snowflake#abc123,access_history#predefined,all) -> Setting use_openssl_only mode to False
INFO	InputProcess::SnowflakeDataPullerSetup(example_collector,snowflake#abc123,access_history#predefined,all) -> Setup for module <SnowflakeDataPuller> has been successfully executed

Puller output

A successful initial run has the following output messages for the puller module:

Note that the PrePull action is executed only one time before the first run of the Pull action.

INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) Starting the execution of pre_pull()
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> Reading persisted data
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> Data retrieved from the persistence: {'last_polled_timestamp': '2023-01-11T15:18:51.644000Z', 'retrieving_ts': '2022-12-29T12:54:33.224Z', 'historic_date_utc': '2023-01-10T00:00:00.001000Z', 'ids_with_same_timestamp': ['xxx'], '@persistence_version': 1}
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> Running the persistence upgrade steps
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> Running the persistence corrections steps
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> Running the persistence corrections steps
WARNING	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> Some changes have been detected and the persistence needs to be updated. Previous content: {'last_polled_timestamp': '2023-01-11T15:18:51.644000Z', 'retrieving_ts': '2022-12-29T12:54:33.224Z', 'historic_date_utc': '2023-01-10T00:00:00.001000Z', 'ids_with_same_timestamp': ['01a99496-0000-dffe-0000-cb3100010052'], '@persistence_version': 1}. New content: {'last_polled_timestamp': '2023-01-10T00:00:00.000000Z', 'retrieving_ts': '2022-12-29T12:54:33.224Z', 'historic_date_utc': '2023-01-10T00:00:00.000000Z', 'ids_with_same_timestamp': [], '@persistence_version': 1}
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> Updating the persistence
WARNING	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> Persistence has been updated successfully
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) Finalizing the execution of pre_pull()
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> Starting data collection every 600 seconds
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> Pull Started
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> query: [USE ROLE ACCOUNTADMIN]
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> query execution done
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> Number of results in first chunk: 1
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> query: [USE SNOWFLAKE]
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> query execution done
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> Number of results in first chunk: 1
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> query: [USE SCHEMA ACCOUNT_USAGE]
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> query execution done
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> Number of results in first chunk: 1
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> query: [ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED]
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> query execution done
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> Number of results in first chunk: 1
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> query: [USE WAREHOUSE COMPUTE_WH]
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> query execution done
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> Number of results in first chunk: 1
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> query: [alter session set timezone = 'UTC']
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> query execution done
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> Number of results in first chunk: 1
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> query: [SELECT * FROM access_history WHERE QUERY_START_TIME >= ? ORDER BY QUERY_START_TI...]
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> query execution done
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> Number of results in first chunk: 7
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> Removing the duplicate detections if present...
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> Statistics for this pull cycle (@devo_pulling_id=1673453142259):Number of requests made: 1; Number of events received: 7; Number of duplicated events filtered out: 0; Number of events generated and sent: 7; Average of events per second: 1.573.
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> query: [USE ROLE ACCOUNTADMIN]
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> query execution done
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> Number of results in first chunk: 1
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> query: [USE SNOWFLAKE]
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> query execution done
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> Number of results in first chunk: 1
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> query: [USE SCHEMA ACCOUNT_USAGE]
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> query execution done
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> Number of results in first chunk: 1
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> query: [ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED]
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> query execution done
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> Number of results in first chunk: 1
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> query: [USE WAREHOUSE COMPUTE_WH]
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> query execution done
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> Number of results in first chunk: 1
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> query: [alter session set timezone = 'UTC']
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> query execution done
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> Number of results in first chunk: 1
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> query: [SELECT * FROM access_history WHERE QUERY_START_TIME >= ? ORDER BY QUERY_START_TI...]
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> query execution done
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> Number of results in first chunk: 1
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> Removing the duplicate detections if present...
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> Statistics for this pull cycle (@devo_pulling_id=1673453142259):Number of requests made: 1; Number of events received: 1; Number of duplicated events filtered out: 1; Number of events generated and sent: 0; Average of events per second: 0.000.
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> The data is up to date!
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> Data collection completed. Elapsed time: 6.277 seconds. Waiting for 593.723 second(s) until the next one

After a successful collector’s execution (that is, no error logs found), you will see the following log message:

INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,access_history,predefined,all) -> Statistics for this pull cycle (@devo_pulling_id=1673453142259):Number of requests made: 1; Number of events received: 1; Number of duplicated events filtered out: 1; Number of events generated and sent: 0; Average of events per second: 0.000.

The value @devo_pulling_id is injected in each event to group all events ingested by the same pull action. You can use it to get the exact events downloaded in that Pull action in Devo’s search window.

Note that a Partial Statistics Report will be displayed after download a page when the pagination is required to pull all available events. Look for the report without the Partial reference.

(Partial) Statistics for this pull cycle (@devo_pulling_id=1656602793.044179) so far: Number of requests made: 2; Number of events received: 45; Number of duplicated events filtered out: 0; Number of events generated and sent: 40.

Restart the persistence

This collector uses persistent storage to download events in an orderly fashion and avoid duplicates. In case you want to re-ingest historical data or recreate the persistence, you can restart the persistence of this collector by following these steps:

Edit the configuration file.
Change the value of the historical_date_utc parameter to a different one.
Save the changes.
Restart the collector.

The collector will detect this change and will restart the persistence using the parameters of the configuration file or the default configuration in case it has not been provided.

Note that this action clears the persistence and cannot be recovered in any way. Resetting persistence could result in duplicate or lost events.

Login history

The Login_history family of table functions can be used to query login attempts by Snowflake users along various dimensions. This service covers Login_history, which returns login events within a specified time range.

Devo categorization and destination

All events of this service are ingested into the table db.snowflake.history.login

Verify data collection

Once the collector has been launched, it is important to check if the ingestion is performed in a proper way. To do so, go to the collector’s logs console.

This service has the following components:

Component	Description
Setup	The setup module is in charge of authenticating the service and managing the token expiration when needed.
Puller	The setup module is in charge of pulling the data in a organized way and delivering the events via SDK.

Setup output

A successful run has the following output messages for the setup module:

INFO	InputProcess::SnowflakeDataPullerSetup(example_collector,snowflake#abc123,login_history#predefined,all) -> Starting the execution of setup()
INFO	InputProcess::SnowflakeDataPullerSetup(example_collector,snowflake#abc123,login_history#predefined,all) -> Establishing Snowflake connection using provided username/password/account_identifier. 
INFO	InputProcess::SnowflakeDataPullerSetup(example_collector,snowflake#abc123,login_history#predefined,all) -> Snowflake Connector for Python Version: 2.9.0, Python Version: 3.9.12, Platform: Linux-5.14.0-1055-oem-x86_64-with-glibc2.31
INFO	InputProcess::SnowflakeDataPullerSetup(example_collector,snowflake#abc123,login_history#predefined,all) -> This connection is in OCSP Fail Open Mode. TLS Certificates would be checked for validity and revocation status. Any other Certificate Revocation related exceptions or OCSP Responder failures would be disregarded in favor of connectivity.
INFO 	InputProcess::SnowflakeDataPullerSetup(example_collector,snowflake#abc123,login_history#predefined,all) -> Setting use_openssl_only mode to False
INFO	InputProcess::SnowflakeDataPullerSetup(example_collector,snowflake#abc123,login_history#predefined,all) -> Setup for module <SnowflakeDataPuller> has been successfully executed

Puller output

A successful initial run has the following output messages for the puller module:

Note that the PrePull action is executed only one time before the first run of the Pull action.

INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) Starting the execution of pre_pull()
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> Reading persisted data
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> Data retrieved from the persistence: None
WARNING	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> Persistence will be overridden due to the 
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> Running the persistence upgrade steps
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> Running the persistence corrections steps
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> Running the persistence corrections steps
WARNING	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> Some changes have been detected and the persistence needs to be updated. Previous content: None. New content: {'last_polled_timestamp': '2023-01-10T00:00:00.000Z', 'retrieving_ts': '2023-01-11T16:15:03.919Z', 'historic_date_utc': '2023-01-10T00:00:00.000000Z', 'ids_with_same_timestamp': []}
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> Updating the persistence
WARNING	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> Persistence has been updated successfully
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) Finalizing the execution of pre_pull()
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> Starting data collection every 600 seconds
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> Pull Started
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> query: [USE ROLE ACCOUNTADMIN]
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> query execution done
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> Number of results in first chunk: 1
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> query: [USE SNOWFLAKE]
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> query execution done
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> Number of results in first chunk: 1
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> query: [USE SCHEMA ACCOUNT_USAGE]
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> query execution done
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> Number of results in first chunk: 1
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> query: [ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED]
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> query execution done
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> Number of results in first chunk: 1
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> query: [USE WAREHOUSE COMPUTE_WH]
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> query execution done
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> Number of results in first chunk: 1
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> query: [alter session set timezone = 'UTC']
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> query execution done
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> Number of results in first chunk: 1
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> query: [SELECT * FROM login_history WHERE EVENT_TIMESTAMP >= ? ORDER BY EVENT_TIMESTAMP ...]
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> query execution done
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> Number of results in first chunk: 5
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> Removing the duplicate detections if present...
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> Statistics for this pull cycle (@devo_pulling_id=1673453703919):Number of requests made: 1; Number of events received: 5; Number of duplicated events filtered out: 0; Number of events generated and sent: 5; Average of events per second: 0.682.
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> query: [USE ROLE ACCOUNTADMIN]
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> query execution done
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> Number of results in first chunk: 1
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> query: [USE SNOWFLAKE]
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> query execution done
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> Number of results in first chunk: 1
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> query: [USE SCHEMA ACCOUNT_USAGE]
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> query execution done
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> Number of results in first chunk: 1
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> query: [ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED]
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> query execution done
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> Number of results in first chunk: 1
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> query: [USE WAREHOUSE COMPUTE_WH]
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> query execution done
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> Number of results in first chunk: 1
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> query: [alter session set timezone = 'UTC']
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> query execution done
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> Number of results in first chunk: 1
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> query: [SELECT * FROM login_history WHERE EVENT_TIMESTAMP >= ? ORDER BY EVENT_TIMESTAMP ...]
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> query execution done
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> Number of results in first chunk: 1
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> Removing the duplicate detections if present...
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> Statistics for this pull cycle (@devo_pulling_id=1673453703919):Number of requests made: 1; Number of events received: 1; Number of duplicated events filtered out: 1; Number of events generated and sent: 0; Average of events per second: 0.000.
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> The data is up to date!
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> Data collection completed. Elapsed time: 9.370 seconds. Waiting for 590.630 second(s) until the next one

After a successful collector’s execution (that is, no error logs found), you will see the following log message:

INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,login_history,predefined,all) -> Statistics for this pull cycle (@devo_pulling_id=1673453703919):Number of requests made: 1; Number of events received: 1; Number of duplicated events filtered out: 1; Number of events generated and sent: 0; Average of events per second: 0.000.

The value @devo_pulling_id is injected in each event to group all events ingested by the same pull action. You can use it to get the exact events downloaded in that Pull action in Devo’s search window.

Note that a Partial Statistics Report will be displayed after download a page when the pagination is required to pull all available events. Look for the report without the Partial reference.

(Partial) Statistics for this pull cycle (@devo_pulling_id=1656602793.044179) so far: Number of requests made: 2; Number of events received: 45; Number of duplicated events filtered out: 0; Number of events generated and sent: 40.

Restart the persistence

This collector uses persistent storage to download events in an orderly fashion and avoid duplicates. In case you want to re-ingest historical data or recreate the persistence, you can restart the persistence of this collector by following these steps:

Edit the configuration file.
Change the value of the historical_date_utc parameter to a different one.
Save the changes.
Restart the collector.

The collector will detect this change and will restart the persistence using the parameters of the configuration file or the default configuration in case it has not been provided.

Note that this action clears the persistence and cannot be recovered in any way. Resetting persistence could result in duplicate or lost events.

Sessions

This service provides information on the session, including information on the authentication method to Snowflake and the Snowflake login event. Snowflake returns one row for each session created over the last year.

Devo categorization and destination

All events of this service are ingested into the table db.snowflake.history.session

Verify data collection

Once the collector has been launched, it is important to check if the ingestion is performed in a proper way. To do so, go to the collector’s logs console.

This service has the following components:

Component	Description
Setup	The setup module is in charge of authenticating the service and managing the token expiration when needed.
Puller	The setup module is in charge of pulling the data in a organized way and delivering the events via SDK.

Setup output

A successful run has the following output messages for the setup module:

INFO	InputProcess::SnowflakeDataPullerSetup(example_collector,snowflake#abc123,sessions#predefined,all) -> Starting the execution of setup()
INFO	InputProcess::SnowflakeDataPullerSetup(example_collector,snowflake#abc123,sessions#predefined,all) -> Establishing Snowflake connection using provided username/password/account_identifier. 
INFO	InputProcess::SnowflakeDataPullerSetup(example_collector,snowflake#abc123,sessions#predefined,all) -> Snowflake Connector for Python Version: 2.9.0, Python Version: 3.9.12, Platform: Linux-5.14.0-1055-oem-x86_64-with-glibc2.31
INFO	InputProcess::SnowflakeDataPullerSetup(example_collector,snowflake#abc123,sessions#predefined,all) -> This connection is in OCSP Fail Open Mode. TLS Certificates would be checked for validity and revocation status. Any other Certificate Revocation related exceptions or OCSP Responder failures would be disregarded in favor of connectivity.
INFO 	InputProcess::SnowflakeDataPullerSetup(example_collector,snowflake#abc123,sessions#predefined,all) -> Setting use_openssl_only mode to False
INFO	InputProcess::SnowflakeDataPullerSetup(example_collector,snowflake#abc123,sessions#predefined,all) -> Setup for module <SnowflakeDataPuller> has been successfully executed

Puller output

A successful initial run has the following output messages for the puller module:

Note that the PrePull action is executed only one time before the first run of the Pull action.

INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) Starting the execution of pre_pull()
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> Reading persisted data
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> Data retrieved from the persistence: {'last_polled_timestamp': '2022-12-29T09:10:15.879000Z', 'retrieving_ts': '2023-01-11T16:22:32.051Z', 'historic_date_utc': '2022-12-12T12:00:00.000000Z', 'ids_with_same_timestamp': [xxx], '@persistence_version': 1}
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> Running the persistence upgrade steps
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> Running the persistence corrections steps
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> Running the persistence corrections steps
WARNING	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> Some changes have been detected and the persistence needs to be updated. Previous content: {'last_polled_timestamp': '2022-12-29T09:10:15.879000Z', 'retrieving_ts': '2023-01-11T16:22:32.051Z', 'historic_date_utc': '2022-12-12T12:00:00.000000Z', 'ids_with_same_timestamp': [223411313901770], '@persistence_version': 1}. New content: {'last_polled_timestamp': '2023-01-10T00:00:00.000000Z', 'retrieving_ts': '2023-01-11T16:22:32.051Z', 'historic_date_utc': '2023-01-10T00:00:00.000000Z', 'ids_with_same_timestamp': [], '@persistence_version': 1}
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> Updating the persistence
WARNING	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> Persistence has been updated successfully
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) Finalizing the execution of pre_pull()
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> Starting data collection every 600 seconds
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> Pull Started
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> query: [USE ROLE ACCOUNTADMIN]
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> query execution done
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> Number of results in first chunk: 1
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> query: [USE SNOWFLAKE]
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> query execution done
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> Number of results in first chunk: 1
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> query: [USE SCHEMA ACCOUNT_USAGE]
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> query execution done
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> Number of results in first chunk: 1
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> query: [ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED]
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> query execution done
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> Number of results in first chunk: 1
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> query: [USE WAREHOUSE COMPUTE_WH]
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> query execution done
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> Number of results in first chunk: 1
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> query: [alter session set timezone = 'UTC']
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> query execution done
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> Number of results in first chunk: 1
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> query: [SELECT * FROM sessions WHERE CREATED_ON >= ? ORDER BY CREATED_ON ASC LIMIT ?]
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> query execution done
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> Number of results in first chunk: 5
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> Removing the duplicate detections if present...
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> Statistics for this pull cycle (@devo_pulling_id=1673454176963):Number of requests made: 1; Number of events received: 5; Number of duplicated events filtered out: 0; Number of events generated and sent: 5; Average of events per second: 3.070.
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> query: [USE ROLE ACCOUNTADMIN]
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> query execution done
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> Number of results in first chunk: 1
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> query: [USE SNOWFLAKE]
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> query execution done
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> Number of results in first chunk: 1
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> query: [USE SCHEMA ACCOUNT_USAGE]
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> query execution done
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> Number of results in first chunk: 1
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> query: [ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED]
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> query execution done
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> Number of results in first chunk: 1
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> query: [USE WAREHOUSE COMPUTE_WH]
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> query execution done
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> Number of results in first chunk: 1
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> query: [alter session set timezone = 'UTC']
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> query execution done
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> Number of results in first chunk: 1
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> query: [SELECT * FROM sessions WHERE CREATED_ON >= ? ORDER BY CREATED_ON ASC LIMIT ?]
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> query execution done
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> Number of results in first chunk: 1
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> Removing the duplicate detections if present...
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> Statistics for this pull cycle (@devo_pulling_id=1673454176963):Number of requests made: 1; Number of events received: 1; Number of duplicated events filtered out: 1; Number of events generated and sent: 0; Average of events per second: 0.000.
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> The data is up to date!
INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> Data collection completed. Elapsed time: 3.226 seconds. Waiting for 596.774 second(s) until the next one

After a successful collector’s execution (that is, no error logs found), you will see the following log message:

INFO	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> Statistics for this pull cycle (@devo_pulling_id=1673454176963):Number of requests made: 1; Number of events received: 1; Number of duplicated events filtered out: 1; Number of events generated and sent: 0; Average of events per second: 0.000.

The value @devo_pulling_id is injected in each event to group all events ingested by the same pull action. You can use it to get the exact events downloaded in that Pull action in Devo’s search window.

Note that a Partial Statistics Report will be displayed after download a page when the pagination is required to pull all available events. Look for the report without the Partial reference.

(Partial) Statistics for this pull cycle (@devo_pulling_id=1656602793.044179) so far: Number of requests made: 2; Number of events received: 45; Number of duplicated events filtered out: 0; Number of events generated and sent: 40.

Restart the persistence

This collector uses persistent storage to download events in an orderly fashion and avoid duplicates. In case you want to re-ingest historical data or recreate the persistence, you can restart the persistence of this collector by following these steps:

Edit the configuration file.
Change the value of the historical_date_utc parameter to a different one.
Save the changes.
Restart the collector.

The collector will detect this change and will restart the persistence using the parameters of the configuration file or the default configuration in case it has not been provided.

Note that this action clears the persistence and cannot be recovered in any way. Resetting persistence could result in duplicate or lost events.

Collector operations

This section is intended to explain how to proceed with the specific operations of this collector.

Verify collector operations

Initialization

The initialization module is in charge of setup and running the input (pulling logic) and output (delivering logic) services and validating the given configuration.

A successful run has the following output messages for the initializer module:

INFO	MainProcess::MainThread -> Added "/path/to/collector" directory to the Python path
INFO	MainProcess::MainThread -> Added "/path/to/collector/config_internal" directory to the Python path
INFO	MainProcess::MainThread -> Added "/path/to/collector/schemas" directory to the Python path
INFO	MainProcess::MainThread -> Production mode: False, execute only setup and exit: False, Python version: "3.9.12 (main, Mar 31 2022, 09:38:33) [GCC 9.4.0]", current dir: "/path/to/devo/mylab/collectors/devo-collector-snowflake", exists "config" dir: True, exists "config_internal" dir: True, exists "certs" dir: True, exists "schemas" dir: True, exists "credentials" dir: True
INFO	MainProcess::MainThread -> Loading configuration using the following files: {"full_config": "config.yaml", "job_config_loc": null, "collector_config_loc": null}
INFO	MainProcess::MainThread -> Using the default location for "job_config_loc" file: "/etc/devo/job/job_config.json"
INFO	MainProcess::MainThread -> "/etc/devo/job" does not exists
INFO	MainProcess::MainThread -> Using the default location for "collector_config_loc" file: "/etc/devo/collector/collector_config.json"
INFO	MainProcess::MainThread -> "/etc/devo/collector" does not exists
INFO	MainProcess::MainThread -> Results of validation of config files parameters: {"config": "/path/to/collector/config/config.yaml", "config_validated": True, "job_config_loc": "/etc/devo/job/job_config.json", "job_config_loc_default": True, "job_config_loc_validated": False, "collector_config_loc": "/etc/devo/collector/collector_config.json", "collector_config_loc_default": True, "collector_config_loc_validated": False}
INFO	MainProcess::MainThread -> Build time: "UNKNOWN", OS: "Linux-5.14.0-1055-oem-x86_64-with-glibc2.31", collector(name:version): "example_collector:1.0.0", owner: "integrations_factory@devo.com", started at: "2023-01-11T16:22:54.880093Z"
INFO	MainProcess::MainThread -> Initialized all object from "MainProcess" process
INFO	MainProcess::MainThread -> OutputProcess - Starting thread (executing_period=120s)
INFO	MainProcess::MainThread -> InputProcess - Starting thread (executing_period=120s)
INFO	OutputProcess::MainThread -> Process started
INFO	MainProcess::MainThread -> Started all object from "MainProcess" process
INFO	InputProcess::MainThread -> Process Started
INFO	InputProcess::MainThread -> There is not defined any submodule, using the default one with value "all"
INFO	MainProcess::CollectorThread -> global_status: {"main_process": {"process_id": 44536, "process_status": "sleeping", "thread_counter": 6, "thread_names": ["MainThread", "QueueFeederThread", "OutputControllerThread", "InputControllerThread", "CollectorThread"], "memory_info": {"rss": "118.93MiB", "vms": "845.70MiB", "shared": "46.71MiB", "text": "2.19MiB", "lib": "0.00B", "data": "346.25MiB", "dirty": "0.00B"}, "CollectorThread": {"running_flag": true, "status": "RUNNING(10)", "shutdown_timestamp": "None", "message_queues": {"standard": {"name": "standard_queue_multiprocessing", "max_size_in_messages": 10000, "max_size_in_mb": 1024, "max_wrap_size_in_items": 100, "current_size": 0, "put_lock": "<Lock(owner=None)>", "input_lock": "<multiprocessing.synchronize.Event object at 0x7fbe480e2d30>"}, "lookup": {"name": "lookup_queue_multiprocessing", "max_size_in_messages": 10000, "max_size_in_mb": 1024, "max_wrap_size_in_items": 100, "current_size": 0, "put_lock": "<Lock(owner=None)>", "input_lock": "<multiprocessing.synchronize.Event object at 0x7fbe48082c70>"}, "internal": {"name": "internal_queue_multiprocessing", "max_size_in_messages": 10000, "max_size_in_mb": 1024, "max_wrap_size_in_items": 100, "current_size": 2, "put_lock": "<Lock(owner=SomeOtherProcess)>", "input_lock": "<multiprocessing.synchronize.Event object at 0x7fbe4808a5b0>"}}, "controllers": {"InputControllerThread": {"running_flag": true, "status": "RUNNING(10)", "underlying_process": {"process_id": 44552, "process_status": "running", "num_threads": 2, "memory_info": {"rss": "77.35MiB", "vms": "717.70MiB", "shared": "5.14MiB", "text": "2.19MiB", "lib": "0.00B", "data": "345.99MiB", "dirty": "0.00B"}, "exit_code": null}}, "OutputControllerThread": {"running_flag": true, "status": "RUNNING(10)", "underlying_process": {"process_id": 44550, "process_status": "running", "num_threads": 1, "memory_info": {"rss": "76.80MiB", "vms": "653.89MiB", "shared": "4.54MiB", "text": "2.19MiB", "lib": "0.00B", "data": "346.05MiB", "dirty": "0.00B"}, "exit_code": null}}}}}}
INFO	InputProcess::MainThread -> SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) Starting the execution of init_variables()
INFO	InputProcess::MainThread -> Validating settings from collector definitions
INFO	InputProcess::MainThread -> Validating settings from user configuration
INFO	InputProcess::MainThread -> Populating collector_variables store
INFO	InputProcess::MainThread -> SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) Finalizing the execution of init_variables()
INFO	InputProcess::MainThread -> InputThread(snowflake,abc123) - Starting thread (execution_period=60s)
INFO	InputProcess::MainThread -> ServiceThread(snowflake,abc123,sessions,predefined) - Starting thread (execution_period=60s)
INFO	InputProcess::MainThread -> SnowflakeDataPullerSetup(example_collector,snowflake#abc123,sessions#predefined,all) -> Starting thread
INFO	InputProcess::MainThread -> SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) - Starting thread
WARNING	InputProcess::SnowflakeDataPuller(snowflake,abc123,sessions,predefined,all) -> Waiting until setup will be executed
INFO	InputProcess::SnowflakeDataPullerSetup(example_collector,snowflake#abc123,sessions#predefined,all) -> Starting the execution of setup()
INFO	InputProcess::SnowflakeDataPullerSetup(example_collector,snowflake#abc123,sessions#predefined,all) -> Establishing Snowflake connection using provided username/password/account_identifier. 
INFO	InputProcess::SnowflakeDataPullerSetup(example_collector,snowflake#abc123,sessions#predefined,all) -> Snowflake Connector for Python Version: 2.9.0, Python Version: 3.9.12, Platform: Linux-5.14.0-1055-oem-x86_64-with-glibc2.31
INFO	InputProcess::SnowflakeDataPullerSetup(example_collector,snowflake#abc123,sessions#predefined,all) -> This connection is in OCSP Fail Open Mode. TLS Certificates would be checked for validity and revocation status. Any other Certificate Revocation related exceptions or OCSP Responder failures would be disregarded in favor of connectivity.
INFO	InputProcess::SnowflakeDataPullerSetup(example_collector,snowflake#abc123,sessions#predefined,all) -> Setting use_openssl_only mode to False
INFO	OutputProcess::MainThread -> DevoSender(standard_senders,devo_sender_0) -> Starting thread
INFO	OutputProcess::MainThread -> DevoSenderManagerMonitor(standard_senders,devo_eu_1) -> Starting thread (every 300 seconds)
INFO	OutputProcess::MainThread -> DevoSenderManager(standard_senders,manager,devo_eu_1) -> Starting thread
INFO	OutputProcess::MainThread -> DevoSender(lookup_senders,devo_sender_0) -> Starting thread
INFO	OutputProcess::MainThread -> DevoSenderManagerMonitor(lookup_senders,devo_eu_1) -> Starting thread (every 300 seconds)
INFO	OutputProcess::MainThread -> DevoSenderManager(lookup_senders,manager,devo_eu_1) -> Starting thread

Events delivery and Devo ingestion

The event delivery module is in charge of receiving the events from the internal queues where all events are injected by the pullers and delivering them using the selected compatible delivery method.

A successful run has the following output messages for the initializer module:

INFO OutputProcess::SyslogSenderManagerMonitor(standard_senders,sidecar_0) -> Number of available senders: 1, sender manager internal queue size: 0
INFO OutputProcess::SyslogSenderManagerMonitor(standard_senders,sidecar_0) -> enqueued_elapsed_times_in_seconds_stats: {}
INFO OutputProcess::SyslogSenderManagerMonitor(standard_senders,sidecar_0) -> Sender: SyslogSender(standard_senders,syslog_sender_0), status: {"internal_queue_size": 0, "is_connection_open": True}
INFO OutputProcess::SyslogSenderManagerMonitor(standard_senders,sidecar_0) -> Standard - Total number of messages sent: 44, messages sent since "2022-06-28 10:39:22.511671+00:00": 44 (elapsed 0.007 seconds)
INFO OutputProcess::SyslogSenderManagerMonitor(internal_senders,sidecar_0) -> Number of available senders: 1, sender manager internal queue size: 0
INFO OutputProcess::SyslogSenderManagerMonitor(internal_senders,sidecar_0) -> enqueued_elapsed_times_in_seconds_stats: {}
INFO OutputProcess::SyslogSenderManagerMonitor(internal_senders,sidecar_0) -> Sender: SyslogSender(internal_senders,syslog_sender_0), status: {"internal_queue_size": 0, "is_connection_open": True}
INFO OutputProcess::SyslogSenderManagerMonitor(internal_senders,sidecar_0) -> Internal - Total number of messages sent: 1, messages sent since "2022-06-28 10:39:22.516313+00:00": 1 (elapsed 0.019 seconds)

By default, these information traces will be displayed every 10 minutes.

Sender services

The Integrations Factory Collector SDK has 3 different senders services depending on the event type to delivery (internal, standard, and lookup). This collector uses the following Sender Services:

Sender services	Description
`internal_senders`	In charge of delivering internal metrics to Devo such as logging traces or metrics.
`standard_senders`	In charge of delivering pulled events to Devo.

Sender statistics

Each service displays its own performance statistics that allow checking how many events have been delivered to Devo by type:

Logging trace

Description

Number of available senders: 1

Displays the number of concurrent senders available for the given Sender Service.

sender manager internal queue size: 0

Displays the items available in the internal sender queue.

This value helps detect bottlenecks and needs to increase the performance of data delivery to Devo. This last can be made by increasing the concurrent senders.

Standard - Total number of messages sent: 57, messages sent since "2023-01-10 16:09:16.116750+00:00": 0 (elapsed 0.000 seconds

Displayes the number of events from the last time and following the given example, the following conclusions can be obtained:

44 events were sent to Devo since the collector started.
The last checkpoint timestamp was 2022-06-28 10:39:22.511671+00:00.
21 events where sent to Devo between the last UTC checkpoint and now.
Those 21 events required 0.007 seconds to be delivered.

By default these traces will be shown every 10 minutes.

Check memory usage

To check the memory usage of this collector, look for the following log records in the collector which are displayed every 5 minutes by default, always after running the memory-free process.

The used memory is displayed by running processes and the sum of both values will give the total used memory for the collector.
The global pressure of the available memory is displayed in the global value.
All metrics (Global, RSS, VMS) include the value before freeing and after previous -> after freeing memory

INFO InputProcess::MainThread -> [GC] global: 20.4% -> 20.4%, process: RSS(34.50MiB -> 34.08MiB), VMS(410.52MiB -> 410.02MiB)
INFO OutputProcess::MainThread -> [GC] global: 20.4% -> 20.4%, process: RSS(28.41MiB -> 28.41MiB), VMS(705.28MiB -> 705.28MiB)

Differences between RSS and VMS memory usage:

RSS is the Resident Set Size, which is the actual physical memory the process is using
VMS is the Virtual Memory Size which is the virtual memory that process is using

Enable/disable the logging debug mode

Sometimes it is necessary to activate the debug mode of the collector's logging. This debug mode increases the verbosity of the log and allows you to print execution traces that are very helpful in resolving incidents or detecting bottlenecks in heavy download processes.

To enable this option you just need to edit the configuration file and change the debug_status parameter from false to true and restart the collector.
To disable this option, you just need to update the configuration file and change the debug_status parameter from true to false and restart the collector.

For more information, visit the configuration and parameterization section corresponding to the chosen deployment mode.

Troubleshooting

This collector has different security layers that detect both an invalid configuration and abnormal operation. This table will help you detect and resolve the most common errors.

Error type	Error ID	Error Message	Cause	Solution
InitVariablesError	0	`The internal config did not pass the format validation. Contact with Support.`	There is an error in the internal configuration.	Contact with Devo Support.
InitVariablesError	1	`The user config did not pass the format validation. Check error traces for details and visit our documentation.`	There is an error in the user configuration.	Read the error message carefully and follow the documentation.
InitVariablesError	11	`Required setting, historic_date_utc not of expected type: str`	`historic_date_utc` is not of type `str`.	Set `historic_date_utc` as string following the next format: `2023-01-01T00:00:00.000Z`
InitVariablesError	12	`Invalid value is provided for the historic_date. Provide the date in %Y-%m-%d %H:%M:%S format. Error message: {error_message}`	`historic_date_utc` doesn't match the expected format.	Set `historic_date_utc` as string following the next format: `2023-01-01T00:00:00.000Z`
InitVariablesError	13	`Time format for historic date must be %Y-%m-%dT%H:%M:%S.%fZ. e.g. 2022-02-15T14:32:33.043Z`	`historic_date_utc` doesn't match the expected format.	Set `historic_date_utc` as string following the next format: `2023-01-01T00:00:00.000Z`
InitVariablesError	14	`historic datetime cannot be greater than the present UTC time`	`historic_date_utc` represents a future date, and that’s not allowed.	Set `historic_date_utc` representing a past date.
SetupError	100	`The remote data is not pullable with the given credentials. Check the error traces for details.`	There is a problem with the credentials.	Check the error traces to find the possible cause. It could be caused because the credentials are not valid anymore, the token has expired, lack of permissions, etc.
PullError	300	`Invalid username/password, please re-enter username and password.`	Credentials stopped working during the pulling.	Set valid credentials.
PullError	301	`Connection is closed.`	Connection to Snowflake’s cloud database has been closed for a reason during the pulling.	Re-run the collector and check if there is any other error. If there are other errors, follow the error messages. If the error persists, contact Devo Support.
PullError	302	`Unexpected error occured while connecting to Snowflake: {error_message}`	There has been an unexpected behavior during the pulling and lost connection to Snowflake’s cloud database.	Re-run the collector and check if there is any other error. If there are other errors, follow the error messages. If the error persists, contact Devo Support.
PullError	303	`Invalid username/password, please re-enter username and password.`	Credentials failed at connection stage.	Set valid credentials.
PullError	304	`Connection is closed.`	Connection to Snowflake’s cloud database has been closed for a reason at connection stage.	Re-run the collector and check if there is any other error. If there are other errors, follow the error messages. If the error persists, contact Devo Support.
PullError	305	`Unexpected error occured while connecting to Snowflake: {error_message}`	There has been an unexpected behavior at connection stage and lost connection to Snowflake’s cloud database.	Re-run the collector and check if there is any other error. If there are other errors, follow the error messages. If the error persists, contact Devo Support.
PullError	306	`Invalid username/password, please re-enter username and password.`	Credentials failed for Custom SQL service.	Set valid credentials.
PullError	307	`Connection is closed.`	Connection to Snowflake’s cloud database has been closed for a reason for Custom SQL service.	Re-run the collector and check if there is any other error. If there are other errors, follow the error messages. If the error persists, contact Devo Support.
PullError	308	`Unexpected error occured while connecting to Snowflake: {error_message}`	There has been an unexpected behavior for Custom SQL service and lost connection to Snowflake’s cloud database.	Re-run the collector and check if there is any other error. If there are other errors, follow the error messages. If the error persists, contact Devo Support.

Change log

Release	Released on	Release type	Details	Recommendations
`v1.4.1`	21 Aug 2024	IMPROVEMENTS BUG FIXING	Improvements: Updated DCSDK to 1.12.4 Bug fixes: Fixed state file being copied and storing whole events. It now stores md5 hash of the event.	`Recommended version`
`v1.4.0`	12 Aug 2024	IMPROVEMENTS	Improvements Updated DCSDK to 1.12.2 Updated Docker Image to 1.3.0 Added snowflake account id decorator to each log for easier sorting	`Update`
`v1.3.1`	14 Jun 2024	BUG FIXING	Bug fixing: Fixed Internal Dependency	`Update`
`v1.3.0`	13 Jun 2024	BUG FIXING IMPROVEMENTS	Bug fixing: Removed limit from query causing only one record to return Updated Python Dependencies to remove a scaling error Fixed the puller constantly restarting and never marked completed. Improvements: Added connection closing Added the ability to use a lower role than ACCOUNTADMIN	`Update`
`v1.2.0`	10 May 2024	BUG FIXING IMPROVEMENTS	Bug fixing: Fixed the custom sql bug in persistance by adding json schema validation and changed the parsing logic. Improvements: Updated DCSDK from 1.10.2 to 1.11.1	`Update`
`v1.1.0`	20 Dec 2023	BUG FIXING IMPROVEMENTS	Bug fixing: Fixed the custom sql bug in persistance by adding json schema validation. Improvements: Updated DCSDK from 1.6.1 to 1.10.2	`Update`
`v1.0.1`	07 Feb 2023	NEW FEATURE BUG FIXING	Improvements: Upgraded DCSDK to v1.6.1 A new key called `@devo_environment` will be added to the event(only for JSON events) Obfuscation service can be now configured from user config and module definition Obfuscation service can now obfuscate items inside arrays Fixed bugs: No need to use privileged roles anymore All the DatabaseError-s are handled now Fix a coupe data types to avoid errors Statistics are shown correctly now Rename some error messages and numbers	`Update`
`v1.0.0`	29 Dec 2022	NEW FEATURE	New features: Access History: Returns when the user query reads column data and when the SQL statement performs a data write operation. Login History: Returns login events (successful or not) within a specified time range. Session History: Returns data about successful authentications, including the username, method used, application used, etc. Custom SQL: This service allows to perform custom queries.	`-`