/
Spidersilk collector

Spidersilk collector

Configuration requirements

To run this collector, there are some configurations detailed below that you need to consider.

Configuration

Details

Configuration

Details

SpiderSilk Account

You must have an active SpiderSilk account.

API Key

Required for authentication.

Refer to the Vendor setup section to know more about these configurations.

Overview

The Spidersilk Collector is a specialized data collection tool designed to gather intelligence on threats, assets, and dark web activity. It scans the internet, aggregating critical security insights from various sources to enhance cybersecurity operations.

By leveraging advanced crawling techniques, the collector provides valuable monitoring of potential threats, compromised assets, and underground discussions in the dark web. This enables security teams to detect and mitigate risks before they escalate.

Devo collector features

Feature

Details

Feature

Details

Allow parallel downloading (multipod)

  • not allowed

Running environments

  • collector server

  • on-premise

Populated Devo events

  • table

Flattening preprocessing

  • no

Data sources

This collector extracts data from three services available from Spidersilk:

  • Assets: Retrieves information about discovered assets.

  • Threats: Collects threat intelligence reports.

  • Darkweb: Gathers intelligence from dark web sources.

Data Source

Description

API Endpoint

Collector Service Name

Devo Table

Available from release

Data Source

Description

API Endpoint

Collector Service Name

Devo Table

Available from release

Assets

Retrieves information about discovered assets

/client/v1/assets

/client/v1/assets/{id}

assets

asm.api.spidersilk.asset

v1.0.0

Threats

Collects threat intelligence reports

/client/v1/threats

/client/v1/threats/{uuid}

threats

asm.api.spidersilk.threat

v1.0.0

Darkweb

Gathers intelligence from dark web sources

/client/v1/darkweb

/client/v1/darkweb/{uuid}

darkweb

asm.api.spidersilk.darkweb

v1.0.0

For more information on how the events are parsed, visit our page

Flattening preprocessing

No flattening preprocessing is applied to the source events before sending the data to Devo. The data is ingested in its original structure as retrieved from the Spidersilk API.

Vendor setup

To configure the Spidersilk Collector, follow these steps to obtain the necessary API key and ensure a secure setup.

Obtaining the API Key

  1. Register on Spidersilk: Visit the Spidersilk website and create an account by providing the required information.

  2. Access the Dashboard: After registration, log in to your account to access the dashboard.

  3. Generate an API Key: Within the dashboard, navigate to the API section. Locate the option to generate a new API Key and follow the provided instructions.

  4. Secure Your API Key: Once generated, store your API key securely. Do not share it publicly or include it in publicly accessible code repositories.

Security Recommendations

  • Treat Your API Key Like a Password: The security of your application is directly linked to the protection of your API Key. Avoid sharing it with unauthorized individuals or transmitting it via insecure channels.

  • Regularly Rotate Your API Key: Periodically regenerate your API key to minimize potential security risks.

  • Restrict Permissions: Assign only the necessary permissions to your API key, adhering to the principle of least privilege.

Connectivity Requirements

  • Network Configuration: Ensure that your system allows outbound traffic over SSL/TLS on TCP port 443, as the Spidersilk API operates over HTTPS.

  • Avoid IP-Based Restrictions: It's advisable not to enforce firewall rules based on specific IP addresses for outbound traffic to Spidersilk services, as IP addresses may change to maintain service availability.

By following these steps and recommendations, you can securely set up and utilize the Spidersilk Collector.

Minimum configuration required for basic pulling

Although this collector supports advanced configuration, the fields required to retrieve data with basic configuration are defined below.

This minimum configuration refers exclusively to those specific parameters of this integration. There are more required parameters related to the generic behavior of the collector. Check setting sections for details.

Setting

Details

Setting

Details

api_key

The API key is required to authenticate requests.

See the Accepted authentication methods section to verify what settings are required based on the desired authentication method.

Accepted authentication methods

The Spidersilk API supports authentication via API key, which must be included in the request headers.

Authentication Method

Details

Authentication Method

Details

api_key

The API key is required to authenticate requests.

This is the only authentication method supported for accessing Assets, Threats, and Darkweb data sources.

Run the collector

Once the data source is configured, you can either send us the required information if you want us to host and manage the collector for you (Cloud collector), or deploy and host the collector in your own machine using a Docker image (On-premise collector).

Collector services detail

This section is intended to explain how to proceed with specific actions for services.

Threats service

This is a snapshot-based service, meaning that every pull captures all events as they exist at that moment in the system—like taking a picture of it. Because of this, duplicate elements will appear. If updates are infrequent, consider increasing the request_period_in_seconds value to reduce unnecessary data collection.

Once the collector has been launched, it is important to check if the ingestion is performed in a proper way. To do so, go to the collector’s logs console.

This service has the following components:

Component

Description

Component

Description

Setup

The setup module is in charge of authenticating the service and managing the token expiration when needed.

Puller

The setup module is in charge of pulling the data in a organized way and delivering the events via SDK.

Setup output

A successful run has the following output messages for the setup module:

InputProcess::MainThread -> SpidersilkPullerSetup(spidersilk#1234566,threats#predefined) -> Starting thread InputProcess::SpidersilkPullerSetup(spidersilk#1234566,threats#predefined) -> Check for <SpidersilkPuller> module has started InputProcess::MainThread -> SpidersilkPuller(spidersilk#1234566,threats#predefined) - Starting thread InputProcess::SpidersilkPuller(spidersilk#1234566,threats#predefined) -> Waiting until setup will be executed InputProcess::MainThread -> InputMetricsThread -> Started thread for updating metrics values (update_period=10.0) InputProcess::SpidersilkPullerSetup(spidersilk#1234566,threats#predefined) -> The token/header/authentication has not been created yet

Puller output

A successful initial run has the following output messages for the puller module:

Note that the PrePull action is executed only one time before the first run of the Pull action.

InputProcess::SpidersilkPuller(spidersilk#1234566,threats#predefined) -> SpidersilkPuller(spidersilk#1234566,threats#predefined) Starting the execution of pre_pull() InputProcess::SpidersilkPuller(spidersilk#1234566,threats#predefined) -> Reading persisted data InputProcess::SpidersilkPuller(spidersilk#1234566,threats#predefined) -> Data retrieved from the persistence: None InputProcess::SpidersilkPuller(spidersilk#1234566,threats#predefined) -> Persistence will be overridden due to the retrieved state is empty InputProcess::SpidersilkPuller(spidersilk#1234566,threats#predefined) -> Running the persistence upgrade steps InputProcess::SpidersilkPuller(spidersilk#1234566,threats#predefined) -> Running the persistence corrections steps InputProcess::SpidersilkPuller(spidersilk#1234566,threats#predefined) -> Running the persistence corrections steps InputProcess::SpidersilkPuller(spidersilk#1234566,threats#predefined) -> The persistence version value is <ZERO>, so no persistence will be allocated InputProcess::SpidersilkPuller(spidersilk#1234566,threats#predefined) -> SpidersilkPuller(spidersilk#1234566,threats#predefined) Finalizing the execution of pre_pull() InputProcess::SpidersilkPuller(spidersilk#1234566,threats#predefined) -> Starting data collection every 60 seconds InputProcess::SpidersilkPuller(spidersilk#1234566,threats#predefined) -> Pull Started InputProcess::SpidersilkPuller(spidersilk#1234566,threats#predefined) -> (Partial) Statistics for this pull cycle (@devo_pulling_id=1742379164420):Number of requests made: 1; Number of events received: 100; Number of duplicated events filtered out: 0; Number of events generated and sent: 0; Average of events per second: 0.000. InputProcess::SpidersilkPuller(spidersilk#1234566,threats#predefined) -> (Partial) Statistics for this pull cycle (@devo_pulling_id=1742379164420):Number of requests made: 2; Number of events received: 200; Number of duplicated events filtered out: 0; Number of events generated and sent: 0; Average of events per second: 0.000. InputProcess::SpidersilkPuller(spidersilk#1234566,threats#predefined) -> (Partial) Statistics for this pull cycle (@devo_pulling_id=1742379164420):Number of requests made: 3; Number of events received: 300; Number of duplicated events filtered out: 0; Number of events generated and sent: 0; Average of events per second: 0.000. InputProcess::SpidersilkPuller(spidersilk#1234566,threats#predefined) -> (Partial) Statistics for this pull cycle (@devo_pulling_id=1742379164420):Number of requests made: 4; Number of events received: 400; Number of duplicated events filtered out: 0; Number of events generated and sent: 0; Average of events per second: 0.000. InputProcess::SpidersilkPuller(spidersilk#1234566,threats#predefined) -> (Partial) Statistics for this pull cycle (@devo_pulling_id=1742379164420):Number of requests made: 5; Number of events received: 500; Number of duplicated events filtered out: 0; Number of events generated and sent: 0; Average of events per second: 0.000. InputProcess::SpidersilkPuller(spidersilk#1234566,threats#predefined) -> (Partial) Statistics for this pull cycle (@devo_pulling_id=1742379164420):Number of requests made: 6; Number of events received: 600; Number of duplicated events filtered out: 0; Number of events generated and sent: 0; Average of events per second: 0.000. InputProcess::SpidersilkPuller(spidersilk#1234566,threats#predefined) -> (Partial) Statistics for this pull cycle (@devo_pulling_id=1742379164420):Number of requests made: 7; Number of events received: 700; Number of duplicated events filtered out: 0; Number of events generated and sent: 0; Average of events per second: 0.000. InputProcess::SpidersilkPuller(spidersilk#1234566,threats#predefined) -> (Partial) Statistics for this pull cycle (@devo_pulling_id=1742379164420):Number of requests made: 8; Number of events received: 800; Number of duplicated events filtered out: 0; Number of events generated and sent: 0; Average of events per second: 0.000. InputProcess::SpidersilkPuller(spidersilk#1234566,threats#predefined) -> (Partial) Statistics for this pull cycle (@devo_pulling_id=1742379164420):Number of requests made: 9; Number of events received: 894; Number of duplicated events filtered out: 0; Number of events generated and sent: 894; Average of events per second: 16.965. InputProcess::SpidersilkPuller(spidersilk#1234566,threats#predefined) -> Statistics for this pull cycle (@devo_pulling_id=1742379164420):Number of requests made: 9; Number of events received: 894; Number of duplicated events filtered out: 0; Number of events generated and sent: 894; Average of events per second: 16.965. InputProcess::SpidersilkPuller(spidersilk#1234566,threats#predefined) -> The data is up to date!

After a successful collector’s execution (that is, no error logs found), you will see the following log message:

InputProcess::SpidersilkPuller(spidersilk#1234566,threats#predefined) -> Statistics for this pull cycle (@devo_pulling_id=1742379164420):Number of requests made: 9; Number of events received: 894; Number of duplicated events filtered out: 0; Number of events generated and sent: 894; Average of events per second: 16.965.

The value @devo_pulling_id is injected in each event to group all events ingested by the same pull action. You can use it to get the exact events downloaded in that Pull action in Devo’s search window.

This collector does not implement persistence because it has snapshot services.

This collector has different security layers that detect both an invalid configuration and abnormal operation. This table will help you detect and resolve the most common errors.

Common Errors

Error Type

Error Id

Error Message

Cause

Solution

Error Type

Error Id

Error Message

Cause

Solution

InitVariablesError

10

Missing base_url variable in module_globals

The base_url parameter is not set in the configuration file.

Ensure base_url is correctly defined in module_globals.

11

base_url must be a string

The base_url parameter is not in the correct format.

Verify that base_url is a string in module_globals.

12

Missing sort_field variable in module_globals

The sort_field parameter is missing from the configuration.

Add sort_field to module_globals in the configuration file.

13

sort_field must be a string

The sort_field parameter is not in the correct format.

Ensure sort_field is a string in module_globals.

ApiError

400

Bad request: The server could not understand the request

Invalid request format or missing required parameters.

Check the API documentation and adjust the request.

401

Unauthorized: Authentication failed or missing

The API key is invalid or not provided.

Verify the API key and ensure it is included in the request.

429

Too Many Requests: Rate limit exceeded

Too many requests were sent in a short time frame.

Implement retry logic with backoff timing.

500

Server Error: An error occurred on the server

The API service is experiencing internal issues.

Retry the request after some time or contact support.

If you continue to experience issues, consult the official documentation or contact support for further assistance.

Assets service

This is a snapshot-based service, meaning that every pull captures all events as they exist at that moment in the system—like taking a picture of it. Because of this, duplicate elements will appear. If updates are infrequent, consider increasing the request_period_in_seconds value to reduce unnecessary data collection.

Once the collector has been launched, it is important to check if the ingestion is performed in a proper way. To do so, go to the collector’s logs console.

This service has the following components:

Component

Description

Component

Description

Setup

The setup module is in charge of authenticating the service and managing the token expiration when needed.

Puller

The setup module is in charge of pulling the data in a organized way and delivering the events via SDK.

Setup output

A successful run has the following output messages for the setup module:

InputProcess::MainThread -> SpidersilkPullerSetup(spidersilk#1234566,assets#predefined) -> Starting thread InputProcess::SpidersilkPullerSetup(spidersilk#1234566,assets#predefined) -> Check for <SpidersilkPuller> module has started InputProcess::MainThread -> SpidersilkPuller(spidersilk#1234566,assets#predefined) - Starting thread InputProcess::SpidersilkPuller(spidersilk#1234566,assets#predefined) -> Waiting until setup will be executed InputProcess::MainThread -> InputMetricsThread -> Started thread for updating metrics values (update_period=10.0) InputProcess::SpidersilkPullerSetup(spidersilk#1234566,assets#predefined) -> The token/header/authentication has not been created yet

Puller output

A successful initial run has the following output messages for the puller module:

Note that the PrePull action is executed only one time before the first run of the Pull action.

INFO InputProcess::SpidersilkPuller(spidersilk#1234566,assets#predefined) -> SpidersilkPuller(spidersilk#1234566,assets#predefined) Starting the execution of pre_pull() INFO InputProcess::SpidersilkPuller(spidersilk#1234566,assets#predefined) -> Reading persisted data INFO InputProcess::SpidersilkPuller(spidersilk#1234566,assets#predefined) -> Data retrieved from the persistence: None WARNING InputProcess::SpidersilkPuller(spidersilk#1234566,assets#predefined) -> Persistence will be overridden due to the retrieved state is empty INFO InputProcess::SpidersilkPuller(spidersilk#1234566,assets#predefined) -> Running the persistence upgrade steps INFO InputProcess::SpidersilkPuller(spidersilk#1234566,assets#predefined) -> Running the persistence corrections steps INFO InputProcess::SpidersilkPuller(spidersilk#1234566,assets#predefined) -> Running the persistence corrections steps WARNING InputProcess::SpidersilkPuller(spidersilk#1234566,assets#predefined) -> The persistence version value is <ZERO>, so no persistence will be allocated INFO InputProcess::SpidersilkPuller(spidersilk#1234566,assets#predefined) -> SpidersilkPuller(spidersilk#1234566,assets#predefined) Finalizing the execution of pre_pull() INFO InputProcess::SpidersilkPuller(spidersilk#1234566,assets#predefined) -> Starting data collection every 60 seconds INFO InputProcess::SpidersilkPuller(spidersilk#1234566,assets#predefined) -> Pull Started INFO InputProcess::SpidersilkPuller(spidersilk#1234566,assets#predefined) -> (Partial) Statistics for this pull cycle (@devo_pulling_id=1742379852241):Number of requests made: 1; Number of events received: 100; Number of duplicated events filtered out: 0; Number of events generated and sent: 0; Average of events per second: 0.000. INFO InputProcess::SpidersilkPuller(spidersilk#1234566,assets#predefined) -> (Partial) Statistics for this pull cycle (@devo_pulling_id=1742379852241):Number of requests made: 2; Number of events received: 200; Number of duplicated events filtered out: 0; Number of events generated and sent: 0; Average of events per second: 0.000. INFO InputProcess::SpidersilkPuller(spidersilk#1234566,assets#predefined) -> (Partial) Statistics for this pull cycle (@devo_pulling_id=1742379852241):Number of requests made: 3; Number of events received: 300; Number of duplicated events filtered out: 0; Number of events generated and sent: 0; Average of events per second: 0.000. INFO InputProcess::SpidersilkPuller(spidersilk#1234566,assets#predefined) -> (Partial) Statistics for this pull cycle (@devo_pulling_id=1742379852241):Number of requests made: 4; Number of events received: 400; Number of duplicated events filtered out: 0; Number of events generated and sent: 0; Average of events per second: 0.000. INFO InputProcess::SpidersilkPuller(spidersilk#1234566,assets#predefined) -> (Partial) Statistics for this pull cycle (@devo_pulling_id=1742379852241):Number of requests made: 5; Number of events received: 500; Number of duplicated events filtered out: 0; Number of events generated and sent: 0; Average of events per second: 0.000. INFO InputProcess::SpidersilkPuller(spidersilk#1234566,assets#predefined) -> (Partial) Statistics for this pull cycle (@devo_pulling_id=1742379852241):Number of requests made: 6; Number of events received: 600; Number of duplicated events filtered out: 0; Number of events generated and sent: 0; Average of events per second: 0.000. INFO InputProcess::SpidersilkPuller(spidersilk#1234566,assets#predefined) -> (Partial) Statistics for this pull cycle (@devo_pulling_id=1742379852241):Number of requests made: 7; Number of events received: 700; Number of duplicated events filtered out: 0; Number of events generated and sent: 0; Average of events per second: 0.000. INFO InputProcess::SpidersilkPuller(spidersilk#1234566,assets#predefined) -> (Partial) Statistics for this pull cycle (@devo_pulling_id=1742379852241):Number of requests made: 8; Number of events received: 800; Number of duplicated events filtered out: 0; Number of events generated and sent: 0; Average of events per second: 0.000. INFO InputProcess::SpidersilkPuller(spidersilk#1234566,assets#predefined) -> (Partial) Statistics for this pull cycle (@devo_pulling_id=1742379852241):Number of requests made: 9; Number of events received: 900; Number of duplicated events filtered out: 0; Number of events generated and sent: 0; Average of events per second: 0.000. INFO InputProcess::SpidersilkPuller(spidersilk#1234566,assets#predefined) -> (Partial) Statistics for this pull cycle (@devo_pulling_id=1742379852241):Number of requests made: 10; Number of events received: 950; Number of duplicated events filtered out: 0; Number of events generated and sent: 950; Average of events per second: 16.874. INFO InputProcess::SpidersilkPuller(spidersilk#1234566,assets#predefined) -> Statistics for this pull cycle (@devo_pulling_id=1742379852241):Number of requests made: 10; Number of events received: 950; Number of duplicated events filtered out: 0; Number of events generated and sent: 950; Average of events per second: 16.874. INFO InputProcess::SpidersilkPuller(spidersilk#1234566,assets#predefined) -> The data is up to date!

After a successful collector’s execution (that is, no error logs found), you will see the following log message:

INFO InputProcess::SpidersilkPuller(spidersilk#1234566,assets#predefined) -> Statistics for this pull cycle (@devo_pulling_id=1742379852241):Number of requests made: 10; Number of events received: 950; Number of duplicated events filtered out: 0; Number of events generated and sent: 950; Average of events per second: 16.874.

The value @devo_pulling_id is injected in each event to group all events ingested by the same pull action. You can use it to get the exact events downloaded in that Pull action in Devo’s search window.

This collector does not implement persistence because it has snapshot services.

This collector has different security layers that detect both an invalid configuration and abnormal operation. This table will help you detect and resolve the most common errors.

Common Errors

Error Type

Error Id

Error Message

Cause

Solution

Error Type

Error Id

Error Message

Cause

Solution

InitVariablesError

10

Missing base_url variable in module_globals

The base_url parameter is not set in the configuration file.

Ensure base_url is correctly defined in module_globals.

11

base_url must be a string

The base_url parameter is not in the correct format.

Verify that base_url is a string in module_globals.

12

Missing sort_field variable in module_globals

The sort_field parameter is missing from the configuration.

Add sort_field to module_globals in the configuration file.

13

sort_field must be a string

The sort_field parameter is not in the correct format.

Ensure sort_field is a string in module_globals.

ApiError

400

Bad request: The server could not understand the request

Invalid request format or missing required parameters.

Check the API documentation and adjust the request.

401

Unauthorized: Authentication failed or missing

The API key is invalid or not provided.

Verify the API key and ensure it is included in the request.

429

Too Many Requests: Rate limit exceeded

Too many requests were sent in a short time frame.

Implement retry logic with backoff timing.

500

Server Error: An error occurred on the server

The API service is experiencing internal issues.

Retry the request after some time or contact support.

If you continue to experience issues, consult the official documentation or contact support for further assistance.

Darkweb service

This is a snapshot-based service, meaning that every pull captures all events as they exist at that moment in the system—like taking a picture of it. Because of this, duplicate elements will appear. If updates are infrequent, consider increasing the request_period_in_seconds value to reduce unnecessary data collection.

Once the collector has been launched, it is important to check if the ingestion is performed in a proper way. To do so, go to the collector’s logs console.

This service has the following components:

Component

Description

Component

Description

Setup

The setup module is in charge of authenticating the service and managing the token expiration when needed.

Puller

The setup module is in charge of pulling the data in a organized way and delivering the events via SDK.

Setup output

A successful run has the following output messages for the setup module:

INFO InputProcess::MainThread -> ServiceThread(spidersilk,1234566,darkweb,predefined) - Starting thread (execution_period=60s) INFO InputProcess::MainThread -> SpidersilkPullerSetup(spidersilk#1234566,darkweb#predefined) -> Starting thread INFO InputProcess::SpidersilkPullerSetup(spidersilk#1234566,darkweb#predefined) -> Check for <SpidersilkPuller> module has started INFO InputProcess::MainThread -> SpidersilkPuller(spidersilk#1234566,darkweb#predefined) - Starting thread INFO InputProcess::SpidersilkPuller(spidersilk#1234566,darkweb#predefined) -> Waiting until setup will be executed INFO InputProcess::MainThread -> InputMetricsThread -> Started thread for updating metrics values (update_period=10.0) WARNING InputProcess::SpidersilkPullerSetup(spidersilk#1234566,darkweb#predefined) -> The token/header/authentication has not been created yet

Puller output

A successful initial run has the following output messages for the puller module:

Note that the PrePull action is executed only one time before the first run of the Pull action.

INFO InputProcess::SpidersilkPuller(spidersilk#1234566,darkweb#predefined) -> SpidersilkPuller(spidersilk#1234566,darkweb#predefined) Starting the execution of pre_pull() INFO InputProcess::SpidersilkPuller(spidersilk#1234566,darkweb#predefined) -> Reading persisted data INFO InputProcess::SpidersilkPuller(spidersilk#1234566,darkweb#predefined) -> Data retrieved from the persistence: None WARNING InputProcess::SpidersilkPuller(spidersilk#1234566,darkweb#predefined) -> Persistence will be overridden due to the retrieved state is empty INFO InputProcess::SpidersilkPuller(spidersilk#1234566,darkweb#predefined) -> Running the persistence upgrade steps INFO InputProcess::SpidersilkPuller(spidersilk#1234566,darkweb#predefined) -> Running the persistence corrections steps INFO InputProcess::SpidersilkPuller(spidersilk#1234566,darkweb#predefined) -> Running the persistence corrections steps WARNING InputProcess::SpidersilkPuller(spidersilk#1234566,darkweb#predefined) -> The persistence version value is <ZERO>, so no persistence will be allocated INFO InputProcess::SpidersilkPuller(spidersilk#1234566,darkweb#predefined) -> SpidersilkPuller(spidersilk#1234566,darkweb#predefined) Finalizing the execution of pre_pull() INFO InputProcess::SpidersilkPuller(spidersilk#1234566,darkweb#predefined) -> Starting data collection every 60 seconds INFO InputProcess::SpidersilkPuller(spidersilk#1234566,darkweb#predefined) -> Pull Started INFO InputProcess::SpidersilkPuller(spidersilk#1234566,darkweb#predefined) -> (Partial) Statistics for this pull cycle (@devo_pulling_id=1742380191896):Number of requests made: 1; Number of events received: 14; Number of duplicated events filtered out: 0; Number of events generated and sent: 14; Average of events per second: 17.717. INFO InputProcess::SpidersilkPuller(spidersilk#1234566,darkweb#predefined) -> Statistics for this pull cycle (@devo_pulling_id=1742380191896):Number of requests made: 1; Number of events received: 14; Number of duplicated events filtered out: 0; Number of events generated and sent: 14; Average of INFO InputProcess::SpidersilkPuller(spidersilk#1234566,darkweb#predefined)

After a successful collector’s execution (that is, no error logs found), you will see the following log message:

INFO InputProcess::SpidersilkPuller(spidersilk#1234566,darkweb#predefined) -> Statistics for this pull cycle (@devo_pulling_id=1742380191896):Number of requests made: 1; Number of events received: 14; Number of duplicated events filtered out: 0; Number of events generated and sent: 14; Average of INFO InputProcess::SpidersilkPuller(spidersilk#1234566,darkweb#predefined)

The value @devo_pulling_id is injected in each event to group all events ingested by the same pull action. You can use it to get the exact events downloaded in that Pull action in Devo’s search window.

This collector does not implement persistence because it has snapshot services.

This collector has different security layers that detect both an invalid configuration and abnormal operation. This table will help you detect and resolve the most common errors.

Common Errors

Error Type

Error Id

Error Message

Cause

Solution

Error Type

Error Id

Error Message

Cause

Solution

InitVariablesError

10

Missing base_url variable in module_globals

The base_url parameter is not set in the configuration file.

Ensure base_url is correctly defined in module_globals.

11

base_url must be a string

The base_url parameter is not in the correct format.

Verify that base_url is a string in module_globals.

12

Missing sort_field variable in module_globals

The sort_field parameter is missing from the configuration.

Add sort_field to module_globals in the configuration file.

13

sort_field must be a string

The sort_field parameter is not in the correct format.

Ensure sort_field is a string in module_globals.

ApiError

400

Bad request: The server could not understand the request

Invalid request format or missing required parameters.

Check the API documentation and adjust the request.

401

Unauthorized: Authentication failed or missing

The API key is invalid or not provided.

Verify the API key and ensure it is included in the request.

429

Too Many Requests: Rate limit exceeded

Too many requests were sent in a short time frame.

Implement retry logic with backoff timing.

500

Server Error: An error occurred on the server

The API service is experiencing internal issues.

Retry the request after some time or contact support.

If you continue to experience issues, consult the official documentation or contact support for further assistance.

Collector operations

This section is intended to explain how to proceed with specific operations of this collector.

Initialization

The initialization module is in charge of setup and running the input (pulling logic) and output (delivering logic) services and validating the given configuration.

A successful run has the following output messages for the initializer module:

INFO MainThread -> (CollectorMultithreadingQueue) standard_queue_multithreading -> max_size_in_messages: 10000, max_size_in_mb: 1024, max_wrap_size_in_items: 100 WARNING MainThread -> [INTERNAL LOGIC] DevoSender::_validate_kwargs_for_method__init__ -> The <address> does not appear to be an IP address and cannot be verified: collector-us.devo.io WARNING MainThread -> [OUTPUT] OutputLookupSenders -> <threshold_for_using_gzip_in_transport_layer> setting has been modified from 1.1 to 1.0 due to this configuration increases the Lookup sender performance. WARNING MainThread -> [INTERNAL LOGIC] DevoSender::_validate_kwargs_for_method__init__ -> The <address> does not appear to be an IP address and cannot be verified: collector-us.devo.io INFO MainThread -> [OUTPUT] OutputMultithreadingController(threatquotient_collector) -> Starting thread INFO MainThread -> [OUTPUT] DevoSender(standard_senders,devo_sender_0) -> Starting thread INFO MainThread -> [OUTPUT] DevoSenderManagerMonitor(standard_senders,devo_1) -> Starting thread (every 600 seconds) INFO MainThread -> [OUTPUT] DevoSenderManager(standard_senders,manager,devo_1)(devo_1) -> Starting thread INFO MainThread -> [OUTPUT] DevoSender(lookup_senders,devo_sender_0) -> Starting thread INFO MainThread -> [OUTPUT] DevoSenderManagerMonitor(lookup_senders,devo_1) -> Starting thread (every 600 seconds) INFO MainThread -> [OUTPUT] DevoSenderManager(lookup_senders,manager,devo_1)(devo_1) -> Starting thread INFO MainThread -> InitVariables Started INFO MainThread -> start_time_value initialized INFO MainThread -> verify_host_ssl_cert initialized INFO MainThread -> event_fetch_limit_in_items initialized INFO MainThread -> InitVariables Terminated INFO MainThread -> [INPUT] InputMultithreadingController(threatquotient_collector) - Starting thread (executing_period=300s) INFO MainThread -> [INPUT] InputThread(threatquotient_collector,threatquotient_data_puller#111) - Starting thread (execution_period=600s) INFO MainThread -> [INPUT] ServiceThread(threatquotient_collector,threatquotient_data_puller#111,events#predefined) - Starting thread (execution_period=600s) INFO MainThread -> [SETUP] ThreatQuotientDataPullerSetup(threatquotient_collector,threatquotient_data_puller#111,events#predefined) - Starting thread INFO MainThread -> [INPUT] ThreatQuotientDataPuller(threatquotient_collector,threatquotient_data_puller#111,events#predefined) - Starting thread

Events delivery and Devo ingestion

The event delivery module is in charge of receiving the events from the internal queues where all events are injected by the pullers and delivering them using the selected compatible delivery method.

A successful run has the following output messages for the initializer module:

INFO OutputProcess::SyslogSenderManagerMonitor(standard_senders,sidecar_0) -> Number of available senders: 1, sender manager internal queue size: 0 INFO OutputProcess::SyslogSenderManagerMonitor(standard_senders,sidecar_0) -> enqueued_elapsed_times_in_seconds_stats: {} INFO OutputProcess::SyslogSenderManagerMonitor(standard_senders,sidecar_0) -> Sender: SyslogSender(standard_senders,syslog_sender_0), status: {"internal_queue_size": 0, "is_connection_open": True} INFO OutputProcess::SyslogSenderManagerMonitor(standard_senders,sidecar_0) -> Standard - Total number of messages sent: 44, messages sent since "2022-06-28 10:39:22.511671+00:00": 44 (elapsed 0.007 seconds) INFO OutputProcess::SyslogSenderManagerMonitor(internal_senders,sidecar_0) -> Number of available senders: 1, sender manager internal queue size: 0 INFO OutputProcess::SyslogSenderManagerMonitor(internal_senders,sidecar_0) -> enqueued_elapsed_times_in_seconds_stats: {} INFO OutputProcess::SyslogSenderManagerMonitor(internal_senders,sidecar_0) -> Sender: SyslogSender(internal_senders,syslog_sender_0), status: {"internal_queue_size": 0, "is_connection_open": True} INFO OutputProcess::SyslogSenderManagerMonitor(internal_senders,sidecar_0) -> Internal - Total number of messages sent: 1, messages sent since "2022-06-28 10:39:22.516313+00:00": 1 (elapsed 0.019 seconds)

By default, these information traces will be displayed every 10 minutes.

Sender services

The Integrations Factory Collector SDK has 3 different senders services depending on the event type to delivery (internal, standard, and lookup). This collector uses the following Sender Services:

Sender services

Description

Sender services

Description

internal_senders

In charge of delivering internal metrics to Devo such as logging traces or metrics.

standard_senders

In charge of delivering pulled events to Devo.

Sender statistics

Each service displays its own performance statistics that allow checking how many events have been delivered to Devo by type:

Logging trace

Description

Logging trace

Description

Number of available senders: 1

Displays the number of concurrent senders available for the given Sender Service.

sender manager internal queue size: 0

Displays the items available in the internal sender queue.

This value helps detect bottlenecks and needs to increase the performance of data delivery to Devo. This last can be made by increasing the concurrent senders.

Total number of messages sent: 44, messages sent since "2022-06-28 10:39:22.511671+00:00": 21 (elapsed 0.007 seconds)

Displayes the number of events from the last time and following the given example, the following conclusions can be obtained:

  • 44 events were sent to Devo since the collector started.

  • The last checkpoint timestamp was 2022-06-28 10:39:22.511671+00:00.

  • 21 events where sent to Devo between the last UTC checkpoint and now.

  • Those 21 events required 0.007 seconds to be delivered.

By default these traces will be shown every 10 minutes.

To check the memory usage of this collector, look for the following log records in the collector which are displayed every 5 minutes by default, always after running the memory-free process.

  • The used memory is displayed by running processes and the sum of both values will give the total used memory for the collector.

  • The global pressure of the available memory is displayed in the global value.

  • All metrics (Global, RSS, VMS) include the value before freeing and after previous -> after freeing memory

INFO InputProcess::MainThread -> [GC] global: 20.4% -> 20.4%, process: RSS(34.50MiB -> 34.08MiB), VMS(410.52MiB -> 410.02MiB) INFO OutputProcess::MainThread -> [GC] global: 20.4% -> 20.4%, process: RSS(28.41MiB -> 28.41MiB), VMS(705.28MiB -> 705.28MiB)

Differences between RSS and VMS memory usage:

  • RSS is the Resident Set Size, which is the actual physical memory the process is using

  • VMS is the Virtual Memory Size which is the virtual memory that process is using

Sometimes it is necessary to activate the debug mode of the collector's logging. This debug mode increases the verbosity of the log and allows you to print execution traces that are very helpful in resolving incidents or detecting bottlenecks in heavy download processes.

  • To enable this option you just need to edit the configuration file and change the debug_status parameter from false to true and restart the collector.

  • To disable this option, you just need to update the configuration file and change the debug_status parameter from true to false and restart the collector.

For more information, visit the configuration and parameterization section corresponding to the chosen deployment mode.

Change log for v1.x.x

Release

Released on

Release type

Details

Recommendations

Release

Released on

Release type

Details

Recommendations

v1.0.0

Feb 26, 2025

status:NEW FEATURE

New features:

  • Initial release of the Spidersilk collector.
    Introduces snapshot-based data retrieval approach.
    Supports Threats, Assets, and Darkweb data collection.

Recommended version

Related content