Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 19 Next »

Overview

GitHub is a version control platform that allows you to track changes to your codebase, flag bugs and issues for follow-up, and manage your product's build process. It simplifies the process of working with other people and makes it easy to collaborate on projects. Team members can work on files and easily merge their changes with the master branch of the project. GitHub API provides data about a hosted code repository, ranging from commits to pull requests and comments.

The Devo Github collector enables customers to retrieve data from GitHub API into Devo to query, correlate, analyze, and visualize it, enabling Enterprise IT and Cybersecurity teams to take the most impactful decisions at the petabyte scale.

Configuration requirements

To run this collector, there are some configurations detailed below that you need to take into account.

Configuration

Details

Token

You’ll need to create an access token to authenticate the collector on the GitHub server.

Configuration

Refer to the Vendor setup section to know more about these configurations.

Devo collector features

Feature

Details

Allow parallel downloading (multipod)

not allowed

Running environments

  • collector server

  • on-premise

Populated Devo events

standard

Data sources

Data Source

Description

GitHub API endpoint

Collector service name

Type

Devo table

Available from release

Collaborators

Information about collaborators.

/repos/{owner}/{repo}/collaborators

Repositories - GitHub Docs

  • metadata:read

collaborators

repository

vcs.github.repository.collaborators

v1.0.0

Commits

Commits made in the repository

/repos/{owner}/{repo}/commits

Repositories - GitHub Docs

  • contents:read

commits

repository

vcs.github.repository.commits

v1.0.0

Forks

Forks created in the repository

/repos/{owner}/{repo}/forks

Repositories - GitHub Docs

  • metadata:read

forks

repository

vcs.github.repository.forks

v1.0.0

Events

Information about the different events such as resource creations or deletions.

/repos/{owner}/{repo}/events

Eventos - GitHub Docs

  • metadata:read

events

repository

vcs.github.repository.events

v1.0.0

Issue comments

Comments made in every issue.

/repos/{owner}/{repo}/comments

Issue comments - GitHub Docs

  • issues:read or

  • pull_requests:read

issue_comments

repository

vcs.github.repository.issue_comments

v1.0.0

Subscribers

Information about the different users subscribed to one repository.

/repos/{owner}/{repo}/subscribers

Watching - GitHub Docs

  • metadata:read

subscribers

repository

vcs.github.repository.subscribers

v1.0.0

Pull requests

Pull requests made in the repository.

/repos/{owner}/{repo}/pulls

/repos/{owner}/{repo}/pulls/{pull_number}/commits

Pulls - GitHub Docs

  • pull_requests:read

pull_requests

repository

vcs.github.repository.pull_requests

vcs.github.repository.pull_request_commits

v1.0.0

Subscriptions

Repositories you are subscribed.

/repos/{owner}/{repo}/subscription

Activity - GitHub Docs

  • metadata:read

subscriptions

repository

vcs.github.repository.subscriptions

v1.0.0

Releases

Information about releases made in the repository.

/repos/{owner}/{repo}/releases

Repositories - GitHub Docs

  • contents:read

releases

repository

vcs.github.repository.releases

v1.0.0

Stargazers

Information about users who start repositories making them favorites

/repos/{owner}/{repo}/stargazers

Starring - GitHub Docs

  • metadata:read

stargazers

repository

vcs.github.repository.stargazers

v1.0.0

Audit

Organization auditory events.

/orgs/{org}/audit-log

Organizations - GitHub Docs

audit

organization

vcs.github.organization.audit

v1.0.0

SSO Authorizations

Single sign-on authorization.

/orgs/{org}/credential-authorizations

Organizations - GitHub Docs

  • organization_administration:read

sso_authorizations

organization

vcs.github.organization.sso_authorizations

v1.0.0

Webhooks

Organization created webhooks.

/orgs/{org}/hooks

Organizations - GitHub Docs

admin:org_hook

webhooks

organization

vcs.github.organization.webhooks

v1.0.0

Dependabot Alerts

GitHub sends Dependabot alerts when we detect that your repository uses a vulnerable dependency or malware.

/repos/{owner}/{repo}/dependabot/alerts

Dependabot alerts - GitHub Docs

  • vulnerability_alerts:read

dependabot_alerts

repository

vcs.github.organization.dependabot_alerts

v2.0.0

Dependabot Secrets

Lists all secrets available in an organization without revealing their encrypted values.

/orgs/{org}/dependabot/secrets

Dependabot secrets - GitHub Docs

  • admin:org

dependabot

organization

vcs.github.organization.dependabot

v2.0.0

Actions

GitHub Actions for a repository.

/repos/{owner}/{repo}/actions/runs

Workflow runs - GitHub Docs

  • actions:read

actions

repository

vcs.github.repository.actions

v2.0.0

CodeScan

Code scanning is a feature that you use to analyze the code in a GitHub repository to find security vulnerabilities and coding errors.

/repos/{owner}/{repo}/code-scanning/alerts

Code Scanning - GitHub Docs

  • security_events:read

codescan

repository

vcs.github.repository.codescan

v2.0.0

Enterprise Audit

 

Enterprise Auditory Events

/enterprises/{enterprise}/audit-log

REST API endpoints for organizations - GitHub Docs

  • admin:enterprise

  • read:audit_log

  • read:enterprise

enterprise_audit

enterprise

vcs.github.enterprise.audit

v2.0.0

GitHub Documentation

Refer to the GitHub documentation to know more about its repositories.

For more information on how the events are parsed, visit our page.

Vendor setup

Personal access token authentication

To retrieve the data, we need to create an access token to authenticate the collector on the GitHub server.

If you want to use the Enterprise Audit service, make sure that read:audit_log andread:enterprise are present.

Github App Installation authentication

If you are using SALM authentication in your account, you’ll need to authorize your token after generating it. Check how to do it in this article.

Authorization with SAML

What is SAML authorization?

SAML authorization is a markup language for security confirmations that provides a standardized way to tell external applications and services that a user is who he or she claims to be. SAML uses single sign-on (SSO) technology and allows you to authenticate a user once and then communicate that authentication to multiple applications.

Authorizing a personal access token

To use SAML, you need to authorize the token for personal use. There are two ways:

  • Authorize existing token

  • Create a new token and authorize it.

GitHub documentation

Refer to the GitHub documentation to know how to do it.

Minimum configuration required for basic pulling

Although this collector supports advanced configuration, the fields required to retrieve data with basic configuration are defined below.

This minimum configuration refers exclusively to those specific parameters of this integration. There are more required parameters related to the generic behavior of the collector. Check setting sections for details.

Setting

Details

token

Set up here requires your access token created in the GitHub console.

username

Set up here requires your username.

private_key_path

Set here the path to the .pem file that stores your private key

private_key_base64

Set here the private key file encoded in base64

app_id

Set here the id of your installed app

organization

Use this parameter to define the name of the organization that owns the repository

See the Accepted authentication methods section to verify what settings are required based on the desired authentication method.

Accepted authentication methods

Authentication method

URL

Token

Username

Private key

App ID

Organization

Personal Access Token

OPTIONAL
(default is https://api.github.com/)

REQUIRED

REQUIRED

NOT REQUIRED

NOT REQUIRED

REQUIRED

GitHub App installation

OPTIONAL
(default is https://api.github.com/)

NOT REQUIRED

NOT REQUIRED

REQUIRED

REQUIRED

REQUIRED

Run the collector

Once the data source is configured, you can either send us the required information if you want us to host and manage the collector for you (Cloud collector), or deploy and host the collector in your own machine using a Docker image (On-premise collector).

API limitations

By default, the Rate Limit defined by GitHub is 5.000 requests per hour per authenticated user. This limit can change depending on the type of account. Basically, GitHub Enterprise Cloud accounts may have higher limits, up to 15.000 requests per hour.

Create multiple accounts

It is important to select carefully those services that are required to be monitored. However, GitHub allows the creation of one Personal Access Token per account. So, for this purpose, it would be possible to create multiple accounts and then use one account per service to be monitored. Those accounts must belong to the Organization from which the data will be pulled. GitHub only allows to have one free account, so that account will be paid account.

Collector services detail

This section is intended to explain how to proceed with specific actions for services.

 Common sections for Services

Verify data collections

Once the collector has been launched, it is important to check if the ingestion is performed in a proper way. To do so, go to the collector’s logs console.

This service has the following components:

Component

Description

Setup

The setup module is in charge of authenticating the service and managing the token expiration when needed.

Puller

The setup module is in charge of pulling the data in a organized way and delivering the events via SDK.

Setup output

A successful run has the following output messages for the setup module:

INFO InputProcess::GithubApiserverBasePullerSetup(example_collector,github#444,actions#predefined,all) -> The token/header/authentication is defined
INFO InputProcess::GithubApiserverBasePullerSetup(example_collector,github#444,actions#predefined,all) -> The token/header/authentication is valid
INFO InputProcess::GithubApiserverBasePullerSetup(example_collector,github#444,actions#predefined,all) -> The user whatever-user belongs to whatever-company
INFO InputProcess::GithubApiserverBasePullerSetup(example_collector,github#444,actions#predefined,all) -> Finalizing the execution of setup()
INFO InputProcess::GithubApiserverBasePullerSetup(example_collector,github#444,actions#predefined,all) -> Setup for module <GithubDataPullerActions> has been successfully executed
 Action Service

This service lists all workflows that run for a repository in GitHub. All events of this service are ingested into the table vcs.github.repository.actions.

Verify data collections

Puller Output

A successful initial run has the following output messages for the puller module:

Note that the PrePull action is executed only one time before the first run of the Pull action.

INFO InputProcess::GithubDataPullerActions(github,444,actions,predefined,all) -> Starting the execution of pre_pull()
INFO InputProcess::GithubDataPullerActions(github,444,actions,predefined,all) -> Reading persisted data
INFO InputProcess::GithubDataPullerActions(github,444,actions,predefined,all) -> No changes have been made in saved state. Returning saved state: {'pulling_date_from_config': '1640995200.0', 'last_pulled_date': '1641104263.0', 'ids': [1645452345]}
INFO InputProcess::GithubDataPullerActions(github,444,actions,predefined,all) -> GithubDataPullerActions(github,444,actions,predefined,all) Finalizing the execution of pre_pull()
INFO InputProcess::GithubDataPullerActions(github,444,actions,predefined,all) -> Starting data collection every 60 seconds
INFO InputProcess::GithubDataPullerActions(github,444,actions,predefined,all) -> Pull Started
INFO InputProcess::GithubDataPullerActions(github,444,actions,predefined,all) -> The collector will start pulling data since 2022-01-02T06:17:43Z
INFO InputProcess::GithubDataPullerActions(github,444,actions,predefined,all) -> Total number of repositories: 2
INFO InputProcess::GithubDataPullerActions(github,444,actions,predefined,all) -> Tag: vcs.github.api.repository.actions
INFO InputProcess::GithubDataPullerActions(github,444,actions,predefined,all) -> (Partial) Statistics for this pull cycle Number of requests made: 3; Number of events received: 1; Number of duplicated events filtered out: 1; Number of events generated and sent: 0; Average of events per second: 0.000.
...
INFO InputProcess::GithubDataPullerActions(github,444,actions,predefined,all) -> (Partial) Statistics for this pull cycle Number of requests made: 3; Number of events received: 1; Number of duplicated events filtered out: 1; Number of events generated and sent: 0; Average of events per second: 0.000.

After a successful collector’s execution (this is, no error logs were found), you should be able to see the following log message:

INFO InputProcess::GithubDataPullerActions(github,444,actions,predefined,all) -> Statistics for this pull cycle Number of requests made: 30; Number of events received: 317; Number of duplicated events filtered out: 11; Number of events generated and sent: 306; Average of events per second: 23.234.

The @devo_pulling_id value is injected into each event to allow grouping all events ingested by the same pull action. You can use it to get the exact events downloaded on that Pull action in Loxcope.

Note that a Partial Statistics Report will be displayed after download a page when the pagination is required to pull all available events. Look for the report without the Partial reference.

(Partial) Statistics for this pull cycle Number of requests made: Number of requests made: 2; Number of events received: 45; Number of duplicated events filtered out: 0; Number of events generated and sent: 40; Average of events per second: 23.234.

Restart the persistence

This service makes use of persistence. To restart the persistence, the since parameter must be changed from the user configuration. This field indicates the date from which to start pulling data. For further details, go to the settings section.

 Audit Service

This service gets the audit log (a sequence of activities) for an organization in GitHub. This service starts collecting 90 days back from the moment the persistence is reset. All events of this service are ingested into table vcs.github.organization.audit.

This service generates a huge amount of events and it takes a lot of time and requests for the API to be up-to-date with this service.

Verify data collection

Puller output

A successful initial run has the following output messages for the puller module:

Note that the PrePull action is executed only one time before the first run of the Pull action.

INFO InputProcess::GithubDataPullerAudit(github,444,audit,predefined,all) -> Starting the execution of pre_pull()
INFO InputProcess::GithubDataPullerAudit(github,444,audit,predefined,all) -> Reading persisted data
INFO InputProcess::GithubDataPullerAudit(github,444,audit,predefined,all) -> GithubDataPullerAudit(github,444,audit,predefined,all) Finalizing the execution of pre_pull()
INFO InputProcess::GithubDataPullerAudit(github,444,audit,predefined,all) -> Starting data collection every 60 seconds
INFO InputProcess::GithubDataPullerAudit(github,444,audit,predefined,all) -> Pull Started
INFO InputProcess::GithubDataPullerAudit(github,444,audit,predefined,all) -> Tag: vcs.github.api.organization.audit
INFO InputProcess::GithubDataPullerAudit(github,444,audit,predefined,all) -> (Partial) Statistics for this pull cycle Number of requests made: 1; Number of events received: 30; Number of duplicated events filtered out: 0; Number of events generated and sent: 30; Average of events per second: 14.773.
INFO InputProcess::GithubDataPullerAudit(github,444,audit,predefined,all) -> (Partial) Statistics for this pull cycle Number of requests made: 2; Number of events received: 60; Number of duplicated events filtered out: 0; Number of events generated and sent: 60; Average of events per second: 14.709.
INFO InputProcess::GithubDataPullerAudit(github,444,audit,predefined,all) -> (Partial) Statistics for this pull cycle Number of requests made: 3; Number of events received: 90; Number of duplicated events filtered out: 0; Number of events generated and sent: 90; Average of events per second: 14.685.
INFO InputProcess::GithubDataPullerAudit(github,444,audit,predefined,all) -> (Partial) Statistics for this pull cycle Number of requests made: 4; Number of events received: 120; Number of duplicated events filtered out: 0; Number of events generated and sent: 120; Average of events per second: 14.865.
...

After a successful collector’s execution (this is, no error logs were found), you should be able to see the following log message. However, it takes a lot of time to reach the end of this service, as it generates a huge amount of events and starts pulling 90 days back:

INFO InputProcess::GithubDataPullerAudit(github,444,audit,predefined,all) -> Statistics for this pull cycle Number of requests made: 10000; Number of events received: 300000; Number of duplicated events filtered out: 0; Number of events generated and sent: 300000; Average of events per second: 14.865.

The value @devo_pulling_id is injected in each event to group all events ingested by the same pull action. You can use it to get the exact events downloaded in that Pull action in Devo’s search window.

Note that a Partial Statistics Report will be displayed after downloading a page when the pagination is required to pull all available events. Look for the report without the Partial reference.

(Partial) Statistics for this pull cycle Number of requests made: 4; Number of events received: 120; Number of duplicated events filtered out: 0; Number of events generated and sent: 120; Average of events per second: 14.865.

Restart the persistence

This service makes use of persistence. To restart the persistence, the persistance_reset_date parameter must be changed from the user configuration. This field indicates the date from which to start pulling data. For further details, go to the settings section.

 Codescan Service

Code scanning is a feature that you can use to analyze the code in a GitHub repository to find security vulnerabilities and coding errors. This service returns the codescan results for each repository in case it is enabled.

Verify data collection

Puller output

A successful initial run has the following output messages for the puller module:

Note that the PrePull action is executed only one time before the first run of the Pull action.

INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Starting the execution of pre_pull()
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Reading persisted data
WARNING InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Persistence not found, persistence will be initialized
WARNING InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> State saved: {'old_persistence_reset_date': '26-Oct', 'codescan': {}}
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> GithubDataPullerCodescan(github,444,codescan,predefined,all) Finalizing the execution of pre_pull()
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Starting data collection every 300 seconds
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Pull Started
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Get Codescan function called
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Partial statistics: Pages retrieved 10, items buffered 300
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> No more pages have been detected ahead for repo repo-1
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> New items found for repo repo-1 -> 43
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Persistence saved for repo-1 -> 9443322
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Partial statistics: Pages retrieved 10, items buffered 300
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> No more pages have been detected ahead for repo repo-2
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> New items found for repo repo-2 -> 367
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Persistence saved for repo-2 -> 4567887
....

After successful execution of the collector, you should be able to see the following log message:

INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Statistics for this pull cycle Number of requests made: 120; Number of events received: 932; Number of duplicated events filtered out: 0; Number of events generated and sent: 932; Average of events per second: 23.593.

The value @devo_pulling_id is injected in each event to group all events ingested by the same pull action. You can use it to get the exact events downloaded in that Pull action in Devo’s search window.

Note that a Partial Statistics Report will be displayed after download a set of 10 pages when the pagination is required to pull all available events. Look for the report without the Partial reference.

Partial statistics: Pages retrieved 10, items buffered 300

Restart the persistence

This service makes use of persistence. To restart the persistence, the since parameter must be changed from the user configuration. This field indicates the date from which to start pulling data. For further details, go to the settings section.

 Collaborators Service

This service gets a list of collaborators for each repository in GitHub.

Verify data collection

Puller output

A successful initial run has the following output messages for the puller module:

Note that the PrePull action is executed only one time before the first run of the Pull action.

INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Starting the execution of pre_pull()
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Reading persisted data
WARNING InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Persistence not found, persistence will be initialized
WARNING InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> State saved: {'old_persistence_reset_date': '26-Oct', 'codescan': {}}
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> GithubDataPullerCodescan(github,444,codescan,predefined,all) Finalizing the execution of pre_pull()
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Starting data collection every 300 seconds
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Pull Started
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Get Codescan function called
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Partial statistics: Pages retrieved 10, items buffered 300
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> No more pages have been detected ahead for repo repo-1
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> New items found for repo repo-1 -> 43
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Persistence saved for repo-1 -> 9443322
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Partial statistics: Pages retrieved 10, items buffered 300
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> No more pages have been detected ahead for repo repo-2
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> New items found for repo repo-2 -> 367
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Persistence saved for repo-2 -> 4567887
....

After the successful execution of the collector, you should be able to see the following log message:

INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Statistics for this pull cycle Number of requests made: 120; Number of events received: 932; Number of duplicated events filtered out: 0; Number of events generated and sent: 932; Average of events per second: 23.593.

The value @devo_pulling_id is injected in each event to group all events ingested by the same pull action. You can use it to get the exact events downloaded in that Pull action in Devo’s search window.

Note that a Partial Statistics Report will be displayed after download a set of 10 pages when the pagination is required to pull all available events. Look for the report without the Partial reference.

Partial statistics: Pages retrieved 10, items buffered 300

Restart the persistence

This service makes use of persistence. To restart the persistence, the since parameter must be changed from the user configuration. This field indicates the date from which to start pulling data. For further details, go to the settings section.

 Commits Service

This service gets a list of collaborators for each repository in GitHub.

Verify data collection

Puller output

A successful initial run has the following output messages for the puller module:

Note that the PrePull action is executed only one time before the first run of the Pull action.

INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Starting the execution of pre_pull()
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Reading persisted data
WARNING InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Persistence not found, persistence will be initialized
WARNING InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> State saved: {'old_persistence_reset_date': '26-Oct', 'codescan': {}}
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> GithubDataPullerCodescan(github,444,codescan,predefined,all) Finalizing the execution of pre_pull()
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Starting data collection every 300 seconds
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Pull Started
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Get Codescan function called
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Partial statistics: Pages retrieved 10, items buffered 300
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> No more pages have been detected ahead for repo repo-1
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> New items found for repo repo-1 -> 43
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Persistence saved for repo-1 -> 9443322
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Partial statistics: Pages retrieved 10, items buffered 300
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> No more pages have been detected ahead for repo repo-2
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> New items found for repo repo-2 -> 367
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Persistence saved for repo-2 -> 4567887
....

After the successful execution of the collector, you should be able to see the following log message:

INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Statistics for this pull cycle Number of requests made: 120; Number of events received: 932; Number of duplicated events filtered out: 0; Number of events generated and sent: 932; Average of events per second: 23.593.

The value @devo_pulling_id is injected in each event to group all events ingested by the same pull action. You can use it to get the exact events downloaded in that Pull action in Devo’s search window.

Note that a Partial Statistics Report will be displayed after download a set of 10 pages when the pagination is required to pull all available events. Look for the report without the Partial reference.

Partial statistics: Pages retrieved 10, items buffered 300

Restart the persistence

This service makes use of persistence. To restart the persistence, the since parameter must be changed from the user configuration. This field indicates the date from which to start pulling data. For further details, go to the settings section.

 Dependabot

This service lists all secrets available in an organization without revealing their encrypted values. All events of this service are ingested into table vcs.github.repository.dependabot.

Verify data collection

Puller output

A successful initial run has the following output messages for the puller module:

Note that the PrePull action is executed only one time before the first run of the Pull action.

INFO InputProcess::GithubDataPullerDependabot(github,444,dependabot,predefined,all) -> Starting the execution of pre_pull()
INFO InputProcess::GithubDataPullerDependabot(github,444,dependabot,predefined,all) -> Reading persisted data
WARNING InputProcess::GithubDataPullerDependabot(github,444,dependabot,predefined,all) -> Persistence not found, persistence will be initialized
WARNING InputProcess::GithubDataPullerDependabot(github,444,dependabot,predefined,all) -> State saved: {'old_persistence_reset_date': 'test1', 'dependabot': {}}
INFO InputProcess::GithubDataPullerDependabot(github,444,dependabot,predefined,all) -> GithubDataPullerDependabot(github,444,dependabot,predefined,all) Finalizing the execution of pre_pull()
INFO InputProcess::GithubDataPullerDependabot(github,444,dependabot,predefined,all) -> Starting data collection every 60 seconds
INFO InputProcess::GithubDataPullerDependabot(github,444,dependabot,predefined,all) -> Pull Started
INFO InputProcess::GithubDataPullerDependabot(github,444,dependabot,predefined,all) -> No more pages have been detected ahead for org my-organization
INFO InputProcess::GithubDataPullerDependabot(github,444,dependabot,predefined,all) -> New items found for org my-organization -> 4
INFO InputProcess::GithubDataPullerDependabot(github,444,dependabot,predefined,all) -> Persistence saved for org my-organization -> 46463737382
....

After the successful execution of the collector, you should be able to see the following log message:

INFO InputProcess::GithubDataPullerDependabot(github,444,dependabot,predefined,all) -> Statistics for this pull cycle Number of requests made: 1; Number of events received: 4; Number of duplicated events filtered out: 0; Number of events generated and sent: 4; Average of events per second: 5.743.

The value @devo_pulling_id is injected in each event to group all events ingested by the same pull action. You can use it to get the exact events downloaded in that Pull action in Devo’s search window.

Note that a Partial Statistics Report will be displayed after download a set of 10 pages when the pagination is required to pull all available events. Look for the report without the Partial reference.

Partial statistics: Pages retrieved 10, items buffered 300

Restart the persistence

This service makes use of persistence. To restart the persistence, the persistence_reset_date parameter must be changed from the user configuration. This field indicates the date from which to start pulling data. For further details, go to the settings section.

 Dependabot Alerts

This service returns the Dependabot Alerts for each repository. GitHub generates an alert when a repository uses a vulnerable dependency or malware. All events of this service are ingested into table vcs.github.repository.dependabot_alerts.

Verify data collection

Puller output

A successful initial run has the following output messages for the puller module:

Note that the PrePull action is executed only one time before the first run of the Pull action.

INFO InputProcess::GithubDataPullerDependabotAlerts(github,444,dependabot_alerts,predefined,all) -> Starting the execution of pre_pull()
INFO InputProcess::GithubDataPullerDependabotAlerts(github,444,dependabot_alerts,predefined,all) -> Reading persisted data
WARNING InputProcess::GithubDataPullerDependabotAlerts(github,444,dependabot_alerts,predefined,all) -> Persistence not found, persistence will be initialized
WARNING InputProcess::GithubDataPullerDependabotAlerts(github,444,dependabot_alerts,predefined,all) -> State saved: {'old_persistence_reset_date': '26-Oct', 'dependabot_alerts': {}}
INFO InputProcess::GithubDataPullerDependabotAlerts(github,444,dependabot_alerts,predefined,all) -> GithubDataPullerDependabotAlerts(github,444,dependabot_alerts,predefined,all) Finalizing the execution of pre_pull()
INFO InputProcess::GithubDataPullerDependabotAlerts(github,444,dependabot_alerts,predefined,all) -> Starting data collection every 300 seconds
INFO InputProcess::GithubDataPullerDependabotAlerts(github,444,dependabot_alerts,predefined,all) -> Pull Started
INFO InputProcess::GithubDataPullerDependabotAlerts(github,444,dependabot_alerts,predefined,all) -> Get Dependabot Alerts function called
INFO InputProcess::GithubDataPullerDependabotAlerts(github,444,dependabot_alerts,predefined,all) -> No more pages have been detected ahead for repo repo-1
INFO InputProcess::GithubDataPullerDependabotAlerts(github,444,dependabot_alerts,predefined,all) -> New items found for repo repo-1 -> 12
INFO InputProcess::GithubDataPullerDependabotAlerts(github,444,dependabot_alerts,predefined,all) -> Persistence saved for repo-1 -> 94445
INFO InputProcess::GithubDataPullerDependabotAlerts(github,444,dependabot_alerts,predefined,all) -> No more pages have been detected ahead for repo repo-2
INFO InputProcess::GithubDataPullerDependabotAlerts(github,444,dependabot_alerts,predefined,all) -> New items found for repo repo-2 -> 3
INFO InputProcess::GithubDataPullerDependabotAlerts(github,444,dependabot_alerts,predefined,all) -> Persistence saved for repo-2 -> 45556
....

After the successful execution of the collector, you should be able to see the following log message:

INFO InputProcess::GithubDataPullerDependabotAlerts(github,444,dependabot_alerts,predefined,all) -> Statistics for this pull cycle Number of requests made: 30; Number of events received: 122; Number of duplicated events filtered out: 0; Number of events generated and sent: 122; Average of events per second: 13.63.

The value @devo_pulling_id is injected in each event to group all events ingested by the same pull action. You can use it to get the exact events downloaded in that Pull action in Devo’s search window.

Note that a Partial Statistics Report will be displayed after download a set of 10 pages when the pagination is required to pull all available events. Look for the report without the Partial reference.

Partial statistics: Pages retrieved 10, items buffered 300

Restart the persistence

This service makes use of persistence. To restart the persistence, the persistence_reset_date parameter must be changed from the user configuration. This field indicates the date from which to start pulling data. For further details, go to the settings section.

 Events

This service returns the events for each repository. All events of this service are ingested into table vcs.github.repository.events.

Verify data collection

Puller output

A successful initial run has the following output messages for the puller module:

Note that the PrePull action is executed only one time before the first run of the Pull action.

INFO InputProcess::GithubDataPullerEvents(github,444,events,predefined,all) -> Starting the execution of pre_pull()
INFO InputProcess::GithubDataPullerEvents(github,444,events,predefined,all) -> Reading persisted data
WARNING InputProcess::GithubDataPullerEvents(github,444,events,predefined,all) -> Persistence not found, persistence will be initialized
WARNING InputProcess::GithubDataPullerEvents(github,444,events,predefined,all) -> State saved: {'old_persistence_reset_date': 'prueba-2', 'events': {}}
INFO InputProcess::GithubDataPullerEvents(github,444,events,predefined,all) -> GithubDataPullerEvents(github,444,events,predefined,all) Finalizing the execution of pre_pull()
INFO InputProcess::GithubDataPullerEvents(github,444,events,predefined,all) -> Starting data collection every 300 seconds
INFO InputProcess::GithubDataPullerEvents(github,444,events,predefined,all) -> Pull Started
INFO InputProcess::GithubDataPullerEvents(github,444,events,predefined,all) -> Get Events function called
INFO InputProcess::GithubDataPullerEvents(github,444,events,predefined,all) -> Partial statistics: Pages retrieved 10, items buffered 300
INFO InputProcess::GithubDataPullerEvents(github,444,events,predefined,all) -> Partial statistics: Pages retrieved 20, items buffered 600
INFO InputProcess::GithubDataPullerEvents(github,444,events,predefined,all) -> No more pages have been detected ahead for repo repo-1
INFO InputProcess::GithubDataPullerEvents(github,444,events,predefined,all) -> New items found for repo repo-1 -> 740
INFO InputProcess::GithubDataPullerEvents(github,444,events,predefined,all) -> Persistence saved for repo-1 -> 456789
INFO InputProcess::GithubDataPullerEvents(github,444,events,predefined,all) -> Partial statistics: Pages retrieved 10, items buffered 300
INFO InputProcess::GithubDataPullerEvents(github,444,events,predefined,all) -> No more pages have been detected ahead for repo repo-2
INFO InputProcess::GithubDataPullerEvents(github,444,events,predefined,all) -> New items found for repo repo-2 -> 356
INFO InputProcess::GithubDataPullerEvents(github,444,events,predefined,all) -> Persistence saved for repo-2 -> 5678567
....

After the successful execution of the collector, you should be able to see the following log message:

INFO InputProcess::GithubDataPullerEvents(github,444,events,predefined,all) -> Statistics for this pull cycle Number of requests made: 89; Number of events received: 1562; Number of duplicated events filtered out: 0; Number of events generated and sent: 1562; Average of events per second: 79.13.

The value @devo_pulling_id is injected in each event to group all events ingested by the same pull action. You can use it to get the exact events downloaded in that Pull action in Devo’s search window.

Note that a Partial Statistics Report will be displayed after download a set of 10 pages when the pagination is required to pull all available events. Look for the report without the Partial reference.

Partial statistics: Pages retrieved 10, items buffered 300

Restart the persistence

This service makes use of persistence. To restart the persistence, the persistence_reset_date parameter must be changed from the user configuration. This field indicates the date from which to start pulling data. For further details, go to the settings section.

 Forks

This service returns the forks for each repository. All events of this service are ingested into table vcs.github.repository.forks.

Verify data collection

Puller output

A successful initial run has the following output messages for the puller module:

Note that the PrePull action is executed only one time before the first run of the Pull action.

INFO InputProcess::GithubDataPullerForks(github,444,forks,predefined,all) -> Starting the execution of pre_pull()
INFO InputProcess::GithubDataPullerForks(github,444,forks,predefined,all) -> Reading persisted data
WARNING InputProcess::GithubDataPullerForks(github,444,forks,predefined,all) -> Persistence not found, persistence will be initialized
WARNING InputProcess::GithubDataPullerForks(github,444,forks,predefined,all) -> State saved: {'old_persistence_reset_date': 'prueba-3', 'forks': {}}
INFO InputProcess::GithubDataPullerForks(github,444,forks,predefined,all) -> GithubDataPullerForks(github,444,forks,predefined,all) Finalizing the execution of pre_pull()
INFO InputProcess::GithubDataPullerForks(github,444,forks,predefined,all) -> Starting data collection every 300 seconds
INFO InputProcess::GithubDataPullerForks(github,444,forks,predefined,all) -> Pull Started
INFO InputProcess::GithubDataPullerForks(github,444,forks,predefined,all) -> Get Forks function called
INFO InputProcess::GithubDataPullerForks(github,444,forks,predefined,all) -> No more pages have been detected ahead for repo repo-1
INFO InputProcess::GithubDataPullerForks(github,444,forks,predefined,all) -> New items found for repo repo-1 -> 3
INFO InputProcess::GithubDataPullerForks(github,444,forks,predefined,all) -> Persistence saved for repo-1 -> 9623469344
INFO InputProcess::GithubDataPullerForks(github,444,forks,predefined,all) -> No more pages have been detected ahead for repo repo-2
INFO InputProcess::GithubDataPullerForks(github,444,forks,predefined,all) -> New items found for repo repo-2 -> 2
INFO InputProcess::GithubDataPullerForks(github,444,forks,predefined,all) -> Persistence saved for repo-2 -> 5678234564
....

After the successful execution of the collector, you should be able to see the following log message:

INFO InputProcess::GithubDataPullerForks(github,444,forks,predefined,all) -> Statistics for this pull cycle Number of requests made: 30; Number of events received: 56; Number of duplicated events filtered out: 0; Number of events generated and sent: 56; Average of events per second: 12.128.

The value @devo_pulling_id is injected in each event to group all events ingested by the same pull action. You can use it to get the exact events downloaded in that Pull action in Devo’s search window.

Note that a Partial Statistics Report will be displayed after download a set of 10 pages when the pagination is required to pull all available events. Look for the report without the Partial reference.

Partial statistics: Pages retrieved 10, items buffered 300

Restart the persistence

This service makes use of persistence. To restart the persistence, the persistence_reset_date parameter must be changed from the user configuration. This field indicates the date from which to start pulling data. For further details, go to the settings section.

 Issue Comments Service

This service gets the list of issue comments made into a repository in GitHub. All events of this service are ingested into table vcs.github.repository.issue_comments.

Verify data collection

Puller output

A successful initial run has the following output messages for the puller module:

Note that the PrePull action is executed only one time before the first run of the Pull action.

INFO InputProcess::GithubDataPullerIssueComments(github,444,issue_comments,predefined,all) -> Starting the execution of pre_pull()
INFO InputProcess::GithubDataPullerIssueComments(github,444,issue_comments,predefined,all) -> Reading persisted data
INFO InputProcess::GithubDataPullerIssueComments(github,444,issue_comments,predefined,all) -> No changes have been made in saved state. Returning saved state: {'pulling_date_from_config': '1640995200.0', 'last_pulled_date': '1641852073.0', 'ids': [1009384492]}
INFO InputProcess::GithubDataPullerIssueComments(github,444,issue_comments,predefined,all) -> GithubDataPullerIssueComments(github,444,issue_comments,predefined,all) Finalizing the execution of pre_pull()
INFO InputProcess::GithubDataPullerIssueComments(github,444,issue_comments,predefined,all) -> Starting data collection every 60 seconds
INFO InputProcess::GithubDataPullerIssueComments(github,444,issue_comments,predefined,all) -> Pull Started
INFO InputProcess::GithubDataPullerIssueComments(github,444,issue_comments,predefined,all) -> The collector will start pulling data since 2022-01-10T22:01:13Z
INFO InputProcess::GithubDataPullerIssueComments(github,444,issue_comments,predefined,all) -> Total number of repositories: 2
INFO InputProcess::GithubDataPullerIssueComments(github,444,issue_comments,predefined,all) -> Tag: vcs.github.api.repository.issue_comments
INFO InputProcess::GithubDataPullerIssueComments(github,444,issue_comments,predefined,all) -> (Partial) Statistics for this pull cycle Number of requests made: 2; Number of events received: 60; Number of duplicated events filtered out: 60; Number of events generated and sent: 0; Average of events per second: 0.000.
INFO InputProcess::GithubDataPullerIssueComments(github,444,issue_comments,predefined,all) -> (Partial) Statistics for this pull cycle Number of requests made: 4; Number of events received: 120; Number of duplicated events filtered out: 105; Number of events generated and sent: 15; Average of events per second: 1.307.
INFO InputProcess::GithubDataPullerIssueComments(github,444,issue_comments,predefined,all) -> (Partial) Statistics for this pull cycle Number of requests made: 6; Number of events received: 180; Number of duplicated events filtered out: 160; Number of events generated and sent: 20; Average of events per second: 1.294.
...

After the successful execution of the collector, you should be able to see the following log message:

INFO InputProcess::GithubDataPullerIssueComments(github,444,issue_comments,predefined,all) -> Statistics for this pull cycle Number of requests made: 6; Number of events received: 180; Number of duplicated events filtered out: 160; Number of events generated and sent: 20; Average of events per second: 1.294.

The value @devo_pulling_id is injected in each event to group all events ingested by the same pull action. You can use it to get the exact events downloaded in that Pull action in Devo’s search window.

Note that a Partial Statistics Report will be displayed after download a page when the pagination is required to pull all available events. Look for the report without the Partial reference.

(Partial) Statistics for this pull cycle Number of requests made: 6; Number of events received: 180; Number of duplicated events filtered out: 160; Number of events generated and sent: 20; Average of events per second: 1.294.

Restart the persistence

This service makes use of persistence. To restart the persistence, the since parameter must be changed from the user configuration. This field indicates the date from which to start pulling data. For further details, go to the settings section.

 Pull Request Service

This service returns the pull request for each repository, and the associate commits for each pull request. All events of this service are ingested into table vcs.github.repository.pull_requests and vcs.github.repository.pull_requests.

Verify data collection

Puller output

A successful initial run has the following output messages for the puller module:

Note that the PrePull action is executed only one time before the first run of the Pull action.

INFO InputProcess::GithubDataPullerPullRequests(github,444,pull_requests,predefined,all) -> Starting the execution of pre_pull()
INFO InputProcess::GithubDataPullerPullRequests(github,444,pull_requests,predefined,all) -> Reading persisted data
WARNING InputProcess::GithubDataPullerPullRequests(github,444,pull_requests,predefined,all) -> Persistence not found, persistence will be initialized
WARNING InputProcess::GithubDataPullerPullRequests(github,444,pull_requests,predefined,all) -> State saved: {'old_persistence_reset_date': '27-Oct-2022', 'pull_requests': {}, 'pull_request_commits': {}}
INFO InputProcess::GithubDataPullerPullRequests(github,444,pull_requests,predefined,all) -> GithubDataPullerPullRequests(github,444,pull_requests,predefined,all) Finalizing the execution of pre_pull()
INFO InputProcess::GithubDataPullerPullRequests(github,444,pull_requests,predefined,all) -> Starting data collection every 300 seconds
INFO InputProcess::GithubDataPullerPullRequests(github,444,pull_requests,predefined,all) -> Pull Started
INFO InputProcess::GithubDataPullerPullRequests(github,444,pull_requests,predefined,all) -> New items found for Commits PR commits -> 1
INFO InputProcess::GithubDataPullerPullRequests(github,444,pull_requests,predefined,all) -> Persistence saved for Commits PR commits -> bdb806bd6218552c7b3b6507803e48694b5591b7
INFO InputProcess::GithubDataPullerPullRequests(github,444,pull_requests,predefined,all) -> New items found for Commits PR commits -> 1
INFO InputProcess::GithubDataPullerPullRequests(github,444,pull_requests,predefined,all) -> Persistence saved for Commits PR commits -> ccc00f42d56d59bcb375a327f163bb8b737f376d
INFO InputProcess::GithubDataPullerPullRequests(github,444,pull_requests,predefined,all) -> New items found for Commits PR commits -> 17
INFO InputProcess::GithubDataPullerPullRequests(github,444,pull_requests,predefined,all) -> Persistence saved for Commits PR commits -> 8f1af254f6fffad8718c7c68a7f67778bc6c6b3f
INFO InputProcess::GithubDataPullerPullRequests(github,444,pull_requests,predefined,all) -> New items found for Commits PR commits -> 1
INFO InputProcess::GithubDataPullerPullRequests(github,444,pull_requests,predefined,all) -> Persistence saved for Commits PR commits -> a0d5d609978be33b7b7c37b2ea2a400d6102ccda
INFO InputProcess::GithubDataPullerPullRequests(github,444,pull_requests,predefined,all) -> New items found for Commits PR commits -> 8
INFO InputProcess::GithubDataPullerPullRequests(github,444,pull_requests,predefined,all) -> Persistence saved for Commits PR commits -> e99dfc7b7075250ca685106bab18f31f7e62f3c5
...

After the successful execution of the collector, you should be able to see the following log message:

INFO InputProcess::GithubDataPullerPullRequests(github,444,pull_requests,predefined,all) -> Statistics for this pull cycle Number of requests made: 172; Number of events received: 1333; Number of duplicated events filtered out: 0; Number of events generated and sent: 1333; Average of events per second: 15.183.

The value @devo_pulling_id is injected in each event to group all events ingested by the same pull action. You can use it to get the exact events downloaded in that Pull action in Devo’s search window.

Note that a Partial Statistics Report will be displayed after download a set of 10 pages when the pagination is required to pull all available events. Look for the report without the Partial reference.

Partial statistics: Pages retrieved 10, items buffered 300

Restart the persistence

This service makes use of persistence. To restart the persistence, the persistence_reset_date parameter must be changed from the user configuration. This is a free text field. It is recommended to use a reference to the day the persistence is being reset. For further details, go to the settings section.

 Releases Service

This service returns the releases for each repository. All events of this service are ingested into table vcs.github.repository.releases.

Verify data collection

Puller output

A successful initial run has the following output messages for the puller module:

Note that the PrePull action is executed only one time before the first run of the Pull action.

INFO InputProcess::GithubDataPullerReleases(github,444,releases,predefined,all) -> Starting the execution of pre_pull()
INFO InputProcess::GithubDataPullerReleases(github,444,releases,predefined,all) -> Reading persisted data
WARNING InputProcess::GithubDataPullerReleases(github,444,releases,predefined,all) -> Persistence not found, persistence will be initialized
WARNING InputProcess::GithubDataPullerReleases(github,444,releases,predefined,all) -> State saved: {'old_persistence_reset_date': '27-Oct', 'releases': {}}
INFO InputProcess::GithubDataPullerReleases(github,444,releases,predefined,all) -> GithubDataPullerReleases(github,444,releases,predefined,all) Finalizing the execution of pre_pull()
INFO InputProcess::GithubDataPullerReleases(github,444,releases,predefined,all) -> Starting data collection every 300 seconds
INFO InputProcess::GithubDataPullerReleases(github,444,releases,predefined,all) -> Pull Started
INFO InputProcess::GithubDataPullerReleases(github,444,releases,predefined,all) -> Get Releases function called
INFO InputProcess::GithubDataPullerReleases(github,444,releases,predefined,all) -> No more pages have been detected ahead for repo repo-1
INFO InputProcess::GithubDataPullerReleases(github,444,releases,predefined,all) -> New items found for repo repo-1 -> 0
INFO InputProcess::GithubDataPullerReleases(github,444,releases,predefined,all) -> No more pages have been detected ahead for repo repo-2
INFO InputProcess::GithubDataPullerReleases(github,444,releases,predefined,all) -> New items found for repo repo-2 -> 0
....

After the successful execution of the collector, you should be able to see the following log message:

INFO InputProcess::GithubDataPullerReleases(github,444,releases,predefined,all) -> Statistics for this pull cycle Number of requests made: 30; Number of events received: 0; Number of duplicated events filtered out: 0; Number of events generated and sent: 0; Average of events per second: 0.0.

The value @devo_pulling_id is injected in each event to group all events ingested by the same pull action. You can use it to get the exact events downloaded in that Pull action in Devo’s search window.

Note that a Partial Statistics Report will be displayed after download a set of 10 pages when the pagination is required to pull all available events. Look for the report without the Partial reference.

Partial statistics: Pages retrieved 10, items buffered 300

Restart the persistence

This service makes use of persistence. To restart the persistence, the persistence_reset_date parameter must be changed from the user configuration. This is a free text field. It is recommended to use a reference to the day the persistence is being reset. For further details, go to the settings section.

 SSO Authorization Service

This service returns Single Sign On authorization for all organizations. All events of this service are ingested into table vcs.github.organizations.sso_authorizations.

Verify data collection

Puller output

A successful initial run has the following output messages for the puller module:

Note that the PrePull action is executed only one time before the first run of the Pull action.

INFO InputProcess::GithubDataPullerSSOAuthorizations(github,444,sso_authorizations,predefined,all) -> Starting the execution of pre_pull()
INFO InputProcess::GithubDataPullerSSOAuthorizations(github,444,sso_authorizations,predefined,all) -> Reading persisted data
WARNING InputProcess::GithubDataPullerSSOAuthorizations(github,444,sso_authorizations,predefined,all) -> Persistence not found, persistence will be initialized
WARNING InputProcess::GithubDataPullerSSOAuthorizations(github,444,sso_authorizations,predefined,all) -> State saved: {'old_persistence_reset_date': '2022-10-16T12:00:00Z', 'sso_authorizations': {}}
INFO InputProcess::GithubDataPullerSSOAuthorizations(github,444,sso_authorizations,predefined,all) -> GithubDataPullerSSOAuthorizations(github,444,sso_authorizations,predefined,all) Finalizing the execution of pre_pull()
INFO InputProcess::GithubDataPullerSSOAuthorizations(github,444,sso_authorizations,predefined,all) -> Starting data collection every 60 seconds
INFO InputProcess::GithubDataPullerSSOAuthorizations(github,444,sso_authorizations,predefined,all) -> Pull Started
INFO InputProcess::GithubDataPullerSSOAuthorizations(github,444,sso_authorizations,predefined,all) -> No more pages have been detected ahead for org my-organization
INFO InputProcess::GithubDataPullerSSOAuthorizations(github,444,sso_authorizations,predefined,all) -> New items found for org my-organization -> 58
INFO InputProcess::GithubDataPullerSSOAuthorizations(github,444,sso_authorizations,predefined,all) -> Persistence saved for org my-organization -> 40487930
.....

After the successful execution of the collector, you should be able to see the following log message:

INFO InputProcess::GithubDataPullerSSOAuthorizations(github,444,sso_authorizations,predefined,all) -> Statistics for this pull cycle Number of requests made: 2; Number of events received: 58; Number of duplicated events filtered out: 0; Number of events generated and sent: 58; Average of events per second: 80.759.

The value @devo_pulling_id is injected in each event to group all events ingested by the same pull action. You can use it to get the exact events downloaded in that Pull action in Devo’s search window.

Note that a Partial Statistics Report will be displayed after download a set of 10 pages when the pagination is required to pull all available events. Look for the report without the Partial reference.

Partial statistics: Pages retrieved 10, items buffered 300

Restart the persistence

This service makes use of persistence. To restart the persistence, the persistence_reset_date parameter must be changed from the user configuration. This is a free text field. It is recommended to use a reference to the day the persistence is being reset. For further details, go to the settings section.

 Stargazers

This service returns information about the users who starts each repository, making it a favorite. All events of this service are ingested into table vcs.github.repository.stargazers.

Verify data collection

Puller output

A successful initial run has the following output messages for the puller module:

Note that the PrePull action is executed only one time before the first run of the Pull action.

INFO InputProcess::GithubDataPullerStargazers(github,444,stargazers,predefined,all) -> Starting the execution of pre_pull()
INFO InputProcess::GithubDataPullerStargazers(github,444,stargazers,predefined,all) -> Reading persisted data
WARNING InputProcess::GithubDataPullerStargazers(github,444,stargazers,predefined,all) -> Persistence not found, persistence will be initialized
WARNING InputProcess::GithubDataPullerStargazers(github,444,stargazers,predefined,all) -> State saved: {'old_persistence_reset_date': '2022-11-01T12:34:21Z', 'stargazers': {}}
INFO InputProcess::GithubDataPullerStargazers(github,444,stargazers,predefined,all) -> GithubDataPullerStargazers(github,444,stargazers,predefined,all) Finalizing the execution of pre_pull()
INFO InputProcess::GithubDataPullerStargazers(github,444,stargazers,predefined,all) -> Starting data collection every 60 seconds
INFO InputProcess::GithubDataPullerStargazers(github,444,stargazers,predefined,all) -> Pull Started
INFO InputProcess::GithubDataPullerStargazers(github,444,stargazers,predefined,all) -> No more pages have been detected ahead for repo repo-1
INFO InputProcess::GithubDataPullerStargazers(github,444,stargazers,predefined,all) -> New items found for repo repo-1 -> 9
INFO InputProcess::GithubDataPullerStargazers(github,444,stargazers,predefined,all) -> Persistence saved for repo-1 -> 1966093
INFO InputProcess::GithubDataPullerStargazers(github,444,stargazers,predefined,all) -> No more pages have been detected ahead for repo repo-2
INFO InputProcess::GithubDataPullerStargazers(github,444,stargazers,predefined,all) -> New items found for repo repo-2 -> 0
INFO InputProcess::GithubDataPullerStargazers(github,444,stargazers,predefined,all) -> No more pages have been detected ahead for repo repo-3
INFO InputProcess::GithubDataPullerStargazers(github,444,stargazers,predefined,all) -> New items found for repo repo-3 -> 0
INFO InputProcess::GithubDataPullerStargazers(github,444,stargazers,predefined,all) -> No more pages have been detected ahead for repo repo-4
INFO InputProcess::GithubDataPullerStargazers(github,444,stargazers,predefined,all) -> New items found for repo repo-4 -> 0
...

After the successful execution of the collector, you should be able to see the following log message:

INFO InputProcess::GithubDataPullerStargazers(github,444,stargazers,predefined,all) -> Statistics for this pull cycle Number of requests made: 30; Number of events received: 33; Number of duplicated events filtered out: 0; Number of events generated and sent: 33; Average of events per second: 3.049.

The value @devo_pulling_id is injected in each event to group all events ingested by the same pull action. You can use it to get the exact events downloaded in that Pull action in Devo’s search window.

Note that a Partial Statistics Report will be displayed after download a set of 10 pages when the pagination is required to pull all available events. Look for the report without the Partial reference.

Partial statistics: Pages retrieved 10, items buffered 300

Restart the persistence

This service makes use of persistence. To restart the persistence, the persistence_reset_date parameter must be changed from the user configuration. This is a free text field. It is recommended to use a reference to the day the persistence is being reset. For further details, go to the settings section.

 Subscribers

This service returns information about users subscribed to one repository. All events of this service are ingested into table vcs.github.repository.subscribers.

Verify data collection

Puller output

A successful initial run has the following output messages for the puller module:

Note that the PrePull action is executed only one time before the first run of the Pull action.

INFO InputProcess::GithubDataPullerSubscribers(github,444,subscribers,predefined,all) -> Starting the execution of pre_pull()
INFO InputProcess::GithubDataPullerSubscribers(github,444,subscribers,predefined,all) -> Reading persisted data
WARNING InputProcess::GithubDataPullerSubscribers(github,444,subscribers,predefined,all) -> Persistence not found, persistence will be initialized
WARNING InputProcess::GithubDataPullerSubscribers(github,444,subscribers,predefined,all) -> State saved: {'old_persistence_reset_date': '2022-10-27', 'subscribers': {}}
INFO InputProcess::GithubDataPullerSubscribers(github,444,subscribers,predefined,all) -> GithubDataPullerSubscribers(github,444,subscribers,predefined,all) Finalizing the execution of pre_pull()
INFO InputProcess::GithubDataPullerSubscribers(github,444,subscribers,predefined,all) -> Starting data collection every 60 seconds
INFO InputProcess::GithubDataPullerSubscribers(github,444,subscribers,predefined,all) -> Pull Started
INFO InputProcess::GithubDataPullerSubscribers(github,444,subscribers,predefined,all) -> No more pages have been detected ahead for repo repo-1
INFO InputProcess::GithubDataPullerSubscribers(github,444,subscribers,predefined,all) -> New items found for repo repo-1 -> 60
INFO InputProcess::GithubDataPullerSubscribers(github,444,subscribers,predefined,all) -> Persistence saved for repo-1 -> 1234236
INFO InputProcess::GithubDataPullerSubscribers(github,444,subscribers,predefined,all) -> No more pages have been detected ahead for repo repo-2
INFO InputProcess::GithubDataPullerSubscribers(github,444,subscribers,predefined,all) -> New items found for repo repo-2 -> 54
INFO InputProcess::GithubDataPullerSubscribers(github,444,subscribers,predefined,all) -> Persistence saved for repo-2 -> 2342343
...
INFO InputProcess::GithubDataPullerSubscribers(github,444,subscribers,predefined,all) -> No more pages have been detected ahead for repo repo-N
INFO InputProcess::GithubDataPullerSubscribers(github,444,subscribers,predefined,all) -> New items found for repo repo-N -> 1
INFO InputProcess::GithubDataPullerSubscribers(github,444,subscribers,predefined,all) -> Persistence saved for repo-N -> 3453457
INFO InputProcess::GithubDataPullerSubscribers(github,444,subscribers,predefined,all) -> Statistics for this pull cycle Number of requests made: 10; Number of events received: 178; Number of duplicated events filtered out: 0; Number of events generated and sent: 178; Average of events per second: 11.194.
INFO InputProcess::GithubDataPullerSubscribers(github,444,subscribers,predefined,all) -> The data is up to date!
INFO InputProcess::GithubDataPullerSubscribers(github,444,subscribers,predefined,all) -> Data collection completed. Elapsed time: 15.904 seconds. Waiting for 44.096 second(s) until the next one

After the successful execution of the collector, you should be able to see the following log message:

INFO InputProcess::GithubDataPullerSubscribers(github,444,subscribers,predefined,all) -> Statistics for this pull cycle Number of requests made: 10; Number of events received: 178; Number of duplicated events filtered out: 0; Number of events generated and sent: 178; Average of events per second: 11.194.

The value @devo_pulling_id is injected in each event to group all events ingested by the same pull action. You can use it to get the exact events downloaded in that Pull action in Devo’s search window.

Note that a Partial Statistics Report will be displayed after download a set of 10 pages when the pagination is required to pull all available events. Look for the report without the Partial reference.

Partial statistics: Pages retrieved 10, items buffered 300

Restart the persistence

This service makes use of persistence. To restart the persistence, the persistence_reset_date parameter must be changed from the user configuration. This is a free text field. It is recommended to use a reference to the day the persistence is being reset. For further details, go to the settings section.

 Subscription

This service returns information about the repositories the user (used to make the pulling) is subscribed to. All events of this service are ingested into table vcs.github.repository.subscriptions.

Depending on the kind of user you are using to make the data pulling, it could make no sense for you checking if the user is subscribed to the Organization repositories.

Verify data collection

Puller output

A successful initial run has the following output messages for the puller module:

Note that the PrePull action is executed only one time before the first run of the Pull action.

INFO InputProcess::GithubDataPullerSubscriptions(github,444,subscriptions,predefined,all) -> Starting the execution of pre_pull()
INFO InputProcess::GithubDataPullerSubscriptions(github,444,subscriptions,predefined,all) -> Reading persisted data
WARNING InputProcess::GithubDataPullerSubscriptions(github,444,subscriptions,predefined,all) -> Persistence not found, persistence will be initialized
WARNING InputProcess::GithubDataPullerSubscriptions(github,444,subscriptions,predefined,all) -> State saved: {'old_persistence_reset_date': '2022-10-27', 'subscriptions': {}}
INFO InputProcess::GithubDataPullerSubscriptions(github,444,subscriptions,predefined,all) -> GithubDataPullerSubscriptions(github,444,subscriptions,predefined,all) Finalizing the execution of pre_pull()
INFO InputProcess::GithubDataPullerSubscriptions(github,444,subscriptions,predefined,all) -> Starting data collection every 60 seconds
INFO InputProcess::GithubDataPullerSubscriptions(github,444,subscriptions,predefined,all) -> Pull Started
WARNING InputProcess::GithubDataPullerSubscriptions(github,444,subscriptions,predefined,all) ->  404 Did not found any watchers/subscriptions for the repository
INFO InputProcess::GithubDataPullerSubscriptions(github,444,subscriptions,predefined,all) -> No more pages have been detected ahead for repo repo-1
INFO InputProcess::GithubDataPullerSubscriptions(github,444,subscriptions,predefined,all) -> New items found for repo repo-1 -> 0
WARNING InputProcess::GithubDataPullerSubscriptions(github,444,subscriptions,predefined,all) ->  404 Did not found any watchers/subscriptions for the repository
INFO InputProcess::GithubDataPullerSubscriptions(github,444,subscriptions,predefined,all) -> No more pages have been detected ahead for repo repo-2
INFO InputProcess::GithubDataPullerSubscriptions(github,444,subscriptions,predefined,all) -> New items found for repo repo-2 -> 0
...
WARNING InputProcess::GithubDataPullerSubscriptions(github,444,subscriptions,predefined,all) ->  404 Did not found any watchers/subscriptions for the repository
INFO InputProcess::GithubDataPullerSubscriptions(github,444,subscriptions,predefined,all) -> No more pages have been detected ahead for repo repo-N
INFO InputProcess::GithubDataPullerSubscriptions(github,444,subscriptions,predefined,all) -> New items found for repo repo-N -> 0
INFO InputProcess::GithubDataPullerSubscriptions(github,444,subscriptions,predefined,all) -> Statistics for this pull cycle Number of requests made: 5; Number of events received: 0; Number of duplicated events filtered out: 0; Number of events generated and sent: 0; Average of events per second: 0.000.
INFO InputProcess::GithubDataPullerSubscriptions(github,444,subscriptions,predefined,all) -> The data is up to date!
INFO InputProcess::GithubDataPullerSubscriptions(github,444,subscriptions,predefined,all) -> Data collection completed. Elapsed time: 8.960 seconds. Waiting for 51.040 second(s) until the next one

After the successful execution of the collector, you should be able to see the following log message:

INFO InputProcess::GithubDataPullerSubscriptions(github,444,subscriptions,predefined,all) -> Statistics for this pull cycle Number of requests made: 5; Number of events received: 0; Number of duplicated events filtered out: 0; Number of events generated and sent: 0; Average of events per second: 0.000.

The value @devo_pulling_id is injected in each event to group all events ingested by the same pull action. You can use it to get the exact events downloaded in that Pull action in Devo’s search window.

Note that a Partial Statistics Report will be displayed after download a set of 10 pages when the pagination is required to pull all available events. Look for the report without the Partial reference.

Partial statistics: Pages retrieved 10, items buffered 300

Restart the persistence

This service makes use of persistence. To restart the persistence, the persistence_reset_date parameter must be changed from the user configuration. This is a free text field. It is recommended to use a reference to the day the persistence is being reset. For further details, go to the settings section.

 Webhooks

List of webhooks created by the organization. All events of this service are ingested into table vcs.github.organizations.webhooks.

Verify data collection

Puller output

A successful initial run has the following output messages for the puller module:

Note that the PrePull action is executed only one time before the first run of the Pull action.

INFO InputProcess::GithubDataPullerWebhooks(github,444,webhooks,predefined,all) -> Starting the execution of pre_pull()
INFO InputProcess::GithubDataPullerWebhooks(github,444,webhooks,predefined,all) -> Reading persisted data
WARNING InputProcess::GithubDataPullerWebhooks(github,444,webhooks,predefined,all) -> Persistence not found, persistence will be initialized
WARNING InputProcess::GithubDataPullerWebhooks(github,444,webhooks,predefined,all) -> State saved: {'old_persistence_reset_date': '27-October', 'webhooks': {}}
INFO InputProcess::GithubDataPullerWebhooks(github,444,webhooks,predefined,all) -> GithubDataPullerWebhooks(github,444,webhooks,predefined,all) Finalizing the execution of pre_pull()
INFO InputProcess::GithubDataPullerWebhooks(github,444,webhooks,predefined,all) -> Starting data collection every 60 seconds
INFO InputProcess::GithubDataPullerWebhooks(github,444,webhooks,predefined,all) -> Pull Started
INFO InputProcess::GithubDataPullerWebhooks(github,444,webhooks,predefined,all) -> No more pages have been detected ahead for org my-organization
INFO InputProcess::GithubDataPullerWebhooks(github,444,webhooks,predefined,all) -> New items found for org my-organization -> 1
INFO InputProcess::GithubDataPullerWebhooks(github,444,webhooks,predefined,all) -> Persistence saved for org my-organization -> 324567455
.....

After the successful execution of the collector, you should be able to see the following log message:

INFO InputProcess::GithubDataPullerWebhooks(github,444,webhooks,predefined,all) -> Statistics for this pull cycle Number of requests made: 1; Number of events received: 1; Number of duplicated events filtered out: 0; Number of events generated and sent: 1; Average of events per second: 5.679.

The value @devo_pulling_id is injected in each event to group all events ingested by the same pull action. You can use it to get the exact events downloaded in that Pull action in Devo’s search window.

Note that a Partial Statistics Report will be displayed after download a set of 10 pages when the pagination is required to pull all available events. Look for the report without the Partial reference.

Partial statistics: Pages retrieved 10, items buffered 300

Restart the persistence

This service makes use of persistence. To restart the persistence, the persistence_reset_date parameter must be changed from the user configuration. This is a free text field. It is recommended to use a reference to the day the persistence is being reset. For further details, go to the settings section.

Enterprise audit service

Description

This service gets the audit log (a sequence of activities) for an enterprise in GitHub.

This service generates a huge amount of events and it takes a lot of time and requests to the API to be up-to-date with this service. Use the parameter since to set a near date as the beginning.

Devo categorization and destination

All events of this service are ingested into the table vcs.github.enterprise.audit.

Verify data collection

Puller output

A successful initial run has the following output messages for the puller module:

Note that the PrePull action is executed only one time before the first run of the Pull action.

INFO InputProcess::GithubDataPullerAudit(github,444,audit,predefined,all) -> Starting the execution of pre_pull()
INFO InputProcess::GithubDataPullerAudit(github,444,audit,predefined,all) -> Reading persisted data
INFO InputProcess::GithubDataPullerAudit(github,444,audit,predefined,all) -> GithubDataPullerAudit(github,444,audit,predefined,all) Finalizing the execution of pre_pull()
INFO InputProcess::GithubDataPullerAudit(github,444,audit,predefined,all) -> Starting data collection every 60 seconds
INFO InputProcess::GithubDataPullerAudit(github,444,audit,predefined,all) -> Pull Started
INFO InputProcess::GithubDataPullerAudit(github,444,audit,predefined,all) -> Tag: vcs.github.api.organization.audit
INFO InputProcess::GithubDataPullerAudit(github,444,audit,predefined,all) -> (Partial) Statistics for this pull cycle Number of requests made: 1; Number of events received: 30; Number of duplicated events filtered out: 0; Number of events generated and sent: 30; Average of events per second: 14.773.
INFO InputProcess::GithubDataPullerAudit(github,444,audit,predefined,all) -> (Partial) Statistics for this pull cycle Number of requests made: 2; Number of events received: 60; Number of duplicated events filtered out: 0; Number of events generated and sent: 60; Average of events per second: 14.709.
INFO InputProcess::GithubDataPullerAudit(github,444,audit,predefined,all) -> (Partial) Statistics for this pull cycle Number of requests made: 3; Number of events received: 90; Number of duplicated events filtered out: 0; Number of events generated and sent: 90; Average of events per second: 14.685.
INFO InputProcess::GithubDataPullerAudit(github,444,audit,predefined,all) -> (Partial) Statistics for this pull cycle Number of requests made: 4; Number of events received: 120; Number of duplicated events filtered out: 0; Number of events generated and sent: 120; Average of events per second: 14.865.
...

After a successful collector’s execution (this is, no error logs were found), you should be able to see the following log message. However, it takes a lot of time to reach the end of this service, as it generates a huge amount of events and starts pulling 90 days back:

INFO InputProcess::GithubDataPullerAudit(github,444,audit,predefined,all) -> Statistics for this pull cycle Number of requests made: 10000; Number of events received: 300000; Number of duplicated events filtered out: 0; Number of events generated and sent: 300000; Average of events per second: 14.865.

The @devo_pulling_id value is injected into each event to allow grouping all events ingested by the same pull action. You can use it to get the exact events downloaded on that Pull action in the Data Search area of Devo.

Note that a Partial Statistics Report will be displayed after download a page when the pagination is required to pull all available events. Look for the report without the Partial reference.

(Partial) Statistics for this pull cycle Number of requests made: 4; Number of events received: 120; Number of duplicated events filtered out: 0; Number of events generated and sent: 120; Average of events per second: 14.865.

Restart the persistence

This service makes use of persistence. To restart the persistence, the since parameter must be changed in the user configuration. This field indicates the date from which to start pulling data. For further details, go to the settings section.

Collector operations

This section is intended to explain how to proceed with specific operations of this collector.

 Verify collector operations

Initialization

The initialization module is in charge of setup and running the input (pulling logic) and output (delivering logic) services and validating the given configuration.

A successful run has the following output messages for the initializer module:

INFO MainProcess::MainThread -> Loading configuration using the following files: {"full_config": "config.yaml", "job_config_loc": null, "collector_config_loc": null}
INFO MainProcess::MainThread -> Using the default location for "job_config_loc" file: "/etc/devo/job/job_config.json"
INFO MainProcess::MainThread -> "/etc/devo/job" does not exists
INFO MainProcess::MainThread -> Using the default location for "collector_config_loc" file: "/etc/devo/collector/collector_config.json"
INFO MainProcess::MainThread -> "/etc/devo/collector" does not exists
INFO MainProcess::MainThread -> Results of validation of config files parameters: {"config": "/path/to/config.yaml", "config_validated": True, "job_config_loc": "/etc/devo/job/job_config.json", "job_config_loc_default": True, "job_config_loc_validated": False, "collector_config_loc": "/etc/devo/collector/collector_config.json", "collector_config_loc_default": True, "collector_config_loc_validated": False}
INFO MainProcess::MainThread -> {"build_time": "UNKNOWN", "os_info": "Linux-5.14.0-1054-oem-x86_64-with-glibc2.31", "collector_name": "example_collector", "collector_version": "2.0.0", "collector_owner": "integrations_factory@devo.com", "started_at": "2022-10-26T15:24:51.619878Z"}
INFO MainProcess::MainThread -> [OUTPUT] OutputMultiprocessingController::__init__ Configuration -> {'devo_1': {'type': 'devo_platform', 'config': {'address': 'devo_address', 'port': 443, 'type': 'SSL', 'chain': 'chain_file', 'cert': 'cert_file', 'key': 'key_file', 'concurrent_connections': 1, 'period_sender_stats_in_seconds': 300, 'activate_final_queue': False, 'threshold_for_using_gzip_in_transport_layer': 1.1, 'compression_level': 6, 'compression_buffer_in_bytes': 51200, 'generate_metrics': False}}}
INFO MainProcess::MainThread -> OutputProcess - Starting thread (executing_period=60s)
INFO MainProcess::MainThread -> InputProcess - Starting thread (executing_period=60s)
INFO OutputProcess::MainThread -> Process started
INFO InputProcess::MainThread -> Process Started
INFO InputProcess::MainThread -> There is not defined any submodule, using the default one with value "none"
INFO OutputProcess::MainThread -> [INTERNAL LOGIC] DevoSender::_validate_kwargs_for_method__init__ -> The <address> does not appear to be an IP address and cannot be verified: devo_address
INFO InputProcess::MainThread -> GithubDataPullerIssueComments(github,444,issue_comments,predefined,all) Starting the execution of init_variables()
INFO InputProcess::MainThread -> Validating settings from collector definitions
INFO InputProcess::MainThread -> Validating settings from user configuration
INFO OutputProcess::MainThread -> [INTERNAL LOGIC] DevoSender::_validate_kwargs_for_method__init__ -> The <address> does not appear to be an IP address and cannot be verified: devo_address
INFO InputProcess::MainThread -> Populating collector_variables store
INFO InputProcess::MainThread -> Created new rate_limiter 1 seconds, 1 calls
INFO InputProcess::MainThread -> Initialization of api_base_url has started.
INFO InputProcess::MainThread -> Base url is not provided in the config.yaml. Considering the base url specified in the collector definitions
INFO InputProcess::MainThread -> api_base_url has been initialized
INFO InputProcess::MainThread -> GithubDataPullerIssueComments(github,444,issue_comments,predefined,all) Finalizing the execution of init_variables()
INFO InputProcess::MainThread -> InputThread(github,444) - Starting thread (execution_period=600s)
INFO InputProcess::MainThread -> ServiceThread(github,444,issue_comments,predefined) - Starting thread (execution_period=600s)
INFO InputProcess::MainThread -> GithubApiserverBasePullerSetup(example_collector,github#444,issue_comments#predefined,all) -> Starting thread
INFO InputProcess::MainThread -> GithubDataPullerIssueComments(github,444,issue_comments,predefined,all) - Starting thread
WARNING InputProcess::GithubDataPullerIssueComments(github,444,issue_comments,predefined,all) -> Waiting until setup will be executed
INFO InputProcess::GithubApiserverBasePullerSetup(example_collector,github#444,issue_comments#predefined,all) -> Starting the execution of setup()
INFO OutputProcess::MainThread -> [INTERNAL LOGIC] DevoSender::_validate_kwargs_for_method__init__ -> The <address> does not appear to be an IP address and cannot be verified: devo_address
INFO OutputProcess::MainThread -> DevoSender(standard_senders,devo_sender_0) -> Starting thread
INFO OutputProcess::MainThread -> DevoSenderManagerMonitor(standard_senders,devo_1) -> Starting thread (every 300 seconds)
INFO OutputProcess::MainThread -> DevoSenderManager(standard_senders,manager,devo_1) -> Starting thread
INFO OutputProcess::MainThread -> DevoSender(lookup_senders,devo_sender_0) -> Starting thread
INFO OutputProcess::MainThread -> DevoSenderManagerMonitor(lookup_senders,devo_1) -> Starting thread (every 300 seconds)
INFO OutputProcess::MainThread -> DevoSenderManager(lookup_senders,manager,devo_1) -> Starting thread
INFO OutputProcess::MainThread -> DevoSender(internal_senders,devo_sender_0) -> Starting thread
INFO OutputProcess::MainThread -> DevoSenderManagerMonitor(internal_senders,devo_1) -> Starting thread (every 300 seconds)
INFO OutputProcess::MainThread -> DevoSenderManager(internal_senders,manager,devo_1) -> Starting thread

Events delivery and Devo ingestion

The event delivery module is in charge of receiving the events from the internal queues where all events are injected by the pullers and delivering them using the selected compatible delivery method.

A successful run has the following output messages for the initializer module:

INFO OutputProcess::SyslogSenderManagerMonitor(standard_senders,sidecar_0) -> Number of available senders: 1, sender manager internal queue size: 0
INFO OutputProcess::SyslogSenderManagerMonitor(standard_senders,sidecar_0) -> enqueued_elapsed_times_in_seconds_stats: {}
INFO OutputProcess::SyslogSenderManagerMonitor(standard_senders,sidecar_0) -> Sender: SyslogSender(standard_senders,syslog_sender_0), status: {"internal_queue_size": 0, "is_connection_open": True}
INFO OutputProcess::SyslogSenderManagerMonitor(standard_senders,sidecar_0) -> Standard - Total number of messages sent: 44, messages sent since "2022-06-28 10:39:22.511671+00:00": 44 (elapsed 0.007 seconds)
INFO OutputProcess::SyslogSenderManagerMonitor(internal_senders,sidecar_0) -> Number of available senders: 1, sender manager internal queue size: 0
INFO OutputProcess::SyslogSenderManagerMonitor(internal_senders,sidecar_0) -> enqueued_elapsed_times_in_seconds_stats: {}
INFO OutputProcess::SyslogSenderManagerMonitor(internal_senders,sidecar_0) -> Sender: SyslogSender(internal_senders,syslog_sender_0), status: {"internal_queue_size": 0, "is_connection_open": True}
INFO OutputProcess::SyslogSenderManagerMonitor(internal_senders,sidecar_0) -> Internal - Total number of messages sent: 1, messages sent since "2022-06-28 10:39:22.516313+00:00": 1 (elapsed 0.019 seconds)

By default, these information traces will be displayed every 10 minutes.

Sender services

The Integrations Factory Collector SDK has 3 different senders services depending on the event type to delivery (internal, standard, and lookup). This collector uses the following Sender Services:

Sender services

Description

internal_senders

In charge of delivering internal metrics to Devo such as logging traces or metrics.

standard_senders

In charge of delivering pulled events to Devo.

Sender statistics

Each service displays its own performance statistics that allow checking how many events have been delivered to Devo by type:

Logging trace

Description

Number of available senders: 1

Displays the number of concurrent senders available for the given Sender Service.

sender manager internal queue size: 0

Displays the items available in the internal sender queue.

This value helps detect bottlenecks and needs to increase the performance of data delivery to Devo. This last can be made by increasing the concurrent senders.

Total number of messages sent: 44, messages sent since "2022-06-28 10:39:22.511671+00:00": 21 (elapsed 0.007 seconds)

Displayes the number of events from the last time and following the given example, the following conclusions can be obtained:

  • 44 events were sent to Devo since the collector started.

  • The last checkpoint timestamp was 2022-06-28 10:39:22.511671+00:00.

  • 21 events where sent to Devo between the last UTC checkpoint and now.

  • Those 21 events required 0.007 seconds to be delivered.

By default these traces will be shown every 10 minutes.

 Check memory usage

To check the memory usage of this collector, look for the following log records in the collector which are displayed every 5 minutes by default, always after running the memory-free process.

  • The used memory is displayed by running processes and the sum of both values will give the total used memory for the collector.

  • The global pressure of the available memory is displayed in the global value.

  • All metrics (Global, RSS, VMS) include the value before freeing and after previous -> after freeing memory

INFO InputProcess::MainThread -> [GC] global: 20.4% -> 20.4%, process: RSS(34.50MiB -> 34.08MiB), VMS(410.52MiB -> 410.02MiB)
INFO OutputProcess::MainThread -> [GC] global: 20.4% -> 20.4%, process: RSS(28.41MiB -> 28.41MiB), VMS(705.28MiB -> 705.28MiB)

Differences between RSS and VMS memory usage:

  • RSS is the Resident Set Size, which is the actual physical memory the process is using

  • VMS is the Virtual Memory Size which is the virtual memory that process is using

 Enable/disable the logging debug mode

Sometimes it is necessary to activate the debug mode of the collector's logging. This debug mode increases the verbosity of the log and allows you to print execution traces that are very helpful in resolving incidents or detecting bottlenecks in heavy download processes.

  • To enable this option you just need to edit the configuration file and change the debug_status parameter from false to true and restart the collector.

  • To disable this option, you just need to update the configuration file and change the debug_status parameter from true to false and restart the collector.

For more information, visit the configuration and parameterization section corresponding to the chosen deployment mode.

 Troubleshooting

Restarting the collector with an old configuration after upgrading from version 1.x to 2.x

The format of the configuration file has changed from version 1.x to version 2.x. It is not possible to use directly an old config file from 1.x for the deployment of a 2.x collector. If you try to start a new collector using an old config file, you will see an error like this in the log file:

2022-10-26T15:02:22.850 ERROR InputProcess::MainThread -> InputThread(github,123) - No service definition found for service name: repository
2022-10-26T15:02:22.850 ERROR InputProcess::MainThread -> InputThread(github,123) - No service definition found for service name: organization

You should create a new config file, using the old version 1.x file as a base to fill in the correspondent values in the new 2.x format.

Runtime errors

This collector has different security layers that detect both an invalid configuration and abnormal operation. This table will help you detect and resolve the common errors for the current services.

Error type

Error ID

Error message

Cause

Solution

InitVariablesError

0

The internal config did not pass the format validation. Contact with Support.

The internal configuration of the collector did not match the required JSON Schema to pass validation.

This is an internal issue. Contact with Devo Support team.

1

The user config did not pass the format validation. Check error traces for details and visit our documentation.

The user configuration of the collector did not match the required JSON Schema to pass validation.

Check the error traces to see the field that does not match the format validation and make the necessary change.

7

api_base_url not of expected type: str

override_api_url is not of type string.

Change override_api_url to be of type string. The default API URL could also work; to use it, just remove override_api_url from user configuration.

8

api_base_url must match regex: {api_url_regex}

override_api_url does not match the RegEx.

Make override_api_url match the next RegEx:
https:\/\/([a-z0-9]+[.]{1})+[a-z]+[\/]{1}

The default API URL could also work; to use it, just remove override_api_url from user configuration.

SetupError

100

The remote data is not pullable with the given credentials. Check the error traces for details.

The personal access token is not set, in user configuration.

Follow the steps described in this document to get a valid the Personal Access Token from Github and use that token in the user configuration file.

101

The token/header/authentication was refreshed but is still expired. Check the error traces for details.

The personal access token is not valid, in user configuration.

Follow the steps described in this document to get a valid the Personal Access Token from Github and use that token in the user configuration file.

102

Regenerate the expired/invalid token from Developer settings in your Github account.

The personal access token is not valid, in user configuration.

Follow the steps described in this document to get a valid the Personal Access Token from Github and use that token in the user configuration file.

201

Error, org {org} is not valid for user {username}

The username set in user configuration is not valid.

Follow the steps described in this document to get a valid Personal Access Token from Github and use that username in the user configuration file.

RequestError

600

Error in the connection, API response code {status_code}: {error_message} for {endpoint}

There was an unexpected runtime error.

Read the message and try to understand the error. If it is a 5xx error it means the error is in the API server side; wait and it will be solved.

Else, please, contact with Devo Support team.

601

Neither "x-ratelimit-reset" nor "retry-after" headers found in the response headers. This is an unexpected response.

 

There was an unexpected runtime error thrown by the API, related to the rate limiter.

This error is not handled and it should be. Please, contact with Devo Support team.

Change log

Release

Released on

Release type

Details

Recommendations

v2.3.0

NEW FEATURE
IMPROVEMENT
BUG FIXING

New features

  • Added new Enterprise Audit service

Bug fix

  • Fix missing parentheses

Improvements

  • Upgraded DCSDK from 1.10.0 to 1.11.1:

    • Introduced pyproject.toml

    • Added requirements-dev.txt

    • Fixed error in pyproject.toml related to project scripts endpoint

    • Updated DevoSDK to v5.1.9

    • Fixed some bug related to development on MacOS

    • Added an extra validation and fix when the DCSDK receives a wrong timestamp format

    • Added an optional config property for use the Syslog timestamp format in a strict way

    • Updated DevoSDK to v5.1.10

    • Fix for SyslogSender related to UTF-8

    • Enhace of troubleshooting. Trace Standardization, Some traces has been introduced.

    • Introduced a machanism to detect "Out of Memory killer" situation.

    • Changed default number for connection retries (now 7)

    • Fix for Devo connection retries

    • Added extra check for not valid message timestamps

Reommended version

v2.1.0

IMPROVEMENT

  • Upgraded DCSDK from 1.4.4 to 1.9.1:

    • Store lookup instances into DevoSender to avoid creation of new instances for the same lookup

    • Ensure service_config is a dict into templates

    • Ensure special characters are properly sent to the platform

    • Changed log level to some messages from info to debug

    • Changed some wrong log messages

    • Upgraded some internal dependencies

    • Changed queue passed to setup instance constructor

    • Ability to validate collector setup and exit without pulling any data

    • Ability to store in the persistence the messages that couldn't be sent after the collector stopped

    • Ability to send messages from the persistence when the collector starts and before the puller begins working

    • Ensure special characters are properly sent to the platform

    • Added a lock to enhance sender object

    • Added new class attrs to the setstate and getstate queue methods

    • Fix sending attribute value to the setstate and getstate queue methods

    • Added log traces when queues are full and have to wait

    • Added log traces of queues time waiting every minute in debug mode

    • Added method to calculate queue size in bytes

    • Block incoming events in queues when there are no space left

    • Send telemetry events to Devo platform

    • Upgraded internal Python dependency Redis to v4.5.4

    • Upgraded internal Python dependency DevoSDK to v5.1.3

    • Fixed obfuscation not working when messages are sent from templates d

    • New method to figure out if a puller thread is stopping

    • Upgraded internal Python dependency DevoSDK to v5.0.6

    • Improved logging on messages/bytes sent to Devo platform

    • Fixed wrong bytes size calculation for queues

    • New functionality to count bytes sent to Devo Platform (shown in console log)

    • Upgraded internal Python dependency DevoSDK to v5.0.4

    • Fixed bug in persistence management process, related to persistence reset

    • Aligned source code typing to be aligned with Python 3.9.x

    • Inject environment property from user config

    • Obfuscation service can be now configured from user config and module definiton

    • Obfuscation service can now obfuscate items inside arrays

Reommended version

v2.0.0

NEW FEATURE
IMPROVEMENT
BUG FIXING
VULNS

New features

  • Actions data source: Lists all Github Actions “workflows runs” for a repository.

  • CodeScan data source: List alerts after analyzing the code in a repository to find security vulnerabilities and coding errors.

  • Dependabot Alerts data source: List alerts created due to detecting vulnerable dependencies or malware.

  • Dependabot Secrets data source: List all secrets available in an organization without revealing their encrypted values.

  • A new Rate Limiter service has been added providing a higher granularity.

  • A feature has been added to initialize the persistence of services in a granular way through the configuration file.

Improvements

  • The pulling logic for all the services have been improved reducing the risk of duplicates.

  • Improved error management for connection issues.

  • Upgrade underlying Devo Collector SDK from v1.1.3 to v1.4.1.

  • Upgraded the underlying DevoSDK package to v3.6.4 and dependencies, this upgrade increases the resilience of the collector when the connection with Devo or the Syslog server is lost. The collector can reconnect in some scenarios without running the auto reboot service.

  • Support for stopping the collector when a GRACEFULL_SHUTDOWN system signal is received.

  • Re-enabled the logging to Devo.collector.out for Input threads.

  • Added functionality for detecting some system signals for starting the controlled stopping.

  • Added log traces for knowing system memory usage and execution environment status. Added more details in logs.

  • Added a new template functionality for easing the developing collectors (not used by this collector).

  • Refactored source code structure.

  • The Docker container exits with the proper error code.

  • Minimized probabilities of suffering a DevoSDK bug related to "sender" to be null.

  • When an exception is raised by the Collector Setup, the collector retries after 5 seconds. For consecutive exceptions, the waiting time is multiplied by 5 until hits 1800 seconds, which is the maximum waiting time allowed. No maximum retries are applied.

  • When an exception is raised by the Collector Pull method, the collector retries after 5 seconds. For consecutive exceptions, the waiting time is multiplied by 5 until hits 1800 seconds, which is the maximum waiting time allowed. No maximum retries are applied.

  • When an exception is raised by the Collector pre-pull method, the collector retries after 30 seconds. No maximum retries are applied.

Bug fixing

  • Fixed pagination and persistence bugs when pulling thousands of target repositories and events.

  • Fixed a bug in the Webhook data source that prevented complete downloading.

  • Fixed bugs related to ingestions outages.

Vulnerabilities mitigation

  • CVE-2022-1664

  • CVE-2021-33574

  • CVE-2022-23218

  • CVE-2022-23219

  • CVE-2019-8457

  • CVE-2022-1586

  • CVE-2022-1292

  • CVE-2022-2068

  • CVE-2022-1304

  • CVE-2022-1271

  • CVE-2021-3999

  • CVE-2021-33560

  • CVE-2022-29460

  • CVE-2022-29458

  • CVE-2022-0778

  • CVE-2022-2097

  • CVE-2020-16156

  • CVE-2018-2503

Reommended version
(breaking release)

  • No labels