GitHub collector
Former user (Deleted)
Configuration requirements
To run this collector, there are some configurations detailed below that you need to take into account.
Configuration | Details |
---|
Configuration | Details |
---|---|
Token | You’ll need to create an access token to authenticate the collector on the GitHub server. |
Configuration
Refer to the Vendor setup section to know more about these configurations.
Overview
GitHub is a version control platform that allows you to track changes to your codebase, flag bugs and issues for follow-up, and manage your product's build process. It simplifies the process of working with other people and makes it easy to collaborate on projects. Team members can work on files and easily merge their changes with the master branch of the project. GitHub API provides data about a hosted code repository, ranging from commits to pull requests and comments.
The Devo Github collector enables customers to retrieve data from GitHub API into Devo to query, correlate, analyze, and visualize it, enabling Enterprise IT and Cybersecurity teams to take the most impactful decisions at the petabyte scale.
Devo collector features
Feature | Details |
---|
Feature | Details |
---|---|
Allow parallel downloading ( |
|
Running environments |
|
Populated Devo events |
|
Data sources
Data Source | Description | API endpoint | Collector service name | Devo table | Available from release |
---|---|---|---|---|---|
Collaborators | Information about collaborators. |
|
|
|
|
Commits | Commits made in the repository |
|
|
|
|
Forks | Forks created in the repository |
|
|
|
|
Events | Information about the different events such as resource creations or deletions. |
|
|
|
|
Issue comments | Comments made in every issue. |
|
|
|
|
Subscribers | Information about the different users subscribed to one repository. |
|
|
|
|
Pull requests | Pull requests made in the repository. |
|
|
|
|
Subscriptions | Repositories you are subscribed. |
|
|
|
|
Releases | Information about releases made in the repository. |
|
|
|
|
Stargazers | Information about the users who starts repositories making them favorites |
|
|
|
|
Audit | Organization auditory events. |
|
|
|
|
SSO Authorizations | Single sign-on authorization. |
|
|
|
|
Webhooks | Organization created webhooks. |
|
|
|
|
Dependabot Alerts | GitHub sends Dependabot alerts when we detect that your repository uses a vulnerable dependency or malware. |
|
|
|
|
Dependabot Secrets | Lists all secrets available in an organization without revealing their encrypted values. |
|
|
|
|
Actions | GitHub Actions for a repository. |
|
|
|
|
CodeScan | Code scanning is a feature that you use to analyze the code in a GitHub repository to find security vulnerabilities and coding errors. |
|
|
|
|
GitHub Documentation
Refer to the GitHub documentation to know more about its repositories.
For more information on how the events are parsed, visit our page.
Vendor setup
To retrieve the data, we need to create an access token to authenticate the collector on the GitHub server.
SAML Authentication
If you are using SALM authentication in your account, you’ll need to authorize your token after generating it. Check how to do it in this article.
What is SAML authorization?
SAML authorization is a new feature added to the collector since v.1.2.0
. It is a markup language for security confirmations that provides a standardized way to tell external applications and services that a user is who he or she claims to be. SAML uses single sign-on (SSO) technology and allows you to authenticate a user once and then communicate that authentication to multiple applications.
Authorizing a personal access token
To use SAML, you need to authorize the token for personal use. There are two ways:
Authorize existing token
Create a new token and authorize it.
GitHub documentation
Refer to the GitHub documentation to know how to do it.
Minimum configuration required for basic pulling
Although this collector supports advanced configuration, the fields required to retrieve data with basic configuration are defined below.
This minimum configuration refers exclusively to those specific parameters of this integration. There are more required parameters related to the generic behavior of the collector. Check setting sections for details.
Setting | Details |
---|
Setting | Details |
---|---|
Token | Set up requires your access token created in the GitHub console. |
Username | Set up requires your username. |
Organization | Use this parameter to define the name of the organization that owns the repository. |
See the Accepted authentication methods section to verify what settings are required based on the desired authentication method.
Accepted authentication methods
Authentication Method | URL | Token | Username | Organization |
---|
Authentication Method | URL | Token | Username | Organization |
---|---|---|---|---|
Personal Access Token | REQUIRED | REQUIRED | REQUIRED | REQUIRED |
Run the collector
Once the data source is configured, you can either send us the required information if you want us to host and manage the collector for you (Cloud collector), or deploy and host the collector in your own machine using a Docker image (On-premise collector).
API Limitations
By default, the Rate Limit defined by GitHub is 5.000 requests per hour per authenticated user. This limit can change depending on the type of account. Basically, GitHub Enterprise Cloud accounts may have higher limits, up to 15.000 requests per hour.
Create multiple accounts
It is important to select carefully those services that are required to be monitored. However, GitHub allows the creation of one Personal Access Token per account. So, for this purpose, it would be possible to create multiple accounts and then use one account per service to be monitored. Those accounts must belong to the Organization from which the data will be pulled. GitHub only allows to have one free account, so that account will be paid account.
Collector services detail
This section is intended to explain how to proceed with specific actions for services.
Verify data collections
Once the collector has been launched, it is important to check if the ingestion is performed in a proper way. To do so, go to the collector’s logs console.
This service has the following components:
Component | Description |
---|
Component | Description |
---|---|
Setup | The setup module is in charge of authenticating the service and managing the token expiration when needed. |
Puller | The setup module is in charge of pulling the data in a organized way and delivering the events via SDK. |
Setup output
A successful run has the following output messages for the setup module:
INFO InputProcess::GithubApiserverBasePullerSetup(example_collector,github#444,actions#predefined,all) -> The token/header/authentication is defined
INFO InputProcess::GithubApiserverBasePullerSetup(example_collector,github#444,actions#predefined,all) -> The token/header/authentication is valid
INFO InputProcess::GithubApiserverBasePullerSetup(example_collector,github#444,actions#predefined,all) -> The user whatever-user belongs to whatever-company
INFO InputProcess::GithubApiserverBasePullerSetup(example_collector,github#444,actions#predefined,all) -> Finalizing the execution of setup()
INFO InputProcess::GithubApiserverBasePullerSetup(example_collector,github#444,actions#predefined,all) -> Setup for module <GithubDataPullerActions> has been successfully executed
This service lists all workflows that run for a repository in GitHub. All events of this service are ingested into the table vcs.github.repository.actions
.
Verify data collections
Puller Output
A successful initial run has the following output messages for the puller module:
Note that the PrePull
action is executed only one time before the first run of the Pull
action.
INFO InputProcess::GithubDataPullerActions(github,444,actions,predefined,all) -> Starting the execution of pre_pull()
INFO InputProcess::GithubDataPullerActions(github,444,actions,predefined,all) -> Reading persisted data
INFO InputProcess::GithubDataPullerActions(github,444,actions,predefined,all) -> No changes have been made in saved state. Returning saved state: {'pulling_date_from_config': '1640995200.0', 'last_pulled_date': '1641104263.0', 'ids': [1645452345]}
INFO InputProcess::GithubDataPullerActions(github,444,actions,predefined,all) -> GithubDataPullerActions(github,444,actions,predefined,all) Finalizing the execution of pre_pull()
INFO InputProcess::GithubDataPullerActions(github,444,actions,predefined,all) -> Starting data collection every 60 seconds
INFO InputProcess::GithubDataPullerActions(github,444,actions,predefined,all) -> Pull Started
INFO InputProcess::GithubDataPullerActions(github,444,actions,predefined,all) -> The collector will start pulling data since 2022-01-02T06:17:43Z
INFO InputProcess::GithubDataPullerActions(github,444,actions,predefined,all) -> Total number of repositories: 2
INFO InputProcess::GithubDataPullerActions(github,444,actions,predefined,all) -> Tag: vcs.github.api.repository.actions
INFO InputProcess::GithubDataPullerActions(github,444,actions,predefined,all) -> (Partial) Statistics for this pull cycle Number of requests made: 3; Number of events received: 1; Number of duplicated events filtered out: 1; Number of events generated and sent: 0; Average of events per second: 0.000.
...
INFO InputProcess::GithubDataPullerActions(github,444,actions,predefined,all) -> (Partial) Statistics for this pull cycle Number of requests made: 3; Number of events received: 1; Number of duplicated events filtered out: 1; Number of events generated and sent: 0; Average of events per second: 0.000.
After a successful collector’s execution (this is, no error logs were found), you should be able to see the following log message:
INFO InputProcess::GithubDataPullerActions(github,444,actions,predefined,all) -> Statistics for this pull cycle Number of requests made: 30; Number of events received: 317; Number of duplicated events filtered out: 11; Number of events generated and sent: 306; Average of events per second: 23.234.
The @devo_pulling_id
value is injected into each event to allow grouping all events ingested by the same pull action. You can use it to get the exact events downloaded on that Pull
action in Loxcope.
Note that a Partial
Statistics Report will be displayed after download a page when the pagination is required to pull all available events. Look for the report without the Partial
reference.
(Partial) Statistics for this pull cycle Number of requests made: Number of requests made: 2; Number of events received: 45; Number of duplicated events filtered out: 0; Number of events generated and sent: 40; Average of events per second: 23.234.
Restart the persistence
This service makes use of persistence. To restart the persistence, the since
parameter must be changed from the user configuration. This field indicates the date from which to start pulling data. For further details, go to the settings section.
This service gets the audit log (a sequence of activities) for an organization in GitHub. This service starts collecting 90 days back from the moment the persistence is reset. All events of this service are ingested into table vcs.github.organization.audit
.
This service generates a huge amount of events and it takes a lot of time and requests for the API to be up-to-date with this service.
Verify data collection
Puller output
A successful initial run has the following output messages for the puller module:
Note that the PrePull
action is executed only one time before the first run of the Pull
action.
INFO InputProcess::GithubDataPullerAudit(github,444,audit,predefined,all) -> Starting the execution of pre_pull()
INFO InputProcess::GithubDataPullerAudit(github,444,audit,predefined,all) -> Reading persisted data
INFO InputProcess::GithubDataPullerAudit(github,444,audit,predefined,all) -> GithubDataPullerAudit(github,444,audit,predefined,all) Finalizing the execution of pre_pull()
INFO InputProcess::GithubDataPullerAudit(github,444,audit,predefined,all) -> Starting data collection every 60 seconds
INFO InputProcess::GithubDataPullerAudit(github,444,audit,predefined,all) -> Pull Started
INFO InputProcess::GithubDataPullerAudit(github,444,audit,predefined,all) -> Tag: vcs.github.api.organization.audit
INFO InputProcess::GithubDataPullerAudit(github,444,audit,predefined,all) -> (Partial) Statistics for this pull cycle Number of requests made: 1; Number of events received: 30; Number of duplicated events filtered out: 0; Number of events generated and sent: 30; Average of events per second: 14.773.
INFO InputProcess::GithubDataPullerAudit(github,444,audit,predefined,all) -> (Partial) Statistics for this pull cycle Number of requests made: 2; Number of events received: 60; Number of duplicated events filtered out: 0; Number of events generated and sent: 60; Average of events per second: 14.709.
INFO InputProcess::GithubDataPullerAudit(github,444,audit,predefined,all) -> (Partial) Statistics for this pull cycle Number of requests made: 3; Number of events received: 90; Number of duplicated events filtered out: 0; Number of events generated and sent: 90; Average of events per second: 14.685.
INFO InputProcess::GithubDataPullerAudit(github,444,audit,predefined,all) -> (Partial) Statistics for this pull cycle Number of requests made: 4; Number of events received: 120; Number of duplicated events filtered out: 0; Number of events generated and sent: 120; Average of events per second: 14.865.
...
After a successful collector’s execution (this is, no error logs were found), you should be able to see the following log message. However, it takes a lot of time to reach the end of this service, as it generates a huge amount of events and starts pulling 90 days back:
INFO InputProcess::GithubDataPullerAudit(github,444,audit,predefined,all) -> Statistics for this pull cycle Number of requests made: 10000; Number of events received: 300000; Number of duplicated events filtered out: 0; Number of events generated and sent: 300000; Average of events per second: 14.865.
The value @devo_pulling_id
is injected in each event to group all events ingested by the same pull action. You can use it to get the exact events downloaded in that Pull
action in Devo’s search window.
Note that a Partial
Statistics Report will be displayed after downloading a page when the pagination is required to pull all available events. Look for the report without the Partial
reference.
(Partial) Statistics for this pull cycle Number of requests made: 4; Number of events received: 120; Number of duplicated events filtered out: 0; Number of events generated and sent: 120; Average of events per second: 14.865.
Restart the persistence
This service makes use of persistence. To restart the persistence, the persistance_reset_date
parameter must be changed from the user configuration. This field indicates the date from which to start pulling data. For further details, go to the settings section.
Code scanning is a feature that you can use to analyze the code in a GitHub repository to find security vulnerabilities and coding errors. This service returns the codescan results for each repository in case it is enabled.
Verify data collection
Puller output
A successful initial run has the following output messages for the puller module:
Note that the PrePull
action is executed only one time before the first run of the Pull
action.
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Starting the execution of pre_pull()
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Reading persisted data
WARNING InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Persistence not found, persistence will be initialized
WARNING InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> State saved: {'old_persistence_reset_date': '26-Oct', 'codescan': {}}
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> GithubDataPullerCodescan(github,444,codescan,predefined,all) Finalizing the execution of pre_pull()
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Starting data collection every 300 seconds
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Pull Started
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Get Codescan function called
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Partial statistics: Pages retrieved 10, items buffered 300
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> No more pages have been detected ahead for repo repo-1
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> New items found for repo repo-1 -> 43
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Persistence saved for repo-1 -> 9443322
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Partial statistics: Pages retrieved 10, items buffered 300
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> No more pages have been detected ahead for repo repo-2
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> New items found for repo repo-2 -> 367
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Persistence saved for repo-2 -> 4567887
....
After successful execution of the collector, you should be able to see the following log message:
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Statistics for this pull cycle Number of requests made: 120; Number of events received: 932; Number of duplicated events filtered out: 0; Number of events generated and sent: 932; Average of events per second: 23.593.
The value @devo_pulling_id
is injected in each event to group all events ingested by the same pull action. You can use it to get the exact events downloaded in that Pull
action in Devo’s search window.
Note that a Partial
Statistics Report will be displayed after download a set of 10 pages when the pagination is required to pull all available events. Look for the report without the Partial
reference.
Partial statistics: Pages retrieved 10, items buffered 300
Restart the persistence
This service makes use of persistence. To restart the persistence, the since
parameter must be changed from the user configuration. This field indicates the date from which to start pulling data. For further details, go to the settings section.
This service gets a list of collaborators for each repository in GitHub.
Verify data collection
Puller output
A successful initial run has the following output messages for the puller module:
Note that the PrePull
action is executed only one time before the first run of the Pull
action.
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Starting the execution of pre_pull()
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Reading persisted data
WARNING InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Persistence not found, persistence will be initialized
WARNING InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> State saved: {'old_persistence_reset_date': '26-Oct', 'codescan': {}}
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> GithubDataPullerCodescan(github,444,codescan,predefined,all) Finalizing the execution of pre_pull()
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Starting data collection every 300 seconds
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Pull Started
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Get Codescan function called
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Partial statistics: Pages retrieved 10, items buffered 300
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> No more pages have been detected ahead for repo repo-1
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> New items found for repo repo-1 -> 43
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Persistence saved for repo-1 -> 9443322
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Partial statistics: Pages retrieved 10, items buffered 300
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> No more pages have been detected ahead for repo repo-2
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> New items found for repo repo-2 -> 367
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Persistence saved for repo-2 -> 4567887
....
After the successful execution of the collector, you should be able to see the following log message:
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Statistics for this pull cycle Number of requests made: 120; Number of events received: 932; Number of duplicated events filtered out: 0; Number of events generated and sent: 932; Average of events per second: 23.593.
The value @devo_pulling_id
is injected in each event to group all events ingested by the same pull action. You can use it to get the exact events downloaded in that Pull
action in Devo’s search window.
Note that a Partial
Statistics Report will be displayed after download a set of 10 pages when the pagination is required to pull all available events. Look for the report without the Partial
reference.
Partial statistics: Pages retrieved 10, items buffered 300
Restart the persistence
This service makes use of persistence. To restart the persistence, the since
parameter must be changed from the user configuration. This field indicates the date from which to start pulling data. For further details, go to the settings section.
This service gets a list of collaborators for each repository in GitHub.
Verify data collection
Puller output
A successful initial run has the following output messages for the puller module:
Note that the PrePull
action is executed only one time before the first run of the Pull
action.
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Starting the execution of pre_pull()
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Reading persisted data
WARNING InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Persistence not found, persistence will be initialized
WARNING InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> State saved: {'old_persistence_reset_date': '26-Oct', 'codescan': {}}
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> GithubDataPullerCodescan(github,444,codescan,predefined,all) Finalizing the execution of pre_pull()
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Starting data collection every 300 seconds
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Pull Started
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Get Codescan function called
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Partial statistics: Pages retrieved 10, items buffered 300
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> No more pages have been detected ahead for repo repo-1
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> New items found for repo repo-1 -> 43
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Persistence saved for repo-1 -> 9443322
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Partial statistics: Pages retrieved 10, items buffered 300
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> No more pages have been detected ahead for repo repo-2
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> New items found for repo repo-2 -> 367
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Persistence saved for repo-2 -> 4567887
....
After the successful execution of the collector, you should be able to see the following log message:
INFO InputProcess::GithubDataPullerCodescan(github,444,codescan,predefined,all) -> Statistics for this pull cycle Number of requests made: 120; Number of events received: 932; Number of duplicated events filtered out: 0; Number of events generated and sent: 932; Average of events per second: 23.593.
The value @devo_pulling_id
is injected in each event to group all events ingested by the same pull action. You can use it to get the exact events downloaded in that Pull
action in Devo’s search window.
Note that a Partial
Statistics Report will be displayed after download a set of 10 pages when the pagination is required to pull all available events. Look for the report without the Partial
reference.
Partial statistics: Pages retrieved 10, items buffered 300
Restart the persistence
This service makes use of persistence. To restart the persistence, the since
parameter must be changed from the user configuration. This field indicates the date from which to start pulling data. For further details, go to the settings section.
This service lists all secrets available in an organization without revealing their encrypted values. All events of this service are ingested into table vcs.github.repository.dependabot
.
Verify data collection
Puller output
A successful initial run has the following output messages for the puller module:
Note that the PrePull
action is executed only one time before the first run of the Pull
action.
INFO InputProcess::GithubDataPullerDependabot(github,444,dependabot,predefined,all) -> Starting the execution of pre_pull()
INFO InputProcess::GithubDataPullerDependabot(github,444,dependabot,predefined,all) -> Reading persisted data
WARNING InputProcess::GithubDataPullerDependabot(github,444,dependabot,predefined,all) -> Persistence not found, persistence will be initialized
WARNING InputProcess::GithubDataPullerDependabot(github,444,dependabot,predefined,all) -> State saved: {'old_persistence_reset_date': 'test1', 'dependabot': {}}
INFO InputProcess::GithubDataPullerDependabot(github,444,dependabot,predefined,all) -> GithubDataPullerDependabot(github,444,dependabot,predefined,all) Finalizing the execution of pre_pull()
INFO InputProcess::GithubDataPullerDependabot(github,444,dependabot,predefined,all) -> Starting data collection every 60 seconds
INFO InputProcess::GithubDataPullerDependabot(github,444,dependabot,predefined,all) -> Pull Started
INFO InputProcess::GithubDataPullerDependabot(github,444,dependabot,predefined,all) -> No more pages have been detected ahead for org my-organization
INFO InputProcess::GithubDataPullerDependabot(github,444,dependabot,predefined,all) -> New items found for org my-organization -> 4
INFO InputProcess::GithubDataPullerDependabot(github,444,dependabot,predefined,all) -> Persistence saved for org my-organization -> 46463737382
....
After the successful execution of the collector, you should be able to see the following log message:
INFO InputProcess::GithubDataPullerDependabot(github,444,dependabot,predefined,all) -> Statistics for this pull cycle Number of requests made: 1; Number of events received: 4; Number of duplicated events filtered out: 0; Number of events generated and sent: 4; Average of events per second: 5.743.
The value @devo_pulling_id
is injected in each event to group all events ingested by the same pull action. You can use it to get the exact events downloaded in that Pull
action in Devo’s search window.
Note that a Partial
Statistics Report will be displayed after download a set of 10 pages when the pagination is required to pull all available events. Look for the report without the Partial
reference.
Partial statistics: Pages retrieved 10, items buffered 300
Restart the persistence
This service makes use of persistence. To restart the persistence, the persistence_reset_date
parameter must be changed from the user configuration. This field indicates the date from which to start pulling data. For further details, go to the settings section.
This service returns the Dependabot Alerts for each repository. GitHub generates an alert when a repository uses a vulnerable dependency or malware. All events of this service are ingested into table vcs.github.repository.dependabot_alerts
.
Verify data collection
Puller output
A successful initial run has the following output messages for the puller module:
Note that the PrePull
action is executed only one time before the first run of the Pull
action.
INFO InputProcess::GithubDataPullerDependabotAlerts(github,444,dependabot_alerts,predefined,all) -> Starting the execution of pre_pull()
INFO InputProcess::GithubDataPullerDependabotAlerts(github,444,dependabot_alerts,predefined,all) -> Reading persisted data
WARNING InputProcess::GithubDataPullerDependabotAlerts(github,444,dependabot_alerts,predefined,all) -> Persistence not found, persistence will be initialized
WARNING InputProcess::GithubDataPullerDependabotAlerts(github,444,dependabot_alerts,predefined,all) -> State saved: {'old_persistence_reset_date': '26-Oct', 'dependabot_alerts': {}}
INFO InputProcess::GithubDataPullerDependabotAlerts(github,444,dependabot_alerts,predefined,all) -> GithubDataPullerDependabotAlerts(github,444,dependabot_alerts,predefined,all) Finalizing the execution of pre_pull()
INFO InputProcess::GithubDataPullerDependabotAlerts(github,444,dependabot_alerts,predefined,all) -> Starting data collection every 300 seconds
INFO InputProcess::GithubDataPullerDependabotAlerts(github,444,dependabot_alerts,predefined,all) -> Pull Started
INFO InputProcess::GithubDataPullerDependabotAlerts(github,444,dependabot_alerts,predefined,all) -> Get Dependabot Alerts function called
INFO InputProcess::GithubDataPullerDependabotAlerts(github,444,dependabot_alerts,predefined,all) -> No more pages have been detected ahead for repo repo-1
INFO InputProcess::GithubDataPullerDependabotAlerts(github,444,dependabot_alerts,predefined,all) -> New items found for repo repo-1 -> 12
INFO InputProcess::GithubDataPullerDependabotAlerts(github,444,dependabot_alerts,predefined,all) -> Persistence saved for repo-1 -> 94445
INFO InputProcess::GithubDataPullerDependabotAlerts(github,444,dependabot_alerts,predefined,all) -> No more pages have been detected ahead for repo repo-2
INFO InputProcess::GithubDataPullerDependabotAlerts(github,444,dependabot_alerts,predefined,all) -> New items found for repo repo-2 -> 3
INFO InputProcess::GithubDataPullerDependabotAlerts(github,444,dependabot_alerts,predefined,all) -> Persistence saved for repo-2 -> 45556
....
After the successful execution of the collector, you should be able to see the following log message:
INFO InputProcess::GithubDataPullerDependabotAlerts(github,444,dependabot_alerts,predefined,all) -> Statistics for this pull cycle Number of requests made: 30; Number of events received: 122; Number of duplicated events filtered out: 0; Number of events generated and sent: 122; Average of events per second: 13.63.
The value @devo_pulling_id
is injected in each event to group all events ingested by the same pull action. You can use it to get the exact events downloaded in that Pull
action in Devo’s search window.
Note that a Partial
Statistics Report will be displayed after download a set of 10 pages when the pagination is required to pull all available events. Look for the report without the Partial
reference.
Partial statistics: Pages retrieved 10, items buffered 300
Restart the persistence
This service makes use of persistence. To restart the persistence, the persistence_reset_date
parameter must be changed from the user configuration. This field indicates the date from which to start pulling data. For further details, go to the settings section.
This service returns the events for each repository. All events of this service are ingested into table vcs.github.repository.events
.
Verify data collection
Puller output
A successful initial run has the following output messages for the puller module:
Note that the PrePull
action is executed only one time before the first run of the Pull
action.
INFO InputProcess::GithubDataPullerEvents(github,444,events,predefined,all) -> Starting the execution of pre_pull()
INFO InputProcess::GithubDataPullerEvents(github,444,events,predefined,all) -> Reading persisted data
WARNING InputProcess::GithubDataPullerEvents(github,444,events,predefined,all) -> Persistence not found, persistence will be initialized
WARNING InputProcess::GithubDataPullerEvents(github,444,events,predefined,all) -> State saved: {'old_persistence_reset_date': 'prueba-2', 'events': {}}
INFO InputProcess::GithubDataPullerEvents(github,444,events,predefined,all) -> GithubDataPullerEvents(github,444,events,predefined,all) Finalizing the execution of pre_pull()
INFO InputProcess::GithubDataPullerEvents(github,444,events,predefined,all) -> Starting data collection every 300 seconds
INFO InputProcess::GithubDataPullerEvents(github,444,events,predefined,all) -> Pull Started
INFO InputProcess::GithubDataPullerEvents(github,444,events,predefined,all) -> Get Events function called
INFO InputProcess::GithubDataPullerEvents(github,444,events,predefined,all) -> Partial statistics: Pages retrieved 10, items buffered 300
INFO InputProcess::GithubDataPullerEvents(github,444,events,predefined,all) -> Partial statistics: Pages retrieved 20, items buffered 600
INFO InputProcess::GithubDataPullerEvents(github,444,events,predefined,all) -> No more pages have been detected ahead for repo repo-1
INFO InputProcess::GithubDataPullerEvents(github,444,events,predefined,all) -> New items found for repo repo-1 -> 740
INFO InputProcess::GithubDataPullerEvents(github,444,events,predefined,all) -> Persistence saved for repo-1 -> 456789
INFO InputProcess::GithubDataPullerEvents(github,444,events,predefined,all) -> Partial statistics: Pages retrieved 10, items buffered 300
INFO InputProcess::GithubDataPullerEvents(github,444,events,predefined,all) -> No more pages have been detected ahead for repo repo-2
INFO InputProcess::GithubDataPullerEvents(github,444,events,predefined,all) -> New items found for repo repo-2 -> 356
INFO InputProcess::GithubDataPullerEvents(github,444,events,predefined,all) -> Persistence saved for repo-2 -> 5678567
....
After the successful execution of the collector, you should be able to see the following log message:
INFO InputProcess::GithubDataPullerEvents(github,444,events,predefined,all) -> Statistics for this pull cycle Number of requests made: 89; Number of events received: 1562; Number of duplicated events filtered out: 0; Number of events generated and sent: 1562; Average of events per second: 79.13.
The value @devo_pulling_id
is injected in each event to group all events ingested by the same pull action. You can use it to get the exact events downloaded in that Pull
action in Devo’s search window.
Note that a Partial
Statistics Report will be displayed after download a set of 10 pages when the pagination is required to pull all available events. Look for the report without the Partial
reference.
Partial statistics: Pages retrieved 10, items buffered 300
Restart the persistence
This service makes use of persistence. To restart the persistence, the persistence_reset_date
parameter must be changed from the user configuration. This field indicates the date from which to start pulling data. For further details, go to the settings section.
This service returns the forks for each repository. All events of this service are ingested into table vcs.github.repository.forks
.
Verify data collection
Puller output
A successful initial run has the following output messages for the puller module:
Note that the PrePull
action is executed only one time before the first run of the Pull
action.
INFO InputProcess::GithubDataPullerForks(github,444,forks,predefined,all) -> Starting the execution of pre_pull()
INFO InputProcess::GithubDataPullerForks(github,444,forks,predefined,all) -> Reading persisted data
WARNING InputProcess::GithubDataPullerForks(github,444,forks,predefined,all) -> Persistence not found, persistence will be initialized
WARNING InputProcess::GithubDataPullerForks(github,444,forks,predefined,all) -> State saved: {'old_persistence_reset_date': 'prueba-3', 'forks': {}}
INFO InputProcess::GithubDataPullerForks(github,444,forks,predefined,all) -> GithubDataPullerForks(github,444,forks,predefined,all) Finalizing the execution of pre_pull()
INFO InputProcess::GithubDataPullerForks(github,444,forks,predefined,all) -> Starting data collection every 300 seconds
INFO InputProcess::GithubDataPullerForks(github,444,forks,predefined,all) -> Pull Started
INFO InputProcess::GithubDataPullerForks(github,444,forks,predefined,all) -> Get Forks function called
INFO InputProcess::GithubDataPullerForks(github,444,forks,predefined,all) -> No more pages have been detected ahead for repo repo-1
INFO InputProcess::GithubDataPullerForks(github,444,forks,predefined,all) -> New items found for repo repo-1 -> 3
INFO InputProcess::GithubDataPullerForks(github,444,forks,predefined,all) -> Persistence saved for repo-1 -> 9623469344
INFO InputProcess::GithubDataPullerForks(github,444,forks,predefined,all) -> No more pages have been detected ahead for repo repo-2
INFO InputProcess::GithubDataPullerForks(github,444,forks,predefined,all) -> New items found for repo repo-2 -> 2
INFO InputProcess::GithubDataPullerForks(github,444,forks,predefined,all) -> Persistence saved for repo-2 -> 5678234564
....
After the successful execution of the collector, you should be able to see the following log message:
INFO InputProcess::GithubDataPullerForks(github,444,forks,predefined,all) -> Statistics for this pull cycle Number of requests made: 30; Number of events received: 56; Number of duplicated events filtered out: 0; Number of events generated and sent: 56; Average of events per second: 12.128.
The value @devo_pulling_id
is injected in each event to group all events ingested by the same pull action. You can use it to get the exact events downloaded in that Pull
action in Devo’s search window.
Note that a Partial
Statistics Report will be displayed after download a set of 10 pages when the pagination is required to pull all available events. Look for the report without the Partial
reference.
Partial statistics: Pages retrieved 10, items buffered 300
Restart the persistence
This service makes use of persistence. To restart the persistence, the persistence_reset_date
parameter must be changed from the user configuration. This field indicates the date from which to start pulling data. For further details, go to the settings section.
This service gets the list of issue comments made into a repository in GitHub. All events of this service are ingested into table vcs.github.repository.issue_comments
.
Verify data collection
Puller output
A successful initial run has the following output messages for the puller module:
Note that the PrePull
action is executed only one time before the first run of the Pull
action.
INFO InputProcess::GithubDataPullerIssueComments(github,444,issue_comments,predefined,all) -> Starting the execution of pre_pull()
INFO InputProcess::GithubDataPullerIssueComments(github,444,issue_comments,predefined,all) -> Reading persisted data
INFO InputProcess::GithubDataPullerIssueComments(github,444,issue_comments,predefined,all) -> No changes have been made in saved state. Returning saved state: {'pulling_date_from_config': '1640995200.0', 'last_pulled_date': '1641852073.0', 'ids': [1009384492]}
INFO InputProcess::GithubDataPullerIssueComments(github,444,issue_comments,predefined,all) -> GithubDataPullerIssueComments(github,444,issue_comments,predefined,all) Finalizing the execution of pre_pull()
INFO InputProcess::GithubDataPullerIssueComments(github,444,issue_comments,predefined,all) -> Starting data collection every 60 seconds
INFO InputProcess::GithubDataPullerIssueComments(github,444,issue_comments,predefined,all) -> Pull Started
INFO InputProcess::GithubDataPullerIssueComments(github,444,issue_comments,predefined,all) -> The collector will start pulling data since 2022-01-10T22:01:13Z
INFO InputProcess::GithubDataPullerIssueComments(github,444,issue_comments,predefined,all) -> Total number of repositories: 2
INFO InputProcess::GithubDataPullerIssueComments(github,444,issue_comments,predefined,all) -> Tag: vcs.github.api.repository.issue_comments
INFO InputProcess::GithubDataPullerIssueComments(github,444,issue_comments,predefined,all) -> (Partial) Statistics for this pull cycle Number of requests made: 2; Number of events received: 60; Number of duplicated events filtered out: 60; Number of events generated and sent: 0; Average of events per second: 0.000.
INFO InputProcess::GithubDataPullerIssueComments(github,444,issue_comments,predefined,all) -> (Partial) Statistics for this pull cycle Number of requests made: 4; Number of events received: 120; Number of duplicated events filtered out: 105; Number of events generated and sent: 15; Average of events per second: 1.307.
INFO InputProcess::GithubDataPullerIssueComments(github,444,issue_comments,predefined,all) -> (Partial) Statistics for this pull cycle Number of requests made: 6; Number of events received: 180; Number of duplicated events filtered out: 160; Number of events generated and sent: 20; Average of events per second: 1.294.
...
After the successful execution of the collector, you should be able to see the following log message:
INFO InputProcess::GithubDataPullerIssueComments(github,444,issue_comments,predefined,all) -> Statistics for this pull cycle Number of requests made: 6; Number of events received: 180; Number of duplicated events filtered out: 160; Number of events generated and sent: 20; Average of events per second: 1.294.
The value @devo_pulling_id
is injected in each event to group all events ingested by the same pull action. You can use it to get the exact events downloaded in that Pull
action in Devo’s search window.
Note that a Partial
Statistics Report will be displayed after download a page when the pagination is required to pull all available events. Look for the report without the Partial
reference.
(Partial) Statistics for this pull cycle Number of requests made: 6; Number of events received: 180; Number of duplicated events filtered out: 160; Number of events generated and sent: 20; Average of events per second: 1.294.
Restart the persistence
This service makes use of persistence. To restart the persistence, the since
parameter must be changed from the user configuration. This field indicates the date from which to start pulling data. For further details, go to the settings section.
This service returns the pull request for each repository, and the associate commits for each pull request. All events of this service are ingested into table vcs.github.repository.pull_requests
and vcs.github.repository.pull_requests
.
Verify data collection
Puller output
A successful initial run has the following output messages for the puller module:
Note that the PrePull
action is executed only one time before the first run of the Pull
action.
INFO InputProcess::GithubDataPullerPullRequests(github,444,pull_requests,predefined,all) -> Starting the execution of pre_pull()
INFO InputProcess::GithubDataPullerPullRequests(github,444,pull_requests,predefined,all) -> Reading persisted data
WARNING InputProcess::GithubDataPullerPullRequests(github,444,pull_requests,predefined,all) -> Persistence not found, persistence will be initialized
WARNING InputProcess::GithubDataPullerPullRequests(github,444,pull_requests,predefined,all) -> State saved: {'old_persistence_reset_date': '27-Oct-2022', 'pull_requests': {}, 'pull_request_commits': {}}
INFO InputProcess::GithubDataPullerPullRequests(github,444,pull_requests,predefined,all) -> GithubDataPullerPullRequests(github,444,pull_requests,predefined,all) Finalizing the execution of pre_pull()
INFO InputProcess::GithubDataPullerPullRequests(github,444,pull_requests,predefined,all) -> Starting data collection every 300 seconds
INFO InputProcess::GithubDataPullerPullRequests(github,444,pull_requests,predefined,all) -> Pull Started
INFO InputProcess::GithubDataPullerPullRequests(github,444,pull_requests,predefined,all) -> New items found for Commits PR commits -> 1
INFO InputProcess::GithubDataPullerPullRequests(github,444,pull_requests,predefined,all) -> Persistence saved for Commits PR commits -> bdb806bd6218552c7b3b6507803e48694b5591b7
INFO InputProcess::GithubDataPullerPullRequests(github,444,pull_requests,predefined,all) -> New items found for Commits PR commits -> 1
INFO InputProcess::GithubDataPullerPullRequests(github,444,pull_requests,predefined,all) -> Persistence saved for Commits PR commits -> ccc00f42d56d59bcb375a327f163bb8b737f376d
INFO InputProcess::GithubDataPullerPullRequests(github,444,pull_requests,predefined,all) -> New items found for Commits PR commits -> 17
INFO InputProcess::GithubDataPullerPullRequests(github,444,pull_requests,predefined,all) -> Persistence saved for Commits PR commits -> 8f1af254f6fffad8718c7c68a7f67778bc6c6b3f
INFO InputProcess::GithubDataPullerPullRequests(github,444,pull_requests,predefined,all) -> New items found for Commits PR commits -> 1
INFO InputProcess::GithubDataPullerPullRequests(github,444,pull_requests,predefined,all) -> Persistence saved for Commits PR commits -> a0d5d609978be33b7b7c37b2ea2a400d6102ccda
INFO InputProcess::GithubDataPullerPullRequests(github,444,pull_requests,predefined,all) -> New items found for Commits PR commits -> 8
INFO InputProcess::GithubDataPullerPullRequests(github,444,pull_requests,predefined,all) -> Persistence saved for Commits PR commits -> e99dfc7b7075250ca685106bab18f31f7e62f3c5
...
After the successful execution of the collector, you should be able to see the following log message:
INFO InputProcess::GithubDataPullerPullRequests(github,444,pull_requests,predefined,all) -> Statistics for this pull cycle Number of requests made: 172; Number of events received: 1333; Number of duplicated events filtered out: 0; Number of events generated and sent: 1333; Average of events per second: 15.183.
The value @devo_pulling_id
is injected in each event to group all events ingested by the same pull action. You can use it to get the exact events downloaded in that Pull
action in Devo’s search window.
Note that a Partial
Statistics Report will be displayed after download a set of 10 pages when the pagination is required to pull all available events. Look for the report without the Partial
reference.
Partial statistics: Pages retrieved 10, items buffered 300
Restart the persistence
This service makes use of persistence. To restart the persistence, the persistence_reset_date
parameter must be changed from the user configuration. This is a free text field. It is recommended to use a reference to the day the persistence is being reset. For further details, go to the settings section.
This service returns the releases for each repository. All events of this service are ingested into table vcs.github.repository.releases
.
Verify data collection
Puller output
A successful initial run has the following output messages for the puller module:
Note that the PrePull
action is executed only one time before the first run of the Pull
action.
INFO InputProcess::GithubDataPullerReleases(github,444,releases,predefined,all) -> Starting the execution of pre_pull()
INFO InputProcess::GithubDataPullerReleases(github,444,releases,predefined,all) -> Reading persisted data
WARNING InputProcess::GithubDataPullerReleases(github,444,releases,predefined,all) -> Persistence not found, persistence will be initialized
WARNING InputProcess::GithubDataPullerReleases(github,444,releases,predefined,all) -> State saved: {'old_persistence_reset_date': '27-Oct', 'releases': {}}
INFO InputProcess::GithubDataPullerReleases(github,444,releases,predefined,all) -> GithubDataPullerReleases(github,444,releases,predefined,all) Finalizing the execution of pre_pull()
INFO InputProcess::GithubDataPullerReleases(github,444,releases,predefined,all) -> Starting data collection every 300 seconds
INFO InputProcess::GithubDataPullerReleases(github,444,releases,predefined,all) -> Pull Started
INFO InputProcess::GithubDataPullerReleases(github,444,releases,predefined,all) -> Get Releases function called
INFO InputProcess::GithubDataPullerReleases(github,444,releases,predefined,all) -> No more pages have been detected ahead for repo repo-1
INFO InputProcess::GithubDataPullerReleases(github,444,releases,predefined,all) -> New items found for repo repo-1 -> 0
INFO InputProcess::GithubDataPullerReleases(github,444,releases,predefined,all) -> No more pages have been detected ahead for repo repo-2
INFO InputProcess::GithubDataPullerReleases(github,444,releases,predefined,all) -> New items found for repo repo-2 -> 0
....
After the successful execution of the collector, you should be able to see the following log message:
INFO InputProcess::GithubDataPullerReleases(github,444,releases,predefined,all) -> Statistics for this pull cycle Number of requests made: 30; Number of events received: 0; Number of duplicated events filtered out: 0; Number of events generated and sent: 0; Average of events per second: 0.0.
The value @devo_pulling_id
is injected in each event to group all events ingested by the same pull action. You can use it to get the exact events downloaded in that Pull
action in Devo’s search window.
Note that a Partial
Statistics Report will be displayed after download a set of 10 pages when the pagination is required to pull all available events. Look for the report without the Partial
reference.
Partial statistics: Pages retrieved 10, items buffered 300
Restart the persistence
This service makes use of persistence. To restart the persistence, the persistence_reset_date
parameter must be changed from the user configuration. This is a free text field. It is recommended to use a reference to the day the persistence is being reset. For further details, go to the settings section.
This service returns Single Sign On authorization for all organizations. All events of this service are ingested into table vcs.github.organizations.sso_authorizations
.
Verify data collection
Puller output
A successful initial run has the following output messages for the puller module:
Note that the PrePull
action is executed only one time before the first run of the Pull
action.
INFO InputProcess::GithubDataPullerSSOAuthorizations(github,444,sso_authorizations,predefined,all) -> Starting the execution of pre_pull()
INFO InputProcess::GithubDataPullerSSOAuthorizations(github,444,sso_authorizations,predefined,all) -> Reading persisted data
WARNING InputProcess::GithubDataPullerSSOAuthorizations(github,444,sso_authorizations,predefined,all) -> Persistence not found, persistence will be initialized
WARNING InputProcess::GithubDataPullerSSOAuthorizations(github,444,sso_authorizations,predefined,all) -> State saved: {'old_persistence_reset_date': '2022-10-16T12:00:00Z', 'sso_authorizations': {}}
INFO InputProcess::GithubDataPullerSSOAuthorizations(github,444,sso_authorizations,predefined,all) -> GithubDataPullerSSOAuthorizations(github,444,sso_authorizations,predefined,all) Finalizing the execution of pre_pull()
INFO InputProcess::GithubDataPullerSSOAuthorizations(github,444,sso_authorizations,predefined,all) -> Starting data collection every 60 seconds
INFO InputProcess::GithubDataPullerSSOAuthorizations(github,444,sso_authorizations,predefined,all) -> Pull Started
INFO InputProcess::GithubDataPullerSSOAuthorizations(github,444,sso_authorizations,predefined,all) -> No more pages have been detected ahead for org my-organization
INFO InputProcess::GithubDataPullerSSOAuthorizations(github,444,sso_authorizations,predefined,all) -> New items found for org my-organization -> 58
INFO InputProcess::GithubDataPullerSSOAuthorizations(github,444,sso_authorizations,predefined,all) -> Persistence saved for org my-organization -> 40487930
.....
After the successful execution of the collector, you should be able to see the following log message:
INFO InputProcess::GithubDataPullerSSOAuthorizations(github,444,sso_authorizations,predefined,all) -> Statistics for this pull cycle Number of requests made: 2; Number of events received: 58; Number of duplicated events filtered out: 0; Number of events generated and sent: 58; Average of events per second: 80.759.
The value @devo_pulling_id
is injected in each event to group all events ingested by the same pull action. You can use it to get the exact events downloaded in that Pull
action in Devo’s search window.
Note that a Partial
Statistics Report will be displayed after download a set of 10 pages when the pagination is required to pull all available events. Look for the report without the Partial
reference.
Partial statistics: Pages retrieved 10, items buffered 300