Document toolboxDocument toolbox

.Troubleshooting for Endpoint Agent vv7.5.0

This document is geared toward people in charge of Devo Endpoint Agent deployment or administration and includes ways of troubleshooting Devo EA, as well as information about common errors cases that have been identified in the field.

Deployment

This section describes typical trouble scenarios and troubleshooting guidelines for Deployment scenarios.

Controlled error messages during deployment

There are some tasks that execute commands or operations that can end with some kind of error. These tasks are designed to check some services or configurations and apply customized changes in some cases or ignore those configurations in others.  …ignoring text will be displayed just after displaying error message and can help us to identify this kind of controlled error tasks. 

For example: Task that is checking if firewalld service is running:


This error will pop up when the firewalld service is not present or it is stopped. The error can be safely ignored and does not affect the deployment. As a rule, any message that is tagged as ...ignoring can be safely ignored and has no effect in the deployment sequence.

Timeout when waiting for 127.0.0.1:8080

The deployment process cannot connect with the interface that Fleet starts in the port 8080 or the connection took longer than 60 seconds. Typically, there has been a problem in the deployment sequence and the fleet instance could not boot up. Check the section Endpoint Agent Manager and see if the logs give any hint.

The most common root cause for this error message is that the certificates used to send data from the EA Manager to Devo have not been properly configured, make sure that all the steps in the guide have been followed properly. In a default installation process, the domain-certs folder should look like the following screenshot:


Make sure that the name of the certificates are using the same names than in the screenshot above.

Also, make sure that the collector configured in the entrypoint is correct as explained in the deployment guide.

Proxy is needed to access Internet

Please, set up the following settings:

Enable and set up correctly http_proxy and https_proxy environment variables.

Enable proxy for docker environment as described https://docs.docker.com/network/proxy/ by editing file ~/.docker/config.json.

Shared connection closed

If you see this error during the deployment process (it is likely to happen at the beginning of the deployment process and do not let the process to continue)

TASK [duam-internal-services : Set hostname with name in inventory] ************
fatal: [devo-ua-manager]: FAILED! => {"changed": false, "module_stderr": "Shared connection to 10.239.74.38 closed.\r\n", "module_stdout": "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1}


The reason for this issue is that Ansible is not able to make use of SSH to properly perform the deployment. Ansible will connect via SSH to every server included in the inventory to perform the installation. The solution is to fic the environment to Ansible can make proper use of SSH.

If the deployment is being done in a single server the following workaround has proven working:

  1. Edit your inventory file.
  2. Add ansible_connection: local in the all->vars section.
  3. Save your inventory file.
  4. Run the playbook again.

Be aware that this solution will tell Ansible to not use SSH so this is only valid for local deployments.

Endpoint Agent client


This section describes typical trouble scenarios and troubleshooting guidelines for the client (OSQuery + extensions).

Effective configuration applied in a running and connected agent.

You can get the effective configuration options set value in each agent with next SQL query:

SELECT * FROM osquery_flags


One perfect way to run this query is using devo-ea-manager web UI because we are in the scenario that agent is connected to DEAM.

Packages loaded in a running and connected agent.

You can get the packs loaded for each agent with next SQL query:

SELECT * FROM osquery_packs


One perfect way to run this query is using devo-ea-manager web UI because we are on the scenario that agent is connected to DEAM.

Scheduled queries in a running and connected agent.

You can get the current scheduled queries loaded for each agent with next SQL query:

SELECT * FROM osquery_schedule


One perfect way to run this query is using devo-ea-manager web UI because we are in the scenario that agent is connected to DEAM.

Temporally set log-level to debug in agent.

You can temporally set agent level logger to debug and displaying messages to stdout following the next steps.

This configuration must be enabled only during a short period of time and use as last resource to identify issues, because this mode can severely affect to agent performance and significantly increases the amount of data ingested into Devo.

Windows platform.

All commands detailed below must be run as admin in a  Power-shell console

  • Ensure osqueryd service is stopped:
Stop-Service -Force -Name "osqueryd"
  • Run osqueryd with Debug log-level (assuming devo-ea agent was installed following default installation):
cd "c:\Program Files\osquery"
.\osqueryd\osqueryd.exe --flagfile "osquery.flags" --verbose
  • Alternatively, you can save stdout to a file running next commands, instead of running previous ones:
cd "c:\Program Files\osquery"
.\osqueryd\osqueryd.exe --flagfile "osquery.flags" --verbose 2>&1 | Tee "$Env:HOMEPATH\Desktop\devo-ea-agent-verbose.out"
  • Stop typing Ctrl + C in PowerShell console to stop current debug process when you will finish your tests or probes.
  • Start service:
Start-Service -Force -Name "osqueryd"


Linux platform

Next commands are based on systemd init system. Adapt start/stop services for other init systems.

  • Ensure osqueryd service is stopped:
sudo systemctl stop osqueryd
  • Run osqueryd with Debug log-level (assuming devo-ea agent was installed following default installation):
sudo osqueryd --flagfile "/etc/osquery/osquery.flags" --verbose
  • Alternatively, you can save stdout to a file running next commands, instead of running previous ones:
sudo osqueryd --flagfile "/etc/osquery/osquery.flags" --verbose 2>&1 | tee "$HOME/devo-ea-agent-verbose.out"
  • Stop typing Ctrl + C in previous console to stop current debug process when you will finish your tests or probes.
  • Set right owner if you saved output to a file:
sudo chown $(id -u):$(id -g) "$HOME/devo-ea-agent-verbose.out"
  • Start service:
sudo systemctl start osqueryd


macOS platform

  • Ensure osqueryd service is stopped:
sudo osqueryctl stop
  • Run osqueryd with Debug log-level (assuming devo-ea agent was installed following default installation):
sudo osqueryd /private/var/osquery/osquery.flags --verbose
  • Alternatively, you can save stdout to a file running next commands, instead of run previous ones:
sudo osqueryd /private/var/osquery/osquery.flags --verbose 2>&1 | tee "$HOME/devo-ea-agent-verbose.out"
  • Stop typing Ctrl + C in previous console to stop current debug process when you will finish your tests or probes.
  • Set right owner if you saved output to a file:
sudo chown $(id -u):$(id -g) "$HOME/devo-ea-agent-verbose.out"
  • Start service:
sudo osqueryctl start

My agent host is not showing up in the Manager Interface!

  1. Make sure you can reach the DEAM manager from your client:
    a. Linux: sudo telnet devo-ea-manager:8080
    b. Windows: Open a PS shell in admin mode: Test-NetConnection -ComputerName devo-ea-manager -InformationLevel Detailed -Port 8080
  2. Make sure there is no firewall or antivirus interfering with the connection.
    a. Windows: We have seen previous issues with some installed antivirus: It could be necessary to create an Outbound rule both in antivirus and windows firewall to enable communication with the manager. After rule is created, test again with the step in the previous step. In order to check FW status: netsh firewall show state

Endpoint Agent Manager

This section describes typical trouble scenarios and troubleshooting guidelines for the manager (Fleet).

How to check DEAM logs

  • systemctl status devo-ea-manager to check status of the process.
  • journalctl -u devo-ea-manager to check manager logs.
  • journalctl -fu devo-ea-manager to check real time logs.

DEAM certificates were not properly generated or uploaded

If you see error messages similar to next in DEAM logs (journalctl -u devo-ea-manager):

...
Dec 02 05:52:40 devo-ea-manager fleet[1517]: {"terminated":"open /etc/devo-ea-manager/certs/devo-ea-manager.key: no such file or directory","ts":"2020-12-02T10:52:40.484570752Z"}
...
...
Dec 02 05:53:52 devo-ea-manager fleet[1538]: {"terminated":"open /etc/devo-ea-manager/certs/devo-ea-manager.crt: no such file or directory","ts":"2020-12-02T10:53:52.453080543Z"}
...


If you provided your custom certificates ensure that they are placed in the provided-deam-certs folder under devo-ea-deployer path.

Then run the ansible command again (follow steps described in the deployment guide).

If you are delegating creation of self-signed certificates to devo-ea-deployer, run ansible command again and pay attention to messages marked with the service-certificates tag. This can help you to identify the root cause.

Domain certificates were not properly configured


If you see error messages similar to next in DEAM logs (journalctl -u devo-ea-manager):

..
Dec 02 06:01:54 devo-ea-manager fleet[1660]: {"certPath":"/etc/devo-ea-manager/devo-certs/domain.crt","chainPath":"/etc/devo-ea-manager/devo-certs/chain.crt","component":"Devo-logging","keyPath":"/etc/devo-ea-manager/devo-certs/domain.key","level":"info","msg":"Setup Devo connection using TLS","ts":"2020-12-02T11:01:54.919850542Z"}
Dec 02 06:01:54 devo-ea-manager fleet[1660]: Error initializing service: initializing osquery logging: create devo status logger: create Devo client (TLS): could not load keypair /etc/devo-ea-manager/devo-certs/domain.crt:/etc/devo-ea-manager/devo-certs/domain.key: open /etc/devo-ea-manager/devo-certs/domain.key: no such file or directory
Dec 02 06:01:54 devo-ea-manager systemd[1]: devo-ea-manager.service: Main process exited, code=exited, status=1/FAILURE
...
...
Dec 02 06:04:25 devo-ea-manager fleet[1749]: {"certPath":"/etc/devo-ea-manager/devo-certs/domain.crt","chainPath":"/etc/devo-ea-manager/devo-certs/chain.crt","component":"Devo-logging","keyPath":"/etc/devo-ea-manager/devo-certs/domain.key","level":"info","msg":"Setup Devo connection using TLS","ts":"2020-12-02T11:04:25.923778139Z"}
Dec 02 06:04:25 devo-ea-manager fleet[1749]: Error initializing service: initializing osquery logging: create devo status logger: create Devo client (TLS): could not load keypair /etc/devo-ea-manager/devo-certs/domain.crt:/etc/devo-ea-manager/devo-certs/domain.key: open /etc/devo-ea-manager/devo-certs/domain.crt: no such file or directory
Dec 02 06:04:25 devo-ea-manager systemd[1]: devo-ea-manager.service: Main process exited, code=exited, status=1/FAILURE
...
...
Dec 02 06:05:21 devo-ea-manager fleet[1847]: {"certPath":"/etc/devo-ea-manager/devo-certs/domain.crt","chainPath":"/etc/devo-ea-manager/devo-certs/chain.crt","component":"Devo-logging","keyPath":"/etc/devo-ea-manager/devo-certs/domain.key","level":"info","msg":"Setup Devo connection using TLS","ts":"2020-12-02T11:05:21.667553321Z"}
Dec 02 06:05:21 devo-ea-manager fleet[1847]: Error initializing service: initializing osquery logging: create devo status logger: create Devo client (TLS): could not read certificate "/etc/devo-ea-manager/devo-certs/chain.crt": open /etc/devo-ea-manager/devo-certs/chain.crt: no such file or directory
Dec 02 06:05:21 devo-ea-manager systemd[1]: devo-ea-manager.service: Main process exited, code=exited, status=1/FAILURE
...
...
Dec 02 06:07:32 devo-ea-manager fleet[2005]: {"certPath":"/etc/devo-ea-manager/devo-certs/domain.crt","chainPath":"/etc/devo-ea-manager/devo-certs/chain.crt","component":"Devo-logging","keyPath":"/etc/devo-ea-manager/devo-certs/domain.key","level":"info","msg":"Setup Devo connection using TLS","ts":"2020-12-02T11:07:32.912871438Z"}
Dec 02 06:07:32 devo-ea-manager fleet[2005]: Error initializing service: initializing osquery logging: create devo status logger: create Devo client (TLS): could not load keypair /etc/devo-ea-manager/devo-certs/domain.crt:/etc/devo-ea-manager/devo-certs/domain.key: tls: failed to parse private key
Dec 02 06:07:32 devo-ea-manager systemd[1]: devo-ea-manager.service: Main process exited, code=exited, status=1/FAILURE
...


...
Dec 02 06:09:01 devo-ea-manager fleet[2083]: {"certPath":"/etc/devo-ea-manager/devo-certs/domain.crt","chainPath":"/etc/devo-ea-manager/devo-certs/chain.crt","component":"Devo-logging","keyPath":"/etc/devo-ea-manager/devo-certs/domain.key","level":"info","msg":"Setup Devo connection using TLS","ts":"2020-12-02T11:09:01.417074837Z"}
Dec 02 06:09:01 devo-ea-manager fleet[2083]: Error initializing service: initializing osquery logging: create devo status logger: create Devo client (TLS): could not load keypair /etc/devo-ea-manager/devo-certs/domain.crt:/etc/devo-ea-manager/devo-certs/domain.key: asn1: structure error: tags don't match (16 vs {class:2 tag:24 length:1122 isCompound:false}) {optional:false explicit:false application:false private:false defaultValue:<nil> tag:<nil> stringType:0 timeType:0 set:false omitEmpty:false} certificate @4
Dec 02 06:09:01 devo-ea-manager systemd[1]: devo-ea-manager.service: Main process exited, code=exited, status=1/FAILURE
...
...
Dec 02 06:09:59 devo-ea-manager fleet[2131]: {"certPath":"/etc/devo-ea-manager/devo-certs/domain.crt","chainPath":"/etc/devo-ea-manager/devo-certs/chain.crt","component":"Devo-logging","keyPath":"/etc/devo-ea-manager/devo-certs/domain.key","level":"info","msg":"Setup Devo connection using TLS","ts":"2020-12-02T11:09:59.176293468Z"}
Dec 02 06:09:59 devo-ea-manager fleet[2131]: Error initializing service: initializing osquery logging: create devo status logger: create Devo client (TLS): x509: certificate signed by unknown authority
Dec 02 06:09:59 devo-ea-manager systemd[1]: devo-ea-manager.service: Main process exited, code=exited, status=1/FAILURE
...
...
Dec 09 16:14:15 devo-ea-manager fleet[16683]: {"certPath":"/etc/devo-ea-manager/devo-certs/domain.crt","chainPath":"/etc/devo-ea-manager/devo-certs/chain.crt","component":"Devo-logging","keyPath":"/etc/devo-ea-manager/devo-certs/domain.key","level":"info","msg":"Setup Devo connection using TLS","ts":"2020-12-09T15:14:15.95249458Z"}
Dec 09 16:14:15 devo-ea-manager fleet[16683]: Error initializing service: initializing osquery logging: create devo status logger: create Devo client (TLS): could not load keypair /etc/devo-ea-manager/devo-certs/domain.crt:/etc/devo-ea-manager/devo-certs/domain.key: tls: found a certificate rather than a key in the PEM for the private key
Dec 09 16:14:15 devo-ea-manager systemd[1]: devo-ea-manager.service: main process exited, code=exited, status=1/FAILURE
...


The most likely cause is that the domain certificates were not properly configured. Ensure that you deploy them following instructions described in the deployment guide.

One point to check is that the certificates are owned by the ansible use configured in inventories, and not by root.

Then run the ansible command again (follow steps described in the deployment guide).

MySQL issues


Error 500 after login / "mysql":"could not connect to db: dial tcp [::1]:3306: connect: connection refused

If you get a 500 error right after logging in, and a screen similar to the following, you are likely running into MySQL issues. Verify that you have connectivity with MySQL.


The EA Manager may also show some of the following errors:

Jun 28 15:37:14 devo-ea-manager fleet[710]: {"component":"service","err":"find user: dial tcp [::1]:3306: connect: connection refused","level":"info","method":"Login","took":"787.149µs","ts":"2021-06-28T15:37:14.012392917Z","user":"admin"}


Jun 28 15:35:44 devo-ea-manager fleet[710]: {"component":"service","err":"SSOSettings getting app config: selecting app config: dial tcp [::1]:3306: connect: connection refused","level":"info","method":"SSOSettings","took":"729.832µs","ts":"2021-06-28T15:35:44.770049722Z"}


Jun 28 15:35:33 devo-ea-manager fleet[710]: {"component":"http","err":"authentication error: finding host","ts":"2021-06-28T15:35:33.429262653Z"}


If using dockerized version of the internal services (MySQL and Redis) check that the dockers are up & running in the EA Manager server with: sudo docker ps -a.

Execute the following to restart the internal services:

cd /srv/deam-internal-services
/usr/local/bin/docker-compose down
/usr/local/bin/docker-compose up -d mysql redis
systemctl restart devo-ea-manager


REDIS issues

REDIS is not in the critical path unless the labeling feature in the EA Manager is in use. REDIS issues will surface with the following symptoms:

  • Cannot run Live Queries.
  • Labeling does not work.
  • Errors in EA Manager logs.

If you see a similar error when accessing the Queries menu in the Web UI, it's likely your REDIS instance is not available:

 

Send events through a Relay

Assuming that the relay in-house IP is 192.168.43.147, you should configure deam_relay_entrypoint: tcp://192.168.43.147:13000
Example snipped of inventory file based on that configuration:

all:
  vars:
    ...
    deam_relay_entrypoint: tcp://192.168.43.147:13000
    deam_devo_key: ""
    deam_devo_cert: ""
    deam_devo_chain: ""
    ...


Modify listen port of Devo EA package repository

Ensure that the 8081 port is available and isn’t busy by another service in the client infrastructure. To overwrite the port you must add in the inventory file based on that configuration the parameter:

all:
  vars:
    ...
    dea_ap_repo_port: <port>
    ...


Events are not ingested in right Devo domain

Ensure that right devo-domain certificates were configured. You can inspect domain.crt certificate with next command:

openssl x509 -in /etc/devo-ea-manager/devo-certs/domain.crt -text -noout


Then look for a line similar to:

Subject: C = SP, ST = Madrid, L = Madrid, O = LogTrust, CN = XXXX 


CN value should be the domain name.

In the same output, you can look for a line similar to:

Issuer: C = ES, ST = Madrid, L = Madrid, O = LogTrust, OU = LogTrust AWS USA Clients, CN = userAWSUSACA


CN = userAWSxxxxx indicates to us which is the site that creates certificates. US in the example, that matches with https://us.devo.com

Missing --devo_relay/KOLIDE_DEVO_RELAY setting in deam-fleet configuration 

You will see similar traces in DEAM logs and process does not start when Devo relay property was not set. 

panic: runtime error: index out of range [1] with length 1

goroutine 1 [running]:
github.com/fleetdm/fleet/server/logging.devoConnection(0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x1267020, 0xc000282b10, ...)
        /usr/src/app/server/logging/devo.go:316 +0x7a4
github.com/fleetdm/fleet/server/logging.NewDevoLogWriter(0x0, 0x0, 0xff4e06, 0xb, 0x10002d2, 0x15, 0x0, 0x0, 0x0, 0x0, ...)
        /usr/src/app/server/logging/devo.go:73 +0xb8
github.com/fleetdm/fleet/server/logging.New(0xfed8ff, 0x3, 0xff7ec3, 0xe, 0xfefd9d, 0x6, 0xfefd9d, 0x6, 0xfefd9d, 0x6, ...)
        /usr/src/app/server/logging/logging.go:152 +0x3db
github.com/fleetdm/fleet/server/service.NewService(0x129cfa0, 0xc0004aa000, 0x12795a0, 0xc00020a060, 0x1267020, 0xc000282b10, 0xfed8ff, 0x3, 0xff7ec3, 0xe, ...)
        /usr/src/app/server/service/service.go:27 +0x83
main.createServeCmd.func1(0xc00049ef00, 0xc000205b00, 0x0, 0x4)
        /usr/src/app/cmd/fleet/serve.go:184 +0x836
github.com/spf13/cobra.(*Command).execute(0xc00049ef00, 0xc000205a40, 0x4, 0x4, 0xc00049ef00, 0xc000205a40)
        /go/pkg/mod/github.com/spf13/cobra@v0.0.2/command.go:760 +0x29d
github.com/spf13/cobra.(*Command).ExecuteC(0xc00049e780, 0xc0005dff58, 0x1, 0x1)
        /go/pkg/mod/github.com/spf13/cobra@v0.0.2/command.go:846 +0x2ea
github.com/spf13/cobra.(*Command).Execute(...)
        /go/pkg/mod/github.com/spf13/cobra@v0.0.2/command.go:794
main.main()
        /usr/src/app/cmd/fleet/main.go:29 +0x1cd


devo-ea-deployer installation procedure fills this value from deam_relay_entrypoint variable (set to tcp://us.elb.relay.logtrust.net:443 value by default) the value is assigned to KOLIDE_DEVO_RELAY environment variable configured in /etc/devo-ea-manager/devo-ea-manager file by default.

Sending Windows Events for testing, using the command line

The Windows Events are generated by the system according to some internal conditions, not controlled directly by the system administrators. Sometimes is useful for testing purposes to be able to send certain events by demand., for instance to check that we are receiving events of some type. This is possible using the windows utility eventcreate from the command line. For example: 

eventcreate /t ERROR /id 100 /l application /d "test event"

A complete description of the tool and more examples can be found here.