Document toolboxDocument toolbox

Troubleshooting for Endpoint Agent

This document is geared toward people in charge of Devo Endpoint Agent deployment or administration and includes ways of troubleshooting Devo EA, as well as information about common errors cases that have been identified in the field.

Deployment

This section describes typical trouble scenarios and troubleshooting guidelines for deployment scenarios.

Controlled error messages during deployment

There are some tasks that execute commands or operations that can end with some kind of error. These tasks are designed to check some services or configurations and apply customized changes in some cases or ignore those configurations in others.  …ignoring text will be displayed just after displaying error message and can help us to identify these kinds of controlled error tasks. 

For example: Task that is checking if the firewalld service is running:


This error will pop up when the firewalld service is not present or it is stopped. The error can be safely ignored and does not affect the deployment. As a rule, any message that is tagged as ...ignoring can be safely ignored and has no effect in the deployment sequence.

Timeout when waiting for 127.0.0.1:8080

The deployment process cannot connect with the interface that Fleet starts in the port 8080 or the connection took longer than 60 seconds. Typically, there has been a problem in the deployment sequence and the fleet instance could not boot up. Check the Endpoint Agent Manager section and see if the logs give any hint as to what they problem may be.

The most common root cause for this error message is that the certificates used to send data from the EA Manager to Devo have not been properly configured, make sure that all the steps in the guide have been followed properly. In a default installation process, the domain-certs folder should look like the following screenshot:


Make sure that the name of the certificates are using the same names as in the screenshot above.

Also, make sure that the collector configured in the entrypoint is correct as explained in the deployment guide.

Proxy is needed to access Internet

Set up the following settings:

Enable and set up correctly http_proxy and https_proxy environment variables.

Enable proxy for docker environment as described https://docs.docker.com/network/proxy/ by editing file ~/.docker/config.json.

Shared connection closed

If you see this error during the deployment process (it is likely to happen at the beginning of the deployment process and do not let the process continue if so):

TASK [duam-internal-services : Set hostname with name in inventory] ************
fatal: [devo-ua-manager]: FAILED! => {"changed": false, "module_stderr": "Shared connection to 10.239.74.38 closed.\r\n", "module_stdout": "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1}


The reason for this issue is that Ansible is not able to make use of SSH to properly perform the deployment. Ansible will connect via SSH to every server included in the inventory to perform the installation. The solution is to fix the environment so Ansible can make proper use of SSH.

If the deployment is being done in a single server the following workaround has been proved to work:

  1. Edit your inventory file.
  2. Add ansible_connection: local in the all->vars section.
  3. Save your inventory file.
  4. Run the playbook again.

Be aware that this solution will tell Ansible to not use SSH so this is only valid for local deployments.

Unable to find any of pip3/pip to use.  pip needs to be installed (can appear with both pip or pip3)

If once requirements are installed executing:

ansible-galaxy install -r requirements.txt

From now on it is assumed that the issue is related with pip3, but the procedure works in the same way with pip.

... you still get the following error launching devo-endpoint-agent playbook...

ASK [geerlingguy.pip : Ensure pip_install_packages are installed.] ***************************************************************************
failed: [devo-ea-manager] (item={'name': 'pip', 'state': 'latest'}) => {"ansible_loop_var": "item", "changed": false, "item": {"name": "pip", "state": "latest"}, "msg": "Unable to find any of pip3 to use.  pip needs to be installed."}
failed: [devo-ea-manager] (item={'name': 'docker'}) => {"ansible_loop_var": "item", "changed": false, "item": {"name": "docker"}, "msg": "Unable to find any of pip3 to use.  pip needs to be installed."}
failed: [devo-ea-manager] (item={'name': 'docker-compose'}) => {"ansible_loop_var": "item", "changed": false, "item": {"name": "docker-compose"}, "msg": "Unable to find any of pip3 to use.  pip needs to be installed."}


Since Ansible will use sudo to execute the tasks, you should check if pip3 is in root user $PATH using which:

This is an example, paths may differ depending on the host.


user@server:~/ sudo -s
root@server:/# which pip3
/usr/bin/pip3
root@server:/# echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/sbin:/bin


Since /usr/bin is not included in $PATH it must be added:

root@server:/# export PATH=$PATH:$(which pip3 | sed 's/\/pip3//')
root@server:/# echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/sbin:/bin:/usr/bin


And then launch the playbook again:

ansible-galaxy -i ... devo-endpoint-agent.yaml


If error persists, it may be due to sudoers secure_path configuration,  which limits value of root user environment variable $PATH.

To check it use the following:

user@server:~/ sudo cat /etc/sudoers | grep secure_path | grep $(which pip3 | sed 's/\/pip3//')


If nothing is returned it must be added to the sudoers file.

First of all make a copy of the sudoers file as backup:

user@server:~/ sudo cp /etc/sudoers /etc/sudoers_bck


Modify the original and check if it is ok:

user@server:~/ sudo which pip3
/usr/bin/pip3
user@server:~/ sudo sed -i '/.*secure_path.*/ s/"$/:\/usr\/bin"/' /etc/sudoers
user@server:~/ sudo cat /etc/sudoers | grep secure_path | grep $(which pip3 | sed 's/\/pip3//')
Defaults        secure_path="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"


And then launch the playbook again:

ansible-galaxy -i ... devo-endpoint-agent.yaml

Unsupported nginxinc dependency roles.


If you see one or both errors similar to following ones:

TASK [nginxinc.nginx : (Amazon Linux/CentOS/RHEL) Configure NGINX repository] *************************************************************************************************************************************
fatal: [devo-ea-manager]: FAILED! => {"changed": false, "msg": "Unsupported parameters for (yum_repository) module: module_hotfixes Supported parameters include: async, attributes, backup, bandwidth, baseurl, content, cost, delimiter, deltarpm_metadata_percentage, deltarpm_percentage, description, directory_mode, enabled, enablegroups, exclude, failovermethod, file, follow, force, gpgcakey, gpgcheck, gpgkey, group, http_caching, include, includepkgs, ip_resolve, keepalive, keepcache, metadata_expire, metadata_expire_filter, metalink, mirrorlist, mirrorlist_expire, mode, name, owner, params, password, priority, protect, proxy, proxy_password, proxy_username, regexp, remote_src, repo_gpgcheck, reposdir, retries, s3_enabled, selevel, serole, setype, seuser, skip_if_unavailable, src, ssl_check_cert_permissions, sslcacert, sslclientcert, sslclientkey, sslverify, state, throttle, timeout, ui_repoid_vars, unsafe_writes, username"}


or

TASK [nginxinc.nginx_config : Ensure NGINX HTTP directory exists] *************************************************************************************************************************************************
fatal: [devo-ea-manager]: FAILED! => {"msg": "Invalid data passed to 'loop', it requires a list, got this instead: {'agents': {'template_file': 'http/default.conf.j2', 'conf_file_name': 'agents.conf', 'conf_file_location': '/etc/nginx/conf.d/', 'servers': {'agents': {'listen': {'listen_localhost': {'ip': '0.0.0.0', 'port': 8081, 'ssl': True}}, 'server_name': 'devo-ea-manager', 'http_error_pages': {}, 'error_page': '/var/devo-ea-manager/agents/www', 'access_log': [{'name': 'main', 'location': '/var/log/nginx/agents-access.log'}], 'error_log': {'location': '/var/log/nginx/agents-error.log', 'level': 'warn'}, 'root': '/var/devo-ea-manager/agents/www', 'autoindex': True, 'auth_basic': 'Devo Endpoint agent repo', 'auth_basic_user_file': '/etc/nginx/agents_password', 'client_max_body_size': '100m', 'ssl': {'cert': '/etc/nginx/certs/dea-agents-repo.crt', 'key': '/etc/nginx/certs/dea-agents-repo.key'}}}}}. Hint: If you passed a list/dict of just one element, try adding wantlist=True to your lookup invocation or use q/query instead of lookup."}


Usually these errors are related to the version of nginxinc dependency roles. 

Implement the following steps to fix it:

  • Edit playbooks/roles/requirements.txt file (with vi for example), and set fixed version for nginxinc.nginx and/or nginxinc.nginx_config roles. nginxinc.nginx version must be 0.21.0 and nginxinc.nginx_config version must be 0.3.3:


- src: nginxinc.nginx
  version: 0.21.0
- src: nginxinc.nginx_config
  version: 0.3.3
  • Uninstall current versions of these roles using ansible-galaxy:
ansible-galaxy remove nginxinc.nginx
ansible-galaxy remove nginxinc.nginx_config
  • Reinstall roles dependencies using ansible-galaxy with -f option:
ansible-galaxy install -r playbooks/roles/requirements.yaml -f
  • Continue from step that calls ansible-playbook command with the deployment/configuration process.

Cannot start NGINX (address already in use)


If the ansible playbook fails because NGINX cannot start, and with the following errors:


 

These errors are caused because there is another service running in the port 80. The EA Manager is intended to run by itself and to not share resources with other products. While NGINX will only listen in the port 8081 (by default), during the installation process it will use the port 80 until the port 8081 is configured.

To solve the issue, stop the service from listening in the port 80 and run the playbook again.

NGINX deployment management

If you are experiencing issues when deploying NGINX using Ansible, EA Deployer gives you the option to ignore the installation and do it manually. NGINX works decoupled from the EA Manager's normal service, and it is only used to provide a repository where you will be able to download the generated agents. This is only available from version 1.2.1 on.

Include one of the following variables in your inventory to disable NGINX deployment. Using this variable requires the user to deploy manually the NGINX server:

all:
  vars:
      dea_ap_deploy_nginx_software_base: false


dea_ap_deploy_nginx_software_base: completely disables the NGINX software deployment. However, NGINX http server configuration (ansible role: nginxinc.nginx_config) will still run and configure your service appropriately.

Deployment pre-check of https://pkg.osquery.io/ URL fails


(Only in 1.2.1) If you see the following error during execution of pre-checks: Checking url access with curl ansible task when devo-ea-deployer.yaml playbook is executing:


TASK [pre-checks : Checking url access with curl] *****************************************************************************************************************************************************************
failed: [devo-ea-manager] (item={'url': 'https://pkg.osquery.io/', 'options': '-I', 'expected_pattern': 'HTTP/.*200'}) => {"ansible_loop_var": "item", "changed": true, "cmd": "echo -n 'https://pkg.osquery.io/ ..
.' >> /tmp/dead/pre-checks/checks.log\ncurl -sS -o /tmp/dead/pre-checks/out.std --max-time 15 -I 'https://pkg.osquery.io/'\nrc=$?\nif [ \"${rc}\" != 0 ]\nthen\n  echo \"curl ERROR\" >> /tmp/dead/pre-checks/check
s.log\n  exit $rc\nfi\nif [ \"HTTP/.*200\" != \"\" ]\nthen\n  if grep -E 'HTTP/.*200' \"/tmp/dead/pre-checks/out.std\" > /dev/null\n  then\n    echo \"OK\"\n    echo \"OK\" >> /tmp/dead/pre-checks/checks.log\n
else\n    echo 'Pattern HTTP/.*200 not found when load https://pkg.osquery.io/'\n    echo 'Pattern HTTP/.*200 not found' >> /tmp/dead/pre-checks/checks.log\n    exit 1\n  fi\nfi\n", "delta": "0:00:00.293876", "e
nd": "2022-02-21 14:17:52.835791", "item": {"expected_pattern": "HTTP/.*200", "options": "-I", "url": "https://pkg.osquery.io/"}, "msg": "non-zero return code", "rc": 1, "start": "2022-02-21 14:17:52.541915", "s
tderr": "", "stderr_lines": [], "stdout": "Pattern HTTP/.*200 not found when load https://pkg.osquery.io/", "stdout_lines": ["Pattern HTTP/.*200 not found when load https://pkg.osquery.io/"]}
changed: [devo-ea-manager] => (item={'url': 'https://nginx.org/', 'options': '-I', 'expected_pattern': 'HTTP/.*200'})

PLAY RECAP ********************************************************************************************************************************************************************************************************
devo-ea-manager            : ok=32   changed=11   unreachable=0    failed=1    skipped=11   rescued=0    ignored=0


And which failing URL checked is 'url': 'https://pkg.osquery.io/'

The reason is that pkg.osquery.io service was migrated and it now does a redirection (HTTP 302) instead of replying with a HTTP 200 as usual. New configuration allows redirections when performing the pre-check.

You can update the check parameters with the following command (assuming that the working path is the path where  devo-ea-deployer-1.2.1.tgz was extracted):

[ "$(md5sum playbooks/roles/pre-checks/defaults/main.yml | cut -f 1 -d' ')" == "3da20fb33ea5a9258b82488edf8aec36" ] \
  && sed -i '71s/$/L/' playbooks/roles/pre-checks/defaults/main.yml \
  || echo "ERROR md5 code returned is not matching with the expected value, or error opening file"


If no error is returned after running these commands then you will be able to run devo-ea-manager.yaml ansible-playbook again.

Endpoint Agent client


This section describes typical trouble scenarios and troubleshooting guidelines for the client (OSQuery + extensions).

Effective configuration applied in a running and connected agent.

You can get the effective configuration options set value in each agent with next SQL query:

SELECT * FROM osquery_flags


One perfect way to run this query is using devo-ea-manager web UI because we are in the scenario that agent is connected to DEAM.

Packages loaded in a running and connected agent.

You can get the packs loaded for each agent with next SQL query:

SELECT * FROM osquery_packs


One perfect way to run this query is using devo-ea-manager web UI because we are on the scenario that agent is connected to DEAM.

Scheduled queries in a running and connected agent.

You can get the current scheduled queries loaded for each agent with next SQL query:

SELECT * FROM osquery_schedule


One perfect way to run this query is using devo-ea-manager web UI because we are in the scenario that agent is connected to DEAM.

Temporally set log-level to debug in agent.

You can temporally set agent level logger to debug and displaying messages to stdout following the next steps.

This configuration must be enabled only during a short period of time and use as last resource to identify issues, because this mode can severely affect to agent performance and significantly increases the amount of data ingested into Devo.

Windows platform.

All commands detailed below must be run as admin in a  Power-shell console

  • Ensure osqueryd service is stopped:
Stop-Service -Force -Name "osqueryd"
  • Run osqueryd with Debug log-level (assuming devo-ea agent was installed following default installation):
cd "c:\Program Files\osquery"
.\osqueryd\osqueryd.exe --flagfile "osquery.flags" --verbose
  • Alternatively, you can save stdout to a file running next commands, instead of running previous ones:
cd "c:\Program Files\osquery"
.\osqueryd\osqueryd.exe --flagfile "osquery.flags" --verbose 2>&1 | Tee "$Env:HOMEPATH\Desktop\devo-ea-agent-verbose.out"
  • Stop typing Ctrl + C in PowerShell console to stop current debug process when you will finish your tests or probes.
  • Start service:
Start-Service -Force -Name "osqueryd"


Linux platform

Next commands are based on systemd init system. Adapt start/stop services for other init systems.

  • Ensure osqueryd service is stopped:
sudo systemctl stop osqueryd
  • Run osqueryd with Debug log-level (assuming devo-ea agent was installed following default installation):
sudo osqueryd --flagfile "/etc/osquery/osquery.flags" --verbose
  • Alternatively, you can save stdout to a file running next commands, instead of running previous ones:
sudo osqueryd --flagfile "/etc/osquery/osquery.flags" --verbose 2>&1 | tee "$HOME/devo-ea-agent-verbose.out"
  • Stop typing Ctrl + C in previous console to stop current debug process when you will finish your tests or probes.
  • Set right owner if you saved output to a file:
sudo chown $(id -u):$(id -g) "$HOME/devo-ea-agent-verbose.out"
  • Start service:
sudo systemctl start osqueryd


macOS platform

  • Ensure osqueryd service is stopped:
sudo osqueryctl stop
  • Run osqueryd with Debug log-level (assuming devo-ea agent was installed following default installation):
sudo osqueryd /private/var/osquery/osquery.flags --verbose
  • Alternatively, you can save stdout to a file running next commands, instead of run previous ones:
sudo osqueryd /private/var/osquery/osquery.flags --verbose 2>&1 | tee "$HOME/devo-ea-agent-verbose.out"
  • Stop typing Ctrl + C in previous console to stop current debug process when you will finish your tests or probes.
  • Set right owner if you saved output to a file:
sudo chown $(id -u):$(id -g) "$HOME/devo-ea-agent-verbose.out"
  • Start service:
sudo osqueryctl start

My agent host is not showing up in the Manager Interface.

  1. Make sure you can reach the DEAM manager from your client:
    a. Linux: sudo telnet devo-ea-manager:8080
    b. Windows: Open a PS shell in admin mode: Test-NetConnection -ComputerName devo-ea-manager -InformationLevel Detailed -Port 8080
  2. Make sure there is no firewall or antivirus interfering with the connection.
    a. Windows: We have seen previous issues with some installed antivirus: It could be necessary to create an Outbound rule both in antivirus and windows firewall to enable communication with the manager. After rule is created, test again with the step in the previous step. In order to check FW status: netsh firewall show state

Will not autoload extension with unsafe directory permissions

Osquery will refuse to load an extension from the filesystem if the file’s permissions allow it to be written or modified accounts that lack required privileges. The installation script of the EA should take care of it, but in case of error, make sure that the extensions files are owned by the root account.

On Windows, because of permission inheritance, just changing the owner of a file is not sufficient. You must also change the owner of the parent directory, remove all inherited DACLs, and disable inheritance. EA installation script should take care of the permissions, but in case of issues, the following commands will set permissions that satisfy osquery:

icacls "C:\Program Files\osquery\osqueryd" /setowner Administrators /t
icacls "C:\Program Files\osquery\osqueryd" /grant Administrators:f /t
icacls "C:\Program Files\osquery\osqueryd" /inheritance:r /t
icacls "C:\Program Files\osquery\osqueryd" /inheritance:d /t


Make sure that the group “Administrators” exists. In localized Windows servers, the name of the group will be in the localized language. Replace the “Administrators” group with the localized name. In order to check the local groups in the Windows server, use the cmdlet Get-LocalGroup in a Powershell window.

Make sure that your organization is not including extra permissions globally to the files in "C:\Program Files\osquery\osqueryd". Write/modify permissions should only be given to privileged accounts like “Administrators” or “System”. 

Endpoint Agent Manager

This section describes typical trouble scenarios and troubleshooting guidelines for the manager (Fleet).

How to check DEAM logs

  • systemctl status devo-ea-manager to check status of the process.
  • journalctl -u devo-ea-manager to check manager logs.
  • journalctl -fu devo-ea-manager to check real time logs.

DEAM certificates were not properly generated or uploaded

If you see error messages similar to next in DEAM logs (journalctl -u devo-ea-manager):

...
Dec 02 05:52:40 devo-ea-manager fleet[1517]: {"terminated":"open /etc/devo-ea-manager/certs/devo-ea-manager.key: no such file or directory","ts":"2020-12-02T10:52:40.484570752Z"}
...
...
Dec 02 05:53:52 devo-ea-manager fleet[1538]: {"terminated":"open /etc/devo-ea-manager/certs/devo-ea-manager.crt: no such file or directory","ts":"2020-12-02T10:53:52.453080543Z"}
...


If you provided your custom certificates ensure that they are placed in the provided-deam-certs folder under devo-ea-deployer path.

Then run the ansible command again (follow steps described in the deployment guide).

If you are delegating creation of self-signed certificates to devo-ea-deployer, run ansible command again and pay attention to messages marked with the service-certificates tag. This can help you to identify the root cause.

Domain certificates were not properly configured


If you see error messages similar to next in DEAM logs (journalctl -u devo-ea-manager):

..
Dec 02 06:01:54 devo-ea-manager fleet[1660]: {"certPath":"/etc/devo-ea-manager/devo-certs/domain.crt","chainPath":"/etc/devo-ea-manager/devo-certs/chain.crt","component":"Devo-logging","keyPath":"/etc/devo-ea-manager/devo-certs/domain.key","level":"info","msg":"Setup Devo connection using TLS","ts":"2020-12-02T11:01:54.919850542Z"}
Dec 02 06:01:54 devo-ea-manager fleet[1660]: Error initializing service: initializing osquery logging: create devo status logger: create Devo client (TLS): could not load keypair /etc/devo-ea-manager/devo-certs/domain.crt:/etc/devo-ea-manager/devo-certs/domain.key: open /etc/devo-ea-manager/devo-certs/domain.key: no such file or directory
Dec 02 06:01:54 devo-ea-manager systemd[1]: devo-ea-manager.service: Main process exited, code=exited, status=1/FAILURE
...
...
Dec 02 06:04:25 devo-ea-manager fleet[1749]: {"certPath":"/etc/devo-ea-manager/devo-certs/domain.crt","chainPath":"/etc/devo-ea-manager/devo-certs/chain.crt","component":"Devo-logging","keyPath":"/etc/devo-ea-manager/devo-certs/domain.key","level":"info","msg":"Setup Devo connection using TLS","ts":"2020-12-02T11:04:25.923778139Z"}
Dec 02 06:04:25 devo-ea-manager fleet[1749]: Error initializing service: initializing osquery logging: create devo status logger: create Devo client (TLS): could not load keypair /etc/devo-ea-manager/devo-certs/domain.crt:/etc/devo-ea-manager/devo-certs/domain.key: open /etc/devo-ea-manager/devo-certs/domain.crt: no such file or directory
Dec 02 06:04:25 devo-ea-manager systemd[1]: devo-ea-manager.service: Main process exited, code=exited, status=1/FAILURE
...
...
Dec 02 06:05:21 devo-ea-manager fleet[1847]: {"certPath":"/etc/devo-ea-manager/devo-certs/domain.crt","chainPath":"/etc/devo-ea-manager/devo-certs/chain.crt","component":"Devo-logging","keyPath":"/etc/devo-ea-manager/devo-certs/domain.key","level":"info","msg":"Setup Devo connection using TLS","ts":"2020-12-02T11:05:21.667553321Z"}
Dec 02 06:05:21 devo-ea-manager fleet[1847]: Error initializing service: initializing osquery logging: create devo status logger: create Devo client (TLS): could not read certificate "/etc/devo-ea-manager/devo-certs/chain.crt": open /etc/devo-ea-manager/devo-certs/chain.crt: no such file or directory
Dec 02 06:05:21 devo-ea-manager systemd[1]: devo-ea-manager.service: Main process exited, code=exited, status=1/FAILURE
...
...
Dec 02 06:07:32 devo-ea-manager fleet[2005]: {"certPath":"/etc/devo-ea-manager/devo-certs/domain.crt","chainPath":"/etc/devo-ea-manager/devo-certs/chain.crt","component":"Devo-logging","keyPath":"/etc/devo-ea-manager/devo-certs/domain.key","level":"info","msg":"Setup Devo connection using TLS","ts":"2020-12-02T11:07:32.912871438Z"}
Dec 02 06:07:32 devo-ea-manager fleet[2005]: Error initializing service: initializing osquery logging: create devo status logger: create Devo client (TLS): could not load keypair /etc/devo-ea-manager/devo-certs/domain.crt:/etc/devo-ea-manager/devo-certs/domain.key: tls: failed to parse private key
Dec 02 06:07:32 devo-ea-manager systemd[1]: devo-ea-manager.service: Main process exited, code=exited, status=1/FAILURE
...


...
Dec 02 06:09:01 devo-ea-manager fleet[2083]: {"certPath":"/etc/devo-ea-manager/devo-certs/domain.crt","chainPath":"/etc/devo-ea-manager/devo-certs/chain.crt","component":"Devo-logging","keyPath":"/etc/devo-ea-manager/devo-certs/domain.key","level":"info","msg":"Setup Devo connection using TLS","ts":"2020-12-02T11:09:01.417074837Z"}
Dec 02 06:09:01 devo-ea-manager fleet[2083]: Error initializing service: initializing osquery logging: create devo status logger: create Devo client (TLS): could not load keypair /etc/devo-ea-manager/devo-certs/domain.crt:/etc/devo-ea-manager/devo-certs/domain.key: asn1: structure error: tags don't match (16 vs {class:2 tag:24 length:1122 isCompound:false}) {optional:false explicit:false application:false private:false defaultValue:<nil> tag:<nil> stringType:0 timeType:0 set:false omitEmpty:false} certificate @4
Dec 02 06:09:01 devo-ea-manager systemd[1]: devo-ea-manager.service: Main process exited, code=exited, status=1/FAILURE
...
...
Dec 02 06:09:59 devo-ea-manager fleet[2131]: {"certPath":"/etc/devo-ea-manager/devo-certs/domain.crt","chainPath":"/etc/devo-ea-manager/devo-certs/chain.crt","component":"Devo-logging","keyPath":"/etc/devo-ea-manager/devo-certs/domain.key","level":"info","msg":"Setup Devo connection using TLS","ts":"2020-12-02T11:09:59.176293468Z"}
Dec 02 06:09:59 devo-ea-manager fleet[2131]: Error initializing service: initializing osquery logging: create devo status logger: create Devo client (TLS): x509: certificate signed by unknown authority
Dec 02 06:09:59 devo-ea-manager systemd[1]: devo-ea-manager.service: Main process exited, code=exited, status=1/FAILURE
...
...
Dec 09 16:14:15 devo-ea-manager fleet[16683]: {"certPath":"/etc/devo-ea-manager/devo-certs/domain.crt","chainPath":"/etc/devo-ea-manager/devo-certs/chain.crt","component":"Devo-logging","keyPath":"/etc/devo-ea-manager/devo-certs/domain.key","level":"info","msg":"Setup Devo connection using TLS","ts":"2020-12-09T15:14:15.95249458Z"}
Dec 09 16:14:15 devo-ea-manager fleet[16683]: Error initializing service: initializing osquery logging: create devo status logger: create Devo client (TLS): could not load keypair /etc/devo-ea-manager/devo-certs/domain.crt:/etc/devo-ea-manager/devo-certs/domain.key: tls: found a certificate rather than a key in the PEM for the private key
Dec 09 16:14:15 devo-ea-manager systemd[1]: devo-ea-manager.service: main process exited, code=exited, status=1/FAILURE
...


The most likely cause is that the domain certificates were not properly configured. Ensure that you deploy them following instructions described in the deployment guide.

One point to check is that the certificates are owned by the ansible use configured in inventories, and not by root.

Then run the ansible command again (follow steps described in the deployment guide).

MySQL issues


Error 500 after login / "mysql":"could not connect to db: dial tcp [::1]:3306: connect: connection refused

If you get a 500 error right after logging in, and a screen similar to the following, you are likely running into MySQL issues. Verify that you have connectivity with MySQL.


The EA Manager may also show some of the following errors:

Jun 28 15:37:14 devo-ea-manager fleet[710]: {"component":"service","err":"find user: dial tcp [::1]:3306: connect: connection refused","level":"info","method":"Login","took":"787.149µs","ts":"2021-06-28T15:37:14.012392917Z","user":"admin"}


Jun 28 15:35:44 devo-ea-manager fleet[710]: {"component":"service","err":"SSOSettings getting app config: selecting app config: dial tcp [::1]:3306: connect: connection refused","level":"info","method":"SSOSettings","took":"729.832µs","ts":"2021-06-28T15:35:44.770049722Z"}


Jun 28 15:35:33 devo-ea-manager fleet[710]: {"component":"http","err":"authentication error: finding host","ts":"2021-06-28T15:35:33.429262653Z"}


If using dockerized version of the internal services (MySQL and Redis) check that the dockers are up & running in the EA Manager server with: sudo docker ps -a.

Execute the following to restart the internal services:

cd /srv/deam-internal-services
/usr/local/bin/docker-compose down
/usr/local/bin/docker-compose up -d mysql redis
systemctl restart devo-ea-manager


REDIS issues

REDIS is not in the critical path unless the labeling feature in the EA Manager is in use. REDIS issues will surface with the following symptoms:

  • Cannot run Live Queries.
  • Labeling does not work.
  • Errors in EA Manager logs.

If you see a similar error when accessing the Queries menu in the Web UI, it's likely your REDIS instance is not available:

 

Send events through a Relay

Assuming that the relay in-house IP is 192.168.43.147, you should configure deam_relay_entrypoint: tcp://192.168.43.147:13000
Example snipped of inventory file based on that configuration:

all:
  vars:
    ...
    deam_relay_entrypoint: tcp://192.168.43.147:13000
    deam_devo_key: ""
    deam_devo_cert: ""
    deam_devo_chain: ""
    ...


Modify listen port of Devo EA package repository

Ensure that the 8081 port is available and isn’t busy by another service in the client infrastructure. To overwrite the port you must add in the inventory file based on that configuration the parameter:

all:
  vars:
    ...
    dea_ap_repo_port: <port>
    ...


Events are not ingested in right Devo domain

Ensure that right devo-domain certificates were configured. You can inspect domain.crt certificate with next command:

openssl x509 -in /etc/devo-ea-manager/devo-certs/domain.crt -text -noout


Then look for a line similar to:

Subject: C = SP, ST = Madrid, L = Madrid, O = LogTrust, CN = XXXX 


CN value should be the domain name.

In the same output, you can look for a line similar to:

Issuer: C = ES, ST = Madrid, L = Madrid, O = LogTrust, OU = LogTrust AWS USA Clients, CN = userAWSUSACA


CN = userAWSxxxxx indicates to us which is the site that creates certificates. US in the example, that matches with https://us.devo.com

Missing --devo_relay/KOLIDE_DEVO_RELAY setting in deam-fleet configuration 

You will see similar traces in DEAM logs and process does not start when Devo relay property was not set. 

panic: runtime error: index out of range [1] with length 1

goroutine 1 [running]:
github.com/fleetdm/fleet/server/logging.devoConnection(0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x1267020, 0xc000282b10, ...)
        /usr/src/app/server/logging/devo.go:316 +0x7a4
github.com/fleetdm/fleet/server/logging.NewDevoLogWriter(0x0, 0x0, 0xff4e06, 0xb, 0x10002d2, 0x15, 0x0, 0x0, 0x0, 0x0, ...)
        /usr/src/app/server/logging/devo.go:73 +0xb8
github.com/fleetdm/fleet/server/logging.New(0xfed8ff, 0x3, 0xff7ec3, 0xe, 0xfefd9d, 0x6, 0xfefd9d, 0x6, 0xfefd9d, 0x6, ...)
        /usr/src/app/server/logging/logging.go:152 +0x3db
github.com/fleetdm/fleet/server/service.NewService(0x129cfa0, 0xc0004aa000, 0x12795a0, 0xc00020a060, 0x1267020, 0xc000282b10, 0xfed8ff, 0x3, 0xff7ec3, 0xe, ...)
        /usr/src/app/server/service/service.go:27 +0x83
main.createServeCmd.func1(0xc00049ef00, 0xc000205b00, 0x0, 0x4)
        /usr/src/app/cmd/fleet/serve.go:184 +0x836
github.com/spf13/cobra.(*Command).execute(0xc00049ef00, 0xc000205a40, 0x4, 0x4, 0xc00049ef00, 0xc000205a40)
        /go/pkg/mod/github.com/spf13/cobra@v0.0.2/command.go:760 +0x29d
github.com/spf13/cobra.(*Command).ExecuteC(0xc00049e780, 0xc0005dff58, 0x1, 0x1)
        /go/pkg/mod/github.com/spf13/cobra@v0.0.2/command.go:846 +0x2ea
github.com/spf13/cobra.(*Command).Execute(...)
        /go/pkg/mod/github.com/spf13/cobra@v0.0.2/command.go:794
main.main()
        /usr/src/app/cmd/fleet/main.go:29 +0x1cd


devo-ea-deployer installation procedure fills this value from deam_relay_entrypoint variable (set to tcp://us.elb.relay.logtrust.net:443 value by default) the value is assigned to KOLIDE_DEVO_RELAY environment variable configured in /etc/devo-ea-manager/devo-ea-manager file by default.

Sending Windows Events for testing, using the command line

The Windows Events are generated by the system according to some internal conditions, not controlled directly by the system administrators. Sometimes is useful for testing purposes to be able to send certain events by demand., for instance to check that we are receiving events of some type. This is possible using the windows utility eventcreate from the command line. For example: 

eventcreate /t ERROR /id 100 /l application /d "test event"

A complete description of the tool and more examples can be found here.