/
Troubleshooting SQS collectors

Troubleshooting SQS collectors

Gathering information for troubleshooting

To troubleshoot complex SQS collectors, gather the list of information from the authorization instructions.

It is also helpful to collect:

  • A log sample from the S3 bucket.

  • These JSON objects:

    • SQS Access Policy

    • IAM Policy

    • Trust Policy

    • Collector Parameters

  • A sample of the logs delivered to Devo, if any.

  • Access the devo.collectors.out table in Devo to locate any error messages.

    from devo.collectors.out where toktains(collector_image,"aws_sqs_if"), weaktoktains(msg,"error") or weaktoktains(msg,"exception")
  • If there are events in devo_deadletter_queue, their contents may be informative. If that dead-letter queue is not enabled, add it.

Common issues

Codec can’t decode byte

'utf-8' codec can't decode byte ... in position ...: invalid start byte

This error may occur if the SQS queue contains compressed messages. If all the messages are compressed, then change the collector’s compressed_events parameter to true. If the queue contains a mixture of compressed and uncompressed messages, send those messages to separate queues.

Not authorized

… is not authorized to perform: sqs:receivemessage on resource ... because no resource-based policy allows the sqs:receivemessage action …

The cross account role may be wrong. The cross account role must be in your AWS account and is not the same as the base account role provided by Devo. The SQS access policy may be incorrect.

Duplicate events

Ensure “ack messages” in the collector parameters is not false.

"ack_messages": true

If duplicate events occur in the queue, there will be duplicate events in Devo.

Acked message

Acked message receiptHandle

There was a message in the queue. The collector has processed it successfully and informed the queue that the message can be removed.

Messages in the queue

If AWS has placed SQS messages in the queue before the collector started, then the collector may log the number of messages in the queue. The collector will reduce the number of messages in the queue when it is working correctly. AWS will increase the number of messages in the queue to make them available to Devo. Typically, the number of messages in the queue is less than 10.

Check the trend of the number of messages in the queue.

from devo.collectors.out where toktains(msg,"Number of messages in the queue"), toktains(collector_image,"aws_sqs_if") select int(split(split(msg,"queue: ",1)," ",0)) as backlog group every 1m by collector_name, job_id select min(backlog) as messages_remaining

If the number of messages remaining is small or decreasing, no action is required. If the number of messages is increasing:

  1. Verify that “ack messages” is not false in the collector parameters.

  2. In the cloud collector app, increase the target pods. This will allocate more hardware resources to processing messages.

  3. If you are unable to increase the target pods due to the message

Only target pods between 0 and … are permitted

image-20250128-161215.png

then please visit the support site for assistance.

Long poll did not find any messages in queue

Long poll did not find any messages in queue. All data in the SQS queue has been successfully collected.

All messages have been obtained from the SQS queue and acknowledged. This should occur if the collector is working correctly. It will also occur if no messages were ever placed in the queue.

An error occurred (AccessDenied) when calling the GetObject operation

This may indicate the IAM policy resource section does not include the contents of the S3 bucket as a resource. The ARN should end with /* to grant access to files within scope.

… is not authorized to perform: kms:Decrypt on resource …

This may indicate KMS is active in the S3 bucket but is not included as a resource in the IAM policy. Add any required KMS key to the IAM policy or remove server-side encryption.

An error occurred (AccessDenied) when calling the AssumeRole operation

An error occurred (AccessDenied) when calling the AssumeRole operation: User: arn:aws-us-gov:sts::210253767148:assumed-role/cloud-collector-prod-gc-eks-nodes/ … is not authorized to perform: sts:AssumeRole on resource: …

If you are using Govcloud (devogov.us), then you must set the collector partition by adding the base account role to the credentials section of the collector parameters.

Wrong base account role

The base account role is provided by Devo. It is specific to your AWS partition, such as aws or aws-us-gov. If you are not using devogov.us, remove the base account role from the configuration and the default will work.

Service name is incorrect

If the service name is a service that does not exist, this error will be logged.

Change the service to a predefined service such as aws_sqs_cloudtrail.

Related content

Troubleshooting SQS collectors
Troubleshooting SQS collectors
More like this
Monitoring SQS Collectors
Monitoring SQS Collectors
More like this
Monitoring SQS Collectors
Monitoring SQS Collectors
More like this
Parameters for Custom SQS Collectors
Parameters for Custom SQS Collectors
More like this
Parameters for Custom SQS Collectors
Parameters for Custom SQS Collectors
More like this
GuardDuty Threat SQS Collector
GuardDuty Threat SQS Collector
More like this