Flow queues 90% full

Hi,

I’m always seeing the message UDP Server to Flow Decoder are 90% full, even hours after my 2 instances of flow collectors (running in docker) have started. I only see this message, no throttler message, and it shouldn’t be due to low resources, as my CPU utilization is lower than 10% (usually under 5%), and I have over 600GB of free RAM available.

I also see that udp_server_packet_queue_util is 1 for both collectors.

The EF_PROCESSOR_POOL_SIZE is currently set to 16, but I’m not doing any high latency enrichment tasks other than IP geoloc enrichment. I’ve tried setting it to 32, but it didn’t help.

I’m only getting about a total of 1.5K-2K records per second (around 800-1K for each collector), but my input is definitely much higher. I should be getting tens of thousands of records per second.

Is there anything I can tune to improve the performance? I’m sending the data to Elasticsearch (v8.8.0), and so far, the only relevant settings I can see are EF_OUTPUT_ELASTICSEARCH_BATCH_DEADLINE and EF_OUTPUT_ELASTICSEARCH_BATCH_MAX_BYTES.

Thank you.

What version of ElastiFlow is installed and what license are you running with?

When you say the CPU utilization is low, are you referring to the host machine that the containers are running on? I would run …

docker inspect <container id>

… and look for CPU and Memory usage information. You may also want to run …

docker stats <container id>

… to check the resource usage inside the container. If you need to allocate more CPUs to the container you would use docker update.

Note that if you are expecting 10s of thousands of records per second you will need a license that allows that volume, otherwise you will see the throttler messages.

Regards,
Dexter

I’m using ElastiFlow v7.0.2, and my license should be able to support much more than 2K records/second. In fact, I have not seen any throttler messages, so I don’t think it’s a license issue.

I ran the inspect command, and I only see 0 for all the CPU-related values, e.g. CpuShares, CpuQuota, CpuCount, CpuPercent. In fact all the values in that “block” are either 0 or null.

Also, the output of docker stats looks very similar to that of top, when I ran it on the host itself. The %CPU fluctuates between 200+ and 300+ most of the time, sometimes getting up to 700+. The %MEM is 0.1. The overall %Cpu (i.e. the line at the top of the top output) is usually 1-2%.

ETA: I’m also seeing messages that seem like there’s some connection issues to ES (have not seen these messages when Elastiflow was running for the past week) for both collectors. Not sure if it is related to the slow ingest to ES:

"logger":"flowcoll.elastisearch_output[default].http_connection_manager.connection_worker[8]","caller":"httpretry/httpretry.go:70","msg":"retrying request","attemptCount":747,"address":"https://<ES URL>:443/<path>/_bulk"

Then there was a bunch of

"logger":"flowcoll.elastisearch_output[default].http_connection_manager.connection_worker[8]","caller":"httpoutput/conn_worker.go:79","msg":"no alive connections available"

Quickly followed by "msg":"healthcheck success; connection is available", and then no more connection-related messages.

Throughout the entire time where there seemed to be connection issues, Elastiflow was still writing 1-2K records/sec into ES.

I’m not a docker expert but if docker stats is reporting 200% to 700% then that means it is fully using multiple cores and would seem to explain the 90% full messages. Since the ‘inspect’ reported that the container has CpuCount is 0/null then that means the container can use as much as the host will allow. Maybe try specifically allowing 8 CPU cores?

docker update --cpus="8.0" <container_id>

I also want to mention that a ‘Basic’ license only allows 4K records per second. If you are getting 2K on each collector, 4K total, then you are technically at the license limit.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.