ElastiFlow over 2000 "CACHE HITS" a second

Before even setting up sflow/netflow/ipfix, I see the following after turning on debug logging for flowcoll. Faster the then my terminal can scroll I get the following:
version 7.2.2
2024-09-19T23:35:34.255-0700 debug netif_enricher enrichnetif/memstore.go:75 in-memory store fetching attributes - 127.0.0.1 ifIndex 538
2024-09-19T23:35:34.255-0700 debug enrichnetif/enrichnetif.go:94 CACHE HIT for 127.0.0.1 ifIndex 538
2024-09-19T23:35:34.255-0700 debug netif_enricher enrichnetif/memstore.go:75 in-memory store fetching attributes - 127.0.0.1 ifIndex 0
2024-09-19T23:35:34.255-0700 debug enrichnetif/enrichnetif.go:94 CACHE HIT for 127.0.0.1 ifIndex 0
2024-09-19T23:35:34.255-0700 debug ipaddr_enricher.memstore enrichipaddr/memstore.go:73 fetching attributes {“ip”: “127.0.0.1”}
2024-09-19T23:35:34.255-0700 debug enrichipaddr/enrichipaddr.go:118 CACHE HIT {“ip”: “127.0.0.1”}
2024-09-19T23:35:34.255-0700 debug ipaddr_enricher.memstore enrichipaddr/memstore.go:73 fetching attributes {“ip”: “81.171.1.37”}
2024-09-19T23:35:34.255-0700 debug enrichipaddr/enrichipaddr.go:118 CACHE HIT {“ip”: “81.171.1.37”}
2024-09-19T23:35:34.255-0700 debug ipaddr_enricher.memstore enrichipaddr/memstore.go:73 fetching attributes {“ip”: “196.11.80.226”}
2024-09-19T23:35:34.255-0700 debug enrichipaddr/enrichipaddr.go:118 CACHE HIT {“ip”: “196.11.80.226”}
2024-09-19T23:35:34.255-0700 debug netif_enricher enrichnetif/memstore.go:75 in-memory store fetching attributes - 127.0.0.1 ifIndex 538
2024-09-19T23:35:34.255-0700 debug enrichnetif/enrichnetif.go:94 CACHE HIT for 127.0.0.1 ifIndex 538
2024-09-19T23:35:34.255-0700 debug netif_enricher enrichnetif/memstore.go:75 in-memory store fetching attributes - 127.0.0.1 ifIndex 0
2024-09-19T23:35:34.255-0700 debug enrichnetif/enrichnetif.go:94 CACHE HIT for 127.0.0.1 ifIndex 0
2024-09-19T23:35:34.255-0700 debug ipaddr_enricher.memstore enrichipaddr/memstore.go:73 fetching attributes {“ip”: “127.0.0.1”}
2024-09-19T23:35:34.255-0700 debug enrichipaddr/enrichipaddr.go:118 CACHE HIT {“ip”: “127.0.0.1”}
2024-09-19T23:35:34.255-0700 debug ipaddr_enricher.memstore enrichipaddr/memstore.go:73 fetching attributes {“ip”: “81.171.1.37”}
2024-09-19T23:35:34.255-0700 debug enrichipaddr/enrichipaddr.go:118 CACHE HIT {“ip”: “81.171.1.37”}

I don’t think that is unusual or necessarily an issue. The ‘debug’ setting is quite verbose and we do check memory cache frequently. The collector will check the cache before doing any polling or external lookups for enrichment.

Most of the lookups seem to be for localhost in this sample. You say this is before setting up any netflow, so I’d be curious what might be sending data to the collector.

I rarely run logging in debug mode, even when troubleshooting. I do usually set the following, just to get an idea in the logs about how many records are being processed:

EF_OUTPUT_MONITOR_ENABLE: "true"
EF_OUTPUT_MONITOR_INTERVAL: 300

This gives me a log entry of the “decoding rate” (records per second) every 5 minutes in the log. You can see that in the UI, but I find it helpful when scrolling through the logs to get an idea of how busy it is and if there are bursts.

sudo grep -i “127.0.0.1” flowcoll-2024-09-20T07-10-25.904.log | wc -l
returns: 113412 lines in an example log file.

I see this every 5min:
“logger”:“flowcoll”,“caller”:“flowprocessor/metrics.go:118”,“msg”:“flow processor to output writer is 90% full. This is normal when the collector is starting. If it persists for hours, it may indicate that you are at your license threshold or your system is under-resourced.”}

in addition systemctl stop flowcoll takes minutes to terminate the process. spamming the following in the logs:
2024-09-19T23:57:18.815-0700 debug unified_flowcoll/shutdown.go:49 packetStream len - 512
2024-09-19T23:57:18.815-0700 debug unified_flowcoll/shutdown.go:49 packetStream len - 512
2024-09-19T23:57:18.815-0700 debug unified_flowcoll/shutdown.go:49 packetStream len - 512
2024-09-19T23:57:18.815-0700 debug unified_flowcoll/shutdown.go:49 packetStream len - 512
sudo grep -i “unified_flowcoll/shutdown.go” flowcoll-2024-09-20T06-57-18.877.log | wc -l
returns: 1165084 lines

4vcpu/16GB RAM

I would definitely change the logging level from ‘debug’ to ‘info.’

There is a KB article regarding the “90% full” messages that has a lot of good information about those messages.

“Minutes” to terminate certainly seems unusual to me, but I have seen it take 60 seconds or so on a similar sized system receiving nearly 10,000 records/second.

My recommendation would be to enable the ‘MONITOR’ options shown above to get an idea on how may records per second are being processed and then we can make a better recommendation on proper sizing.

Note that a basic/community license will limit flows to 4000 flows per second. There is a note in the docs on “lab” server sizing that may be helpful. In virtualized environments, remember that 2 vCPUs = 1 physical core and 1 hyperthread thread. Thus, you should allocate vCPUs accordingly, doubling the indicated core count for vCPU allocations.

Thanks for asking … hope this information has been helpful!

Thanks for the assistance @dxturner. it just seemed odd, that I just installed/configured Elastiflow/Elasticsearch yesterday and had not yet even configured IPFIX, or sFLOW on any of my devices yet to see so many records per sec, especially from the loopback. What I found was the following was enabled:

EF_INPUT_FLOW_BENCHMARK_ENABLE: “true”
EF_INPUT_FLOW_BENCHMARK_PACKET_FILEPATH: /etc/elastiflow/benchmark/flow/packets.txt

After commenting those two options, the spamming in the logs stopped, and I don’t have weird 100GB/s throughput to hosts outside of the US in the Kibana dashboards.

Hope this helps if anyone in the future runs across this. I feel like a dope :hushed:

1 Like

Ah, yes! I should have asked about that! Thanks very much for the update. It is very helpful.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.