AWS S3 Bucket Flowcoll.yml Example

tallguy86 · February 22, 2025, 12:09am

What is the correct path Flowcoll/GO is looking for when its doing a API call to list buckets?

I have the follow in my flowcoll.yml file

#AWS_ACCESS_KEY_ID: ""
AWS_REGION: "us-east-2"
#AWS_SECRET_ACCESS_KEY: ""
#EF_AWS_VPC_FLOW_LOG_FIREHOSE_S3_LOG_FORMAT: ${version} ${account-id} ${interface-id} ${srcaddr} ${dstaddr} ${srcport} ${dstport} ${protocol} ${packets} ${bytes} ${start} ${end} ${action} ${log-status}
EF_AWS_VPC_FLOW_LOG_S3_BUCKET: "s3://ost-networkou-flowlog-s3/AWSLogs/"
EF_AWS_VPC_FLOW_LOG_S3_ENABLE: "true"
EF_AWS_VPC_FLOW_LOG_S3_POOL_SIZE: 4
EF_AWS_VPC_FLOW_LOG_S3_PREFIX: AWSLogs
#EF_AWS_VPC_FLOW_LOG_S3_TLS_CA_CERT_FILEPATH: ""
EF_AWS_VPC_FLOW_LOG_S3_TLS_ENABLE: "false"
EF_AWS_VPC_FLOW_LOG_S3_TLS_MIN_VERSION: 1.2
EF_AWS_VPC_FLOW_LOG_S3_TLS_SKIP_VERIFICATION: "true"

I am running an EC2 instance of Amazon Linux 2 and AWS cli is installed and I can list the bucket and objects but dont understand why is giving the error:

[ec2-user@ip-172-31-33-7 ~]$ aws s3 ls s3://ost-networkou-flowlog-s3/AWSLogs/058264086903/vpcflowlogs/us-east-2/
                           PRE 2025/

{"level":"error","ts":"2025-02-21T22:54:05.019Z","logger":"flowcoll","caller":"S3/s3.go:168","msg":"AWS Flow Logs: s3 list objects failure","code":"aws-vpc-flow-logs/list-objects-failure","reason":"InvalidBucketName: The specified bucket is not valid.\n\tstatus code: 400, request id: HFYGCBK74G4Q43CP, 

host id: W8wRsEC882o2eyy76ZoNN70pZ5ElbiaPpE7hfJoAj9ddZw8EPpmj4+iz8yJg9cR1rQYt1Lq1YhH8LPZ6dnOaYUxFDVQ1nr0p","stacktrace":"github.com/elastiflow/flowcoll/pkg/inputs/awsflowlogs/S3.(*FlowLogsS3).fetchS3Objects\n\t/tmp/collectors/pkg/inputs/awsflowlogs/S3/s3.go:168\ngithub.com/elastiflow/flowcoll/pkg/inputs/awsflowlogs/S3.(*FlowLogsS3).fetchS3ObjectsOnInterval\n\t/tmp/collectors/pkg/inputs/awsflowlogs/S3/s3.go:150"}

ontech · February 22, 2025, 2:47am

Hello tallguy86. I believe you’ll want to use a bucket name without the protocol…without the s3:// like you’ll see here: General purpose bucket naming rules - Amazon Simple Storage Service

tallguy86 · February 22, 2025, 4:33pm

Thank you for the help. After changing the line to:

EF_AWS_VPC_FLOW_LOG_S3_BUCKET: “ost-networkou-flowlog-s3”

It now works. I also found out that the AWS IAM role permissions needed “s3:PutObject”, so I added that and it then started picket up the files. But now I am getting the below messages and I am not sure if this is because its trying to catch up getting the logs ingested or if its something else. Any ideas?

{"level":"warn","ts":"2025-02-22T16:19:40.054Z","caller":"throttle/restricted_throttle.go:102","msg":"[throttler]: start burst"}
{"level":"warn","ts":"2025-02-22T16:20:01.056Z","caller":"throttle/restricted_throttle.go:108","msg":"[throttler]: stop burst"}
{"level":"warn","ts":"2025-02-22T16:20:04.684Z","caller":"throttle/restricted_throttle.go:114","msg":"[throttler]: start recovery"}

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket",
                "s3:PutObject",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::ost-networkou-flowlog-s3/*",
                "arn:aws:s3:::ost-networkou-flowlog-s3"
            ]
        }
    ]
}

Edit:

It seems that AWS EC2 does throttle API requests. Is there a setting to throttle requests on the ElastiFlow side?

dxturner · February 22, 2025, 6:52pm

The “throttler” messages you are seeing in flowcoll.log are related to exceeding license limits. Here is an article with some helpful information.

tallguy86 · February 24, 2025, 5:22am

Update:

Deleted all objects for AWSLogs and ElastiFlowProcessed

Uncommented the line EF_PROCESSOR_POOL_SIZE and changed to 8 since the basic license is 4.
Started ElastFlow, and the latest and only logs now are an no more logs entries written so it seems be to caught up after deleting the S3 bucket and starting ElastiFlow service again.

{"level":"warn","ts":"2025-02-24T04:57:18.469Z","caller":"throttle/restricted_throttle.go:102","msg":"[throttler]: start burst"}
{"level":"warn","ts":"2025-02-24T04:57:38.793Z","caller":"throttle/restricted_throttle.go:108","msg":"[throttler]: stop burst"}
{"level":"warn","ts":"2025-02-24T04:57:38.794Z","caller":"throttle/restricted_throttle.go:114","msg":"[throttler]: start recovery"}
{"level":"error","ts":"2025-02-24T05:11:17.281Z","logger":"ipaddr_enricher.maxmind_geoip","caller":"inclexcl/inclexcl.go:201","msg":"yaml: control characters are not allowed","stacktrace":"github.com/elastiflow/go-enrich-ipaddr/inclexcl.(*InclExcl).run.func1\n\t/root/go/pkg/mod/github.com/elastiflow/go-enrich-ipaddr@v1.1.1/inclexcl/inclexcl.go:201"}
{"level":"info","ts":"2025-02-24T05:11:19.101Z","logger":"flowcoll.metrics_provider","caller":"metrics/provider.go:179","msg":"gathering metrics"}

How can I be exceeding the license if the files are archives of less that 5 KB each? See attached screenshot of S3 bucket. If you open on of these files, its only 130 lines of TransitGateway logs.

Edit:

Just noticed Elastisearch is OOM and killing the service. I will up the vCPU from 2 to 4 and RAM from 8 to 16 and adjust Java memory limits.

Started elasticsearch.service - Elasticsearch.
emd[1]: elasticsearch.service: A process of this unit has been killed by the OOM killer.
emd[1]: elasticsearch.service: Failed with result 'oom-kill'.
emd[1]: elasticsearch.service: Unit process 1879 (controller) remains running after unit stopped.
emd[1]: elasticsearch.service: Consumed 3min 52.915s CPU time.

Last edit for tonight:

Increased vCPU from 2 to 4 and RAM from 8 to 16, now seeing in Elastisearch the health is green:

[2025-02-24T05:39:54,215][INFO ][o.e.c.r.a.AllocationService] [aws-net-ost-elastiflow01] current.health="GREEN" message="Cluster health status changed from [RED] to [GREEN] (reason: [shards started [[.ds-.logs-deprecation.elasticsearch-default-2025.02.21-000001][0]]])." previous.health="RED" reason="shards started [[.ds-.logs-deprecation.elasticsearch-default-2025.02.21-000001][0]]"

Do not see any Discovery Analytics data or Data Streams in Stack Management.

dxturner · February 24, 2025, 5:56pm

The Data Streams may have ‘hidden’ names so check that button in Stack Management if you haven’t.

You might want to set the 'OUTPUT_MONITOR’ values so you can see the decoding rate in the logs.

It also looks like there is an issue with the /etc/elastiflow/maxmind/incl_excl.yml file. The error message indicates there is a control character so perhaps an artifact from cutting/pasting or an editor issue?

Thanks for the updates. Let us know if this is helpful.

Regards,
Dexter

tallguy86 · February 24, 2025, 6:31pm

Thanks for your quick updates and help with this!

dxturner:

The Data Streams may have ‘hidden’ names so check that button in Stack Management if you haven’t.

I changed the view from 1 to 2, and the “Hidden Data Streams” is now checked. I clicked refresh and see nothing for ElastiFlow.

You might want to set the 'OUTPUT_MONITOR’ values so you can see the decoding rate in the logs.

I set this after enabled “Hidden Data Streams”, and after restarting Flowcoll service I see the following 60 second metrics:

EF_OUTPUT_MONITOR_ENABLE: “true”
EF_OUTPUT_MONITOR_INTERVAL: 60
{"level":"info","ts":"2025-02-24T18:23:39.684Z","logger":"flowcoll.monitor_pool","caller":"monitor/pool.go:53","msg":"Monitor Output: decoding rate: 29 records/second"}
{"level":"info","ts":"2025-02-24T18:24:39.684Z","logger":"flowcoll.monitor_pool","caller":"monitor/pool.go:53","msg":"Monitor Output: decoding rate: 7 records/second"}
{"level":"info","ts":"2025-02-24T18:25:39.684Z","logger":"flowcoll.monitor_pool","caller":"monitor/pool.go:53","msg":"Monitor Output: decoding rate: 9 records/second"}
{"level":"info","ts":"2025-02-24T18:26:39.685Z","logger":"flowcoll.monitor_pool","caller":"monitor/pool.go:53","msg":"Monitor Output: decoding rate: 3 records/second"}
{"level":"info","ts":"2025-02-24T18:27:39.685Z","logger":"flowcoll.monitor_pool","caller":"monitor/pool.go:53","msg":"Monitor Output: decoding rate: 5 records/second"}
{"level":"info","ts":"2025-02-24T18:28:39.684Z","logger":"flowcoll.monitor_pool","caller":"monitor/pool.go:53","msg":"Monitor Output: decoding rate: 3 records/second"}
It also looks like there is an issue with the /etc/elastiflow/maxmind/incl_excl.yml file. The error message indicates there is a control character so perhaps an artifact from cutting/pasting or an editor issue?

The YAML file at /etc/elastiflow/maxmind/incl_excl.yml was default and had only commented lines with example text in the file. I removed the YAML file from the path and do not see the entry in the logs anymore.

[2025-02-24T18:19:40,319][INFO ][o.e.c.r.a.AllocationService] [aws-net-ost-elastiflow01] current.health="GREEN" message="Cluster health status changed from [RED] to [GREEN] (reason: [shards started [[.ds-.logs-deprecation.elasticsearch-default-2025.02.21-000001][0]]])." previous.health="RED" reason="shards started [[.ds-.logs-deprecation.elasticsearch-default-2025.02.21-000001][0]]"

I am still not seeing any Data Streams in Index Management after confirming the flow logs are being decoded above.

dxturner · February 24, 2025, 7:47pm

What are the Elasticsearch settings in your flowcoll.yml file?

dxturner@mg-lab:~$ grep EF_OUTPUT_ELASTICSEARCH /etc/elastiflow/flowcoll.yml | grep -v ^#
EF_OUTPUT_ELASTICSEARCH_ADDRESSES: 127.0.0.1:9200
EF_OUTPUT_ELASTICSEARCH_ECS_ENABLE: "false"
EF_OUTPUT_ELASTICSEARCH_ENABLE: "true"
EF_OUTPUT_ELASTICSEARCH_INDEX_SUFFIX: "mgr"
EF_OUTPUT_ELASTICSEARCH_INDEX_TEMPLATE_ENABLE: "true"
EF_OUTPUT_ELASTICSEARCH_INDEX_TEMPLATE_OVERWRITE: "false"
EF_OUTPUT_ELASTICSEARCH_INDEX_TEMPLATE_REPLICAS: 0
EF_OUTPUT_ELASTICSEARCH_INDEX_TEMPLATE_SHARDS: 1
EF_OUTPUT_ELASTICSEARCH_PASSWORD: < deleted >
EF_OUTPUT_ELASTICSEARCH_TIMESTAMP_SOURCE: collect
EF_OUTPUT_ELASTICSEARCH_TLS_CA_CERT_FILEPATH: ""
EF_OUTPUT_ELASTICSEARCH_TLS_ENABLE: "true"
EF_OUTPUT_ELASTICSEARCH_TLS_SKIP_VERIFICATION: "true"
EF_OUTPUT_ELASTICSEARCH_TSDS_ENABLE: "true"
EF_OUTPUT_ELASTICSEARCH_USERNAME: elastic

tallguy86 · February 24, 2025, 8:50pm

grep EF_OUTPUT_ELASTICSEARCH /etc/elastiflow/flowcoll.yml | grep -v ^#

EF_OUTPUT_ELASTICSEARCH_ADDRESSES: 127.0.0.1:9200
EF_OUTPUT_ELASTICSEARCH_ECS_ENABLE: "false"
EF_OUTPUT_ELASTICSEARCH_INDEX_TEMPLATE_REPLICAS: 0
EF_OUTPUT_ELASTICSEARCH_PASSWORD: 'password omitted'
EF_OUTPUT_ELASTICSEARCH_TLS_ENABLE: "true"
EF_OUTPUT_ELASTICSEARCH_TLS_SKIP_VERIFICATION: "true"
EF_OUTPUT_ELASTICSEARCH_TSDS_ENABLE: "true"
EF_OUTPUT_ELASTICSEARCH_USERNAME: elastic

dxturner · February 24, 2025, 10:17pm

And what do you see if you run:

GET _cat/indices?v&s=index

From the Management > Dev Tools > Console?

tallguy86 · February 25, 2025, 12:43am

I see the following output on the right.

health status index                                                              uuid                   pri rep docs.count docs.deleted store.size pri.store.size dataset.size
green  open   .internal.alerts-default.alerts-default-000001                     H1Gg9xXqQ0idcRt5GDEqTA   1   0          0            0       249b           249b         249b
green  open   .internal.alerts-ml.anomaly-detection-health.alerts-default-000001 2cCylaDqTFW28imon93CRQ   1   0          0            0       249b           249b         249b
green  open   .internal.alerts-ml.anomaly-detection.alerts-default-000001        jAJpLw6kRq-cV0jJoNEE-g   1   0          0            0       249b           249b         249b
green  open   .internal.alerts-observability.apm.alerts-default-000001           HL1zgO4MQAqNZ9H7svmfrA   1   0          0            0       249b           249b         249b
green  open   .internal.alerts-observability.logs.alerts-default-000001          tOkim7dJSVClnFlN2X6dvA   1   0          0            0       249b           249b         249b
green  open   .internal.alerts-observability.metrics.alerts-default-000001       GGL6V4TmTjq5fj3ckeLQXQ   1   0          0            0       249b           249b         249b
green  open   .internal.alerts-observability.slo.alerts-default-000001           IExYi9cLTvOCL-VyXrN3Nw   1   0          0            0       249b           249b         249b
green  open   .internal.alerts-observability.threshold.alerts-default-000001     fZeiLOryRp24KUvDKJoHXQ   1   0          0            0       249b           249b         249b
green  open   .internal.alerts-observability.uptime.alerts-default-000001        1lEGf5ctTOCjbB1oSZMB2g   1   0          0            0       249b           249b         249b
green  open   .internal.alerts-security.alerts-default-000001                    IerukUOsRmKm9i6DPvBfIg   1   0          0            0       249b           249b         249b
green  open   .internal.alerts-stack.alerts-default-000001                       zWhu0MEMQUuSQy-T_L-MFA   1   0          0            0       249b           249b         249b
green  open   .internal.alerts-transform.health.alerts-default-000001            _TY41lF4QbWM4FjRTgxAHA   1   0          0

dxturner · February 25, 2025, 12:26pm

This looks like none of the elastiflow indices are being created. I also notice in your config that you do not have
EF_OUTPUT_ELASTICSEARCH_ENABLE: "true"
which is what ‘turns on’ output to Elasticsearch, or
EF_OUTPUT_ELASTICSEARCH_INDEX_TEMPLATE_ENABLE: "true"
which pushes the index templates to Elasticsearch on startup.

Set those, and when starting up, check to see that you are connecting to Elasticsearch (look for ‘healthcheck’ messages) and that index templates are being sent (look for ‘index template’ ). Hope this helps.

tallguy86 · February 25, 2025, 2:44pm

Wow, who would have thought that EF_OUTPUT_ELASTICSEARCH_ENABLE: “true” would not be turned on by default out of the box but does make sense as you have multiple output options.

Well once those lines were uncommented and flowcoll restarted, the lights came on like Johnny 5. I am seeing indexes, data streams, and Log Analytics now, and flow records in the Flows tab on ElastiFlow, so everything seems to be good now.

I cant thank you enough for your help with this and sticking with it!

dxturner · February 25, 2025, 7:39pm

Glad it’s working! I was wondering if you could do me a favor?

Could you navigate to the Analytics > Discover menu, choose the elastiflow-flow-codex-* data source (could be ‘ecs’ instead of ‘codex’), select one record to expand (any record), and then in the details window choose ‘JSON’ and the copy icon.

If you could paste the details of that record in an email to me ( dexter at elastiflow dot com) it would be a great help. I’m trying to validate some changes and only have simulated records to work with. Having a “real” record would help me improve the scenarios I’m testing.

Thanks,
Dexter

system · March 27, 2025, 7:39pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Flowcoll is downloading 70 till 80 Mbyte each hour ElastiFlow Community flow-collector	18	111	March 12, 2025
ElastiFlow (flow): Mitre ATT&CK Saved Objects Not Working ElastiFlow Community flow-collector	1	98	September 7, 2024
No data from elastiflow on ELK ElastiFlow Community kibana , elasticsearch , flow-collector	6	11	July 26, 2025
Elastiflow Opensearch Output Retry Mechanism Not Working ElastiFlow Community flow-collector	6	46	March 8, 2025
Ready to use Elastiflow stack with Elasticsearch ElastiFlow Community kibana , dashboards	1	199	May 4, 2024

AWS S3 Bucket Flowcoll.yml Example

Related topics