AWS S3 Bucket Flowcoll.yml Example

What is the correct path Flowcoll/GO is looking for when its doing a API call to list buckets?

I have the follow in my flowcoll.yml file

#AWS_ACCESS_KEY_ID: ""
AWS_REGION: "us-east-2"
#AWS_SECRET_ACCESS_KEY: ""
#EF_AWS_VPC_FLOW_LOG_FIREHOSE_S3_LOG_FORMAT: ${version} ${account-id} ${interface-id} ${srcaddr} ${dstaddr} ${srcport} ${dstport} ${protocol} ${packets} ${bytes} ${start} ${end} ${action} ${log-status}
EF_AWS_VPC_FLOW_LOG_S3_BUCKET: "s3://ost-networkou-flowlog-s3/AWSLogs/"
EF_AWS_VPC_FLOW_LOG_S3_ENABLE: "true"
EF_AWS_VPC_FLOW_LOG_S3_POOL_SIZE: 4
EF_AWS_VPC_FLOW_LOG_S3_PREFIX: AWSLogs
#EF_AWS_VPC_FLOW_LOG_S3_TLS_CA_CERT_FILEPATH: ""
EF_AWS_VPC_FLOW_LOG_S3_TLS_ENABLE: "false"
EF_AWS_VPC_FLOW_LOG_S3_TLS_MIN_VERSION: 1.2
EF_AWS_VPC_FLOW_LOG_S3_TLS_SKIP_VERIFICATION: "true"

I am running an EC2 instance of Amazon Linux 2 and AWS cli is installed and I can list the bucket and objects but dont understand why is giving the error:

[ec2-user@ip-172-31-33-7 ~]$ aws s3 ls s3://ost-networkou-flowlog-s3/AWSLogs/058264086903/vpcflowlogs/us-east-2/
                           PRE 2025/
{"level":"error","ts":"2025-02-21T22:54:05.019Z","logger":"flowcoll","caller":"S3/s3.go:168","msg":"AWS Flow Logs: s3 list objects failure","code":"aws-vpc-flow-logs/list-objects-failure","reason":"InvalidBucketName: The specified bucket is not valid.\n\tstatus code: 400, request id: HFYGCBK74G4Q43CP, 

host id: W8wRsEC882o2eyy76ZoNN70pZ5ElbiaPpE7hfJoAj9ddZw8EPpmj4+iz8yJg9cR1rQYt1Lq1YhH8LPZ6dnOaYUxFDVQ1nr0p","stacktrace":"github.com/elastiflow/flowcoll/pkg/inputs/awsflowlogs/S3.(*FlowLogsS3).fetchS3Objects\n\t/tmp/collectors/pkg/inputs/awsflowlogs/S3/s3.go:168\ngithub.com/elastiflow/flowcoll/pkg/inputs/awsflowlogs/S3.(*FlowLogsS3).fetchS3ObjectsOnInterval\n\t/tmp/collectors/pkg/inputs/awsflowlogs/S3/s3.go:150"}

Hello tallguy86. I believe you’ll want to use a bucket name without the protocol…without the s3:// like you’ll see here: General purpose bucket naming rules - Amazon Simple Storage Service

1 Like

Thank you for the help. After changing the line to:

EF_AWS_VPC_FLOW_LOG_S3_BUCKET: “ost-networkou-flowlog-s3”

It now works. I also found out that the AWS IAM role permissions needed “s3:PutObject”, so I added that and it then started picket up the files. But now I am getting the below messages and I am not sure if this is because its trying to catch up getting the logs ingested or if its something else. Any ideas?

{"level":"warn","ts":"2025-02-22T16:19:40.054Z","caller":"throttle/restricted_throttle.go:102","msg":"[throttler]: start burst"}
{"level":"warn","ts":"2025-02-22T16:20:01.056Z","caller":"throttle/restricted_throttle.go:108","msg":"[throttler]: stop burst"}
{"level":"warn","ts":"2025-02-22T16:20:04.684Z","caller":"throttle/restricted_throttle.go:114","msg":"[throttler]: start recovery"}
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket",
                "s3:PutObject",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::ost-networkou-flowlog-s3/*",
                "arn:aws:s3:::ost-networkou-flowlog-s3"
            ]
        }
    ]
}

Edit:

It seems that AWS EC2 does throttle API requests. Is there a setting to throttle requests on the ElastiFlow side?

The “throttler” messages you are seeing in flowcoll.log are related to exceeding license limits. Here is an article with some helpful information.

Update:

  1. Deleted all objects for AWSLogs and ElastiFlowProcessed
  1. Uncommented the line EF_PROCESSOR_POOL_SIZE and changed to 8 since the basic license is 4.
  2. Started ElastFlow, and the latest and only logs now are an no more logs entries written so it seems be to caught up after deleting the S3 bucket and starting ElastiFlow service again.
{"level":"warn","ts":"2025-02-24T04:57:18.469Z","caller":"throttle/restricted_throttle.go:102","msg":"[throttler]: start burst"}
{"level":"warn","ts":"2025-02-24T04:57:38.793Z","caller":"throttle/restricted_throttle.go:108","msg":"[throttler]: stop burst"}
{"level":"warn","ts":"2025-02-24T04:57:38.794Z","caller":"throttle/restricted_throttle.go:114","msg":"[throttler]: start recovery"}
{"level":"error","ts":"2025-02-24T05:11:17.281Z","logger":"ipaddr_enricher.maxmind_geoip","caller":"inclexcl/inclexcl.go:201","msg":"yaml: control characters are not allowed","stacktrace":"github.com/elastiflow/go-enrich-ipaddr/inclexcl.(*InclExcl).run.func1\n\t/root/go/pkg/mod/github.com/elastiflow/go-enrich-ipaddr@v1.1.1/inclexcl/inclexcl.go:201"}
{"level":"info","ts":"2025-02-24T05:11:19.101Z","logger":"flowcoll.metrics_provider","caller":"metrics/provider.go:179","msg":"gathering metrics"}

  1. How can I be exceeding the license if the files are archives of less that 5 KB each? See attached screenshot of S3 bucket. If you open on of these files, its only 130 lines of TransitGateway logs.

Edit:

Just noticed Elastisearch is OOM and killing the service. I will up the vCPU from 2 to 4 and RAM from 8 to 16 and adjust Java memory limits.

Started elasticsearch.service - Elasticsearch.
emd[1]: elasticsearch.service: A process of this unit has been killed by the OOM killer.
emd[1]: elasticsearch.service: Failed with result 'oom-kill'.
emd[1]: elasticsearch.service: Unit process 1879 (controller) remains running after unit stopped.
emd[1]: elasticsearch.service: Consumed 3min 52.915s CPU time.

Last edit for tonight:

Increased vCPU from 2 to 4 and RAM from 8 to 16, now seeing in Elastisearch the health is green:

[2025-02-24T05:39:54,215][INFO ][o.e.c.r.a.AllocationService] [aws-net-ost-elastiflow01] current.health="GREEN" message="Cluster health status changed from [RED] to [GREEN] (reason: [shards started [[.ds-.logs-deprecation.elasticsearch-default-2025.02.21-000001][0]]])." previous.health="RED" reason="shards started [[.ds-.logs-deprecation.elasticsearch-default-2025.02.21-000001][0]]"

Do not see any Discovery Analytics data or Data Streams in Stack Management.

The Data Streams may have ‘hidden’ names so check that button in Stack Management if you haven’t.

You might want to set the 'OUTPUT_MONITOR’ values so you can see the decoding rate in the logs.

It also looks like there is an issue with the /etc/elastiflow/maxmind/incl_excl.yml file. The error message indicates there is a control character so perhaps an artifact from cutting/pasting or an editor issue?

Thanks for the updates. Let us know if this is helpful.

Regards,
Dexter

Thanks for your quick updates and help with this!

[2025-02-24T18:19:40,319][INFO ][o.e.c.r.a.AllocationService] [aws-net-ost-elastiflow01] current.health="GREEN" message="Cluster health status changed from [RED] to [GREEN] (reason: [shards started [[.ds-.logs-deprecation.elasticsearch-default-2025.02.21-000001][0]]])." previous.health="RED" reason="shards started [[.ds-.logs-deprecation.elasticsearch-default-2025.02.21-000001][0]]"

I am still not seeing any Data Streams in Index Management after confirming the flow logs are being decoded above.

What are the Elasticsearch settings in your flowcoll.yml file?

dxturner@mg-lab:~$ grep EF_OUTPUT_ELASTICSEARCH /etc/elastiflow/flowcoll.yml | grep -v ^#
EF_OUTPUT_ELASTICSEARCH_ADDRESSES: 127.0.0.1:9200
EF_OUTPUT_ELASTICSEARCH_ECS_ENABLE: "false"
EF_OUTPUT_ELASTICSEARCH_ENABLE: "true"
EF_OUTPUT_ELASTICSEARCH_INDEX_SUFFIX: "mgr"
EF_OUTPUT_ELASTICSEARCH_INDEX_TEMPLATE_ENABLE: "true"
EF_OUTPUT_ELASTICSEARCH_INDEX_TEMPLATE_OVERWRITE: "false"
EF_OUTPUT_ELASTICSEARCH_INDEX_TEMPLATE_REPLICAS: 0
EF_OUTPUT_ELASTICSEARCH_INDEX_TEMPLATE_SHARDS: 1
EF_OUTPUT_ELASTICSEARCH_PASSWORD: < deleted >
EF_OUTPUT_ELASTICSEARCH_TIMESTAMP_SOURCE: collect
EF_OUTPUT_ELASTICSEARCH_TLS_CA_CERT_FILEPATH: ""
EF_OUTPUT_ELASTICSEARCH_TLS_ENABLE: "true"
EF_OUTPUT_ELASTICSEARCH_TLS_SKIP_VERIFICATION: "true"
EF_OUTPUT_ELASTICSEARCH_TSDS_ENABLE: "true"
EF_OUTPUT_ELASTICSEARCH_USERNAME: elastic

grep EF_OUTPUT_ELASTICSEARCH /etc/elastiflow/flowcoll.yml | grep -v ^#

EF_OUTPUT_ELASTICSEARCH_ADDRESSES: 127.0.0.1:9200
EF_OUTPUT_ELASTICSEARCH_ECS_ENABLE: "false"
EF_OUTPUT_ELASTICSEARCH_INDEX_TEMPLATE_REPLICAS: 0
EF_OUTPUT_ELASTICSEARCH_PASSWORD: 'password omitted'
EF_OUTPUT_ELASTICSEARCH_TLS_ENABLE: "true"
EF_OUTPUT_ELASTICSEARCH_TLS_SKIP_VERIFICATION: "true"
EF_OUTPUT_ELASTICSEARCH_TSDS_ENABLE: "true"
EF_OUTPUT_ELASTICSEARCH_USERNAME: elastic

And what do you see if you run:

GET _cat/indices?v&s=index

From the Management > Dev Tools > Console?

I see the following output on the right.

health status index                                                              uuid                   pri rep docs.count docs.deleted store.size pri.store.size dataset.size
green  open   .internal.alerts-default.alerts-default-000001                     H1Gg9xXqQ0idcRt5GDEqTA   1   0          0            0       249b           249b         249b
green  open   .internal.alerts-ml.anomaly-detection-health.alerts-default-000001 2cCylaDqTFW28imon93CRQ   1   0          0            0       249b           249b         249b
green  open   .internal.alerts-ml.anomaly-detection.alerts-default-000001        jAJpLw6kRq-cV0jJoNEE-g   1   0          0            0       249b           249b         249b
green  open   .internal.alerts-observability.apm.alerts-default-000001           HL1zgO4MQAqNZ9H7svmfrA   1   0          0            0       249b           249b         249b
green  open   .internal.alerts-observability.logs.alerts-default-000001          tOkim7dJSVClnFlN2X6dvA   1   0          0            0       249b           249b         249b
green  open   .internal.alerts-observability.metrics.alerts-default-000001       GGL6V4TmTjq5fj3ckeLQXQ   1   0          0            0       249b           249b         249b
green  open   .internal.alerts-observability.slo.alerts-default-000001           IExYi9cLTvOCL-VyXrN3Nw   1   0          0            0       249b           249b         249b
green  open   .internal.alerts-observability.threshold.alerts-default-000001     fZeiLOryRp24KUvDKJoHXQ   1   0          0            0       249b           249b         249b
green  open   .internal.alerts-observability.uptime.alerts-default-000001        1lEGf5ctTOCjbB1oSZMB2g   1   0          0            0       249b           249b         249b
green  open   .internal.alerts-security.alerts-default-000001                    IerukUOsRmKm9i6DPvBfIg   1   0          0            0       249b           249b         249b
green  open   .internal.alerts-stack.alerts-default-000001                       zWhu0MEMQUuSQy-T_L-MFA   1   0          0            0       249b           249b         249b
green  open   .internal.alerts-transform.health.alerts-default-000001            _TY41lF4QbWM4FjRTgxAHA   1   0          0   

This looks like none of the elastiflow indices are being created. I also notice in your config that you do not have
EF_OUTPUT_ELASTICSEARCH_ENABLE: "true"
which is what ‘turns on’ output to Elasticsearch, or
EF_OUTPUT_ELASTICSEARCH_INDEX_TEMPLATE_ENABLE: "true"
which pushes the index templates to Elasticsearch on startup.

Set those, and when starting up, check to see that you are connecting to Elasticsearch (look for ‘healthcheck’ messages) and that index templates are being sent (look for ‘index template’ ). Hope this helps.

Wow, who would have thought that EF_OUTPUT_ELASTICSEARCH_ENABLE: “true” would not be turned on by default out of the box but does make sense as you have multiple output options.

Well once those lines were uncommented and flowcoll restarted, the lights came on like Johnny 5. I am seeing indexes, data streams, and Log Analytics now, and flow records in the Flows tab on ElastiFlow, so everything seems to be good now.

I cant thank you enough for your help with this and sticking with it!

1 Like

Glad it’s working! I was wondering if you could do me a favor?

Could you navigate to the Analytics > Discover menu, choose the elastiflow-flow-codex-* data source (could be ‘ecs’ instead of ‘codex’), select one record to expand (any record), and then in the details window choose ‘JSON’ and the copy icon.

If you could paste the details of that record in an email to me ( dexter at elastiflow dot com) it would be a great help. I’m trying to validate some changes and only have simulated records to work with. Having a “real” record would help me improve the scenarios I’m testing.

Thanks,
Dexter