Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save files based on processing time and not event time #44

Open
KhaoticMind opened this issue Nov 24, 2022 · 0 comments
Open

Save files based on processing time and not event time #44

KhaoticMind opened this issue Nov 24, 2022 · 0 comments

Comments

@KhaoticMind
Copy link

KhaoticMind commented Nov 24, 2022

Currently the output plugin saves temporary files (the ones that will be sent do ADX) based on the @timestamp field of the events.

When working with a large environment we may have cases where devices aren't fully time synced and might send events close to each other but with very different timestamp values. In this case the plugin will end up creating various small files to send to ADX, what can increase the load on the cluster (various small files being ingested) and increase the cost of the service (various small files end-up triggering various write-operations that accumulate to the total value).

We were facing this issue in our environment and by customizing the filter step on logstash like bellow we forced all events to be write to the file base on the processing time, and not the event time. This helped as cut reduce the write operations costs by 95% (yeah, ninity-five percent!).

   mutate {
      copy => { "@timestamp" => "event_timestamp" }
   }
   ruby {
      code => "event.set('logstash_processed_at', Time.now());"
   }
   mutate {
      copy => { "logstash_processed_at" => "@timestamp" }
   }
   mutate {
      remove_field  => ["logstash_processed_at"]
   }

Before the changes we saw various small files (in kilobytes) being ingested every minute, and now have just one file with 100-200Mb per minute.

While we know that we need to fix the time of the devices sending the data, this issue might also happen because of buffering and send delays (network disconnects and what else).
@avneraa and @ag-ramachandran are aware of our situation.
Unfortunately, we didn't have the time to try to change the plugin code to contribute with a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant