Skip to content

Conversation

@nehaljwani
Copy link

This feature enables users to specify a dead size for the log files.
As soon as the size of the log file exceeds the gives size, lsf will
stop reading logs from the file. It will resume reading, only when
the log file is rotated or renamed.

Changes:

  • config.go:
    Rename deadtime to deadTimeDur to explicitely determine it's usage
    Introduce DeadSize (string) and deadSizeVal (uint64)
  • harvestor.go:
    Apply check for dead size. Stop harvesting once exceeded.
  • logstash-forwarder.go:
    Introduce a global map and mutex for deadfile db.
  • prospector.go:
    Resume harvestor on dead files only under certain conditions.
  • bytefmt.go:
    Add function to convert size-string to bytes
  • bytefmt_test.go:
    Add unit testcases for the new conversion function

@jordansissel
Copy link
Contributor

I haven't looked at the code yet because the CLA isn't signed for you yet.

As for the idea, I don't think logstash-forwarder should stop until some external force rotates lsf's own logfile. If rotation is needed, lsf should do that itself.

@nehaljwani
Copy link
Author

Oh. @jordansissel, I think I put a confusing message in the commit. s/writing/reading. The 'dead size' parameter enables user to put a threshold on the maximum amount of data to be forwarded for a given log file. So, for example, my application writes logs to the file /var/log/myapp.log and lsf is configured to ship it's contents somewhere, then it should stop harvesting the file as soon as it crosses the 'dead size' for that file. But should resume from 0, when the file is rotated.

@jordansissel
Copy link
Contributor

What is the goal of stopping reading? What consequence is the length of the
file being read?

On Monday, June 22, 2015, Nehal J Wani notifications@github.com wrote:

Oh. @jordansissel https://github.com/jordansissel I think I put a
confusing message in the commit. s/writing/reading. The 'dead size'
parameter enables user to put a threshold on the maximum amount of data to
be forwarded for a given log file. So, for example, my application writes
logs to the file /var/log/myapp.log and lsf is configured to ship it's
contents somewhere, then it should stop harvesting the file as soon as it
crosses the 'dead size' for that file. But should resume from 0, when the
file is rotated.


Reply to this email directly or view it on GitHub
#477 (comment)
.

@nehaljwani
Copy link
Author

Suppose I have configured my webapp to log stuff with level INFO and WARN and expect the size of the log file to be 5GB for one day. But it so happens that due to some edge case, the webapp starts throwing exception(s), and keeps doing so every 15 seconds. This generates a very very huge log file. So, if the content of the log is similar for the past, say 1GB, then it isn't worth continuing shipping the same content to elasticsearch. I end up paying for more network bandwidth and have redundant data. If 'dead size' is available, I can put a max threshold to about 7.5GB. In the worst case if ES doesn't help me debug the error, or the logs are indeed legit because of high traffic on my web app, I can always look at the RAW log file and increase the threshold value the very next day.
This feature becomes more helpful when I have different webapps running on different servers, all forwarding to a single ES cluster. In such a scenario, if one of them hogs bandwidth, the I loose the legit logs generated by the other webapps.
It's rare, but another scenario can be that the webapp logs too much unnecessary stuff, or is too verbose, which somehow didn't catch the eye during code review.

@jordansissel
Copy link
Contributor

Hmm.. What about instead of a "file length" stopping point, there's some kind of quota based on bytes-over-time, not file offset?

@nehaljwani
Copy link
Author

Since we use logrotate every hour on the log files, that didn't strike me for our specific use case. But yes, a quota based on bytes-over-time makes more sense. In that case, the values for 'dead size' would be something like '10GB/24h' or '500MB/1m'. I'll try to implement this and send another pull request. What do you say?

@nehaljwani nehaljwani force-pushed the dead-size branch 4 times, most recently from c075549 to 0a10a8b Compare July 8, 2015 17:10
This feature enables users to specify a dead size for the log files.
As soon as the size of the log file exceeds the gives size, lsf will
stop harvesting logs from that file. It will resume writing, only
when the log file is rotated or renamed.

Changes:
* config.go:
  Rename deadtime to deadTimeDur to explicitely determine it's usage
  Introduce DeadSize (string) and deadSizeVal (uint64)
* harvestor.go:
  Apply check for dead size. Stop harvesting once exceeded.
* logstash-forwarder.go:
  Introduce a global map and mutex for deadfile db.
* prospector.go:
  Resume harvestor on dead files only under certain conditions.
* bytefmt.go:
  Add function to convert size-string to bytes
* bytefmt_test.go:
  Add unit testcases for the new conversion function
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants