Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of memory Exception #45

Open
aleksandre opened this issue May 29, 2018 · 15 comments
Open

Out of memory Exception #45

aleksandre opened this issue May 29, 2018 · 15 comments

Comments

@aleksandre
Copy link

aleksandre commented May 29, 2018

Hi,

I use the log4stash appender in Async mode. When the Elasticsearch server is not availaible for too long, I end up getting an OutOfMemoryException.

Is there an existing mechanism to limit the number of log entries that should be kept in memory by the appender ?

If there is no existing mechanism, I would propose to add a new parameter BulkListLimit that would prevent the exception. ElasticSearchAppender would stop adding new entries when that limit is reached.

To make the solution backward compatible, the parameter would be optional and set to no limit by default.

What do you think ?

@urielha
Copy link
Owner

urielha commented May 30, 2018

Hi @aleksandre ,

First thanks for the issue and the pull request - #46 (Added BulkListLimit parameter).

Unfortunately, it seems to me like it is not the main cause for the OutOfMemoryException.

I want to verify first whether this is the real problem,
As you can see in ElasticSearchAppender.cs:209 the bulk is getting replaced by brand new one on indexing. Therefore, there shouldn't be any "endlessly accumulating bulk size" in my opinion.
In addition, I already have added a bulksize mechanism which make sure to send (and clear) a bulk which has more than specific size.

In the other hand, I suspect that your exception is occurred because log4stash is opening requests endlessly without closing them (which is also weird but makes more sense in my opinion).
Note - the place in code is ElasticClient.cs:110

Can you please check these things for me:

  • Add the stack-trace for your OutOfMemoryException .
  • Try settings ElasticSearchTimeout to smaller number (less than 10000).
  • Try also reduce the BulkSize (to be less than 5000).

I recall several problems with this async mechanism and that's why I suspect on this part.

I hope to change it to be something more stable (not using those Begin... and End...) - I will open an issue for that so I can get help.

@aleksandre
Copy link
Author

HI @urielha ,

Thanks for your fast response. As you asked, here is the stacktrace :

System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
at System.Text.StringBuilder.ToString()
at log4stash.WebElasticClient.PrepareBulk(IEnumerable1 bulk) in d:\uriel\Programing\C#\log4stash\src\log4stash\ElasticClient\ElasticClient.cs:line 148 at log4stash.WebElasticClient.PrepareRequest(IEnumerable1 bulk) in d:\uriel\Programing\C#\log4stash\src\log4stash\ElasticClient\ElasticClient.cs:line 130
at log4stash.WebElasticClient.IndexBulkAsync(IEnumerable`1 bulk) in d:\uriel\Programing\C#\log4stash\src\log4stash\ElasticClient\ElasticClient.cs:line 109
at log4stash.ElasticSearchAppender.DoIndexNow() in d:\uriel\Programing\C#\log4stash\src\log4stash\ElasticSearchAppender.cs:line 216

The problem seems to happen with the StringBuilder, even when when my system is not using 100 % of it's RAM.

I used the following parameters to reproduce the problem :

  • ElasticSearchTimeOut = 1000
  • BulkSize = 100

@urielha
Copy link
Owner

urielha commented May 30, 2018

Emm..

What kind of objects do you usually have? Does it make sense that this object you are trying to log is really huge (in terms of object represented as json string) ?
If the answer to the above question is yes - I suppose this has something to do with lot's of strings that got allocated in the Large Object Heap (you can verify it via tools like perfmon).

If the answer is no, I suppose we are back with the bad Begin/End.. async stuff ( issue #47 ).

Anyway,
Can you try logging non-async and see if this still occurs? or your app is really needs the async feature?
And one more thing that I thought about - it's a long shot - try settings SerializeObjects property to false like so:

    <appender name="ElasticSearchAppender" type="log4stash.ElasticSearchAppender, log4stash">
        <SerializeObjects>false</SerializeObjects>

        <!-- The rest of your configuration.... -->
    </appender>

@pravinkarthy
Copy link

Hi @aleksandre ,

Im trying to implement log4stash Asynchronously as well. Did your PR changes solve the issue ?
Also, would like to know if you are using "log4net.async" nuget package for asynchronous logging with log4stash.

Thanks & Regards.

@urielha
Copy link
Owner

urielha commented Jul 27, 2018

@pravinkarthy, have you encountered this problem too?

@pravinkarthy
Copy link

hi.. I have not encountered any problem as of now. I just wanted to know if we do not include "False" in the appender, will then log4stash operate in Async mode ? Am asking this because the default constructor has "IndexAsync" property set to "true";

@urielha
Copy link
Owner

urielha commented Jul 31, 2018

@pravinkarthy you are right about the IndexAsync
I'm thinking of changing this default behavior but afraid to break users that use old versions.
As for now, I recommend to always specify the IndexAsync parameter on your configuration.

About the out of memory exception:

I ran some tests and found that the main cause is the bulk size.
Events are accumulating faster than cleaned (after connection is failed).

@aleksandre - What is your configuration for Bulksize ? Please try to reduce it.
Furthermore, I'm working on limiting the number of open connections but it seems to has really minor effects in compare to reducing the Bulksize.

@aleksandre
Copy link
Author

aleksandre commented Jul 31, 2018

Hi @urielha and @pravinkarthy ,

I only had OutOfMemory exception while doing performance testing on the appender. With a bulk size of 1000, I had to be very agressive with the tests to trigger the error.

I have been using version 2.1.0 in production since about a month and it has been very stable so far. I have a few applications logging +1 million requests per day.

In my case, the requests are evenly distributed during the day and the volume is low, so the memory is not a problem. If your workload is bigger and happens in 'spikes', you may need to lower the bulk size and reduce the bulk idle timeout, like urielha said.

Otherwise, you would need a new parameter to put a hard limit on the appender cache size.

Here is my config :
Bulksize = 1000
BulkIdleTimeout = 10000
IndexAsync = True

@aleksandre
Copy link
Author

@pravinkarthy

Yes, the PR solved the issue, but we are not using it because our real life workload does not cause the error.

@pravinkarthy
Copy link

thanks a lot for the info @urielha and @aleksandre !

@urielha
Copy link
Owner

urielha commented Aug 8, 2018

FYI,
I'm planning on add a new configuration property ( related commit )so you can limit the bulk size and drop events if the limit has been reached.

If you choose to set this to true - choose wisely the BulkSize and BulkIdleTimeout parameters so it won't drop events when you don't have any error on the server.

For example,
If you create 100 events per second in normal day and lets say the server communication is taking 0.1 sec per bulk operation. If you chose to set BulkIdleTimeout = 1000 (1sec) I would set BulkSize = 110 (at least).

@nerumo
Copy link
Contributor

nerumo commented Oct 13, 2020

If you choose to set this to true - choose wisely the BulkSize and BulkIdleTimeout parameters so it won't drop events when you don't have any error on the server.

I agree that in the current implementation the value of BulkSize is critical. Since dropping isn't necessarily bound to the BulkSize, wouldn't it make sense to use a separat "DropLimit"? E.g. I want to set BulkSize to 100, but if there are more than 20'000, I want to start dropping.

@urielha
Copy link
Owner

urielha commented Oct 25, 2020

Mmm I want to make sure I understand -
You want the bulk size which sent to Elasticsearch to be 100 but the internal accumulating bulk size to be 20,000 ?

I think it will be a little challenging to implement but it makes sense.

@nerumo
Copy link
Contributor

nerumo commented Oct 26, 2020

I looked into the implementation. I agree, it's a bigger change. One option I see is to extend LogBulkSet to keep every "not yet fully sent" bulk and offer an accumulated count of all bulks that aren't fully sent. After each send a "commit" must happen on the LogBulkSet to finally remove the messages from the bulk. Would you agree on that solution? If so, I'd create an initial merge request.

@urielha
Copy link
Owner

urielha commented Oct 26, 2020

I'm not sure I fully understood your approach.
Each send will send one bulk from the Set upon one BulkIdleTimeout ?
It seems like a reasonable solution but keep in mind the Set could get edited by multiple threads at a time.

On more thing - with Set there is no order so it could send only newer bulks and "starve" the old ones, maybe a list / Queue is more suitable? (actually ConcurrentQueue seems like the most suitable one)

If I'm wrong please elaborate.

Finally, I would be happy if you could create a commit for that, thanks! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants