Out of memory Exception #45

aleksandre · 2018-05-29T21:26:26Z

Hi,

I use the log4stash appender in Async mode. When the Elasticsearch server is not availaible for too long, I end up getting an OutOfMemoryException.

Is there an existing mechanism to limit the number of log entries that should be kept in memory by the appender ?

If there is no existing mechanism, I would propose to add a new parameter BulkListLimit that would prevent the exception. ElasticSearchAppender would stop adding new entries when that limit is reached.

To make the solution backward compatible, the parameter would be optional and set to no limit by default.

What do you think ?

urielha · 2018-05-30T11:34:23Z

Hi @aleksandre ,

First thanks for the issue and the pull request - #46 (Added BulkListLimit parameter).

Unfortunately, it seems to me like it is not the main cause for the OutOfMemoryException.

I want to verify first whether this is the real problem,
As you can see in ElasticSearchAppender.cs:209 the bulk is getting replaced by brand new one on indexing. Therefore, there shouldn't be any "endlessly accumulating bulk size" in my opinion.
In addition, I already have added a bulksize mechanism which make sure to send (and clear) a bulk which has more than specific size.

In the other hand, I suspect that your exception is occurred because log4stash is opening requests endlessly without closing them (which is also weird but makes more sense in my opinion).
Note - the place in code is ElasticClient.cs:110

Can you please check these things for me:

Add the stack-trace for your OutOfMemoryException .
Try settings ElasticSearchTimeout to smaller number (less than 10000).
Try also reduce the BulkSize (to be less than 5000).

I recall several problems with this async mechanism and that's why I suspect on this part.

I hope to change it to be something more stable (not using those Begin... and End...) - I will open an issue for that so I can get help.

aleksandre · 2018-05-30T13:05:13Z

HI @urielha ,

Thanks for your fast response. As you asked, here is the stacktrace :

System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
at System.Text.StringBuilder.ToString()
at log4stash.WebElasticClient.PrepareBulk(IEnumerable1 bulk) in d:\uriel\Programing\C#\log4stash\src\log4stash\ElasticClient\ElasticClient.cs:line 148 at log4stash.WebElasticClient.PrepareRequest(IEnumerable1 bulk) in d:\uriel\Programing\C#\log4stash\src\log4stash\ElasticClient\ElasticClient.cs:line 130
at log4stash.WebElasticClient.IndexBulkAsync(IEnumerable`1 bulk) in d:\uriel\Programing\C#\log4stash\src\log4stash\ElasticClient\ElasticClient.cs:line 109
at log4stash.ElasticSearchAppender.DoIndexNow() in d:\uriel\Programing\C#\log4stash\src\log4stash\ElasticSearchAppender.cs:line 216

The problem seems to happen with the StringBuilder, even when when my system is not using 100 % of it's RAM.

I used the following parameters to reproduce the problem :

ElasticSearchTimeOut = 1000
BulkSize = 100

urielha · 2018-05-30T15:37:46Z

Emm..

What kind of objects do you usually have? Does it make sense that this object you are trying to log is really huge (in terms of object represented as json string) ?
If the answer to the above question is yes - I suppose this has something to do with lot's of strings that got allocated in the Large Object Heap (you can verify it via tools like perfmon).

If the answer is no, I suppose we are back with the bad Begin/End.. async stuff ( issue #47 ).

Anyway,
Can you try logging non-async and see if this still occurs? or your app is really needs the async feature?
And one more thing that I thought about - it's a long shot - try settings SerializeObjects property to false like so:

    <appender name="ElasticSearchAppender" type="log4stash.ElasticSearchAppender, log4stash">
        <SerializeObjects>false</SerializeObjects>

        <!-- The rest of your configuration.... -->
    </appender>

pravinkarthy · 2018-07-26T01:29:47Z

Hi @aleksandre ,

Im trying to implement log4stash Asynchronously as well. Did your PR changes solve the issue ?
Also, would like to know if you are using "log4net.async" nuget package for asynchronous logging with log4stash.

Thanks & Regards.

urielha · 2018-07-27T11:38:49Z

@pravinkarthy, have you encountered this problem too?

pravinkarthy · 2018-07-30T07:06:34Z

hi.. I have not encountered any problem as of now. I just wanted to know if we do not include "False" in the appender, will then log4stash operate in Async mode ? Am asking this because the default constructor has "IndexAsync" property set to "true";

urielha · 2018-07-31T22:00:45Z

@pravinkarthy you are right about the IndexAsync
I'm thinking of changing this default behavior but afraid to break users that use old versions.
As for now, I recommend to always specify the IndexAsync parameter on your configuration.

About the out of memory exception:

I ran some tests and found that the main cause is the bulk size.
Events are accumulating faster than cleaned (after connection is failed).

@aleksandre - What is your configuration for Bulksize ? Please try to reduce it.
Furthermore, I'm working on limiting the number of open connections but it seems to has really minor effects in compare to reducing the Bulksize.

aleksandre · 2018-07-31T22:26:56Z

Hi @urielha and @pravinkarthy ,

I only had OutOfMemory exception while doing performance testing on the appender. With a bulk size of 1000, I had to be very agressive with the tests to trigger the error.

I have been using version 2.1.0 in production since about a month and it has been very stable so far. I have a few applications logging +1 million requests per day.

In my case, the requests are evenly distributed during the day and the volume is low, so the memory is not a problem. If your workload is bigger and happens in 'spikes', you may need to lower the bulk size and reduce the bulk idle timeout, like urielha said.

Otherwise, you would need a new parameter to put a hard limit on the appender cache size.

Here is my config :
Bulksize = 1000
BulkIdleTimeout = 10000
IndexAsync = True

aleksandre · 2018-07-31T22:34:50Z

@pravinkarthy

Yes, the PR solved the issue, but we are not using it because our real life workload does not cause the error.

pravinkarthy · 2018-08-01T00:03:40Z

thanks a lot for the info @urielha and @aleksandre !

urielha · 2018-08-08T20:29:11Z

FYI,
I'm planning on add a new configuration property ( related commit )so you can limit the bulk size and drop events if the limit has been reached.

If you choose to set this to true - choose wisely the BulkSize and BulkIdleTimeout parameters so it won't drop events when you don't have any error on the server.

For example,
If you create 100 events per second in normal day and lets say the server communication is taking 0.1 sec per bulk operation. If you chose to set BulkIdleTimeout = 1000 (1sec) I would set BulkSize = 110 (at least).

nerumo · 2020-10-13T07:39:41Z

If you choose to set this to true - choose wisely the BulkSize and BulkIdleTimeout parameters so it won't drop events when you don't have any error on the server.

I agree that in the current implementation the value of BulkSize is critical. Since dropping isn't necessarily bound to the BulkSize, wouldn't it make sense to use a separat "DropLimit"? E.g. I want to set BulkSize to 100, but if there are more than 20'000, I want to start dropping.

urielha · 2020-10-25T11:52:41Z

Mmm I want to make sure I understand -
You want the bulk size which sent to Elasticsearch to be 100 but the internal accumulating bulk size to be 20,000 ?

I think it will be a little challenging to implement but it makes sense.

nerumo · 2020-10-26T06:36:21Z

I looked into the implementation. I agree, it's a bigger change. One option I see is to extend LogBulkSet to keep every "not yet fully sent" bulk and offer an accumulated count of all bulks that aren't fully sent. After each send a "commit" must happen on the LogBulkSet to finally remove the messages from the bulk. Would you agree on that solution? If so, I'd create an initial merge request.

urielha · 2020-10-26T11:31:29Z

I'm not sure I fully understood your approach.
Each send will send one bulk from the Set upon one BulkIdleTimeout ?
It seems like a reasonable solution but keep in mind the Set could get edited by multiple threads at a time.

On more thing - with Set there is no order so it could send only newer bulks and "starve" the old ones, maybe a list / Queue is more suitable? (actually ConcurrentQueue seems like the most suitable one)

If I'm wrong please elaborate.

Finally, I would be happy if you could create a commit for that, thanks! :)

aleksandre mentioned this issue May 29, 2018

Added BulkListLimit parameter #46

Closed

urielha added the Needs Info label May 30, 2018

urielha mentioned this issue Jun 29, 2018

The remote server returned an error: (406) Not Acceptable. #51

Closed

urielha removed the Needs Info label Jul 31, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Out of memory Exception #45

Out of memory Exception #45

aleksandre commented May 29, 2018 •

edited

Loading

urielha commented May 30, 2018 •

edited

Loading

aleksandre commented May 30, 2018

urielha commented May 30, 2018 •

edited

Loading

pravinkarthy commented Jul 26, 2018

urielha commented Jul 27, 2018

pravinkarthy commented Jul 30, 2018

urielha commented Jul 31, 2018

aleksandre commented Jul 31, 2018 •

edited

Loading

aleksandre commented Jul 31, 2018

pravinkarthy commented Aug 1, 2018

urielha commented Aug 8, 2018 •

edited

Loading

nerumo commented Oct 13, 2020

urielha commented Oct 25, 2020

nerumo commented Oct 26, 2020

urielha commented Oct 26, 2020 •

edited

Loading

Out of memory Exception #45

Out of memory Exception #45

Comments

aleksandre commented May 29, 2018 • edited Loading

urielha commented May 30, 2018 • edited Loading

aleksandre commented May 30, 2018

urielha commented May 30, 2018 • edited Loading

pravinkarthy commented Jul 26, 2018

urielha commented Jul 27, 2018

pravinkarthy commented Jul 30, 2018

urielha commented Jul 31, 2018

About the out of memory exception:

aleksandre commented Jul 31, 2018 • edited Loading

aleksandre commented Jul 31, 2018

pravinkarthy commented Aug 1, 2018

urielha commented Aug 8, 2018 • edited Loading

nerumo commented Oct 13, 2020

urielha commented Oct 25, 2020

nerumo commented Oct 26, 2020

urielha commented Oct 26, 2020 • edited Loading

aleksandre commented May 29, 2018 •

edited

Loading

urielha commented May 30, 2018 •

edited

Loading

urielha commented May 30, 2018 •

edited

Loading

aleksandre commented Jul 31, 2018 •

edited

Loading

urielha commented Aug 8, 2018 •

edited

Loading

urielha commented Oct 26, 2020 •

edited

Loading