Really high CPU load over time #1356

dotarmin · 2020-12-18T07:24:05Z

Expected behaviour

Be able to play clips, both long and short without having to worry about the CPU load.

Current behaviour

When playing shorter clips using v2.3.0 LTS (even in v2.2.0), the CPU load goes to 90-92% over time and is stuck there. I have attached some screens to show how it looks like. For longer clips we do not see this behaviour.

Shorter clips = around 20 seconds
Longer clips = hours

I think it has to do with the number of commands sent and that it's not related to the actual file length, but it's just a theory.

v2.3.0 LTS does not crash when this happen
2.2.0 does crash when this happen
v2.0.7 - Works

Used commands (from automation system)

LOAD
PLAY

LOAD
PLAY

Environment

Server version: v2.3.0 LTS
Operating system: Windows 7 x64
8 decklink channels (fill only) configured but only 2 actively used

Screenshots

The text was updated successfully, but these errors were encountered:

TondaKrist · 2021-01-08T14:41:00Z

We are experiencing that too. After some period CasparCG 2.3 LTS process stucks at 99% and then fails. Even after STOPping all layers and playing only one then.

ronag · 2021-01-08T15:20:47Z

I have seen this too.

ronag · 2021-01-08T15:21:04Z

Does anyone have reliable repro steps?

Julusian · 2021-01-08T15:22:32Z

@scriptorian is able to reproduce this and is having a look into the cause

hummelstrand · 2021-01-08T15:26:48Z

Seems like it can be reproduced by issuing multiple LOAD and PLAY commands over time.

TondaKrist · 2021-01-08T15:30:09Z

Reproducable after multiple PLAY and LOADBG commands over time as @hummelstrand mentioned - even on single layer. I will prepare commands log to reproduce.

scriptorian · 2021-01-20T12:06:06Z

As mentioned I have managed to reproduce this with a test script that repeatedly LOADs a clip onto a channel/layer (using the ffmpeg producer). No PLAY is required to provoke the fault. For testing I have made the script loop every 200ms and this makes the problem apparent in a reasonable amount of time. The first symptom is the process working set increasing linearly, then after a few minutes the CPU load starts increasing too.

I have analysed the application using various tools and confirmed that it is working well and not leaking any threads or objects on the heap (with the exception of one rare bug that I have addressed - not relevant to this problem) which is great news but frustrating in terms of finding the problem. I recently tried running Windows Performance Analyzer and finally found a clue. By comparing CPU usage early and late in a run it was apparent that an increasing amount of time was spent in the TBB library and with cleaning up thread local storage. With some very simple (and not production ready!) hacking I removed the TBB thread parallel optimisations in the ffmpeg producer and the memory and CPU growth problem disappeared.

I don't believe there is anything wrong with the CasparCG code that uses this library so my next step will be to get an updated version of the TBB library and try again with that. The release notes mention some bugfixes that may be relevant. Intel have now wrapped it into their new oneAPI product and installing that failed for me just now. If anyone here has experience of this library (@ronag?) I'd be grateful for any pointers for how you cooked it / downloaded it last time.

ronag · 2021-01-20T13:26:29Z

Try skipping the custom tbb stuff and use the regular ffmpeg thread pool?

scriptorian · 2021-01-20T14:42:56Z

Thanks @ronag. If you are referring to to the override of AVFilterGraph::execute that is currently using TBB as the custom multithreading implementation then yes, I have turned this off. The real difference with this problem though is in the tbb::parallel_invoke and tbb::parallel_for_each calls in av_producer and av_util. Removing these stops the problem, removing just one of them halves the rate of growth!

ronag · 2021-01-20T16:06:13Z

For now just remove the tbb stuff. We can follow up with another PR with an updated tbb version later.

ronag · 2021-01-20T16:06:39Z

I don't know how to update tbb at the moment since intel wrapped it into oneAPI.

ronag · 2021-01-20T16:08:07Z

on windows you can also try https://docs.microsoft.com/en-us/cpp/parallel/concrt/how-to-write-a-parallel-for-loop?view=msvc-160

ronag · 2021-01-20T16:08:20Z

Do we know if this problem occurs on Linux?

scriptorian · 2021-01-20T16:22:32Z

Thanks for the suggestions. I've got hold of the latest tbb now and I think the best approach is to push through with trying that. If the problem has gone away then there are no code changes (any tbb interface changes notwithstanding) and linux should continue to work - hopefully without any problems. Any other approach would require a fair amount of code changes with potentially surprising impacts on performance and that seems like something to avoid if possible.

TondaKrist · 2021-01-25T12:25:32Z

Sorry, is it something we can fix via some TBB tweaking in Windows, or not?

scriptorian · 2021-01-25T12:30:24Z

I have now downloaded and built with the latest TBB library from the Intel oneAPI product. There were some API changes but dealing with these was straightforward and should be safe.
The good news is that this completely fixed the growing CPU and memory problems. I have left my test script running for a good long time and everything stayed very steady.

TondaKrist · 2021-01-25T12:35:47Z

Awesome, will it be included in some future builds of CasparCG? Or can you please provide your build for long time testing?

scriptorian · 2021-01-25T12:39:29Z

We are just discussing how to progress with testing this change and whether to make a beta version. Does anyone here have any thoughts? I'll update this thread when we have a plan!

hummelstrand · 2021-01-25T19:47:51Z

Please beta test and report any issues here!
https://github.com/CasparCG/server/releases/tag/v2.3.2-lts-beta

dimitry-ishenko · 2021-01-25T21:14:31Z

Is this something to worry about on Linux? (Running NRK version).

scriptorian · 2021-01-26T08:59:38Z

It's not clear whether the TBB bug also exists in the Linux version. The TBB release notes include some mentions of fixing relevant bugs in the Windows version so there is reasonable hope that this problem won't affect Linux.
The updated TBB library is available for Linux so it should be straightforward to make an updated build if problems appear.

hummelstrand · 2021-01-26T09:38:08Z

Is this something to worry about on Linux? (Running NRK version).

The latest NRK version of CasparCG Server is v2.1, so it is not affected by this bug which seems to have been introduced in v2.2.

dimitry-ishenko · 2021-01-26T15:26:30Z

OK I get it. Thank you @scriptorian and @hummelstrand

martastain · 2021-01-29T10:32:35Z

Just FYI: It seems there is no problem with increasing CPU load on 2.3.2 beta on Windows 10 (yellow lines). There is just a slight memory usage increase over time but from my experience, it will eventually drop.

Green lines belong to a custom 2.3.0 build running on Debian. Both servers use LOADBG/AUTO to play mixed (Linux) and XDCAM HD (Windows) playlists.

TondaKrist · 2021-01-29T10:41:05Z

I have to confirm, that this build fixes CPU usage leak on Windows (both Intel and AMD currently running 5 days 24/7).
Thanks guys, awesome job in investigation and fix.

Unfortunately I have experienced memery leak on GPU when HTML tempalte GPU acceleration is enabled. I will start a new thread for that.

ronag · 2021-01-29T11:02:41Z

Unfortunately I have experienced memery leak on GPU when HTML tempalte GPU acceleration is enabled.

I have also encountered this.

dotarmin · 2021-01-29T12:04:16Z

~~@TondaKrist or @ronag, can you please create an issue for this of not already done? Thanks~~

Never mind, already done, thanks!

sendust · 2021-02-01T03:56:45Z

This is off-topic, but Beta-version v2.3.2-lts-beta also has audio issues on systems that use the 1001-based-standard.
#1326 already has a solution to the audio issue, and I hope users using NTSC can participate in this test.
Thanks~~

dotarmin added the type/bug label Dec 27, 2020

Julusian closed this as completed Mar 6, 2023

Julusian mentioned this issue Mar 6, 2023

Memory leak on loading and playing clips #1214

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Really high CPU load over time #1356

Really high CPU load over time #1356

dotarmin commented Dec 18, 2020 •

edited

Loading

TondaKrist commented Jan 8, 2021

ronag commented Jan 8, 2021

ronag commented Jan 8, 2021

Julusian commented Jan 8, 2021

hummelstrand commented Jan 8, 2021

TondaKrist commented Jan 8, 2021

scriptorian commented Jan 20, 2021

ronag commented Jan 20, 2021

scriptorian commented Jan 20, 2021

ronag commented Jan 20, 2021

ronag commented Jan 20, 2021

ronag commented Jan 20, 2021

ronag commented Jan 20, 2021

scriptorian commented Jan 20, 2021

TondaKrist commented Jan 25, 2021

scriptorian commented Jan 25, 2021

TondaKrist commented Jan 25, 2021

scriptorian commented Jan 25, 2021

hummelstrand commented Jan 25, 2021

dimitry-ishenko commented Jan 25, 2021

scriptorian commented Jan 26, 2021

hummelstrand commented Jan 26, 2021

dimitry-ishenko commented Jan 26, 2021

martastain commented Jan 29, 2021

TondaKrist commented Jan 29, 2021

ronag commented Jan 29, 2021

dotarmin commented Jan 29, 2021 •

edited

Loading

sendust commented Feb 1, 2021

Really high CPU load over time #1356

Really high CPU load over time #1356

Comments

dotarmin commented Dec 18, 2020 • edited Loading

Expected behaviour

Current behaviour

Environment

Screenshots

TondaKrist commented Jan 8, 2021

ronag commented Jan 8, 2021

ronag commented Jan 8, 2021

Julusian commented Jan 8, 2021

hummelstrand commented Jan 8, 2021

TondaKrist commented Jan 8, 2021

scriptorian commented Jan 20, 2021

ronag commented Jan 20, 2021

scriptorian commented Jan 20, 2021

ronag commented Jan 20, 2021

ronag commented Jan 20, 2021

ronag commented Jan 20, 2021

ronag commented Jan 20, 2021

scriptorian commented Jan 20, 2021

TondaKrist commented Jan 25, 2021

scriptorian commented Jan 25, 2021

TondaKrist commented Jan 25, 2021

scriptorian commented Jan 25, 2021

hummelstrand commented Jan 25, 2021

dimitry-ishenko commented Jan 25, 2021

scriptorian commented Jan 26, 2021

hummelstrand commented Jan 26, 2021

dimitry-ishenko commented Jan 26, 2021

martastain commented Jan 29, 2021

TondaKrist commented Jan 29, 2021

ronag commented Jan 29, 2021

dotarmin commented Jan 29, 2021 • edited Loading

sendust commented Feb 1, 2021

dotarmin commented Dec 18, 2020 •

edited

Loading

dotarmin commented Jan 29, 2021 •

edited

Loading