CPU usage not fully reaching 100% with large vcpu values #52

WMordy · 2024-12-16T16:11:51Z

While using the tool on a virtual machine to gain more computing power I noticed that not all the available CPU is used. The tool is using full CPU on 3 different local machines but they are just personal laptops with minimal setup.

The VM was an an AWS c6a.24xlarge EC2 with 96 vCPU and 192Go of RAM.
CPU usage was around ~21%
i lunched a second process and CPU usage got into ~46%
third process made it until ~65%
I tried a forth process but it was reaching 100% and impacting other processes op/sec.

The operations per seconds reached ~2.100.000 operations/second and this is impressive.
This is more of a experience than an issue since the tool managed to decrypt my 3 documents with 8 digits all capitals ( made some code changes since i didn't know how to pass that as a custom query ) in less than 18 hours.

Thanks guys for the tool.

Kitt3120 · 2024-12-17T14:38:12Z

Same here

Edit:

There is an argument for it, it's just easy to miss it:

pdfrip/crates/cli-interface/src/arguments.rs

Line 63 in 461fe2f

pub number_of_threads: usize,

So add -n <NUMBER_OF_THREADS> or --number-of-threads <NUMBER_OF_THREADS> before specifying your attack method.

WMordy · 2024-12-17T15:01:54Z

I used it actually i set it to 20 if i set it to any larger number the op/sec go bellow the 2M mark. I don't know if the tool reached more than this barrier before ?

Kitt3120 · 2024-12-17T15:05:20Z

I used it actually i set it to 20 if i set it to any larger number the op/sec go bellow the 2M mark. I don't know if the tool reached more than this barrier before ?

You shouldn't exceed the number of threads your CPU has. Mine has 32, and I can see performance improvements until using 32 threads. However, I am unable to reach even 500k/sec. You're reaching 2M/sec with 20 threads?

Edit: Just saw you have 96 vCPU. I think you're reaching some kind of bottleneck, yes. I haven't seen the code, but a guess of mine is that there's a mutex somewhere. So you're hitting a bottleneck that is there to synchronize stuff between threads.

WMordy · 2024-12-17T15:16:48Z

I guess that this is some kind of bottleneck I'm not sure if that's specific to EC2 instances or is it some code issue since I'm not familiar with rust. Using the tool on EC2 spot instances can be a great solution if the tool can use 100% CPU there too.

Pommaq · 2024-12-27T12:07:17Z

If nothing major has changed from when I wrote those parts then this tool is basically composed of 1 producer and multiple consumers. The worker threads are consumers, whilst a singular thread is a producer of passwords.
The idea is the workers get fed a password, attempt to crack the PDF with it, if successful it reports it back and the process ends.
Most likely the singular producer simply can't keep up with that many cores. So an improvement would likely be to allow multiple producers.

IIRC having multiple producers caused issues back when I wrote the code, mostly related to how the source of "candidate passwords" is basically just an iterator, which is sequential in nature. There are room for some potentially easy gains in this project however. IIRC the producer basically grabs 1 candidate before sending it through a pipe towards the workers. This likely involves a mutex somewhere along the line for each call, so perhaps having the producer prepare a "chunk" of candidates and sending them all at once yields better performance. But It'd probably need to be prototyped and tested to confirm that would actually be faster

Pommaq · 2024-12-27T12:31:31Z

Digging into the code in the "engine" crate then my memory was correct. I don't see any "quick and dirty" way to permit sending multiple passwords in each call without risking 1 worker picking up all of them, leaving the others "empty" until the next batch comes around. But perhaps that would be alright since adding some thoughts and prayers might mean they each get their own "batch" before the first worker gobbled up all of it's candidates. This requires our source of password candidates to be fast enough though.

So basically a quick change would be (in the "engine" crate) to change the messaging pipeline to be a Vec<Vec<u8>> (i.e. a list of passwords instead of just 1 password per message). Then have the engine consume X candidate passwords from it's source before sending a message containing them towards the workers. Assuming we set the "size" of each message to 30 passwords I do see some consequences:

Basically assuming we got more than 2 worker threads and our set of "candidates" is fewer than 30 that would basically mean only one worker will ever get to attempt to crack the PDF. i.e. if we want any reasonable performance then the size of our set of candidates should optimally be something like

# Number of workers consuming messages
WORKER_THREADS = 96
CANDIDATES: set[bytes] = {b"password", b"passwor1", ... }

# Maximum number of messages per "message" sent to each worker
MESSAGE_SIZE: int = len(CANDIDATES) // WORKER_THREADS

Sadly automatically calculating MESSAGE_SIZE is probably not a wise idea since there are situations where we would have to simply read and count all of them in order to know, e.g. if our source is a file where each line is a password. That'd likely be super slow assuming the file is large (and they tend to be in these applications...). But perhaps we can make it an argument the user can utilize for tuning.

EDIT: It seems we already count the number of lines when given a file, looking at the implementation of "LineProducer".

Another option is doing what I think @VMordy did and split the password source (a file?) into multiple parts, and run multiple instances of PDFRip in parallel, since it "emulates" having multiple producers. I don't see how we can do this universally for each source of password candidates however

WMordy · 2024-12-27T14:17:27Z

For parallel processing I tweaked in the source code to edit the password generator in the default_query so I made two binaries each one start the generation from a direction and when started in parallel the ETA is logically the half. I'm not familiar with rust but I'm used to tackle such bottlenecks with a partitioned channels pub/sub style where the producer publish passwords and consumers try to do the work. I can see that in this case that the password producer is the main issue but we can work something up by dividing the pool of passwords by first partitioning it by first characters and then let each producer generate a password based on that first character. I'm just sharing my thoughts with you and I want to thank you for this great tool that you have made.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CPU usage not fully reaching 100% with large vcpu values #52

CPU usage not fully reaching 100% with large vcpu values #52

WMordy commented Dec 16, 2024

Kitt3120 commented Dec 17, 2024 •

edited

Loading

WMordy commented Dec 17, 2024

Kitt3120 commented Dec 17, 2024 •

edited

Loading

WMordy commented Dec 17, 2024

Pommaq commented Dec 27, 2024

Pommaq commented Dec 27, 2024 •

edited

Loading

WMordy commented Dec 27, 2024

CPU usage not fully reaching 100% with large vcpu values #52

CPU usage not fully reaching 100% with large vcpu values #52

Comments

WMordy commented Dec 16, 2024

Kitt3120 commented Dec 17, 2024 • edited Loading

WMordy commented Dec 17, 2024

Kitt3120 commented Dec 17, 2024 • edited Loading

WMordy commented Dec 17, 2024

Pommaq commented Dec 27, 2024

Pommaq commented Dec 27, 2024 • edited Loading

WMordy commented Dec 27, 2024

Kitt3120 commented Dec 17, 2024 •

edited

Loading

Kitt3120 commented Dec 17, 2024 •

edited

Loading

Pommaq commented Dec 27, 2024 •

edited

Loading