-
Notifications
You must be signed in to change notification settings - Fork 49
Open
Description
I have opened a PR to fix the issue: #44
This fixes an issue that can cause the dawg to allocate a lot of memory on load in the case of a corrupted file.
Unfortunately, the original dawgdic library looks like it has long since ceased to be maintained, so I am opening a PR against the python wrapper.
To replicate the issue, use the following script:
from dawg import BytesDAWG as dawg
with open('corrupt.dawg', 'w') as f:
f.write('corrupt!')
try:
d = dawg().load('corrupt.dawg')
except:
print('failed, as expected')
If you run this under gtime on OSX, you will see somewhere between 4 and 6GB of RAM being used.
$ gtime -v python load_corrupted_dawg
failed, as expected
Command being timed: "python load_corrupted_dawg"
User time (seconds): 2.94
System time (seconds): 3.00
Percent of CPU this job got: 84%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:07.04
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 6256108 <------------------------ SIX GIGABYTES
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 258
Minor (reclaiming a frame) page faults: 1890722
Voluntary context switches: 338
Involuntary context switches: 15360
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
After this PR, this number is greatly reduced
Maximum resident set size (kbytes): 35152
Metadata
Metadata
Assignees
Labels
No labels