Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pathspider running out of memory #242

Open
nstudach opened this issue Sep 13, 2018 · 3 comments
Open

Pathspider running out of memory #242

nstudach opened this issue Sep 13, 2018 · 3 comments

Comments

@nstudach
Copy link
Contributor

nstudach commented Sep 13, 2018

The problem is that pathspider is unable to finish, but also that it will not terminate after throwing the out of memory error.

Running Pathspider on a digital ocean VM with 2gb of ram and Debian 9.
Pathspider is invocated via a python script using subprocess.call().
Executing the webtest.ndjson in examples/webtest.ndjson works flawlessly.

The allocatet memory starts at around 200mb and slowly increases with time.
There has been similar problems with os.fork() that when a new proces is instanciated it allocates the same memory as the parent process in case it would need it.
See:https://stackoverflow.com/questions/1367373/python-subprocess-popen-oserror-errno-12-cannot-allocate-memory

The number of filedescriptors is always around 730. As exeeding this limit can also throw the same error.

stderr is:

WARNING: Failed to execute tcpdump. Check it is installed and in the PATH
INFO:pathspider:activating spider...
INFO:pathspider:starting pathspider
INFO:pathspider:Creating observer
INFO:pathspider:opening output file Gm8h6-fra1-ecn
ERROR:pathspider:exception occurred. terminating.
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/pathspider-2.0.0.dev0-py3.5.egg/pathspider/base.py", line 403, in exception_wrapper
    target(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/pathspider-2.0.0.dev0-py3.5.egg/pathspider/sync.py", line 46, in configurator
    self.configurations[config](self)
  File "/usr/local/lib/python3.5/dist-packages/pathspider-2.0.0.dev0-py3.5.egg/pathspider/plugins/ecn.py", line 44, in config_ecn
    stderr=subprocess.DEVNULL)
  File "/usr/lib/python3.5/subprocess.py", line 266, in check_call
    retcode = call(*popenargs, **kwargs)
  File "/usr/lib/python3.5/subprocess.py", line 247, in call
    with Popen(*popenargs, **kwargs) as p:
  File "/usr/lib/python3.5/subprocess.py", line 676, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.5/subprocess.py", line 1221, in _execute_child
    restore_signals, start_new_session, preexec_fn)
OSError: [Errno 12] Cannot allocate memory
INFO:pathspider:terminating pathspider
INFO:observer:processed 6038714 packets (24943 dropped, 0 short, 3535 non-ip) into 1053009 flows (0 ignored)

The relevant files on the machine are:

root@ps-fra1-ecn:~# ls -l

-rw-r--r-- 1 root root       773 Sep 12 21:01 config.json
-rw-r--r-- 1 root root 252236255 Sep 13 15:31 Gm8h6-fra1-ecn
-rw-r--r-- 1 root root 191778315 Sep 12 21:01 input.ndjson
-rw-r--r-- 1 root root      2877 Sep 12 21:00 install_script.py
drwxr-xr-x 9 root root      4096 Sep 12 21:01 pathspider
drwxr-xr-x 6 root root      4096 Sep 12 21:01 python-libtrace
-rw-r--r-- 1 root root      2432 Sep 12 21:00 run_pathspider.py
-rw-r--r-- 1 root root      1432 Sep 13 15:30 stderr1.txt

and the system memory

root@ps-fra1-ecn:~# free -h

              total        used        free      shared  buff/cache   available
Mem:           2.0G        1.8G         75M        6.9M        113M         55M
Swap:            0B          0B          0B

Also pathspider is not terminating after the error occurs

root@ps-fra1-ecn:~# ps aux | grep pspdr

root      3356  1.9 86.0 3126620 1766620 ?     Sl   Sep12  24:58 /usr/bin/python3 /usr/local/bin/pspdr measure -i eth0 --input input.ndjson --output Gm8h6-fra1-ecn -w 80 ecn
root      3389  0.8  2.0 432556 42500 ?        Sl   Sep12  11:12 /usr/bin/python3 /usr/local/bin/pspdr measure -i eth0 --input input.ndjson --output Gm8h6-fra1-ecn -w 80 ecn
root     21682  0.0  0.0  12720   896 pts/0    S+   18:27   0:00 grep pspdr
@nstudach
Copy link
Contributor Author

Now that I have tried to access Gm8h6-fra1-ecn (output file) pathspider has terminated.

@irl
Copy link
Member

irl commented Sep 24, 2018

I guess the easy answer here is "add more memory" which is probably not what you want to hear.

Is anything actually calling os.fork()?

@nstudach
Copy link
Contributor Author

The current work-around is to split the inputfile in smaller junks. I split it into 4 pieces and also removed some information reseulting in an inputfile size of about 13mb instead of 200. I tested those with even 80 workers and they ran fine.

If I recall correctly the os.fork() is used by subprocess.poppen(). But this should be mentioned in the stack overflow discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants