Skip to content
This repository has been archived by the owner on May 4, 2021. It is now read-only.

Cleaning script filter_hunalign_bitext.py silently fails when running out of memory #18

Open
achimr opened this issue Oct 24, 2017 · 0 comments
Labels

Comments

@achimr
Copy link
Contributor

achimr commented Oct 24, 2017

When running baseline/filter_hunalign_bitext.py , e.g. like this

nohup cat en-de.sent | ~/DataCollection/baseline/filter_hunalign_bitext.py - en-de.filtered --lang1 en --lang2 de -cld2 -deleted en-de.deleted 2> filter.log &

and the process runs out of memory cleaning will stop, filter.log will be empty and there will be no en-de.deleted file. The root cause of this is that the deleted segments are stored in memory.

For now the only indicator of the failure is the missing .deleted file and the work around is to allocate more memory and re-run the cleaning.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

1 participant