Skip to content

Commit

Permalink
Merge pull request #26 from tokuhirom/tokuhirom-patch-1
Browse files Browse the repository at this point in the history
Use bunzip2 in streaming style
  • Loading branch information
tokuhirom authored Aug 5, 2024
2 parents d62e22e + 7968862 commit 69b69a9
Showing 1 changed file with 2 additions and 5 deletions.
7 changes: 2 additions & 5 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,8 @@ test:
autopep8 --max-line-length 180 -i *.py */*.py
flake8 . --count --exit-zero --max-complexity=30 --max-line-length=1200 --statistics

dat/jawiki-latest-pages-articles.xml.bz2:
wget --no-verbose --no-clobber -O dat/jawiki-latest-pages-articles.xml.bz2 https://dumps.wikimedia.org/jawiki/latest/jawiki-latest-pages-articles.xml.bz2

dat/jawiki-latest-pages-articles.xml: dat/jawiki-latest-pages-articles.xml.bz2
bunzip2 --keep --force dat/jawiki-latest-pages-articles.xml.bz2
dat/jawiki-latest-pages-articles.xml:
curl -s https://dumps.wikimedia.org/jawiki/latest/jawiki-latest-pages-articles.xml.bz2 | bunzip2 > dat/jawiki-latest-pages-articles.xml

dat/grepped.txt: dat/jawiki-latest-pages-articles.xml
grep -E "<title>.*</title>|'''[』|((]" dat/jawiki-latest-pages-articles.xml > dat/grepped.txt
Expand Down

0 comments on commit 69b69a9

Please sign in to comment.