Skip to content

Commit

Permalink
ingest-gisaid: use gisaid.ndjson.xz file
Browse files Browse the repository at this point in the history
Previous commits updated the various scripts called within
`ingest-gisaid` to support the compressed `gisaid.ndjson.xz` file.
Update `ingest-gisaid` to use `gisaid.ndjson.xz` instead of the
uncompressed `gisaid.ndjson` file.
  • Loading branch information
joverlee521 committed Oct 26, 2021
1 parent 3f3e1d4 commit fcba89c
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 11 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ Relies on data from https://simplemaps.com/data/us-cities.
## Running locally
If you're using Pipenv (see below), then run commands from `./bin/…` inside a `pipenv shell` or wrapped with `pipenv run ./bin/…`.

1. Run `./bin/fetch-from-gisaid > data/gisaid.ndjson`
2. Run `./bin/transform-gisaid data/gisaid.ndjson`
1. Run `./bin/fetch-from-gisaid > data/gisaid.ndjson.xz`
2. Run `./bin/transform-gisaid data/gisaid.ndjson.xz`
3. Look at `data/gisaid/sequences.fasta` and `data/gisaid/metadata.tsv`

## Running automatically
Expand Down
18 changes: 9 additions & 9 deletions bin/ingest-gisaid
Original file line number Diff line number Diff line change
Expand Up @@ -73,35 +73,35 @@ main() {

while [[ $((++attempt)) -le $max_attempts ]]; do
echo "Fetch attempt $attempt"
if ./bin/fetch-from-gisaid > data/gisaid.ndjson; then
if ./bin/fetch-from-gisaid > data/gisaid.ndjson.xz; then
break
else
echo "...FAILED"
rm data/gisaid.ndjson
rm data/gisaid.ndjson.xz
sleep 10
fi
done
if [[ ! -f data/gisaid.ndjson ]]; then
if [[ ! -f data/gisaid.ndjson.xz ]]; then
echo "Failed to fetch"
exit 1
fi
if [[ "$branch" == master ]]; then
./bin/notify-on-record-change data/gisaid.ndjson "$S3_SRC/gisaid.ndjson.xz" "GISAID"
./bin/notify-on-record-change data/gisaid.ndjson.xz "$S3_SRC/gisaid.ndjson.xz" "GISAID"
fi
./bin/upload-to-s3 --quiet data/gisaid.ndjson "$S3_DST/gisaid.ndjson.xz"
./bin/upload-to-s3 --quiet data/gisaid.ndjson.xz "$S3_DST/gisaid.ndjson.xz"
else
./bin/download-from-s3 "$S3_DST/gisaid.ndjson.xz" "data/gisaid.ndjson"
./bin/download-from-s3 "$S3_DST/gisaid.ndjson.xz" "data/gisaid.ndjson.xz"
fi

flagged_annotations="$(mktemp -t flagged-annotations-XXXXXX)"
trap "rm -f '$flagged_annotations'" EXIT
./bin/transform-gisaid data/gisaid.ndjson \
./bin/transform-gisaid data/gisaid.ndjson.xz \
--output-metadata data/gisaid/metadata.tsv \
--output-fasta data/gisaid/sequences.fasta \
--output-unix-newline > "$flagged_annotations"

# Remove gisaid.ndjson to save disk space.
rm data/gisaid.ndjson
# Remove gisaid.ndjson.xz to save disk space.
rm data/gisaid.ndjson.xz

# Download old clades
./bin/download-from-s3 "$S3_DST/nextclade.tsv.gz" "data/gisaid/nextclade_old.tsv" || \
Expand Down

0 comments on commit fcba89c

Please sign in to comment.