From fa51cecb937ef70c501cb5060393cdc80e25877c Mon Sep 17 00:00:00 2001 From: "Julien \"uj\" Abadji" Date: Thu, 2 Feb 2023 10:43:45 +0100 Subject: [PATCH] Update generation-jeanzay.md --- docs/generation-jeanzay.md | 27 ++++++++++++++++++++++++--- 1 file changed, 24 insertions(+), 3 deletions(-) diff --git a/docs/generation-jeanzay.md b/docs/generation-jeanzay.md index d9e7b68..f4771d7 100644 --- a/docs/generation-jeanzay.md +++ b/docs/generation-jeanzay.md @@ -219,7 +219,28 @@ $OSCAR_TOOLS_BIN v2 compress $CORPUS $DST This step took around 2 hours, going from 12TB to 3.3TB -## Packaging +## Checksuming -checksum + move into folders -TODO +The last step is to create `checksum` files for each language, so that people can check that their downloads have been successful. +Also, it acts as a split list for [download-oscar](https://pypi.org/project/download-oscar/). + +```bash +#! /bin/bash + +#SBATCH --partition=prepost +#SBATCH --job-name=compress_oscar # create a short name for your job +#SBATCH --mail-type=BEGIN,END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL) +#SBATCH --mail-user= # Where to send mail +#SBATCH --nodes="1" #Combien de nœuds +#SBATCH --ntasks-per-node="1" # Une tâche par GPU +#SBATCH --cpus-per-task="48" # nombre de coeurs à réserver par tâche +#SBATCH --time="20:00:00" # temps d'exécution maximum demande (HH:MM:SS) +#SBATCH -A @cpu + +export OSCAR_TOOLS_BIN= +export CORPUS= + +$OSCAR_TOOLS_BIN v2 checksum $CORPUS +``` + +The process took around 2 hours.