Skip to content
This repository has been archived by the owner on Jun 30, 2021. It is now read-only.

New model upload format #17

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
82 changes: 47 additions & 35 deletions upload-current.sh
Original file line number Diff line number Diff line change
@@ -1,77 +1,89 @@
#!/usr/bin/env bash
#
# Upload trained model s3 in a format compatible with DeepRacer model import functionality.
# Example usage to upload the best model:
# ./upload-current.sh aws-deepracer-XXX model1 -b
#

S3_BUCKET=$1
S3_PREFIX=$2

MODEL_DIR=data/minio/bucket/current/model/

while getopts ":c:" opt; do
echo "Uploading to model ==> s3://$S3_BUCKET/$S3_PREFIX <=="

USE_BEST=false
while getopts ":c:b" opt; do
case $opt in
c) CHECKPOINT="$OPTARG"
;;
\?) echo "Invalid option -$OPTARG" >&2
;;
c) CHECKPOINT="$OPTARG"
;;
b) USE_BEST=true
;;
\?) echo "Invalid option -$OPTARG" >&2
;;
esac
done

CHECKPOINT_FILE=$MODEL_DIR"deepracer_checkpoints.json"
if [ ! -f ${CHECKPOINT_FILE} ]; then
echo "Checkpoint file not found!"
exit 1
else
echo "found checkpoint index file "$CHECKPOINT_FILE
fi;
fi
echo "found checkpoint index file "$CHECKPOINT_FILE

if [ -z "$CHECKPOINT" ]; then
echo "Checkpoint not supplied, checking for latest checkpoint"

LAST_CHECKPOINT=`cat $CHECKPOINT_FILE |jq ".last_checkpoint.name"`
BEST_CHECKPOINT=`cat $CHECKPOINT_FILE |jq ".best_checkpoint.name"`

CHECKPOINT=$LAST_CHECKPOINT

echo "latest checkpoint = "$CHECKPOINT
#echo "Checkpoint not supplied, checking for latest checkpoint"
LAST_CHECKPOINT=`cat $CHECKPOINT_FILE |jq ".last_checkpoint.name" | sed s/\"//g`
mharvan marked this conversation as resolved.
Show resolved Hide resolved
BEST_CHECKPOINT=`cat $CHECKPOINT_FILE |jq ".best_checkpoint.name" | sed s/\"//g`
if $USE_BEST; then
CHECKPOINT=$BEST_CHECKPOINT
echo "Using best checkpoint ==> $CHECKPOINT <=="
else
CHECKPOINT=$LAST_CHECKPOINT
echo "Using latest checkpoint ==> $CHECKPOINT <=="
fi
else
echo "Checkpoint supplied: ["${CHECKPOINT}"]"
fi

MODEL=`echo $CHECKPOINT |sed "s@^[^0-9]*\([0-9]\+\).*@\1@"`
mkdir -p checkpoint
rm -rf checkpoint
cp -a upload-template checkpoint
mkdir -p checkpoint/model
MODEL_FILE=$MODEL_DIR"model_"$MODEL".pb"
METADATA_FILE=$MODEL_DIR"model_metadata.json"


if test ! -f "$MODEL_FILE"; then
echo "$MODEL_FILE doesn't exist"
exit 1
else
cp $MODEL_FILE checkpoint/
fi

if test ! -f "$METADATA_FILE"; then
echo "$METADATA_FILE doesn't exist"
exit 1
else
cp $METADATA_FILE checkpoint/
fi

CHECKPOINT_FILES=`echo $CHECKPOINT* |sed "s/\"//g"`
for i in $( find $MODEL_DIR -type f -name $CHECKPOINT_FILES ); do
cp $i checkpoint/
cp -v $MODEL_FILE checkpoint/model/
cp -v $METADATA_FILE checkpoint/model/

CHECKPOINT_FILES=$MODEL_DIR/${CHECKPOINT}*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems tha according to the spec we need current checkpoint and previous one (so 2 checkpoint sets) - so the previous checkpoint should be copied or simply all checkpoints (there are two sets per last and best anyways), i.e. CHECKPOINT_FILES=$MODEL_DIR/*.ckpt.*

Copy link
Author

@mharvan mharvan Aug 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works for race submission with a single checkpoint. Does something break without 2 checkpoints?

If we really need multiple checkpoints then we should upload all checkpoints.
Uploading all checkpoints takes longer, so I would only upload them all if this is really needed.

#for i in $( find $MODEL_DIR -type f -name ${CHECKPOINT}\* ); do
for i in $CHECKPOINT_FILES
do
cp -v $i checkpoint/model/
done

VAR1=`cat $CHECKPOINT_FILE |jq ".last_checkpoint = .best_checkpoint"`
VAR2=`echo $VAR1 |jq ".last_checkpoint.name = $CHECKPOINT"`
VAR3=`echo $VAR2 |jq ".best_checkpoint.name = $CHECKPOINT"`
echo $VAR3 >checkpoint/deepracer_checkpoints.json
echo $CHECKPOINT > checkpoint/model/.coach_checkpoint
# File deepracer_checkpoints.json is optional.

# upload files to s3
for filename in checkpoint/*; do
aws s3 cp $filename s3://$S3_BUCKET/$S3_PREFIX/model/
done
# Cleanup upload destination
aws s3 rm --recursive s3://$S3_BUCKET/$S3_PREFIX/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one is removing all other folders which makes AWS unhappy, change it to: aws s3 rm s3://$S3_BUCKET/$S3_PREFIX/model --recursive

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to include the directory upload-template. So some files were missing and that was causing issues.

I would always clean up the whole prefix to ensure that only newly uploaded files are present. Otherwise, unknown old files could be affecting the import and your model.


tar -czvf ${CHECKPOINT}-checkpoint.tar.gz checkpoint/*
# Upload files to s3
aws s3 sync checkpoint/ s3://$S3_BUCKET/$S3_PREFIX/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here it should be just: aws s3 sync checkpoint/model s3://$S3_BUCKET/$S3_PREFIX/model

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal is to always upload a complete model with all required files, not just the model files. That includes also reward_function.py and ip/hyperparameters.json.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tar is needed to keep an archive of what was uploaded. Once a new best model is found, sagemaker deletes the old best model so you would no longer have a copy.


# Backup checkpoint
tar -czvf ${CHECKPOINT}.tar.gz checkpoint
rm -rf checkpoint
echo 'done uploading model!'

echo 'done uploading model!'