The CUDL viewer provides a frontend display for the Cambridge Digital Library, allowing you to view books, manuscripts and other content with a full zoomable viewer and detailed metadata display.
It relies on data converted from standard TEI into a JSON format for display. This is done by Cambridge using XSLT transformations. We have some example data for you to have a look and get started.
The viewer uses Java Spring and maven, you can see more instructions on getting the dependencies setup in our github documentation. To build and run this you will need to be able to access public packages on GitHub using maven - full instructions to do this are on this page.
Once you have setup the required dependencies you are ready to build and run the CUDL viewer.
First make sure you have a copy of this repository using git:
git clone git@github.com:cambridge-collection/cudl-viewer.git
The sample data is linked as a git submodule so we need to initalise it and download the data. Do this with the following commands:
git submodule init
git submodule update
Check the git data submodule are present: dl-data-samples should be at the directory:
data/dl-data-samples
To build the application into a WAR packaged file, to run locally run:
mvn clean package
The war file will be created under target/
.
To run the viewer:
docker-compose --env-file sample-data.env up
When running you can then access the Viewer at http://localhost:8888/.
You may want to take a look at the data under out data
directory, you can drag JPG or TIFF images into the folder
data/dl-data-samples/iiif-images/dropzone
while the application is running to automatically convert them to JP2 for
zoomable images via IIIF. You can then reference them in any of the json file at
data/dl-data-samples/processed-data/cudl-data/json
by editing the thumbnailImageURL
and IIIFImageURL
properties.
NOTE: You will need to restart the application to pick up your changes.
The main theme, website name, colour and images are configured in the data under the data/dl-data-samples/processed-data/ui
directory.
NOTE: You will need to restart the application to pick up your changes.
The Viewer is configured in the sample-global.properties
file. A template is
available at docker/sample-global.properties
.
Go through the file, updating defaults as desired. Most of the defaults are acceptable to get the Viewer running, but some must be changed:
cudl-viewer-content.html.path
cudl-viewer-content.images.path
These must point to the html/
and images/
directories in a local checkout of
the data you're using (see above for details).
It uses the separate repository cudl-viewer-ui for all the javascript and css dependencies, so this is a good thing to look at if you want to start customising your viewer instance.
All the Javascript, CSS etc used by the viewer comes from cudl-viewer-ui
via a Maven dependency. You can make live changes to the UI Javascript etc
without rebuilding each time by using the UI's devserver.
This requires setting the cudl.ui.dev
property to true
in
cudl-global.properties
. You can also set cudl.ui.dev.baseUrl
if the default
of http://localhost:8080/
is not right for you.
Once enabled, the viewer will link to the UI's devserver instead of serving Javascript/CSS from the dependency JARs.
See the cudl-viewer-ui's README for instructions on running the UI devserver.
Download a sample of the cudl data from s3 at e.g. s3://staging-cul-cudl-data-releases or from Bitbucket. This should be placed in a separate directory at the same level as the cudl-data-viewer.
cd ..
git clone git@bitbucket.org:CUDL/cudl-data-releases.git cudl-data-releases
To run the viewer:
docker-compose --env-file cudl-data.env up
This configuration uses the properties file at ./docker/cudl-global.properties
instead of the sample-global.properties
file, so any tweaks to the config when running locally should be made to this file.
When deployed, the Viewer requires cudl-global.properties
to exist on the
classpath.
This file will be excluded from any WAR file generated as it contains the properties
that vary between systems (DEV, BETA, LIVE etc). This file should be copied into the
classpath for your web container (e.g. lib
directory in Tomcat).
You can run the following command to manually create the docker image and follow the instructions at the ECR repository on aws to manually upload it.
docker build -t $REPOSITORY_URI_VIEWER:latest -f docker/ui/Dockerfile .
e.g.
docker build -t 563181399728.dkr.ecr.eu-west-1.amazonaws.com/sandbox-cudl-viewer:latest -f docker/ui/Dockerfile .
Then push to the sandbox ECR using e.g.
aws ecr get-login-password --region eu-west-1 | docker login --username AWS --password-stdin 563181399728.dkr.ecr.eu-west-1.amazonaws.com
docker push 563181399728.dkr.ecr.eu-west-1.amazonaws.com/sandbox-cudl-viewer:latest
and to the cul-cudl ECR using
docker tag 563181399728.dkr.ecr.eu-west-1.amazonaws.com/sandbox-cudl-viewer:latest 438117829123.dkr.ecr.eu-west-1.amazonaws.com/cudl/viewer:latest
aws ecr get-login-password --region eu-west-1 | docker login --username AWS --password-stdin 438117829123.dkr.ecr.eu-west-1.amazonaws.com
docker push 438117829123.dkr.ecr.eu-west-1.amazonaws.com/cudl/viewer:latest
The Live site is deployed to AWS ECS. This uses an Apache Tomcat container image to run the WAR file. To deploy a new version of the Viewer to ECS, a new Docker Image will need to be built. Currently images for the Live site are built using AWS CodeBuild.
Infrastructure supporting the CodeBuild project can be built using the Terraform code in the terraform/
subdirectory. This code will create the CodeBuild project, ECR Repository, SSM Parameters and IAM permissions. The CodeBuild project is configured to build the project https://github.com/cambridge-collection/cudl-viewer.git and the main
branch.
Commands to run the Terraform from this directory are:
terraform init
terraform plan
terraform apply
CodeBuild will automatically detect the buildspec.yml
file contained at the root of this project. This describes the steps that CodeBuild will run as part of a build. Currently CodeBuild will build the WAR file using Maven, build the container image using Docker, and push the image to the ECR repository.
The Maven part of the build is dependent on two files that would normally be located in the user's .m2
directory: toolchains.xml
and settings.xml
. The file toolchains.xml
configures the JDK to be used by CodeBuild. The Maven build is dependent on a JAR file obtained from the GitHub repositiory https://github.com/orgs/cambridge-collection/packages?repo_name=cudl-viewer-ui. The file settings.xml
sets up credentials that allow CodeBuild to authenticate with GitHub to download the package required. The SSM Parameters created by Terraform are used to supply values for the GITHUB_USER and GITHUB_TOKEN environment variables associated with the project referred to in settings.xml
. Note that the parameters created by Terraform will likely need to be updated manually to supply actual values. The build has been tested with a classic Personal Access Token with read:packages
scope.
The Docker part of the build refers to the Dockerfile at docker/ui/Dockerfile
.
Each build will need to be triggered manually in the AWS Console. Once complete a new tagged version of the image will be available in ECR.
It may be necessary to pull the image from the ECR repository used by CodeBuild to push the image to another AWS Account for deployment. the new image version will need to be referenced in the image
property of an ECS Container Definition to be deployed to a new ECS task.
To pull the image you can login with docker using the command
aws ecr get-login-password --region $AWS_REGION_A | docker login --username AWS --password-stdin $AWS_ACCOUNT_ID_A.dkr.ecr.$AWS_REGION_A.amazonaws.com
docker pull $AWS_ACCOUNT_ID_A.dkr.ecr.$AWS_REGION_A.amazonaws.com/$ENVIRONMENT_A-cudl-viewer:latest
where AWS_ACCOUNT_ID_A
is the AWS Account ID of the source account, ENVIRONMENT_A
is the name of the environment prefixed to the ECR Repository, and AWS_REGION_A
is the source AWS region.
Then tag the image and push to the alternative ECR repository
docker tag "$AWS_ACCOUNT_ID_A.dkr.ecr.$AWS_REGION_A.amazonaws.com/$ENVIRONMENT_A-cudl-viewer:latest" "$AWS_ACCOUNT_ID_B.dkr.ecr.$AWS_REGION_B.amazonaws.com/cudl/viewer:latest"
aws ecr get-login-password --region $AWS_REGION_B | docker login --username AWS --password-stdin $AWS_ACCOUNT_ID_B.dkr.ecr.$AWS_REGION_B.amazonaws.com
docker push "$AWS_ACCOUNT_ID_B.dkr.ecr.$AWS_REGION_B.amazonaws.com/cudl/viewer:latest"
where AWS_ACCOUNT_ID_B
, ENVIRONMENT_B
and AWS_REGION_B
all refer to the target repository.
As previously noted the WAR file needs to be deployed to Tomcat with a properties file cudl-global.properties
. An example can be seen here. This file also needs to be available to the ECS deployment.
The image is built with an entrypoint.sh
script that reads the value of an environment variable S3_URL
and uses the AWS CLI to copy the file indicated to the path /etc/cudl-viewer/cudl-global.properties
. This will be read by the Viewer application on startup. This can be configured in ECS by setting an environment variable S3_URL
in the container definition and giving the ECS task permission to access the S3 location specified.
We have seen a frequent error during the docker build step e.g.
Sending build context to Docker daemon 387.7MB
Step 1/11 : FROM --platform=linux/amd64 tomcat:9.0.30-jdk11-openjdk
toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit
Command did not exit successfully docker build -t $REPOSITORY_URI_VIEWER:latest -f docker/ui/Dockerfile . exit status 1
Phase complete: BUILD State: FAILED
As can be seen this is due to rate throttling in DockerHub, presumably due to the shared addresses used by CodeBuild to access DockerHub leading to a high volume of requests. We have found the build will succeed after a few reattempts.
Once you have the new image for the viewer in ECR you can get the sha value for this image, and update the value in
terraform.tfvars
in cudl-terraform - https://github.com/cambridge-collection/cudl-terraform repository, and see that
repo for details on how to apply the changes.
Releasing is using the Maven release plugin.
Releases are tagged using the old CARET Ops tag format, which is
<server-class>-<date-yyyymmdd><release-in-day>
. <release-in-day>
is a two
digit number, starting from 00
which increments after each release in one day.
For example, for the 1st release on 21/06/2024
for a production deployment
we'd use production-2024062100
.
Note that the last two digits make tags a pain to auto-generate, so you'll have to manually specify the tag value each time you tag.
After committing and testing all changes, switch to the main
branch and
run:
$ mvn release:prepare
You'll be prompted for the version and tag to use for the release, and the
version to use for the next version. Use a tag in the above format for the first
two. The third must be
1.0-SNAPSHOT
.
After this finishes, you'll have two new commits on your main
branch and
a new tag in your local repo. You need to push all of these to the CUDL repo.
Assuming your CUDL remote is cudl
, and your created tag was
production-2024062100
, you'd run:
$ git push cudl main production-2024062100
Once the release has been tagged you can finish off by deploying the artifacts to the CUDL packages repository in GitHub using the command:
$ mvn release:perform
For more information, see the Documentation.