Skip to content

ckuijjer/expatcinema.com

Repository files navigation

Expat Cinema

Expat Cinema shows foreign movies with english subtitles that are screened in cinemas in the Netherlands. It can be found at https://expatcinema.com.

Deploy Cloud

Deploy Prod

A GitHub Action is used to deploy to AWS. The action is triggered by a push to the main branch.

The .env file from cloud/ is only used when running it locally, when deploying using CI/CD the environment variables are set in the GitHub Secrets and Variables > Actions > Repository Secrets. The .env file is not checked into git, so it won't be available in the CI/CD environment.

Deploy Dev

It's possible to create a dev stage, by locally running e.g.

pnpm run synth  # synthesize the cdk stack for dev
pnpm run watch  # watch for changes, deploy to dev
pnpm run deploy # deploy to dev

Scrapers

Scheduled Prod

The scrapers run on a daily schedule defined in the cdk stack in cloud/lib/backend-stack.ts.

Manual Prod

  • cd cloud; pnpm run scrapers:prod to run the scrapers on the prod stage, see output/expatcinema-prod-scrapers.json for the output of the scrapers.

Manual Dev

  • cd cloud; pnpm run scrapers to run the scrapers on the dev stage, see output/expatcinema-dev-scrapers.json for the output of the scrapers.

If you want to run it on only a few scrapers, you can use the SCRAPERS environment variable in .env to specify which scrapers to run. After making changes, pnpm run deploy and pnpm run scrapers.

Or as cdk watch doesn't trigger on .env file changes, when running pnpm run watch trigger a deploy by making a change in a .ts file, and afterwards run pnpm run scrapers

pnpm run config:scraper can be used to get the lambda function configuration for the scrapers.

Deploy Web

Scheduled Prod

The web is deployed on a daily schedule using GitHub Actions. The schedule is defined in .github/workflows/web.yml. The schedule is needed to have the SSG (static site generator) get the latest data from the scrapers.

GitHub actions is used, web/ uses JamesIves/github-pages-deploy-action to deploy to the gh-pages branch, and the GitHub settings has Pages take the source branch gh-pages which triggers the GitHub built in pages-build-deployment

Manual

Easiest is to bump the version in web/package.json and push to master. This will trigger a GitHub Action that will deploy the web app to GitHub Pages. Note there's only a prod stage for the web app.

Running scrapers locally

Multiple scrapers

Note: Currently broken

pnpm run scrapers:local

Stores the output in cloud/output instead of S3 buckets and DynamoDB

Use SCRAPERS environment variable in .env.local to define a comma separated list of scrapers to locally run and diverge from the default set of scrapers in scrapers/index.js

Single scraper

And to call a single scraper, e.g. LOG_LEVEL=debug pnpm tsx scrapers/kinorotterdam.ts and then have e.g.

if (require.main === module) {
  extractFromMoviePage(
    'https://kinorotterdam.nl/films/cameron-on-film-aliens-1986/',
  ).then(console.log)
}

with the LOG_LEVEL=debug used to have debug output from the scrapers show up in the console

Quick local backup

Backup

Creates a backup of the S3 buckets and DynamoDB tables

cd backup/
export STAGE=prod
aws s3 sync s3://expatcinema-scrapers-output-$STAGE expatcinema-scrapers-output-$STAGE --profile casper
aws s3 sync s3://expatcinema-public-$STAGE expatcinema-public-$STAGE --profile casper
aws dynamodb scan --table-name expatcinema-scrapers-analytics-$STAGE --profile casper > expatcinema-scrapers-analytics-$STAGE.json
aws dynamodb scan --table-name expatcinema-scrapers-movie-metadata-$STAGE --profile casper > expatcinema-scrapers-movie-metadata-$STAGE.json

For the DynamoDB tables, it might be better to use the Export to S3 functionality in the AWS Console, as these can be imported using aws dynamodb import-table

To convert the DynamoDB JSON format to a more readable format, you can use the following command:

cd backup/
export STAGE=prod
jq -c '.Items[] |
  def dynamodb_to_json:
    if type == "object" then
      if has("S") then .S
      elif has("N") then (.N | tonumber)
      elif has("BOOL") then .BOOL
      elif has("NULL") then null
      elif has("L") then [.L[] | dynamodb_to_json]
      elif has("M") then .M | with_entries(.value |= dynamodb_to_json)
      else .
      end
    else .
    end;
  with_entries(.value |= dynamodb_to_json)
' expatcinema-scrapers-analytics-$STAGE.json > expatcinema-scrapers-analytics-$STAGE-converted.json

jq -c '.Items[] |
  def dynamodb_to_json:
    if type == "object" then
      if has("S") then .S
      elif has("N") then (.N | tonumber)
      elif has("BOOL") then .BOOL
      elif has("NULL") then null
      elif has("L") then [.L[] | dynamodb_to_json]
      elif has("M") then .M | with_entries(.value |= dynamodb_to_json)
      else .
      end
    else .
    end;
  with_entries(.value |= dynamodb_to_json)
' expatcinema-scrapers-movie-metadata-$STAGE.json > expatcinema-scrapers-movie-metadata-$STAGE-converted.json

Restore

The S3 buckets can be restored by running the following commands

cd backup/
export STAGE=prod
aws s3 sync expatcinema-scrapers-output-$STAGE s3://expatcinema-scrapers-output-$STAGE --profile casper
aws s3 sync expatcinema-public-$STAGE s3://expatcinema-public-$STAGE --profile casper

The DynamoDB tables can be restored by running the following commands. Note that this doesn't batch, it just puts the items back one by one, which might be slow for large tables.

cd backup/
export STAGE=prod

jq -c '.Items[]' expatcinema-scrapers-analytics-$STAGE.json | while read -r item; do
  aws dynamodb put-item \
    --table-name expatcinema-scrapers-analytics-$STAGE \
    --item "$item" \
    --profile casper
done

jq -c '.Items[]' expatcinema-scrapers-movie-metadata-$STAGE.json | while read -r item; do
  aws dynamodb put-item \
    --table-name expatcinema-scrapers-movie-metadata-$STAGE \
    --item "$item" \
    --profile casper
done

Favicon

Chromium

Some scrapers need to run in a real browser, for which we use puppeteer and a lambda layer with Chromium.

Upgrading puppeteer and chromium

pnpm add puppeteer-core@22.6.3 @sparticuz/chromium@^123.0.1
pnpm add -D puppeteer@22.6.3

After installing the new version of puppeteer and chromium update the lambda layer in the cdk stack, by doing a search and replace on arn:aws:lambda:eu-west-1:764866452798:layer:chrome-aws-lambda: and change e.g. 44 to 45

Installing Chromium for use by puppeteer-core locally

Run the following command to install Chromium locally:

pnpm run install-chromium

To see if it's correctly installed, open it with pnpm run open-chromium

or see https://github.com/Sparticuz/chromium#running-locally--headlessheadful-mode for how

Troubleshooting

When running a puppeteer based scraper locally, e.g. AWS_PROFILE=casper pnpm tsx scrapers/ketelhuis.ts and getting an error like

Error: Failed to launch the browser process! spawn /tmp/localChromium/chromium/mac_arm-1205129/chrome-mac/Chromium.app/Contents/MacOS/Chromium ENOENT

you need to install Chromium locally, run pnpm run install-chromium which installs Chromium locally and then updates the LOCAL_CHROMIUM_EXECUTABLE_PATH in browser-local-constants.ts to point to the Chromium executable. See https://github.com/Sparticuz/chromium#running-locally--headlessheadful-mode for more information about how to install a locally running chromium.

To see if it's correctly installed, open it with pnpm run open-chromium

About

Expat Cinema - Foreign movies with English subtitles

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages