Skip to content

Commit 71b3d7d

Browse files
authored
Merge pull request #36 from bendsouza2/documentation/uploader-readme
Documentation/uploader readme
2 parents 0a1ed07 + 43f0d5e commit 71b3d7d

File tree

2 files changed

+117
-23
lines changed

2 files changed

+117
-23
lines changed

README.md

Lines changed: 57 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,61 @@
11
# YT-Translator
22

33
### Project Overview
4-
The aim of this project is to provide language learning resources. The project was developed with Spanish as the target language to learn and English as the base language. However this can easily be changed in the project configuration to provide language learning resources for other languages. See project setup for details.
5-
6-
The project is currently still in development and is not yet in mainenance mode.
7-
8-
### Setup
9-
1. Install the requirements to your venv `pip install -r requirements.txt`
10-
2. To be able to import enchant in python we'll need to install the Enchant C library (on Mac)
11-
* Run `brew install enchant`
12-
* Configure the environment variables by opening the shell config file (~/.bashrc or ~/.zshrc) and adding the lines:
13-
* `export DYLD_LIBRARY_PATH="<PATH_TO_ENCHANT_INSTALL>:$DYLD_LIBRARY_PATH"`
14-
* `export ENCHANT_LIBRARY_PATH="<PATH_TO_ENCHANT_INSTALL>"`
15-
3. Configure other environment variables:
16-
* `OPENAI_API_KEY = <YOUR_API_KEY>`
17-
* `AWS_PUBLIC_KEY = <YOUR_KEY>`
18-
* This is the public key for the IAM role.
19-
* `AWS_SECRET_KEY = <YOUR_KEY>`
20-
* This is the secret key for the IAM role.
21-
* `YOUTUBE_CREDENTIALS = <OAUTH_CREDS>`
22-
* This is the OAUTH creds stored in JSON format for accessing the associated YouTube account
23-
4. Run `npm install` to install the Node.js dependencies (echogarden)
24-
5. In the constants.py file:
25-
* Set the LANGUAGE_TO_LEARN variable to the language you want to publish language learning videos for: e.g. `LANGUAGE_TO_LEARN = "es`
26-
* Set the NATIVE_LANGUAGE variable to the language which should be used as a base language to learn the secondary language from: e.g. `NATIVE_LANGUAGE = "en"`
4+
This project aims to provide free and accessible language learning resources in the form of video content created leveraging LLMs. THe project was created with Spanish as the target language to learn, and English as the base language. However, this can easily be customised to provide language learning resources for other languages. I have written (detailed documentation)[https://github.com/bendsouza2/yt-translator/tree/main/python/README.md] on the video creation side of the project, which you can (view here)[https://github.com/bendsouza2/yt-translator/tree/main/python/README.md] if you want to clone the project for your own use case.
275

6+
The video creation element of the project has been completed as above.
7+
8+
The web backend has been completed and the frontend is at the deployment stage.
9+
10+
You can (view examples of the video output here.)[https://www.youtube.com/channel/UCQjyvCIR9IkG02Q0Wmpz9sQ] New videos are uploaded every day at 12pm UTC.
11+
12+
13+
## Developer Customisation - Video Uploads
14+
15+
The project defaults to creating Spanish language learning content but has been designed to be easily customisable for other languages.
16+
17+
To customise the project and deploy the video creation capabilities, complete the following steps:
18+
19+
1. Clone the repository
20+
21+
2. In the constants.py file:
22+
- Set the LANGUAGE_TO_LEARN variable to the language you want to publish language learning videos for: e.g. `LANGUAGE_TO_LEARN = "es"`
23+
- Set the NATIVE_LANGUAGE variable to the language which should be used as a base language to learn the secondary language from: e.g. `NATIVE_LANGUAGE = "en"`
24+
25+
3. You'll need to authorise the uploader app. You can do this by running the yt_authenticator.py script, which will take you to an oauth consent screen. Follow the prompts in your browser and click 'Allow all'. The oauth creds should automatically be written to your local directory and saved as an environment variable.
26+
27+
28+
4. Build the docker image, in the base directory run:
29+
- `docker build -t <IMAGE_NAME> -f python/Dockerfile .`
30+
31+
5. Push the docker image to ECR. Assuming you have [configured your AWS CLI profile](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-quickstart.html) and setup the ECR repo, run:
32+
- `docker tag <REPO_NAME/IMAGE_NAME> <LINK_TO_REPO/IMAGE_NAME>`
33+
- `docker push <LINK_TO_REPO/IMAGE_NAME>`
34+
35+
6. Create a new Lambda function using the image hosted in ECR.
36+
- Note that the image is built for arm64 architecture, so make sure to specify that in the Lambda configuration
37+
- The default timeout on Lambda is 3 seconds. From my tests the function takes about a minute to run, so you'll need to increase the timeout limit.
38+
39+
7. Make sure the below environment variables have been added to the Lambda function:
40+
- `DB_HOST` = The RDS endpoint for the database
41+
- `DB_USER` = The username with read/write access
42+
- `DB_PASSWORD` = The password for the username
43+
- `DB_NAME` = The name of the database
44+
- `YT_CREDENTIALS` = The oauth credentials, including refresh token and access token
45+
46+
8. Assign an IAM role to the Lambda function which has the following permissions attached:
47+
- AmazonS3FullAccess
48+
- AmazonRDSFullAccess
49+
- AWSLambdaBasicExecutionRole
50+
51+
9. Assuming your RDS instance is within a VPC your Lambda function needs to be within the same VPC to access it (Lambda functions don't have a static IP, so you can't just add an inbound rule allowing access from the Lambda function's IP).
52+
- This presents another problem as the Lambda function won't be able to make API calls from within a VPC. (See the docs)[https://repost.aws/knowledge-center/internet-access-lambda-function] on how to allow the Lambda function to make API calls from within a VPC
53+
54+
10. Setup a trigger for the Lambda function. Mine is just running a CRON job using EventBridge.
55+
56+
57+
## Potential Problems
58+
Some problems I encountered in setup or that you might encounter if working with the project for the first time:
59+
60+
* I'm using enchant to verify that the 'word of the day' is real. The dockerfile handles the install of enchant, but if you're working with a new language, the dictionary for that language may not be pre-installed. You can find a list of (available language dictionaries here.)[https://cgit.freedesktop.org/libreoffice/dictionaries/tree/] If you need to install a new dictionary, just add a line to the docker file:
61+
- `curl -o <LINK_TO_DICT_FILE>`

python/README.md

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
## Overview
2+
This readme documents how to implement the video upload functionality. For a general project overview, including website setup, see the [main documentation.](https://github.com/bendsouza2/yt-translator)
3+
4+
## Video Creation
5+
Videos are created and uploaded by a Lambda function which runs once a day. This function:
6+
* Interacts with OpenAI (Dalle) to generate images
7+
* Uses LLMs to create a word of the day and associated sentences/definitions for the word
8+
* Combines audio, text and images to generate a video using moviepy
9+
* Uploads the generated content to YouTube
10+
* Logs the video creation and metadata to a MySQL DB
11+
12+
## Developer Customisation
13+
14+
The project defaults to creating Spanish language learning content but has been designed to be easily customisable for other languages.
15+
16+
To customise the project and deploy the video creation capabilities, complete the following steps:
17+
18+
1. Clone the repository
19+
20+
2. In the constants.py file:
21+
- Set the LANGUAGE_TO_LEARN variable to the language you want to publish language learning videos for: e.g. `LANGUAGE_TO_LEARN = "es"`
22+
- Set the NATIVE_LANGUAGE variable to the language which should be used as a base language to learn the secondary language from: e.g. `NATIVE_LANGUAGE = "en"`
23+
24+
3. You'll need to authorise the uploader app. You can do this by running the yt_authenticator.py script, which will take you to an oauth consent screen. Follow the prompts in your browser and click 'Allow all'. The oauth creds should automatically be written to your local directory and saved as an environment variable.
25+
26+
27+
4. Build the docker image, in the base directory run:
28+
- `docker build -t <IMAGE_NAME> -f python/Dockerfile .`
29+
30+
5. Push the docker image to ECR. Assuming you have [configured your AWS CLI profile](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-quickstart.html) and setup the ECR repo, run:
31+
- `docker tag <REPO_NAME/IMAGE_NAME> <LINK_TO_REPO/IMAGE_NAME>`
32+
- `docker push <LINK_TO_REPO/IMAGE_NAME>`
33+
34+
6. Create a new Lambda function using the image hosted in ECR.
35+
- Note that the image is built for arm64 architecture, so make sure to specify that in the Lambda configuration
36+
- The default timeout on Lambda is 3 seconds. From my tests the function takes about a minute to run, so you'll need to increase the timeout limit.
37+
38+
7. Make sure the below environment variables have been added to the Lambda function:
39+
- `DB_HOST` = The RDS endpoint for the database
40+
- `DB_USER` = The username with read/write access
41+
- `DB_PASSWORD` = The password for the username
42+
- `DB_NAME` = The name of the database
43+
- `YT_CREDENTIALS` = The oauth credentials, including refresh token and access token
44+
45+
8. Assign an IAM role to the Lambda function which has the following permissions attached:
46+
- AmazonS3FullAccess
47+
- AmazonRDSFullAccess
48+
- AWSLambdaBasicExecutionRole
49+
50+
9. Assuming your RDS instance is within a VPC your Lambda function needs to be within the same VPC to access it (Lambda functions don't have a static IP, so you can't just add an inbound rule allowing access from the Lambda function's IP).
51+
- This presents another problem as the Lambda function won't be able to make API calls from within a VPC. (See the docs)[https://repost.aws/knowledge-center/internet-access-lambda-function] on how to allow the Lambda function to make API calls from within a VPC
52+
53+
10. Setup a trigger for the Lambda function. Mine is just running a CRON job using EventBridge.
54+
55+
56+
## Potential Problems
57+
Some problems I encountered in setup or that you might encounter if working with the project for the first time:
58+
59+
* I'm using enchant to verify that the 'word of the day' is real. The dockerfile handles the install of enchant, but if you're working with a new language, the dictionary for that language may not be pre-installed. You can find a list of (available language dictionaries here.)[https://cgit.freedesktop.org/libreoffice/dictionaries/tree/] If you need to install a new dictionary, just add a line to the docker file:
60+
- `curl -o <LINK_TO_DICT_FILE>`

0 commit comments

Comments
 (0)