-
Notifications
You must be signed in to change notification settings - Fork 5
Contributing Guide
We'd love to accept your patches and contributions to this project. There are just a few small guidelines you need to follow.
All submissions, including submissions by project members, require review. We use GitHub pull requests for this purpose. Consult GitHub Help for more information on using pull requests.
Before opening a pull request to suggest a feature change, please open a Github Issue to discuss the use-case and feature proposal with the project maintainers. After this aligment, you can fork the project, improve it and open a pull request!
Debussy is currently beign developed on native Ubuntu
/Debian
linux distributions or through Windows Subsystem for Linux (WSL). It may not work properly on other OS such as MacOS
, Windows
/cygwin
, or CentOS
/Fedora
/FreeBSD
, etc.
Visual Studio Code (VS Code) or another IDE that supports Python (e.g. Pycharm).
If you're on Windows, use VS Code with Remote development in WSL).
WARNING: The option "Git: Rebase When Sync" must be active in File > Preferences > Settings.
If you are on Windows, install WSL 2 and Ubuntu, according to Microsoft's tutorial. We can say that WSL 2 has almost full access to your machine's resources. It has access by default:
- All hard drive.
- Making full use of processing resources.
- Using 80% of available RAM.
- Using 25% of available memory for SWAP.
This might not be interesting, as WSL 2 can use almost every resource on your machine, but we can set limits.
Create a file called .wslconfig
in the root of your user folder (e.g. C:\Users\<your_user>
) and configure these settings:
[wsl2]
memory=8GB
processors=4
swap=2GB
These are example limits and the most basic settings to be used, configure them to your availabilities. For more details see wsl-2-settings.
To apply these settings it is necessary to restart the Linux distributions, so we suggest running the command in PowerShell: wsl --shutdown
(This command will shut down all active WSL 2 instances and just open the terminal again to use it with the new settings).
- Cloud SDK: in case of Windows, install directly from WSL and use the command
gcloud init --console-only
at startup. (Guide)
With this configuration, it is not necessary to use a service account locally, authentication is done by your GCP user.
- (Optional) For specific applications: ask to create a GCP service account and JSON access key, and create the
GOOGLE_APPLICATION_CREDENTIALS
environment variable.
# create the environment variable:
export GOOGLE_APPLICATION_CREDENTIALS=path/to/credentials.json
# displays the credential path:
echo $GOOGLE_APPLICATION_CREDENTIALS
# displays the contents of the credential:
cat $GOOGLE_APPLICATION_CREDENTIALS
Include it in the .bashrc
file and reload it with source
, so that the environment variable created is available in new sessions.
-
Python: comes by default on Linux, the same goes for Ubuntu distros on WSL. We currently use Python 3.8.x or 3.9.x. (Download)
To check the version of Python installed, run the command
python3 --version
in the terminal. -
Create virtual Python environment. To do this, in the chosen directory, run the following commands:
# Create the virtual environment:
python -m venv .debussy-env
# Activate the virtual environment:
source .debussy-env/bin/activate
- Alternatively, use virtualenvwrapper to manage your virtual environments:
# (Optional) Check the pip package manager version, and install if necessary:
pip3 --version
sudo apt install python3-pip
# Install virtualenv and virtualenvwrapper:
sudo pip install virtualenv virtualenvwrapper
# virtualenvwrapper configuration:
export WORKON_HOME=~/workspace/.virtualenvs
mkdir -p $WORKON_HOME
source /usr/local/bin/virtualenvwrapper.sh
# Creating the virtual environment:
mkvirtualenv debussy-env
# (Optional) If the virtual environment is not selected automatically:
workon debussy-env
# Add in startup (vim ~/.bashrc) the following commands:
VIRTUALENVWRAPPER_PYTHON=$(which python3)
export WORKON_HOME=~/workspace/.virtualenvs
source /usr/local/bin/virtualenvwrapper.sh
# Reload the .bashrc file:
source ~/.bashrc
- To install Debussy, open a command or terminal window and:
- Clone the Debussy Concert repo
- cd into the root directory, where
setup.py
is located - Enter:
python setup.py install
If you're on Windows, we recommend using Docker Engine directly through your Ubuntu distro (Native Docker).
Install the pre-requisites:
sudo apt update && sudo apt upgrade
sudo apt remove docker docker-engine docker.io containerd runc
sudo apt-get install \
apt-transport-https \
ca-certificates \
curl \
gnupg \
lsb-release
Add the Docker repository to the Ubuntu sources list:
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo \
"deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
Install Docker Engine:
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io
Give permission to run Docker with your current user:
sudo usermod -aG docker $USER
Install Docker Compose:
sudo curl -L "https://github.com/docker/compose/releases/download/1.29.1/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose
sudo ln -s /usr/local/bin/docker-compose /usr/bin/docker-compose
Start the Docker service:
sudo service docker start
The above command will have to be run every time Ubuntu is restarted. If the Docker service is not running, it will show this error message:
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
The above error is usually related to permissions, to fix it, run the following command:
sudo chmod 666 /var/run/docker.sock
(Optional) In Windows 11 it is possible to specify a default command to be executed whenever WSL is started, this allows us to put the docker service to start automatically. See WSL boot settings.
For development or trying out Debussy, we recommend using Astro. Follow their official guide to install astro-cli.
With the CLI installed, follow the getting started to create a project. Once you have a Astro project configured, you'll need to configure some files.
First, you'll need to mount the path to the Debussy Concert repository through the docker-compose.override.yml
file, according Astro's docs. The file will look like this:
version: "2"
services:
scheduler:
volumes:
- /home/user/workspace/debussy_concert/debussy_concert:/usr/local/lib/python3.9/site-packages/debussy_concert
- /home/user/workspace/debussy_concert/examples:/usr/local/airflow/dags/examples
- /home/user/workspace/secrets/debussy-develop.json:/auth/debussy-develop.json
- /home/user/workspace/environment/environment.yaml:/usr/local/airflow/dags/environment.yaml
environment:
- GOOGLE_APPLICATION_CREDENTIALS=/auth/debussy.json
- GCP_PROJECT=gcp-project-id
- DEBUSSY_CONCERT__DAGS_FOLDER=/usr/local/airflow/dags
Then, you need to update the packages.txt
file:
gcc
g++
unixodbc-dev
Finaly, you need to update the requirements.txt
file with the dependencies:
mysql-connector-python==8.0.24
pymssql==2.1.5
#pyodbc==4.0.32
google-cloud-datacatalog==3.0.0
google-cloud-datastore==1.11.0
google-cloud-bigquery==2.13.1
google-cloud-pubsub==2.6.1
google-cloud-secret-manager==2.4.0
google-cloud-storage==1.38.0
Inject==4.3.1
yaml-env-var-parser
- make: https://www.gnu.org/software/make/manual/make.html
- Python 3.8.x or 3.9.x: https://wiki.python.org/moin/BeginnersGuide/Download
- gcloud SDK (for interacting with GCP): https://cloud.google.com/sdk/docs/install
- Apache Airflow: https://airflow.apache.org/