Welcome to my go_play GitHub repository, where I showcase my work in data engineering, data science, and machine learning. This repository serves as my ledger for my skills, knowledge, and passion for transforming data into actionable insights and innovative AI solutions. Arm yourself with information!
Each project in this repository is contained in its own directory.
In the scripts and notebooks directories, you'll find various Python scripts and Jupyter notebooks that I've written to perform data analysis, machine learning model training, and other data science tasks.
To use the scripts and notebooks in this repository:
-
Clone the repository:
git clone https://github.com/sarahgetter/go_play.git
-
Navigate to the desired project or script directory.
-
Follow the individual READMEs in each project or script directory for specific instructions on how to run and use them.
This repository includes work done with the following technologies:
- Python
- SQL
- TensorFlow
- PyTorch
- Pandas
- NumPy
- Scikit-learn
- Airflow
- DBT
- Terraform
- Kubernetes
I am a data science and machine learning enthusiast with a passion for turning complex datasets into meaningful stories.
While this is a personal portfolio, I am open to collaborations and contributions that can enhance the projects. Feel free to fork the repository and submit pull requests.
Build the Docker Image
Navigate to the directory containing the Dockerfile.
Run the following command to build the Docker image. You can replace data-science-env with your preferred image name:
docker build -t data-science-env .
Once the image is built, you can start a container using the following command:
docker run -p 8888:8888 data-science-env
This command maps the container's port 8888 to port 8888 on your host machine, allowing you to access Jupyter Notebook through your browser.
Open your web browser and navigate to: http://localhost:8888.
You might need a token to log in, which can be found in the terminal output where you started the Docker container.
Customization: You may want to customize the environment further by adding more packages or configuring Jupyter Notebook settings.
Data Persistence: Consider adding a volume to your Docker run command if you want to persist notebooks or data files outside the container.
docker run -p 8888:8888 -v $(pwd):/usr/src/app data-science-env
Docker Hub: You could also push the built image to a container registry like Docker Hub for easy access and sharing.
This setup provides a basic yet powerful data science environment, suitable for a variety of tasks in machine learning and data analysis.