Skip to content

A project demonstrating the integration of Docker, DuckDB, R, and {targets} for robust and reproducible data pipelines.

License

Notifications You must be signed in to change notification settings

philiporlando/docker-duckdb-r

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Portable Data pipelines with Docker, DuckDB, and R

This project showcases the integration of Docker, DuckDB, and R to create efficient and portable data pipelines. By using Docker, we ensure a consistent environment across different machines, while DuckDB provides fast, in-process SQL analytics. The R programming language, along with the {duckdb}, {dbplyr}, and {targets} packages, is used to orchestrate and run the data processing tasks. The {renv} package is used alongside Docker to manage R package dependencies, ensuring reproducibility.

Getting Started

To get started with this project, clone the repository and navigate to the directory:

git clone https://github.com/philiporlando/docker-duckdb-r.git
cd docker-duckdb-r

Building the Docker Image

Build the Docker image using the following command. This will set up the necessary R environment, install all dependencies, and prepare the DuckDB database for use. The initial build may take a few minutes to complete.

docker build .

Running the Container

To run the Docker container and launch the {targets} pipeline use:

docker run

Repository Structure

  • Dockerfile: Defines the Docker image and specifies how the R environment is built.
  • R/: Contains R scripts with function definitions used by {targets}.
  • _targets.R: The target script file that defines the pipeline. See The {targets} R package user manual for more details.
  • data/: Any source data and the DuckDB database file are stored here.
  • tests/: TBD test suite built around {testthat}.

About

A project demonstrating the integration of Docker, DuckDB, R, and {targets} for robust and reproducible data pipelines.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published