strava-datastack

Overview

WIP project to visualize strava data using open source tooling

🏃🚴 🤝 📊

🧰 Tools planned:

Orchestration → dagster
Warehouse → DuckDB
Extract & Load → dlt
Transform → dbt
BI → evidence

👋 Find me on

Setup

Install required dependencies:

uv sync

uv is being used as the preferred package manager for this project; if you are unfamiliar with uv:

uv sync = poetry install
uv run my_script.py = poetry run my_script.py = source .venv/bin/activate && python my_script.py
uv run dlt --help = source .venv/bin/activate && dlt --help

Strava Auth

Create your own Strava API Application

This is necessary to generate the credentials (e.g. client_id, client_secret, etc.) you will supply to the strava resource in strava.py. Don't worry, creating an application is just a few clicks.

(full details on the process of creating an app are supplied by strava here)

Review App

Once your app has been created, it will come with a refresh token and an access token. For the purposes of this setup (i.e. creating an automated dlt pipeline to export your strava data), these are utterly useless.

The initial Access Token expires after 6 hours and will need updating (this is how strava manages access tokens)
The intial Refresh Token is minimally scoped with read access which does not allow viewing of activity data (there is a seperate activity:read scope that must be granted to a token to view activity data)

To get around these issues, we will create a new refresh token with the proper scopes that we can pass dlt.

dlt can then use this refresh token that has already been authorized to fetch an up-to-date access token whenever it runs.

Generating new refresh token with proper scopes:

(these steps were ripped from the Strava developer docs)

Paste the Client ID from your app into this URL where [REPLACE_WITH_YOUR_CLIENT_ID] is, and specify the required scopes where [INSERT_SCOPES] is:
```
https://www.strava.com/oauth/authorize?client_id=[REPLACE_WITH_YOUR_CLIENT_ID]&response_type=code&redirect_uri=http://localhost/exchange_token&approval_prompt=force&scope=[INSERT_SCOPES]
```
note: in order to get full read access to both your public and private data, these are the necessary scopes
```
read_all,activity:read_all,profile:read_all
```
Paste the updated URL into a browser, hit enter, and follow authorization prompts.
After authorizaiton, you should get a "This site can't be reached" error. Copy the authorization code (i.e. the code= value) from the returned URL.

Run a cURL request to generate the refresh token that dlt will use:

curl -X POST https://www.strava.com/oauth/token \
  -F client_id=YOURCLIENTID \
  -F client_secret=YOURCLIENTSECRET \
  -F code=AUTHORIZATIONCODE \
  -F grant_type=authorization_code

If successful, the response will return a json payload with a refresh_token (save this)

`secrets.toml` setup

dlt offers various methods for storing connection credentials;

shown below is an example secrets.toml for the strava connection secrets:

[sources.strava.credentials]
access_token_url = "https://www.strava.com/oauth/token"
client_id = "<your_client_id>"
client_secret = "<your_client_secret>"
refresh_token = "<your_refresh_token>"

Usage

`dlt`

With crendentials defined, the strava datastack dlt pipeline can be run via:

uv run strava.py

By default, this will just load the last 30 days of data.

This is done as Strava limits read requests to 1000/day. Depending on how many years of Strava data you have / number of total activities, you may need to break up your initial historical load across multiple days. It's recommended to start with a few months at first to get a sense for how many requests that is. Fair warning: if your timespan is too large, and you hit the daily request limit, the pipeline will error and you will need to retry loading all that data (modified with a smaller time window) the following day.

Historical loads & backfills

To complete a larger, historical load, or run a backfill, you can pass in a --start-date and --end-date.

Initial load with data starting from January 2024*

uv run strava.py --start-date='2024-01-01'

Backfill data starting from January 2024 up until July 2024

uv run strava.py --start-date='2024-01-01' --end-date='2024-07-01'

*Note: if you have already run the pipeline once, you will have a last_value saved in your state, and you will need to provide an --end-date to temporarily override this (if you don't, your pipeline will think it is already up to date and not properly backfill)

You can check this stored last_value at any time by making use of the -v flag:

uv run dlt pipeline -v strava_datastack info

An example of the relevant output as it relates to last_value:

sources:
{
  "strava": {
    "resources": {
      "activities": {
        "incremental": {
          "start_date": {
            "initial_value": "2024-11-16T00:00:00Z",
            "last_value": "2024-12-15T20:48:45Z",
            "unique_hashes": [
              "k3Za3tV8zvNDwCyexCJQ"
            ]
          }
        }
      }
    }
  }
}

Local state:
first_run: False
_last_extracted_at: 2024-12-16 05:00:59.894989+00:00
_last_extracted_hash: QeJ5iEmeeGmB3zs5XxaM1oRvcARLpgsmifRxmxj84Og=

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.dlt		.dlt
examples		examples
transform		transform
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
strava.py		strava.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

strava-datastack

Table of Contents

Overview

Setup

Strava Auth

Create your own Strava API Application

Review App

Generating new refresh token with proper scopes:

`secrets.toml` setup

Usage

`dlt`

Historical loads & backfills

About

Releases

Packages

Contributors 3

Languages

License

datadisciple/strava-datastack

Folders and files

Latest commit

History

Repository files navigation

strava-datastack

Table of Contents

Overview

Setup

Strava Auth

Create your own Strava API Application

Review App

Generating new refresh token with proper scopes:

secrets.toml setup

Usage

dlt

Historical loads & backfills

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

`secrets.toml` setup

`dlt`

Packages