Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions preparation/xylem/.dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Build artifacts
_build/
deps/

# IDE
.vscode/
.idea/
*.iml

# Test
test/

# Docs
doc/

# Git
.git/
.gitignore

# macOS
.DS_Store

# Temp
tmp/
4 changes: 4 additions & 0 deletions preparation/xylem/.formatter.exs
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Used by "mix format"
[
inputs: ["{mix,.formatter}.exs", "{config,lib,test}/**/*.{ex,exs}"]
]
26 changes: 26 additions & 0 deletions preparation/xylem/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# The directory Mix will write compiled artifacts to.
/_build/

# If you run "mix test --cover", coverage assets end up here.
/cover/

# The directory Mix downloads your dependencies sources to.
/deps/

# Where third-party dependencies like ExDoc output generated docs.
/doc/

# If the VM crashes, it generates a dump, let's ignore it too.
erl_crash.dump

# Also ignore archive artifacts (built via "mix archive.build").
*.ez

# Ignore package tarball (built via "mix hex.build").
xylem-*.tar

# Temporary files, for example, from tests.
/tmp/

# Output data
/priv/data/wikidata/
29 changes: 29 additions & 0 deletions preparation/xylem/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
FROM elixir:1.18.4-otp-27-slim

# CA-Zertifikate für HTTPS-Downloads installieren
RUN apt-get update && apt-get install -y ca-certificates && rm -rf /var/lib/apt/lists/*

WORKDIR /app

# Fix this issue on Apple Silicon: https://elixirforum.com/t/unable-to-compile-default-elixir-project-from-the-getting-started-guide/57199/12
ENV ERL_FLAGS="+JPperf true"

ENV MIX_ENV=prod

# Hex und rebar installieren
RUN mix local.hex --force && \
mix local.rebar --force

# Dependencies zuerst kopieren (für besseres Caching)
COPY mix.exs mix.lock ./
RUN mix deps.get && mix deps.compile

# Restlichen Code kopieren
COPY config config
COPY lib lib

RUN mix compile

VOLUME ["/app/data", "/app/priv"]

ENTRYPOINT ["mix", "xylem.generate"]
45 changes: 45 additions & 0 deletions preparation/xylem/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Xylem

Daten-Pipeline für das BaumBie-Projekt. Lädt Baumarten-Daten von Wikidata basierend auf einer CSV-Datei mit Wikidata-IDs.

**Input:** `data/Baumarten-wikidata.csv` (CSV mit Wikidata-IDs)
**Output:** `priv/data/wikidata/raw/*.ttl` (RDF-Daten im Turtle-Format)


## Nutzung mit Docker

### Image bauen

```bash
docker compose build
```

### Pipeline ausführen

```bash
# Alle Spezies verarbeiten
docker compose run --rm xylem

# Anzahl limitieren
docker compose run --rm xylem --limit 10
```

### Optionen

- `--csv PATH` - Pfad zur Input-CSV-Datei (default: `data/Baumarten-wikidata.csv`)
- `--raw PATH` - Verzeichnis für die Output-TTL-Dateien (default: `priv/data/wikidata/raw`)
- `--limit N` - Nur N Spezies verarbeiten


## Lokale Entwicklung (mit Elixir)

```bash
# Dependencies installieren
mix deps.get

# Tests
mix test

# Pipeline ausführen
mix xylem.generate
```
9 changes: 9 additions & 0 deletions preparation/xylem/config/config.exs
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
import Config

config :sparql_client,
protocol_version: "1.1",
update_request_method: :direct

config :tesla, :adapter, Tesla.Adapter.Hackney

import_config "#{Mix.env()}.exs"
1 change: 1 addition & 0 deletions preparation/xylem/config/dev.exs
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
import Config
1 change: 1 addition & 0 deletions preparation/xylem/config/prod.exs
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
import Config
6 changes: 6 additions & 0 deletions preparation/xylem/config/test.exs
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
import Config

config :exvcr,
vcr_cassette_library_dir: "test/fixtures/vcr_cassettes"

config :logger, level: :warning
Loading