Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 1 addition & 10 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -61,11 +61,6 @@ jobs:
out="${out} catalog-explorer"
fi

# experiment-tracker
if git diff --name-only $BASE $HEAD | grep -q '^src/experiment-tracker/'; then
out="${out} experiment-tracker"
fi

# workspace
if git diff --name-only $BASE $HEAD | grep -q '^src/workspace/'; then
out="${out} workspace"
Expand Down Expand Up @@ -109,7 +104,7 @@ jobs:
REG=flintml
OLD=${{ steps.check_version.outputs.old }}
NEW=${{ steps.check_version.outputs.new }}
all=( storage compute-manager experiment-server catalog-explorer experiment-tracker workspace reverse-proxy worker-base )
all=( storage compute-manager experiment-server catalog-explorer workspace reverse-proxy worker-base )
read -r -a changed <<< "${{ steps.detect.outputs.services }}"

for svc in "${all[@]}"; do
Expand Down Expand Up @@ -143,10 +138,6 @@ jobs:
CTX="src"
DOCKERFILE="src/catalog-explorer/Dockerfile"
;;
experiment-tracker)
CTX="src/experiment-tracker"
DOCKERFILE="src/experiment-tracker/Dockerfile"
;;
workspace)
CTX="src"
DOCKERFILE="src/workspace/Dockerfile"
Expand Down
15 changes: 11 additions & 4 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,22 @@
# Changelog

## [0.1.25] - ...
## [0.2.0]
- The objective of this release is to support mounting the Flint Metastore as a POSIX-like filesystem. Consequently, reducing the number of Docker volumes required by the Control Plane.
- The Experiment Tracker has been collapsed into the Experiment Server for simplicity and improved robustness around `inotify` events.
- The Experiment Server now stores metrics in the Flint Metastore via JuiceFS, which maintains a metadata database as a SQLite file inside the `storage_meta` mount.
- The Workspace now mounts the Flint Metastore to store workspace files such as notebooks.
- Worker Containers also now mount the Flint Metastore, giving them visibility of workspace files. This enables notebooks to be executed within a notebook.

## [0.1.25]
- Redesigned networking to ensure control plane services communicate directly and not via `reverse-proxy`. This avoids circular service dependencies where `reverse-proxy` defines a route for `serviceX` and thus depends on `serviceX`, but `serviceX` depends on `serviceY` and tries to communicate with `serviceY` via `reverse-proxy` that is dependent on `serviceX`...

## [0.1.24] - ...
## [0.1.24]
- Added named volume for `workspace`.

## [0.1.23] - ...
## [0.1.23]
- First distributed release of FlintML.

...

## [0.1.0] - ...
## [0.1.0]
- First build release of FlintML.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
<img width="60%" src="docs/_assets/logo-text.png" alt="FlintML Logo Text" /><br/>

<!-- Badges, all inside the same HTML block -->
<img src="https://img.shields.io/badge/version-v0.1.25-cf051c" alt="Version 0.1.25" />
<img src="https://img.shields.io/badge/version-v0.2.0-cf051c" alt="Version 0.2.0" />
<img src="https://img.shields.io/badge/license-BSL_1.1-blue" alt="License BSL 1.1" />

</br>
Expand Down Expand Up @@ -53,7 +53,7 @@ To get a sense of what you can do with FlintML, check out the [Instacart Kaggle

### Data Storage

The `docker-compose.*.yml` in each FlintML release contains the named Docker volumes `storage_data`, `storage_meta`, `experiment_data` and `workspace_data`. If you wish to specify custom volumes, you should create an override `docker-compose.override.yml` and compose it when spinning up flint. See the [docs](https://docs.docker.com/compose/how-tos/multiple-compose-files/merge/).
FlintML ships with its own [Storage](docs/concepts.md#flint-control-plane) service that depends on the mounts, `storage_data` and `storage_meta`. If you wish to specify custom volumes, you should create an override `docker-compose.override.yml` and compose it when spinning up flint. See the [docs](https://docs.docker.com/compose/how-tos/multiple-compose-files/merge/).

### Environment Variables

Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.1.25
0.2.0
10 changes: 5 additions & 5 deletions docs/concepts.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,14 @@

FlintML contains all the necessary components to enable end-to-end machine learning workloads. It accomplishes this by running several Docker Compose services, constituting the control plane. The control plane network is managed by an internal [nginx](https://github.com/nginx/nginx) container. The key services are:

- **Storage:** Powered by [Zenko CloudServer](https://github.com/scality/cloudserver), the Storage service houses the *data layer* upon which the [Flint Catalog](#flint-catalog) is built.
- **Workspace:** The Workspace service serves the FlintML user interface (JupyterLab skin + custom extensions.) A custom [KernelProvisioner](https://jupyter-client.readthedocs.io/en/latest/provisioning.html) communicates with the Compute Manager service.
- **Storage:** Powered by [Zenko CloudServer](https://github.com/scality/cloudserver), the Storage service comprises the *data layer* upon which the [Flint Catalog](#flint-catalog) is built. This data layer is referred to as the Flint Metastore. Data is stored per the `storage_data` and `storage_meta` Docker volumes.
- **Workspace:** The Workspace service serves the FlintML user interface (JupyterLab skin + custom extensions.) A custom [KernelProvisioner](https://jupyter-client.readthedocs.io/en/latest/provisioning.html) communicates with the Compute Manager service. Mounts the Flint Metastore using [s3fs](https://github.com/s3fs-fuse/s3fs-fuse) and uses this mount as the JupyterLab working directory.
- **Compute Manager:** The Compute Manager orchestrates and controls all [Worker Containers](#worker-containers) via a configurable *driver*. All requests to start and stop Worker Containers are handled by this service.
- **Experiment Server:** Integrated with [Aim](https://github.com/aimhubio/aim), the Experiment Server acts as the controller for all ML experiments. It does NOT use the Storage service because it requires a filesystem backend. Thus, an `experiment_data` volume is required. Artifacts are referenced in experiment runs but stored as Objects in the Flint Catalog.
- **Experiment Server:** Integrated with [Aim](https://github.com/aimhubio/aim), the Experiment Server acts as the controller for all ML experiments and serves the Aim UI. Metrics live in the Flint Metastore as chunks using [JuiceFS](https://juicefs.com/en/). JuiceFS maintains its metadata with a SQLite database inside the `storage_meta` volume, coupling the lifetime of metrics metadata to the liftetime of metrics data.

## Flint Catalog

The Flint Catalog is a novel and simplified approach to a data catalog. It provides a logical repository that sits on top of the physical locations of files in Storage. This enables key capabilities around governance, search, discovery, lineage and reusability. FlintML will continue to deploy new such capabilities throughout development.
The Flint Catalog is a novel and simplified approach to a data catalog. It provides a logical repository that sits on top of the physical locations of files in the Flint Metastore. This enables key capabilities around governance, search, discovery, lineage and reusability. FlintML will continue to deploy new such capabilities throughout development. **The Flint Catalog does not govern workspace files or metrics.**

The Flint Catalog defines an Item as logically being a Table (i.e. Delta), or an Object (incl. Artifact sub-type.) All Items are unified within the catalog and considered first-class data-citizens. Rather than logically grouping Items by way of a hierarchical structure like `<catalog>.<schema>.<entity>` as used in the Unity Catalog, the Flint Catalog allows for full flexibility through the use of tags.

Expand All @@ -23,7 +23,7 @@ Importantly, Items are identified by their URI; an Item's URI is a url encoding

FlintML's Control Plane does not execute user code (i.e. development notebooks and workflows.) This work gets delegated to Worker Containers. A Worker Container is a Docker container installed with the `flintml/worker-base` image (directly or indirectly by a derivative image.)

Each Worker Container runs a single Jupyter kernel that communicates with the Workspace, Storage and Experiment Server services. **Therefore, it is crucial that the host running Worker Containers has networking to the host of the Control Plane.**
Each Worker Container runs a single Jupyter kernel that communicates with the Workspace, Storage and Experiment Server services. **Therefore, it is crucial that the host running Worker Containers has networking to the host of the Control Plane.** Each Worker Container also mounts Workspace files to its working directory. This enables you to, for example, execute notebooks within a notebook: `%run ./root-level-notebook.ipynb`.

Worker Containers are instantiated by the [Driver](#drivers) configured for use by the Compute Manager service.

Expand Down
42 changes: 22 additions & 20 deletions examples/instacart.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -216,7 +216,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": null,
"id": "39081859-c5e0-4106-b67f-2c88824d36f7",
"metadata": {},
"outputs": [
Expand All @@ -242,10 +242,11 @@
" name=\"order_products__prior\",\n",
" tags={\n",
" \"source\": \"external\",\n",
" \"provider\": \"kaggle\"\n",
" \"provider\": \"kaggle\",\n",
" \"example\": \"instacart\"\n",
" }\n",
")\n",
"products = scan_delta(path=\"products?provider=kaggle&source=external\") # or copy table path from Catalog Explorer\n",
"products = scan_delta(path=\"products?provider=kaggle&source=external&example=instacart\") # or copy table path from Catalog Explorer\n",
"\n",
"# Lazy join\n",
"top_products = (\n",
Expand Down Expand Up @@ -277,18 +278,18 @@
},
{
"cell_type": "code",
"execution_count": 20,
"execution_count": null,
"id": "f39330a1-15d8-4380-8fc5-028b0da868c3",
"metadata": {},
"outputs": [],
"source": [
"# Lazy load all tables\n",
"aisles = scan_delta(\"aisles?provider=kaggle&source=external\")\n",
"departments = scan_delta(\"departments?provider=kaggle&source=external\")\n",
"prior = scan_delta(\"order_products__prior?provider=kaggle&source=external\")\n",
"train = scan_delta(\"order_products__train?provider=kaggle&source=external\")\n",
"orders = scan_delta(\"orders?provider=kaggle&source=external\")\n",
"products = scan_delta(\"products?provider=kaggle&source=external\")\n",
"aisles = scan_delta(\"aisles?provider=kaggle&source=external&example=instacart\")\n",
"departments = scan_delta(\"departments?provider=kaggle&source=external&example=instacart\")\n",
"prior = scan_delta(\"order_products__prior?provider=kaggle&source=external&example=instacart\")\n",
"train = scan_delta(\"order_products__train?provider=kaggle&source=external&example=instacart\")\n",
"orders = scan_delta(\"orders?provider=kaggle&source=external&example=instacart\")\n",
"products = scan_delta(\"products?provider=kaggle&source=external&example=instacart\")\n",
"\n",
"# Join prior orders with order metadata\n",
"prior_full = (\n",
Expand Down Expand Up @@ -433,7 +434,7 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": null,
"id": "91a63c9d-3317-4cf8-bc34-b665381e9f65",
"metadata": {},
"outputs": [],
Expand All @@ -443,7 +444,7 @@
"from sklearn.preprocessing import LabelEncoder\n",
"\n",
"# Load full dataset into memory\n",
"df = read_delta(\"features?env=dev\")\n",
"df = read_delta(\"features?env=dev&example=instacart\")\n",
"df = df.drop([\"product_name\"])\n",
"\n",
"# Encode categorical columns\n",
Expand Down Expand Up @@ -471,7 +472,7 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": null,
"id": "0e070ada-620c-4d96-a491-f18502f3ae93",
"metadata": {},
"outputs": [
Expand Down Expand Up @@ -1469,9 +1470,9 @@
"run = new_run(experiment=\"xgb-reorder-predictor\")\n",
"run[\"hparams\"] = {\n",
" \"scale_pos_weight\": ratio,\n",
" \"n_estimators\": 100,\n",
" \"n_estimators\": 20,\n",
" \"learning_rate\": 0.1,\n",
" \"max_depth\": 6,\n",
" \"max_depth\": 4,\n",
"}\n",
"\n",
"# Define model\n",
Expand All @@ -1480,9 +1481,9 @@
" scale_pos_weight=ratio,\n",
" eval_metric=\"logloss\",\n",
" use_label_encoder=False,\n",
" n_estimators=100,\n",
" n_estimators=20,\n",
" learning_rate=0.1,\n",
" max_depth=6,\n",
" max_depth=4,\n",
" random_state=42\n",
")\n",
"\n",
Expand Down Expand Up @@ -1619,14 +1620,13 @@
},
{
"cell_type": "code",
"execution_count": 21,
"execution_count": null,
"id": "6a525197-1426-494c-9926-5ff070d6e249",
"metadata": {},
"outputs": [],
"source": [
"from flint import drop_delta, delete_object\n",
"\n",
"delete_object(\"untrained-model?example=instacart&proj=instacart&run_id=cc31f135d4ff4c04a2eb53dd\")\n",
"drop_delta(\"features?env=dev&example=instacart\")\n",
"drop_delta(\"order_products__train?example=instacart&provider=kaggle&source=external\")\n",
"drop_delta(\"departments?example=instacart&provider=kaggle&source=external\")\n",
Expand All @@ -1642,7 +1642,9 @@
"id": "c5881504-29e7-45be-9999-127216128423",
"metadata": {},
"outputs": [],
"source": []
"source": [
"delete_object(\"untrained-model?example=instacart&proj=instacart&run_id=93a02bd3348846edb3e6179c\") # You will have a different run_id"
]
}
],
"metadata": {
Expand Down
28 changes: 0 additions & 28 deletions src/compute-manager/Dockerfile-old

This file was deleted.

17 changes: 8 additions & 9 deletions src/compute-manager/src/driver/local.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
import docker
from docker.models.containers import Container
from docker.errors import NotFound, APIError
from docker.types import LogConfig
from typing import Dict
import os
from typing import Tuple, Optional
Expand Down Expand Up @@ -73,11 +74,10 @@ async def launch_container(self, ctx: ContainerContext) -> None:

# Launch the container (do NOT start ipykernel yet)
container = await asyncio.to_thread(
self._docker.containers.run,
self._docker.containers.create,
image=self.worker_image,
name=f"flint__{project_name}__worker__{ctx.id}",
detach=True,
auto_remove=True,
name=f"flint__{project_name}__worker__{ctx.id}",
network=network_name,
labels={"flint.ephemeral": "true"},
environment={
Expand All @@ -87,20 +87,19 @@ async def launch_container(self, ctx: ContainerContext) -> None:
"STORAGE_USER": os.environ.get("STORAGE_USER"),
"STORAGE_PASSWORD": os.environ.get("STORAGE_PASSWORD")
},
command=["sh", "-c", "poetry run python /root/watchdog.py >> /tmp/watchdog.log 2>&1"],
tty=True,
stdin_open=True,
volumes=volumes_dict,
devices=["/dev/fuse:/dev/fuse"],
cap_add=["SYS_ADMIN"],
security_opt=["apparmor:unconfined"],
log_config=LogConfig(type="local")
)

# Copy connection.json into container
await asyncio.to_thread(container.put_archive, "/tmp", tarstream.read())

# Run ipykernel inside the container
run_kernel_cmd = [
"poetry", "run", "python", "-m", "ipykernel_launcher", "-f", "/tmp/connection.json"
]
await asyncio.to_thread(container.exec_run, run_kernel_cmd, detach=True)
await asyncio.to_thread(container.start)

# Update container context
await asyncio.to_thread(container.reload)
Expand Down
32 changes: 12 additions & 20 deletions src/docker-compose.build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,14 @@ x-storage-creds: &storage-creds
STORAGE_USER: ${STORAGE_USER:-admin}
STORAGE_PASSWORD: ${STORAGE_PASSWORD:-password}

x-s3fs: &s3fs
devices:
- /dev/fuse:/dev/fuse
cap_add:
- SYS_ADMIN
security_opt:
- apparmor:unconfined

services:

# --- BACKEND SERVICES ---
Expand All @@ -24,15 +32,15 @@ services:
restart: always

experiment-server:
<<: *storage-creds
<<: [*storage-creds, *s3fs]
build:
context: .
dockerfile: ./experiment-server/Dockerfile
depends_on:
storage:
condition: service_healthy
volumes:
- experiment_data:/repo
- storage_meta:/meta
restart: always

### --- FRONTEND SERVICES ---
Expand All @@ -46,26 +54,14 @@ services:
storage:
condition: service_healthy
restart: always

experiment-tracker:
build: ./experiment-tracker
depends_on:
experiment-server:
condition: service_healthy
volumes:
- experiment_data:/repo
restart: always

workspace:
<<: *storage-creds
<<: [*storage-creds, *s3fs]
build:
context: .
dockerfile: ./workspace/Dockerfile
depends_on:
- catalog-explorer
- experiment-tracker
volumes:
- workspace_data:/srv/workspace
restart: always

reverse-proxy:
Expand All @@ -81,14 +77,10 @@ services:
condition: service_started
catalog-explorer:
condition: service_started
experiment-tracker:
condition: service_started
ports:
- "${FLINT_PORT:-8701}:80"
restart: always

volumes:
storage_data:
storage_meta:
experiment_data:
workspace_data:
storage_meta:
Loading