Skip to content

Commit 9126c80

Browse files
docs: update Docker deployment docs for bridge networking migration (#2963)
- Create docker/README.md with full setup guide, env var reference, port table, health checks, and troubleshooting - Fix hugegraph-store/docs/deployment-guide.md: replace wrong env vars (GRPC_HOST, RAFT_ADDRESS etc.) with correct HG_* names - Update K8s manifest in deployment-guide.md to use HG_* env vars - Fix 7 files pointing to dead docker/example/ directory - Add Docker bridge network notes to PD configuration docs - Add distributed cluster section to server Docker README Relates to: #2952 * docs: clarify temporary entrypoint mount workaround in docker/README.md The 3-node and single-node quickstart compose files currently mount entrypoint scripts from source as a workaround until updated Docker images are published with the new entrypoints baked in. Add a clear note explaining this temporary requirement so users are not confused about needing a full source clone to run the cluster. --------- Co-authored-by: imbajin <jin@apache.org>
1 parent 8d758d5 commit 9126c80

File tree

11 files changed

+480
-206
lines changed

11 files changed

+480
-206
lines changed

AGENTS.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -231,7 +231,7 @@ For distributed development:
231231
3. Build Store: `mvn clean package -pl hugegraph-store -am -DskipTests`
232232
4. Build Server with HStore backend: `mvn clean package -pl hugegraph-server -am -DskipTests`
233233

234-
See Docker Compose example: `hugegraph-server/hugegraph-dist/docker/example/`
234+
See Docker Compose examples: `docker/` directory. Single-node quickstart (pre-built images): `docker/docker-compose.yml`. Single-node dev build (from source): `docker/docker-compose.dev.yml`. 3-node cluster: `docker/docker-compose-3pd-3store-3server.yml`. See `docker/README.md` for full setup guide.
235235

236236
### Debugging Tips
237237

README.md

Lines changed: 18 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -173,11 +173,11 @@ flowchart TB
173173
### 5 Minutes Quick Start
174174

175175
```bash
176-
# Start HugeGraph with Docker
176+
# Start HugeGraph (standalone mode)
177177
docker run -itd --name=hugegraph -p 8080:8080 hugegraph/hugegraph:1.7.0
178178

179179
# Verify server is running
180-
curl http://localhost:8080/apis/version
180+
curl http://localhost:8080/versions
181181

182182
# Try a Gremlin query
183183
curl -X POST http://localhost:8080/gremlin \
@@ -208,13 +208,18 @@ docker run -itd --name=hugegraph -e PASSWORD=your_password -p 8080:8080 hugegrap
208208
```
209209

210210
For advanced Docker configurations, see:
211-
- [Docker Documentation](https://hugegraph.apache.org/docs/quickstart/hugegraph-server/#3-deploy)
212-
- [Docker Compose Example](./hugegraph-server/hugegraph-dist/docker/example)
213-
- [Docker README](hugegraph-server/hugegraph-dist/docker/README.md)
211+
212+
* [Docker Documentation](https://hugegraph.apache.org/docs/quickstart/hugegraph-server/#3-deploy)
213+
* [Docker Compose Examples](./docker/)
214+
* [Docker README](./docker/README.md)
215+
* [Server Docker README](hugegraph-server/hugegraph-dist/docker/README.md)
216+
217+
> **Docker Desktop (Mac/Windows)**: The 3-node distributed cluster (`docker/docker-compose-3pd-3store-3server.yml`) uses Docker bridge networking and works on all platforms including Docker Desktop. Allocate at least 12 GB memory to Docker Desktop.
214218
215219
> **Note**: Docker images are convenience releases, not **official ASF distribution artifacts**. See [ASF Release Distribution Policy](https://infra.apache.org/release-distribution.html#dockerhub) for details.
216220
>
217-
> **Version Tags**: Use release tags (`1.7.0`, `1.x.0`) for stable versions. Use `latest` for development features.
221+
> **Version Tags**: Use release tags (e.g., `1.7.0`) for stable deployments. The `latest` tag should only be used for testing or development.
222+
218223

219224
<details>
220225
<summary><b>Option 2: Download Binary Package</b></summary>
@@ -283,14 +288,16 @@ Once the server is running, verify the installation:
283288

284289
```bash
285290
# Check server version
286-
curl http://localhost:8080/apis/version
291+
curl http://localhost:8080/versions
287292

288293
# Expected output:
289294
# {
290-
# "version": "1.7.0",
291-
# "core": "1.7.0",
292-
# "gremlin": "3.5.1",
293-
# "api": "1.7.0"
295+
# "versions": {
296+
# "version": "v1",
297+
# "core": "1.7.0",
298+
# "gremlin": "3.5.1",
299+
# "api": "1.7.0"
300+
# }
294301
# }
295302

296303
# Try Gremlin console (if installed locally)

docker/README.md

Lines changed: 259 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,259 @@
1+
# HugeGraph Docker Deployment
2+
3+
This directory contains Docker Compose files for running HugeGraph:
4+
5+
| File | Description |
6+
|------|-------------|
7+
| `docker-compose.yml` | Single-node cluster using pre-built images from Docker Hub |
8+
| `docker-compose.dev.yml` | Single-node cluster built from source (for developers) |
9+
| `docker-compose-3pd-3store-3server.yml` | 3-node distributed cluster (PD + Store + Server) |
10+
11+
## Prerequisites
12+
13+
- **Docker Engine** 20.10+ (or Docker Desktop 4.x+)
14+
- **Docker Compose** v2 (included in Docker Desktop)
15+
- **Memory**: Allocate at least **12 GB** to Docker Desktop (Settings → Resources → Memory). The 3-node cluster runs 9 JVM processes (3 PD + 3 Store + 3 Server) which are memory-intensive. Insufficient memory causes OOM kills that appear as silent Raft failures.
16+
17+
> [!IMPORTANT]
18+
> The 12 GB minimum is for Docker Desktop. On Linux with native Docker, ensure the host has at least 12 GB of free memory.
19+
---
20+
21+
## Single-Node Setup
22+
23+
Two compose files are available for running a single-node cluster (1 PD + 1 Store + 1 Server):
24+
25+
### Option A: Quick Start (pre-built images)
26+
27+
Uses pre-built images from Docker Hub. Best for **end users** who want to run HugeGraph quickly.
28+
29+
```bash
30+
cd docker
31+
HUGEGRAPH_VERSION=1.7.0 docker compose up -d
32+
```
33+
34+
- Images: `hugegraph/pd:1.7.0`, `hugegraph/store:1.7.0`, `hugegraph/server:1.7.0`
35+
- `pull_policy: always` — always pulls the specified image tag
36+
37+
> **Note**: Use release tags (e.g., `1.7.0`) for stable deployments. The `latest` tag is intended for testing or development only.
38+
- PD healthcheck endpoint: `/v1/health`
39+
- Single PD, single Store (`HG_PD_INITIAL_STORE_LIST: store:8500`), single Server
40+
- Server healthcheck endpoint: `/versions`
41+
42+
### Option B: Development Build (build from source)
43+
44+
Builds images locally from source Dockerfiles. Best for **developers** who want to test local changes.
45+
46+
```bash
47+
cd docker
48+
docker compose -f docker-compose.dev.yml up -d
49+
```
50+
51+
- Images: built from source via `build: context: ..` with Dockerfiles
52+
- No `pull_policy` — builds locally, doesn't pull
53+
- Entrypoint scripts are baked into the built image (no volume mounts)
54+
- PD healthcheck endpoint: `/v1/health`
55+
- Otherwise identical env vars and structure to the quickstart file
56+
57+
### Key Differences
58+
59+
| | `docker-compose.yml` (quickstart) | `docker-compose.dev.yml` (dev build) |
60+
|---|---|---|
61+
| **Images** | Pull from Docker Hub | Build from source |
62+
| **Who it's for** | End users | Developers |
63+
| **pull_policy** | `always` | not set (build) |
64+
65+
**Verify** (both options):
66+
```bash
67+
curl http://localhost:8080/versions
68+
```
69+
70+
---
71+
72+
## 3-Node Cluster Quickstart
73+
74+
```bash
75+
cd docker
76+
HUGEGRAPH_VERSION=1.7.0 docker compose -f docker-compose-3pd-3store-3server.yml up -d
77+
78+
# To stop and remove all data volumes (clean restart)
79+
docker compose -f docker-compose-3pd-3store-3server.yml down -v
80+
```
81+
82+
**Startup ordering** is enforced via `depends_on` with `condition: service_healthy`:
83+
84+
1. **PD nodes** start first and must pass healthchecks (`/v1/health`)
85+
2. **Store nodes** start after all PD nodes are healthy
86+
3. **Server nodes** start after all Store nodes are healthy
87+
88+
This ensures PD and Store are healthy before the server starts. The server entrypoint still performs a best-effort partition wait after launch, so partition assignment may take a little longer.
89+
90+
**Verify the cluster is healthy**:
91+
92+
```bash
93+
# Check PD health
94+
curl http://localhost:8620/v1/health
95+
96+
# Check Store health
97+
curl http://localhost:8520/v1/health
98+
99+
# Check Server (Graph API)
100+
curl http://localhost:8080/versions
101+
102+
# List registered stores via PD
103+
curl http://localhost:8620/v1/stores
104+
105+
# List partitions
106+
curl http://localhost:8620/v1/partitions
107+
```
108+
109+
---
110+
111+
## Environment Variable Reference
112+
113+
Configuration is injected via environment variables. The old `docker/configs/application-pd*.yml` and `docker/configs/application-store*.yml` files are no longer used.
114+
115+
### PD Environment Variables
116+
117+
| Variable | Required | Default | Maps To (`application.yml`) | Description |
118+
|----------|----------|---------|-----------------------------|-------------|
119+
| `HG_PD_GRPC_HOST` | Yes || `grpc.host` | This node's hostname/IP for gRPC |
120+
| `HG_PD_RAFT_ADDRESS` | Yes || `raft.address` | This node's Raft address (e.g. `pd0:8610`) |
121+
| `HG_PD_RAFT_PEERS_LIST` | Yes || `raft.peers-list` | All PD peers (e.g. `pd0:8610,pd1:8610,pd2:8610`) |
122+
| `HG_PD_INITIAL_STORE_LIST` | Yes || `pd.initial-store-list` | Expected stores (e.g. `store0:8500,store1:8500,store2:8500`) |
123+
| `HG_PD_GRPC_PORT` | No | `8686` | `grpc.port` | gRPC server port |
124+
| `HG_PD_REST_PORT` | No | `8620` | `server.port` | REST API port |
125+
| `HG_PD_DATA_PATH` | No | `/hugegraph-pd/pd_data` | `pd.data-path` | Metadata storage path |
126+
| `HG_PD_INITIAL_STORE_COUNT` | No | `1` | `pd.initial-store-count` | Min stores for cluster availability |
127+
128+
**Deprecated aliases** (still work but log a warning):
129+
130+
| Deprecated | Use Instead |
131+
|------------|-------------|
132+
| `GRPC_HOST` | `HG_PD_GRPC_HOST` |
133+
| `RAFT_ADDRESS` | `HG_PD_RAFT_ADDRESS` |
134+
| `RAFT_PEERS` | `HG_PD_RAFT_PEERS_LIST` |
135+
| `PD_INITIAL_STORE_LIST` | `HG_PD_INITIAL_STORE_LIST` |
136+
137+
### Store Environment Variables
138+
139+
| Variable | Required | Default | Maps To (`application.yml`) | Description |
140+
|----------|----------|---------|-----------------------------|-------------|
141+
| `HG_STORE_PD_ADDRESS` | Yes || `pdserver.address` | PD gRPC addresses (e.g. `pd0:8686,pd1:8686,pd2:8686`) |
142+
| `HG_STORE_GRPC_HOST` | Yes || `grpc.host` | This node's hostname (e.g. `store0`) |
143+
| `HG_STORE_RAFT_ADDRESS` | Yes || `raft.address` | This node's Raft address (e.g. `store0:8510`) |
144+
| `HG_STORE_GRPC_PORT` | No | `8500` | `grpc.port` | gRPC server port |
145+
| `HG_STORE_REST_PORT` | No | `8520` | `server.port` | REST API port |
146+
| `HG_STORE_DATA_PATH` | No | `/hugegraph-store/storage` | `app.data-path` | Data storage path |
147+
148+
**Deprecated aliases** (still work but log a warning):
149+
150+
| Deprecated | Use Instead |
151+
|------------|-------------|
152+
| `PD_ADDRESS` | `HG_STORE_PD_ADDRESS` |
153+
| `GRPC_HOST` | `HG_STORE_GRPC_HOST` |
154+
| `RAFT_ADDRESS` | `HG_STORE_RAFT_ADDRESS` |
155+
156+
### Server Environment Variables
157+
158+
| Variable | Required | Default | Maps To | Description |
159+
|----------|----------|---------|-----------------------------|-------------|
160+
| `HG_SERVER_BACKEND` | Yes || `backend` in `hugegraph.properties` | Storage backend (e.g. `hstore`) |
161+
| `HG_SERVER_PD_PEERS` | Yes || `pd.peers` | PD cluster addresses (e.g. `pd0:8686,pd1:8686,pd2:8686`) |
162+
| `STORE_REST` | No || Used by `wait-partition.sh` | Store REST endpoint for partition verification (e.g. `store0:8520`) |
163+
| `PASSWORD` | No || Enables auth mode | Optional authentication password |
164+
165+
**Deprecated aliases** (still work but log a warning):
166+
167+
| Deprecated | Use Instead |
168+
|------------|-------------|
169+
| `BACKEND` | `HG_SERVER_BACKEND` |
170+
| `PD_PEERS` | `HG_SERVER_PD_PEERS` |
171+
172+
---
173+
174+
## Port Reference
175+
176+
The table below reflects the published host ports in `docker-compose-3pd-3store-3server.yml`.
177+
The single-node compose file (`docker-compose.yml`) only publishes the REST/API ports (`8620`, `8520`, `8080`) by default.
178+
179+
| Service | Container Port | Host Port | Protocol | Purpose |
180+
|---------|---------------|-----------|----------|---------|
181+
| pd0 | 8620 | 8620 | HTTP | REST API |
182+
| pd0 | 8686 | 8686 | gRPC | PD gRPC |
183+
| pd0 | 8610 || TCP | Raft (internal only) |
184+
| pd1 | 8620 | 8621 | HTTP | REST API |
185+
| pd1 | 8686 | 8687 | gRPC | PD gRPC |
186+
| pd2 | 8620 | 8622 | HTTP | REST API |
187+
| pd2 | 8686 | 8688 | gRPC | PD gRPC |
188+
| store0 | 8500 | 8500 | gRPC | Store gRPC |
189+
| store0 | 8510 | 8510 | TCP | Raft |
190+
| store0 | 8520 | 8520 | HTTP | REST API |
191+
| store1 | 8500 | 8501 | gRPC | Store gRPC |
192+
| store1 | 8510 | 8511 | TCP | Raft |
193+
| store1 | 8520 | 8521 | HTTP | REST API |
194+
| store2 | 8500 | 8502 | gRPC | Store gRPC |
195+
| store2 | 8510 | 8512 | TCP | Raft |
196+
| store2 | 8520 | 8522 | HTTP | REST API |
197+
| server0 | 8080 | 8080 | HTTP | Graph API |
198+
| server1 | 8080 | 8081 | HTTP | Graph API |
199+
| server2 | 8080 | 8082 | HTTP | Graph API |
200+
201+
---
202+
203+
## Healthcheck Endpoints
204+
205+
| Service | Endpoint | Expected |
206+
|---------|----------|----------|
207+
| PD | `GET /v1/health` | `200 OK` |
208+
| Store | `GET /v1/health` | `200 OK` |
209+
| Server | `GET /versions` | `200 OK` with version JSON |
210+
211+
---
212+
213+
## Troubleshooting
214+
215+
### Containers Exiting or Restarting (OOM Kills)
216+
217+
**Symptom**: Containers exit with code 137, or restart loops. Raft logs show election timeouts.
218+
219+
**Cause**: Docker Desktop does not have enough memory. The 9 JVM processes require at least 12 GB.
220+
221+
**Fix**: Docker Desktop → Settings → Resources → Memory → set to **12 GB** or higher. Restart Docker Desktop.
222+
223+
```bash
224+
# Check if containers were OOM killed
225+
docker inspect hg-pd0 | grep -i oom
226+
docker stats --no-stream
227+
```
228+
229+
### Raft Leader Election Failure
230+
231+
**Symptom**: PD logs show repeated `Leader election timeout`. Store nodes cannot register.
232+
233+
**Cause**: PD nodes cannot reach each other on the Raft port (8610), or `HG_PD_RAFT_PEERS_LIST` is misconfigured.
234+
235+
**Fix**:
236+
1. Verify all PD containers are running: `docker compose -f docker-compose-3pd-3store-3server.yml ps`
237+
2. Check PD logs: `docker logs hg-pd0`
238+
3. Verify network connectivity: `docker exec hg-pd0 ping pd1`
239+
4. Ensure `HG_PD_RAFT_PEERS_LIST` is identical on all PD nodes
240+
241+
### Partition Assignment Not Completing
242+
243+
**Symptom**: Server starts but graph operations fail. Store logs show `partition not found`.
244+
245+
**Cause**: PD has not finished assigning partitions to stores, or stores did not register successfully.
246+
247+
**Fix**:
248+
1. Check registered stores: `curl http://localhost:8620/v1/stores`
249+
2. Check partition status: `curl http://localhost:8620/v1/partitions`
250+
3. Wait for partition assignment (can take 1–3 minutes after all stores register)
251+
4. Check server logs for the `wait-partition.sh` script output: `docker logs hg-server0`
252+
253+
### Connection Refused Errors
254+
255+
**Symptom**: Stores cannot connect to PD, or Server cannot connect to Store.
256+
257+
**Cause**: Services are using `127.0.0.1` instead of container hostnames, or the `hg-net` bridge network is misconfigured.
258+
259+
**Fix**: Ensure all `HG_*` env vars use container hostnames (`pd0`, `store0`, etc.), not `127.0.0.1` or `localhost`.

hugegraph-pd/AGENTS.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -247,7 +247,7 @@ store:
247247
### Common Configuration Errors
248248
249249
1. **Raft peer discovery failure**: `raft.peers-list` must include all PD nodes' `raft.address` values
250-
2. **Store connection issues**: `grpc.host` must be a reachable IP (not `127.0.0.1`) for distributed deployments
250+
2. **Store connection issues**: `grpc.host` must be a reachable IP (not `127.0.0.1`) for distributed deployments. In Docker bridge networking, use the container hostname (e.g., `pd0`) set via `HG_PD_GRPC_HOST` env var.
251251
3. **Split-brain scenarios**: Always run 3 or 5 PD nodes in production for Raft quorum
252252
4. **Partition imbalance**: Adjust `patrol-interval` for faster/slower rebalancing
253253

@@ -331,7 +331,7 @@ docker run -d -p 8620:8620 -p 8686:8686 -p 8610:8610 \
331331
hugegraph-pd:latest
332332
333333
# For production clusters, use Docker Compose or Kubernetes
334-
# See: hugegraph-server/hugegraph-dist/docker/example/
334+
# See: ../docker/docker-compose-3pd-3store-3server.yml and ../docker/README.md
335335
```
336336

337337
Exposed ports: 8620 (REST), 8686 (gRPC), 8610 (Raft)

0 commit comments

Comments
 (0)