From 73f040121d67277dc81739c2334e45a11a158f11 Mon Sep 17 00:00:00 2001
From: Shiva <shiva@Shivajis-MacBook-Pro.local>
Date: Thu, 4 Dec 2025 18:26:22 +0530
Subject: [PATCH 1/2] add README with detailed usage and import instructions
 for dgraph-import

---
 dgraph-import/README.md | 117 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 117 insertions(+)
 create mode 100644 dgraph-import/README.md

diff --git a/dgraph-import/README.md b/dgraph-import/README.md
new file mode 100644
index 0000000..408a90a
--- /dev/null
+++ b/dgraph-import/README.md
@@ -0,0 +1,117 @@
+# Dgraph Import
+
+## Overview
+
+The `dgraph import` command bulk loads RDF/JSON data into a Dgraph cluster via snapshot-based import. It supports two workflows: generating a snapshot from data files or streaming an existing snapshot to a running cluster.
+
+## Command Syntax
+
+```
+dgraph import [flags]
+```
+
+### Essential Flags
+
+| Flag | Description |
+|------|-------------|
+| `--files, -f` | Path to RDF/JSON data files (e.g., `data.rdf`, `data.json`) |
+| `--schema, -s` | Path to DQL schema file |
+| `--graphql_schema, -g` | Path to GraphQL schema file |
+| `--format` | File format: `rdf` or `json` |
+| `--snapshot-dir, -p` | Path to existing snapshot output directory for direct import |
+| `--drop-all` | Drop all existing cluster data before import (enables bulk loader) |
+| `--drop-all-confirm` | Confirmation flag for `--drop-all` operation |
+| `--conn-str, -c` | Dgraph connection string (e.g., `dgraph://localhost:9080`) |
+
+## Quick Start
+
+### Bulk Import with Data and Schema
+
+```
+dgraph import --files data.rdf --schema schema.dql \
+              --drop-all --drop-all-confirm \
+              --conn-str dgraph://localhost:9080
+```
+
+Loads data from `data.rdf`, drops existing cluster data, generates a snapshot, and streams it to the cluster.
+
+### Import from Existing Snapshot
+
+```
+dgraph import --snapshot-dir ./out --conn-str dgraph://localhost:9080
+```
+
+Directly streams snapshot data without the bulk loading phase.
+
+## Snapshot Directory Structure
+
+The bulk loader generates an `out` directory with per-group subdirectories:
+
+```
+out/
+├── 0/
+│   └── p/          # BadgerDB files for group 0
+├── 1/
+│   └── p/          # BadgerDB files for group 1
+└── N/
+    └── p/          # BadgerDB files for group N
+```
+
+When using `--snapshot-dir`, provide the `out` directory path. The import tool automatically locates `p` directories within each group folder.
+
+**Important:** Do not specify the `p` directory directly.
+
+## How It Works
+
+1. **Drop-All Mode**: With `--drop-all` and `--drop-all-confirm`, the bulk loader generates a snapshot from provided data and schema files.
+2. **Snapshot Streaming**: The snapshot is streamed to the cluster via gRPC.
+3. **Consistency**: The cluster enters drain mode during import. On error, all data is dropped for safety.
+
+## Import Examples
+
+**RDF with DQL schema:**
+```
+dgraph import --files data.rdf --schema schema.dql \
+              --drop-all --drop-all-confirm \
+              --conn-str dgraph://localhost:9080
+```
+
+**JSON with GraphQL schema:**
+```
+dgraph import --files data.json --schema schema.dql \
+              --graphql-schema schema.graphql --format json \
+              --drop-all --drop-all-confirm \
+              --conn-str dgraph://localhost:9080
+```
+
+**Existing snapshot:**
+```
+dgraph import --snapshot-dir ./out --conn-str dgraph://localhost:9080
+```
+
+## Benchmark Import
+
+For testing with large datasets, Dgraph provides sample 1-million-record datasets.
+
+**Download benchmark files:**
+
+```
+wget https://github.com/dgraph-io/dgraph-benchmarks/blob/main/data/1million.rdf.gz?raw=true
+wget https://github.com/dgraph-io/dgraph-benchmarks/blob/main/data/1million.schema?raw=true
+```
+
+**Run benchmark import:**
+
+```
+dgraph import --files 1million.rdf.gz --schema 1million.schema \
+              --drop-all --drop-all-confirm \
+              --conn-str dgraph://localhost:9080
+```
+
+## Important Notes
+
+- When `--drop-all` and `--drop-all-confirm` flags are set, **all existing data in the cluster will be dropped** before the import begins.
+- Both `--drop-all` and `--drop-all-confirm` flags are required for bulk loading; the command aborts without them.
+- Live loader mode is not supported; only snapshot/bulk import is available.
+- Ensure sufficient disk space for snapshot generation.
+- Connection string must use gRPC format: `dgraph://localhost:9080`.
\ No newline at end of file

From 64a5d5c67bdf51b4e1000addf61ee76d27bc87e2 Mon Sep 17 00:00:00 2001
From: Shiva <shiva@Shivajis-MacBook-Pro.local>
Date: Mon, 8 Dec 2025 11:55:12 +0530
Subject: [PATCH 2/2] resolve review comments

---
 dgraph-import/README.md | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/dgraph-import/README.md b/dgraph-import/README.md
index 408a90a..bf4de39 100644
--- a/dgraph-import/README.md
+++ b/dgraph-import/README.md
@@ -2,7 +2,19 @@
 
 ## Overview
 
-The `dgraph import` command bulk loads RDF/JSON data into a Dgraph cluster via snapshot-based import. It supports two workflows: generating a snapshot from data files or streaming an existing snapshot to a running cluster.
+The `dgraph import` command, introduced in **v25.0.0** is designed to unify and simplify bulk and live data loading into Dgraph. Previously, users had to choose between `dgraph bulk` and `dgraph live`. With `dgraph import`, you now have a single command for both workflows, eliminating manual steps and reducing operational complexity.
+
+> **Note:**  
+> The original intent was to support both bulk and live loading, but **live loader mode is not yet supported**. Only bulk/snapshot import is available.
+
+## How Data Is Imported
+
+When you run `dgraph import`, the tool first runs the bulk loader using your provided RDF/JSON and schema files. This generates the snapshot data in the form of `p` directories (BadgerDB files) for each group.  
+After the bulk loader completes, `dgraph import` connects to the Alpha endpoint, puts the cluster into drain mode, and **streams the contents of the generated `p` directories directly to the running cluster using gRPC bidirectional streaming**. Once the import is complete, the cluster exits drain mode and resumes normal operation.
+
+If you already have a snapshot directory (from a previous bulk load), you can use the `--snapshot-dir` flag to skip the bulk loading phase and directly stream the snapshot data to the cluster.
+
+This means you no longer need to stop Alpha nodes or manually manage files—`dgraph import` handles everything automatically.
 
 ## Command Syntax
 
@@ -33,7 +45,7 @@ dgraph import --files data.rdf --schema schema.dql \
               --conn-str dgraph://localhost:9080
 ```
 
-Loads data from `data.rdf`, drops existing cluster data, generates a snapshot, and streams it to the cluster.
+Loads data from `data.rdf`, drops existing cluster data, runs the bulk loader to generate a snapshot, and streams it to the cluster.
 
 ### Import from Existing Snapshot
 
@@ -41,7 +53,7 @@ Loads data from `data.rdf`, drops existing cluster data, generates a snapshot, a
 dgraph import --snapshot-dir ./out --conn-str dgraph://localhost:9080
 ```
 
-Directly streams snapshot data without the bulk loading phase.
+Directly streams snapshot data (output of a previous bulk load) into the cluster, without running the bulk loader again.
 
 ## Snapshot Directory Structure
 
@@ -64,7 +76,7 @@ When using `--snapshot-dir`, provide the `out` directory path. The import tool a
 ## How It Works
 
 1. **Drop-All Mode**: With `--drop-all` and `--drop-all-confirm`, the bulk loader generates a snapshot from provided data and schema files.
-2. **Snapshot Streaming**: The snapshot is streamed to the cluster via gRPC.
+2. **Snapshot Streaming**: The snapshot (contents of `p` directories) is streamed to the cluster via gRPC, copying all data directly into the running cluster.
 3. **Consistency**: The cluster enters drain mode during import. On error, all data is dropped for safety.
 
 ## Import Examples