Skip to content

Comments

Remove ChromaDB/NetworkX/BM25 remnants, update docs for Graphiti#12

Merged
github-actions[bot] merged 8 commits intomainfrom
cleanup/remove-chromadb-networkx-remnants
Feb 15, 2026
Merged

Remove ChromaDB/NetworkX/BM25 remnants, update docs for Graphiti#12
github-actions[bot] merged 8 commits intomainfrom
cleanup/remove-chromadb-networkx-remnants

Conversation

@manana2520
Copy link
Contributor

Summary

  • Update all documentation (README, ARCHITECTURE.md, ADRs) to reflect Neo4j + Graphiti architecture
  • Remove dead code from the old ChromaDB/NetworkX/BM25 architecture (~1,680 lines deleted)
  • Rename all VectorIndexer references to GraphitiIndexer
  • Clean up deprecated config settings, dependencies, and test code

Changes

Documentation (commit 1):

  • Updated README.md, ARCHITECTURE.md, GRAPH_DATABASE_PLAN.md
  • Marked ADR-0002 and ADR-0005 as superseded
  • Added ADR-0009 for Neo4j + Graphiti adoption

Code cleanup (commit 2):

  • Removed chromadb and rank-bm25 from pyproject.toml
  • Removed 9 deprecated config settings (CHROMA_, BM25_, GRAPH_DUAL_WRITE)
  • Deleted graph_builder.py (NetworkX) and graph_retriever.py (NetworkX)
  • Removed EntityExtractor, GovernanceMetadata, Entity, Relationship classes
  • Removed deprecated lifecycle stubs and rebuild-bm25 CLI command
  • Renamed VectorIndexer -> GraphitiIndexer in all callers and test mocks
  • Renamed index_to_chromadb -> index_to_graphiti in downloader
  • Updated ChromaDB references in comments/docstrings to Graphiti
  • Cleaned up test files for removed code

Test plan

  • All 280 unit tests pass
  • All 99 e2e tests collect successfully
  • All 38 integration tests collect successfully
  • All key imports verified (config, graph, models, indexer)
  • Zero remaining chromadb/networkx/rank_bm25 imports in source

Gemini Agent added 8 commits February 12, 2026 21:34
Update NEO4J_URI from old Cloud Run Neo4j service
(bolt+s://neo4j-4aosg235qq-uc.a.run.app:443) to GCE VM
(bolt://10.0.0.27:7687) for all production resources:
- confluence-sync, index-rebuild, sync-pipeline jobs
- slack-bot service

Switch NEO4J_PASSWORD from Secret Manager reference to
random_password.neo4j_prod_password.result to match the
actual password on the GCE VM.
Production Graphiti indexing was extremely slow (0.27 chunks/min vs
staging's 4.5) due to missing LLM_PROVIDER, GOOGLE_GENAI_USE_VERTEXAI,
and GRAPHITI_BULK_ENABLED env vars that staging already had. Without
proper Gemini config, Graphiti produced malformed JSON responses
triggering retries and exponential backoff.
The circuit breaker was resetting consecutive_failures to 0 before
the skip check could trigger (consecutive_failures >= MAX_RETRIES),
causing chunks that always fail to retry forever. Added separate
chunk_attempts counter that tracks retries per chunk independently
of the circuit breaker reset.
Neo4j 5.x stores auth in the system database, not flat files.
The previous auth reset (rm auth.ini/auth) was ineffective.
Now deletes databases/system and transactions/system directories
to force Neo4j to recreate auth from NEO4J_AUTH env var.
All 6 intake-related Cloud Scheduler jobs were firing daily/weekly
without being intentionally enabled, causing duplicate pipeline runs.
Intake jobs should be run manually until a sync strategy is defined.

Removed: confluence-sync-daily, parse-daily, metadata-generation-daily,
index-rebuild-weekly, quality-scoring-daily, sync-pipeline-daily.

Kept: scheduler service account + IAM (used by backup.tf schedulers).
…metadata-generation, confluence-sync

These standalone jobs are remnants from pre-pipeline architecture.
The consolidated pipeline job (sync-pipeline) handles download+parse+index
in a single process. The standalone jobs can't work in Cloud Run anyway
because they need shared SQLite state between steps.

Quality-scoring and metadata-generation are dead features not used by
Graphiti search, and were burning Vertex AI Claude credits for nothing.

Only sync-pipeline remains for manual intake runs.
Update README, ARCHITECTURE.md, ADRs, and GRAPH_DATABASE_PLAN to reflect
the completed migration from ChromaDB/NetworkX/BM25 to Neo4j + Graphiti.
Mark ADR-0002 and ADR-0005 as superseded, add ADR-0009 for Neo4j + Graphiti.
…aphiti

- Remove chromadb and rank-bm25 dependencies from pyproject.toml
- Remove 9 deprecated config settings (CHROMA_*, BM25_*, GRAPH_DUAL_WRITE)
- Delete graph_builder.py (NetworkX) and graph_retriever.py (NetworkX)
- Remove EntityExtractor class (replaced by Graphiti), GovernanceMetadata,
  Entity, and Relationship models (replaced by Neo4j)
- Remove deprecated lifecycle stubs and rebuild-bm25 CLI command
- Rename VectorIndexer -> GraphitiIndexer in all callers and test mocks
- Rename index_to_chromadb -> index_to_graphiti in downloader
- Update all ChromaDB references in comments/docstrings to Graphiti
- Clean up test files: remove tests for deleted code, fix imports
@github-actions github-actions bot merged commit e005a86 into main Feb 15, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant