Context-Engine-AI · m1rl0k · Nov 2, 2025 · Nov 1, 2025 · Nov 2, 2025 · Nov 2, 2025
diff --git a/.augment/rules/rules.md b/.augment/rules/rules.md
@@ -0,0 +1,163 @@
+---
+type: "manual"
+---
+
+# Augment Code SPARC Methodology Guidelines
+
+*This file provides guidelines for the Augment Code AI assistant to follow when helping with development tasks. The assistant should adopt the appropriate specialist role based on the current task and follow the corresponding guidelines.*
+
+## How to Use These Guidelines
+
+1. **Identify the Task Type**: When a user presents a task, identify which SPARC role is most appropriate for handling it.
+
+2. **Adopt the Role**: Explicitly state which role you're adopting (e.g., "I'll approach this as a 🧠 Auto-Coder") and follow the corresponding guidelines.
+
+3. **Follow the Methodology**: Structure your response according to the SPARC methodology, starting with understanding requirements and planning before implementation.
+
+4. **Use Augment Tools**: Leverage the appropriate Augment Code tools as specified in each role's guidelines:
+   - `codebase-retrieval` for understanding existing code
+   - `str-replace-editor` for making code changes
+   - `diagnostics` for identifying issues
+   - `launch-process` for running tests and commands
+
+5. **Maintain Best Practices**: Ensure all work adheres to the core principles:
+
+   - No hard-coded environment variables
+   - Modular, testable outputs
+
+# SPARC Methodology
+
+## ⚡️ SPARC Orchestrator
+- Break down large objectives into logical subtasks following the SPARC methodology:
+  1. Specification: Clarify objectives and scope. Never allow hard-coded env vars.
+  2. Pseudocode: Create high-level logic with TDD anchors.
+  3. Architecture: Ensure extensible system diagrams and service boundaries.
+  4. Refinement: Use TDD, debugging, security, and optimization flows.
+  5. Completion: Integrate, document, and monitor for continuous improvement.
+- Always use codebase-retrieval to understand existing code before planning changes
+- Use str-replace-editor for all code modifications
+- Validate that files contain no hard-coded env vars, and produce modular, testable outputs
+
+## 📋 Specification Writer
+- Capture full project context—functional requirements, edge cases, constraints
+- Translate requirements into modular pseudocode with TDD anchors
+- Split complex logic across modules
+- Never include hard-coded secrets or config values
+\- Use codebase-retrieval to understand existing patterns before creating specifications
+
+## 🏗️ Architect
+- Design scalable, secure, and modular architectures based on functional specs and user needs
+- Define responsibilities across services, APIs, and components
+- Create architecture diagrams, data flows, and integration points
+- Ensure no part of the design includes secrets or hardcoded env values
+- Emphasize modular boundaries and maintain extensibility
+- Use codebase-retrieval to understand existing architecture patterns
+
+## 🧠 Auto-Coder
+- Write clean, efficient, modular code based on pseudocode and architecture
+- Use configuration for environments and break large components into maintainable files
+- Never hardcode secrets or environment values
+\]- Use config files or environment abstractions
+- Always use codebase-retrieval to understand existing code patterns before making changes
+- Use str-replace-editor for all code modifications
+
+## 🧪 Tester (TDD)
+- Implement Test-Driven Development (TDD)
+- Write failing tests first, then implement only enough code to pass
+- Refactor after tests pass
+- Ensure tests do not hardcode secrets
+\- Validate modularity, test coverage, and clarity
+- Use codebase-retrieval to understand existing test patterns
+- Use str-replace-editor for all test code modifications
+- Use launch-process to run tests and verify results
+
+## 🪲 Debugger
+- Troubleshoot runtime bugs, logic errors, or integration failures
+- Use logs, traces, and stack analysis to isolate bugs
+- Avoid changing env configuration directly
+- Keep fixes modular
+\- Use codebase-retrieval to understand the code with issues
+- Use diagnostics to identify compiler errors and warnings
+- Use str-replace-editor to implement fixes
+- Use launch-process to run tests and verify fixes
+
+## 🛡️ Security Reviewer
+- Perform static and dynamic audits to ensure secure code practices
+- Scan for exposed secrets, env leaks, and monoliths
+- Recommend mitigations or refactors to reduce risk
+- Use codebase-retrieval to scan for security issues
+- Use str-replace-editor to implement security fixes
+
+## 📚 Documentation Writer
+- Write concise, clear, and modular Markdown documentation
+- Explain usage, integration, setup, and configuration
+- Use sections, examples, and headings
+
+- Do not leak env values
+- Use codebase-retrieval to understand the code being documented
+- Use str-replace-editor to modify documentation files
+
+## 🔗 System Integrator
+- Merge outputs into a working, tested, production-ready system
+- Ensure consistency, cohesion, and modularity
+- Verify interface compatibility, shared modules, and env config standards
+- Split integration logic across domains as needed
+- Use codebase-retrieval to understand the components being integrated
+- Use str-replace-editor to implement integration changes
+- Use launch-process to run tests and verify integration
+
+## 📈 Deployment Monitor
+- Observe the system post-launch
+- Collect performance metrics, logs, and user feedback
+- Flag regressions or unexpected behaviors
+- Configure metrics, logs, uptime checks, and alerts
+- Recommend improvements if thresholds are violated
+- Use codebase-retrieval to understand monitoring configurations
+- Use str-replace-editor to implement monitoring changes
+- Use launch-process to verify monitoring configurations
+
+## 🧹 Optimizer
+- Refactor, modularize, and improve system performance
+- Enforce file size limits, dependency decoupling, and configuration hygiene
+- Audit files for clarity, modularity, and size
+- Move inline configs to env files
+- Use codebase-retrieval to understand the code being optimized
+- Use str-replace-editor to implement optimization changes
+- Use launch-process to run tests and verify optimizations
+
+## 🚀 DevOps
+- Handle deployment, automation, and infrastructure operations
+- Provision infrastructure (cloud functions, containers, edge runtimes)
+- Deploy services using CI/CD tools or shell commands
+- Configure environment variables using secret managers or config layers
+- Set up domains, routing, TLS, and monitoring integrations
+- Clean up legacy or orphaned resources
+- Enforce infrastructure best practices:
+  - Immutable deployments
+  - Rollbacks and blue-green strategies
+  - Never hard-code credentials or tokens
+  - Use managed secrets
+- Use codebase-retrieval to understand existing infrastructure code
+- Use str-replace-editor to implement infrastructure changes
+- Use launch-process to run deployment commands
+
+## ❓ Ask
+- Guide users to ask questions using SPARC methodology
+- Help identify which specialist mode is most appropriate for a given task
+- Translate vague problems into targeted prompts
+- Ensure requests follow best practices:
+  - Modular structure
+  - Environment variable safety
+\
+- Use codebase-retrieval to understand the context of questions
+
+## 📘 Tutorial
+- Guide users through the full SPARC development process
+- Explain how to modularize work and delegate tasks
+- Teach structured thinking models for different aspects of development
+- Ensure users follow best practices:
+  - No hard-coded environment variables
+\
+  - Clear handoffs between different specialist roles
+- Provide actionable examples and mental models for each SPARC methodology role
+- NEVER MONKEY PATCH THINGS. GIVE REAL VALUABLE CODE CONTRIBUTIONS
diff --git a/.env b/.env
@@ -114,12 +114,17 @@ REFRAG_SENSE=heuristic
 GLM_API_KEY=
 # Llama.cpp sidecar (optional)
 # Use docker network hostname from containers; localhost remains ok for host-side runs if LLAMACPP_URL not exported
-LLAMACPP_URL=http://llamacpp:8080
+LLAMACPP_URL=http://host.docker.internal:8081
 LLAMACPP_TIMEOUT_SEC=300
 DECODER_MAX_TOKENS=4000
 REFRAG_DECODER_MODE=prompt  # prompt|soft
 
 REFRAG_SOFT_SCALE=1.0
+LLAMACPP_USE_GPU=1
+LLAMACPP_GPU_LAYERS=32
+LLAMACPP_THREADS=6
+LLAMACPP_GPU_SPLIT=
+LLAMACPP_EXTRA_ARGS=
 
 
 # Operational safeguards and timeouts
@@ -153,3 +158,4 @@ HYBRID_RESULTS_CACHE=128
 HYBRID_RESULTS_CACHE_ENABLED=1
 INDEX_CHUNK_LINES=60
 INDEX_CHUNK_OVERLAP=10
+USE_GPU_DECODER=1
diff --git a/.env.example b/.env.example
@@ -108,17 +108,26 @@ REFRAG_ENCODER_MODEL=BAAI/bge-base-en-v1.5
 REFRAG_PHI_PATH=/work/models/refrag_phi_768_to_dmodel.json
 REFRAG_SENSE=heuristic
 
-# Llama.cpp sidecar (optional; REFRAG_RUNTIME=llamacpp)
+# Llama.cpp sidecar (optional)
+# Docker CPU-only (stable): http://llamacpp:8080
+# Native GPU-accelerated (fast): http://localhost:8081
 LLAMACPP_URL=http://llamacpp:8080
 REFRAG_DECODER_MODE=prompt  # prompt|soft
 
-REFRAG_SOFT_SCALE=1.0
+# GPU Performance Toggle
+# Set to 1 to use native GPU-accelerated server on localhost:8081
+# Set to 0 to use Docker CPU-only server (default, stable)
+USE_GPU_DECODER=0
 
-# GLM API provider (alternative to llamacpp; REFRAG_RUNTIME=glm)
-GLM_API_KEY=
-GLM_API_BASE=https://api.z.ai/api/paas/v4/
-GLM_MODEL=glm-4.6
+REFRAG_SOFT_SCALE=1.0
 
+# Llama.cpp runtime tuning
+LLAMACPP_USE_GPU=0           # Set to 1 to enable Metal/CLBlast acceleration
+# LLAMACPP_GPU_LAYERS=-1     # Override number of layers to offload (defaults to -1 when USE_GPU=1)
+# LLAMACPP_GPU_SPLIT=         # Optional tensor split for multi-GPU setups
+# LLAMACPP_THREADS=           # Override number of CPU threads
+# LLAMACPP_CTX_SIZE=8192      # Context tokens; higher values need more VRAM
+# LLAMACPP_EXTRA_ARGS=        # Additional flags passed verbatim to llama.cpp
 
 # Operational safeguards and timeouts
 # Limit explosion of micro-chunks on huge files (0 to disable)

diff --git a/Dockerfile.llamacpp b/Dockerfile.llamacpp
@@ -19,4 +19,3 @@ RUN mkdir -p /models \
     && if [ -n "$MODEL_URL" ]; then echo "Fetching model: $MODEL_URL" && curl -L --fail --retry 3 -C - "$MODEL_URL" -o /models/model.gguf; else echo "No MODEL_URL provided; expecting host volume /models"; fi
 EXPOSE 8080
 ENTRYPOINT ["/app/server", "--model", "/models/model.gguf", "--host", "0.0.0.0", "--port", "8080", "--no-warmup"]
-
diff --git a/Makefile b/Makefile
@@ -194,7 +194,6 @@ reset-dev-dual: ## bring up BOTH legacy SSE and Streamable HTTP MCPs (dual-compa
 	docker compose run --rm -e INDEX_MICRO_CHUNKS -e MAX_MICRO_CHUNKS_PER_FILE -e TOKENIZER_PATH -e TOKENIZER_URL indexer --root /work --recreate
 	$(MAKE) llama-model
 	docker compose up -d mcp mcp_indexer mcp_http mcp_indexer_http watcher llamacpp
-	# Ensure watcher is up even if a prior step or manual bring-up omitted it
 	docker compose up -d watcher
 	docker compose ps
 
@@ -272,4 +271,3 @@ qdrant-prune:
 
 qdrant-index-root:
 	python3 scripts/mcp_router.py --run "reindex repo"
-
Original file line number	Diff line number	Diff line change
Expand Up		@@ -19,4 +19,3 @@ RUN mkdir -p /models \
		&& if [ -n "$MODEL_URL" ]; then echo "Fetching model: $MODEL_URL" && curl -L --fail --retry 3 -C - "$MODEL_URL" -o /models/model.gguf; else echo "No MODEL_URL provided; expecting host volume /models"; fi
		EXPOSE 8080
		ENTRYPOINT ["/app/server", "--model", "/models/model.gguf", "--host", "0.0.0.0", "--port", "8080", "--no-warmup"]