Skip to content

Update AI infrastructure: CLAUDE.md, .gitignore#2938

Open
jamesblackburn wants to merge 4 commits intomasterfrom
jb/2_infra_claude_docs
Open

Update AI infrastructure: CLAUDE.md, .gitignore#2938
jamesblackburn wants to merge 4 commits intomasterfrom
jb/2_infra_claude_docs

Conversation

@jamesblackburn
Copy link
Copy Markdown
Collaborator

@jamesblackburn jamesblackburn commented Feb 25, 2026

Update AI infrastructure: CLAUDE.md, build system, and review skill

Improves developer/AI workflow documentation, fixes build parallelism, and aligns the code
review skill with documentation requirements.

Changes

CLAUDE.md

  • Add user-facing documentation guidance (NumPy docstring format, tutorials, mkdocs
    checklist)
  • Expand ASV benchmarking instructions: release build requirement, run-from-root convention,
    available suites, --python=$(which python) usage
  • Add test-driven development guidelines (write failing test first)
  • Add git workflow guidelines (confirm before pushing)
  • Add branch work log convention
  • Fix code formatter guidance to scope to branch-changed files only
  • Remove manual CMAKE_BUILD_PARALLEL_LEVEL=16 from wheel build command (now set
    automatically)

cpp/CMake/CpuCount.cmake

  • Replace RAM warning with actual RAM-based job limiting: caps parallel jobs at RAM_MB / 3000 (~3 GB per compiler process) to prevent OOM during builds

setup.py

  • Fix cmake output parsing to handle the -- CMAKE_BUILD_PARALLEL_LEVEL= prefix emitted by
    CMake STATUS messages

docs/claude/PR_REVIEW_GUIDELINES.md

  • Add §21 DOCUMENTATION covering: NumPy-format docstring completeness, tutorial requirements
    for complex features, mkdocs.yml nav updates, edge case/limitation coverage, and
    docs/claude/ update obligations when modifying documented areas
  • Expand the Documentation checklist from 2 items to 6, aligning it with the requirements in
    CLAUDE.md

.gitignore

  • Broaden .ipynb_checkpoints/ pattern to match at any directory level

.cproject

  • Remove redundant <pathentry kind="src" path=""/> line

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benefits:

  • A bunch of ArcticDB based edge-case handling for claude to worry / think about.
  • Runs the different analyses in parallel in sub-agents

ArcticDB is a high-performance, serverless DataFrame database for the Python Data Science ecosystem. It provides a Python API backed by a C++ data-processing and compression engine, supporting S3, LMDB, Azure Blob Storage, and MongoDB backends.

## Claude-Maintained Documentation
## Documentation
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brought the doc updates section together: dev docs and user-docs.

@jamesblackburn jamesblackburn added documentation Improvements or additions to documentation minor Feature change, should increase minor version labels Feb 25, 2026
@jamesblackburn jamesblackburn force-pushed the jb/2_infra_claude_docs branch 2 times, most recently from b9ca1b2 to 30a0f61 Compare February 25, 2026 19:58
ARCTICDB_PROTOC_VERS=4 CMAKE_BUILD_PARALLEL_LEVEL=16 ARCTIC_CMAKE_PRESET=linux-debug pip install -ve .
```

To install packages which aren't available internally, use the following custom index:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to resolve specific package versions


## Benchmarking

**IMPORTANT: Always use a release build for benchmarking.** Debug builds have 10-30x overhead from disabled optimizations, assertions, and unoptimized template instantiation (e.g. sparrow/Arrow type system). Use `ARCTIC_CMAKE_PRESET=linux-release` for both C++ and Python benchmarks.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is important or it sometimes runs benchmarks with a debug build

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Feb 27, 2026

ArcticDB Code Review Summary

This PR makes documentation and infrastructure changes to CLAUDE.md, .gitignore, PR_REVIEW_GUIDELINES.md, CpuCount.cmake, setup.py, and .cproject. No public Python API, storage backend, or on-disk format changes.

What Changed (all commits)

  • 423cd39: CLAUDE.md overhaul - user-facing doc guidance, ASV benchmarking, TDD workflow, branch work-log, scoped formatter command; .gitignore broadens .ipynb_checkpoints/ pattern; .cproject cleanup
  • 3aac516: CpuCount.cmake cap parallelism at RAM/3000 MB; setup.py fix cmake output parser (STATUS prefix); CLAUDE.md remove hardcoded CMAKE_BUILD_PARALLEL_LEVEL=16
  • 8f4e8d2: PR_REVIEW_GUIDELINES.md - add section 21 DOCUMENTATION; expand Documentation checklist from 2 items to 6
  • 6e65a7d (delta): Merge master into branch - no new PR code; brings in upstream commits 33f2e6a, 0f2e921, 9a5eb28, 4692ff8 (already reviewed and merged to master separately)

Delta Analysis (6e65a7d - merge of master)

No PR-owned code changed. The merge incorporates four master commits with no merge conflicts. Checklist items are unaffected by this delta.

Previously Flagged Issues

  • Benchmark suite names (CLAUDE.md line 224): addressed in earlier commits; ASV section simplified with a link to the Wiki.
  • xargs -r macOS incompatibility (line 261): acknowledged, acceptable - primary development is on Linux.
  • CMAKE_BUILD_PARALLEL_LEVEL=16 in build commands: fully removed; CpuCount.cmake now computes the appropriate level from available RAM.

Checklist

API & Compatibility

  • No breaking changes to public Python API (N/A - docs/build only)
  • On-disk format unchanged (N/A)
  • Protobuf schema backwards-compatible (N/A)

Memory & Safety

  • All memory/safety items (N/A - no C++ logic changes in this PR)

Correctness

  • CpuCount.cmake CI fix: STATUS to NOTICE routes informational message to stderr, leaving parseable stdout for GITHUB_ENV
  • setup.py output parser correctly handles STATUS prefix via string split after line filtering
  • CMAKE_BUILD_PARALLEL_LEVEL=16 fully removed from all CLAUDE.md build commands (auto-computed by CpuCount.cmake)

Code Quality

  • Section 21 content is internally consistent and aligns with CLAUDE.md documentation guidance
  • Section 2 cross-reference to section 21 correctly guides reviewers to the full format spec
  • Checklist expansion (2 to 6 items) matches CLAUDE.md documentation checklist exactly

Testing

  • No behavioural changes requiring tests (N/A - docs/build only)

Build & Dependencies

  • No CMakeLists dependency changes (N/A)
  • CpuCount.cmake change backward-compatible (NOTICE available since CMake 3.15+)

Security

  • No hardcoded credentials or secrets

PR Title & Description

  • Title clear and concise
  • Description covers all changed files
  • No API/breaking changes to call out

Documentation

  • New/modified public methods have complete NumPy-format docstrings - N/A (no Python API changes)
  • Complex or multi-use features have a tutorial - N/A
  • docs/mkdocs/mkdocs.yml nav updated for any new docs pages - N/A
  • Edge cases and limitations documented - section 21 explicitly adds this as a review obligation
  • docs/claude/ technical doc updated where code in a documented area changed - N/A (no C++ logic); PR_REVIEW_GUIDELINES.md itself is the docs/claude/ change

@jamesblackburn jamesblackburn changed the title Update AI infrastructure: CLAUDE.md, .gitignore, code review skill Update AI infrastructure: CLAUDE.md, .gitignore Feb 27, 2026
Copy link
Copy Markdown
Collaborator

@maxim-morozov maxim-morozov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of comments, but LGTM

@jamesblackburn jamesblackburn force-pushed the jb/2_infra_claude_docs branch from 0378f3f to d86f4aa Compare March 2, 2026 08:59

---

## 21. DOCUMENTATION
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added documentation review for the code-review skill

if(${_needed_mb} GREATER ${_physical_mb})
message(WARNING "We recommend 1500MB of physical RAM per processor core. We found ${_proc_count} cores and\
${_physical_mb}MB of RAM. Use the CMAKE_BUILD_PARALLEL_LEVEL environment variable to avoid thrashing.")
# C++ compiler processes can use up to 3GB each; cap parallelism to avoid OOM
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix periodic OOM on large-core client machine. Consider machine memory for parallel builds by default. cpp compiler commands can use op to 3GB

@jamesblackburn jamesblackburn force-pushed the jb/2_infra_claude_docs branch from 5463ba0 to 8d12ad7 Compare March 2, 2026 16:48
- Add documentation guidance: NumPy docstring format, tutorial and
  mkdocs.yml checklist for new features
- Expand ASV benchmarking section with suite names, env vars, and
  run-from-root/--python caveat; simplify by removing general ASV
  knowledge Claude already has
- Remove CMAKE_BUILD_PARALLEL_LEVEL from build commands (now automatic)
- Add TDD guidelines, git workflow (confirm before push), branch work
  log convention, and code formatter scoping guidance
- Broaden .ipynb_checkpoints/ gitignore pattern to any directory level
Replace the RAM warning with actual job limiting: caps parallelism at
RAM_MB/3000 (~3GB per compiler process) to prevent OOM. Uses
message(NOTICE) for the informational message so it goes to stderr and
doesn't break the CI step that pipes cmake stdout to \$GITHUB_ENV.
Updates setup.py to parse the cmake STATUS message prefix correctly.
Removes redundant .cproject source path entry.
Add §21 DOCUMENTATION covering NumPy-format docstring completeness,
tutorial requirements for complex features, mkdocs.yml nav updates,
edge case/limitation coverage, and docs/claude/ update obligations.
Expand the checklist from 2 items to 6 to align with CLAUDE.md.
@jamesblackburn jamesblackburn force-pushed the jb/2_infra_claude_docs branch from 8d12ad7 to 8f4e8d2 Compare March 2, 2026 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation minor Feature change, should increase minor version

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants