Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Sep 27, 2025

This PR adds comprehensive Doxygen-style documentation to the core TritonBLAS API files as requested in issue #X. The documentation provides complete API reference material for both users and contributors.

Files Updated

include/tritonblas/origami.py

  • Module documentation: Added detailed file header explaining the hardware-aware optimization purpose and architecture
  • Constants: Documented dtype_to_str mapping with inline comments for each PyTorch data type
  • MatmulHeuristicResult class: Complete class documentation including:
    • Comprehensive overview of optimization algorithms and hardware considerations
    • Supported AMD GPU architectures (gfx950, gfx942/MI300X/MI300A, gfx908/MI200)
    • Matrix dimensions, data types, and execution modes explanation
  • Methods: Full documentation for all public and private methods with detailed parameter descriptions, algorithm explanations, and cross-references

include/tritonblas/matmul.py

  • Module documentation: Added detailed file header explaining the high-level API features and architecture
  • Constants: Documented all module-level variables including device properties, buffer sizes, and pre-allocated tensors
  • Functions: Complete documentation for all functions including:
    • _make_matmul_selector(): LRU caching mechanism and performance benefits
    • persistent_matmul_lt() and streamk_matmul_lt(): Detailed execution mode comparisons
    • matmul_lt() and matmul(): User-facing APIs with usage examples and code snippets

Documentation Features

  • Doxygen compliance: Uses proper Doxygen syntax (@brief, @param, @return, @details, @see, @throws, @pre, @note)
  • Cross-references: Extensive linking between related functions and classes for easy navigation
  • Performance insights: Detailed explanations of optimization algorithms, caching benefits, and hardware-specific considerations
  • Practical examples: Code snippets demonstrating proper API usage
  • Hardware compatibility: Complete AMD GPU architecture support matrix with data type compatibility

Usage Example

The documentation now enables users to understand the full API:

import torch
import tritonblas

# Create matrices
A = torch.randn(1024, 512, device="cuda", dtype=torch.float16)
B = torch.randn(512, 1024, device="cuda", dtype=torch.float16) 
C = torch.zeros(1024, 1024, device="cuda", dtype=torch.float16)

# High-level API with automatic optimization
result = tritonblas.matmul(A, B, C)

# Enable Stream-K for better load balancing
result = tritonblas.matmul(A, B, C, enable_streamk=True)

This documentation will significantly improve the developer experience and reduce the learning curve for new contributors to the TritonBLAS project.

Fixes #X

Original prompt

This section details on the original issue you should resolve

<issue_title>[Documentation]: Need In-Code Function/API Documentation</issue_title>
<issue_description>### Description of errors

Document the code for the following files (use doxygen standard):

Comments on the Issue (you are @copilot in this section)

Fixes #7

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 2 commits September 27, 2025 04:39
Co-authored-by: neoblizz <9790745+neoblizz@users.noreply.github.com>
Co-authored-by: neoblizz <9790745+neoblizz@users.noreply.github.com>
Copilot AI changed the title [WIP] [Documentation]: Need In-Code Function/API Documentation [Documentation]: Add comprehensive Doxygen-style documentation to origami.py and matmul.py Sep 27, 2025
Copilot AI requested a review from neoblizz September 27, 2025 04:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Documentation]: Need In-Code Function/API Documentation

2 participants