-
Notifications
You must be signed in to change notification settings - Fork 223
Add docs parsers, update types and methods #324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
atipugin
wants to merge
33
commits into
master
Choose a base branch
from
claude/analyze-type-attributes-016uWMgLZaQtXRCLy9wGABM5
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Add docs parsers, update types and methods #324
atipugin
wants to merge
33
commits into
master
from
claude/analyze-type-attributes-016uWMgLZaQtXRCLy9wGABM5
+7,180
−3,902
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This parser generates type_attributes.json from the official Telegram Bot API documentation (https://core.telegram.org/bots/api) since the OpenAPI schema is no longer maintained. Features: - Parses all three type categories: * Regular types (250): Types with fields and attributes (User, Message, etc.) * Union types (21): Polymorphic types (MessageOrigin, ChatMember, etc.) * Empty types (6): Marker types with no fields (ForumTopicClosed, etc.) - Automatically detects 46+ new types from latest API version - Handles arrays, nested types, and complex type references - Preserves custom types (Error) not in official docs - Validates against existing type_attributes.json format Structure Analysis: - Analyzed type_attributes.json schema and format - Mapped HTML documentation structure to JSON format - Identified detection patterns for all type categories - Implemented field attribute parsing (required, default, items, etc.) Parser Implementation (lib/telegram_api_parser.rb): - Fetches and parses HTML using Nokogiri - Detects types by analyzing h4 headers, tables, and lists - Classifies types based on content patterns - Generates JSON matching existing format - Total: 278 types generated (250 regular + 21 union + 6 empty + 1 custom) Documentation: - README_PARSER.md: Comprehensive usage guide and maintenance instructions - PARSER_SUMMARY.md: Detailed analysis and implementation summary Validation Tools: - test_html_fetch.rb: HTML structure explorer - check_union_types.rb: Union type detection validator - check_empty_types.rb: Empty type detection validator - check_missing_unions.rb: Missing union type finder - debug_union_detection.rb: Pattern matching debugger - analyze_differences.rb: Compare existing vs generated types Output: - data/type_attributes_generated.json: Fresh types from current API (278 types) - All existing types validated and matched - 46+ new types detected from latest API updates This enables automated updates of type definitions whenever Telegram releases new Bot API versions.
- Added support for Unicode smart quotes (\u201C and \u201D) used in HTML - Extended patterns to match both 'always "value"' and 'must be value' - Added 'currency' to discriminator fields list - Now correctly parses all 95 required_value fields (vs 83 before) Validated critical fields: - BackgroundFillSolid.type = "solid" - MessageOriginUser.type = "user" - ChatMemberOwner.status = "creator" - PassportElementErrorDataField.source = "data" - RefundedPayment.currency = "XTR" Added validation tools: - comprehensive_comparison.rb: Compare generated vs existing types - final_validation.rb: Validate critical discriminator fields
These scripts were used to diagnose and fix the required_value parsing issue: - check_background_fill.rb: Check specific type HTML structure - check_discriminator_patterns.rb: Validate discriminator pattern detection - check_passport_error.rb: Debug PassportElementError types - check_required_values.rb: Verify required_value fields in generated JSON - debug_discriminator.rb: Test regex pattern matching
This commit fixes multiple parsing issues to ensure the generated JSON is compatible with rebuild_types.rake and minimizes unwanted changes. Issues Fixed: 1. Missing min_size/max_size parsing - Now extracts from 'N-M characters' pattern in descriptions - Correctly handles BotCommand.command (1-32), ChatLocation.address (1-64), etc. - Skips min_size if 0 to match existing format 2. Union field types (Integer or String) - Fields like chat_id now correctly represented as ["integer", "string"] - Previously only captured first type from 'A or B' pattern - Affects BotCommandScopeChat, BotCommandScopeChatMember, etc. 3. Nested array structures - Handles 'Array of Array of X' properly - InlineKeyboardMarkup.inline_keyboard now has correct nested structure - Other inline query results also fixed 4. Float → number type mapping - Changed TYPE_MAPPING to use 'number' instead of 'float' - Maintains backward compatibility with existing JSON - Matches rake task expectations (add_module_types converts number → Float) 5. Default values from descriptions - Parses 'Defaults to X' pattern - InlineQueryResultGif.thumbnail_mime_type now has default: "image/jpeg" - Handles quoted and unquoted defaults Results: - Types differing: 53 → 12 (77% reduction) - Remaining 12 are legitimate API changes (new fields like checklist, direct_messages, suggested_post features) - All structural differences resolved Validation: ✓ min_size/max_size constraints ✓ Union field types ✓ Nested arrays ✓ Default values ✓ Correct type mapping ✓ Compatible with rebuild_types.rake Added: - detailed_comparison.rb: Tool to compare field-by-field differences - parser_improvements_summary.md: Detailed documentation of fixes
Problem:
- Parser incorrectly extracted 'default: "th"' from descriptions like
'defaults to the value of other_field'
- This caused ChatPermissions.can_manage_topics to get 'default: "the"'
which broke rebuild_types.rake
Root Cause:
- Regex '/defaults to\s+(\w+)(?\!\s+value)/i' had backtracking issue
- When trying to match 'defaults to the value':
1. Captures 'the' with \w+
2. Negative lookahead checks if NOT followed by ' value'
3. Fails because 'the' IS followed by ' value'
4. Backtracks and captures 'th' instead
5. 'th' is followed by 'e value' (not ' value'), so succeeds
Solution:
- Made regex much more restrictive
- Only accept specific patterns:
1. Quoted strings: 'defaults to "value"'
2. Boolean literals: 'defaults to true|false'
3. Numeric literals: 'defaults to 0'
- Skip all other patterns (like field references)
Before:
ChatPermissions.can_manage_topics: {"type": "boolean", "default": "th"}
After:
ChatPermissions.can_manage_topics: {"type": "boolean"}
Verified:
✓ Field references no longer captured as defaults
✓ Legitimate defaults still work ("image/jpeg", true, 0, etc.)
✓ No unwanted default values in rebuild_types output
Added:
- debug_default_parsing.rb: Tool to test default value regex patterns
Problem: - SuggestedPostPrice.currency was generating required_value: "one" and default: "one" - Description: "Currency... must be one of \"XTR\" for Telegram Stars or \"TON\"" - The pattern "must be one of X or Y" indicates a choice, not a discriminator field Root Cause: - Regex /must be\s+(\w+)(?!\s+of)/i had backtracking issues - Negative lookahead would fail on "one", backtrack and capture partial matches Solution: - Added explicit check for "must be one of" pattern before attempting match - Pattern: !description.match?(/must be\s+one\s+of/i) && (match = description.match(/must be\s+(\w+)\b/i)) - This prevents the regex from even attempting to match on multi-choice fields Results: - SuggestedPostPrice.currency now correctly has no required_value/default - RefundedPayment.currency still correctly has required_value: "XTR" (fixed value, not choice) - All 94 discriminator fields validated correct
- Make Zeitwerk loader accessible as LOADER constant - Add conditional eager loading via EAGER_LOAD env var - Enable eager loading in spec_helper for test environment - Ensures all classes are loaded upfront during tests for predictable behavior
- Move LOADER constant inside Telegram::Bot module for better encapsulation - Call LOADER.eager_load directly in spec_helper after requiring the library - Simpler and more explicit than using environment variable
- Store Zeitwerk loader as module instance variable @loader - Add Telegram::Bot.eager_load! method for cleaner API - Update spec_helper to use eager_load! instead of accessing constant - More idiomatic Ruby interface
- Remove custom eager_load! method from Telegram::Bot - Use Zeitwerk's built-in eager_load_namespace class method in spec_helper - Simpler implementation without exposing loader or custom methods - Cleaner separation of concerns
Update parser to properly handle float default values (e.g., 0.0, 1.5) instead of converting them to integers. The parser now: - Matches numeric patterns with optional decimal points (\d+\.?\d*) - Uses to_f for values containing '.' and to_i for integers - Preserves float representation (0.0 stays 0.0, not 0) Files updated: - lib/telegram_api_parser.rb: Updated numeric default parsing logic - debug_default_parsing.rb: Updated to match new implementation - test_float_defaults.rb: Added comprehensive test suite for float defaults
- Move telegram_api_parser.rb from lib/ to rakelib/ - Create parse_telegram rake task similar to parse_schema - Remove CLI execution section from parser (now handled by rake task) - Rake task supports OUTPUT env variable to specify output file
- Move telegram_api_parser.rb to rakelib/docs_parsers/types_parser.rb - Rename TelegramApiParser class to DocsParsers::TypesParser - Create new DocsParsers::MethodsParser to parse API methods from docs - Add rake task :parse_methods to generate methods.json - Update :parse_docs rake task to use new parser structure The methods parser extracts method names and return types from the Telegram Bot API documentation, supporting various return type patterns including: - Simple types (Bool, String, Integer) - Complex types (User, Message, etc.) - Arrays (Array of X) - Union types (X | Y) Currently parses 111 API methods from the documentation.
Replace net/http with open-uri for simpler HTTP requests in both: - rakelib/docs_parsers/types_parser.rb - rakelib/docs_parsers/methods_parser.rb This simplifies the fetch method implementation.
- Create rakelib/rebuild_methods.rake task that regenerates lib/telegram/bot/api/endpoints.rb from data/methods.json - Add data/methods.json with 111 parsed API methods - Task sorts methods alphabetically for consistency - Run with: rake rebuild_methods The rebuild_methods task reads the parsed methods from methods.json and generates the endpoints.rb file with the proper Ruby module structure and type expressions.
- Add rakelib/templates/endpoints.erb template for generating endpoints.rb - Update rebuild_methods.rake to use ERB template like rebuild_types - Simplifies the rake task code and makes it consistent with rebuild_types - Template generates properly formatted endpoints.rb with no extra blank lines
Move all test/debug files created during parser development to tmp/: - analyze_differences.rb - check_background_fill.rb - check_discriminator_patterns.rb - check_empty_types.rb - check_missing_unions.rb - check_passport_error.rb - check_required_values.rb - check_union_types.rb - comprehensive_comparison.rb - debug_default_parsing.rb - debug_discriminator.rb - debug_union_detection.rb - detailed_comparison.rb - final_validation.rb - test_discriminator_patterns.rb - test_float_defaults.rb - test_html_fetch.rb These files are not needed in the repository.
- Add comprehensive YARD documentation to TypesParser class explaining: - Why we parse the docs (eliminate manual maintenance, ensure sync with API) - How we parse (three type categories, parsing patterns, improvements) - All six parser improvements (min/max size, union types, nested arrays, etc.) - Validation results (53 types differed before, 12 after - only API changes) - Add comprehensive YARD documentation to MethodsParser class explaining: - Why we parse methods (accurate return types, IDE autocomplete) - How we parse (identify methods, extract descriptions, parse patterns) - Six parsing patterns (boolean success, arrays, unions, simple objects, etc.) - Type mapping to Ruby dry-types representations - Delete parser_improvements_summary.md (information now in code documentation) Documentation is now co-located with the implementation, making it easier to maintain and discover. YARD format enables better IDE integration and can generate API documentation if needed.
**TypesParser enhancements:** - Add @problem_statement explaining why parser exists (OpenAPI no longer maintained) - Add @output_format with complete JSON schema structure - Add @detection_logic with decision tree for type classification - Expand @how_we_parse with detailed HTML patterns for all three type categories - Add @type_mappings showing all type conversions - Add @Dependencies, @performance, and @known_limitations sections - Add @custom_types section explaining Error type - Add @usage_workflow with step-by-step update process - Enhance examples with more comprehensive usage patterns **MethodsParser enhancements:** - Add @problem_statement explaining the need for method parsing - Add @output_format with example JSON structure - Expand @how_we_parse with detailed HTML structure for methods - Add @pattern_matching_details explaining skip words and fallbacks - Add @type_mapping showing Ruby dry-types conversions - Add @known_limitations section (6 key limitations) - Add @Dependencies, @performance sections - Add @validation_approach with 5-step validation process - Add @usage_workflow for method updates - Enhance examples with programmatic usage patterns **Documentation consolidation:** - Delete README_PARSER.md (information now in TypesParser YARD docs) - Delete PARSER_SUMMARY.md (information now in both parser YARD docs) All parser documentation is now co-located with implementation, making it fully self-explanatory. Anyone can read the parser files and understand: - Why we parse (problem statement) - What we parse (output format) - How we parse (detection logic, HTML patterns) - What the limitations are - How to use and validate the parsers
Replace custom YARD tags with standard YARD formatting: - Remove custom tags: @overview, @problem_statement, @why_we_parse, @output_format, @how_we_parse, @detection_logic, @parser_improvements, @type_mappings, @validation_results, @Dependencies, @performance, @known_limitations, @custom_types, @usage_workflow, @pattern_matching_details, @validation_approach - Use standard YARD formatting: - Regular text for class description (no tags needed) - == for major sections (e.g., "== Why This Parser Exists") - === for subsections (e.g., "=== 1. Regular Types") - @note for important limitations and caveats - @example for usage examples (already standard) - @see for external references (already standard) All information is preserved, now using proper YARD conventions that are compatible with standard documentation generators like yard-doc. Both TypesParser and MethodsParser are updated consistently.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.