Skip to content

feat(stable-api): Add ENCODING_GET for string/regexp encoding access #663

@ianks

Description

@ianks

Summary

Add stable API methods for accessing Ruby object encodings, equivalent to the C macros ENCODING_GET and ENCODING_GET_INLINED.

Motivation

Encoding handling is critical for text processing in Ruby extensions. Currently, users must call rb_enc_get() which involves additional overhead. Direct access to the encoding index from the object flags would be more efficient.

Proposed API

/// Get the encoding index from a Ruby object (akin to `ENCODING_GET`).
///
/// Works for String, Regexp, and Symbol objects that store encoding
/// information in their flags.
///
/// # Safety
/// This function is unsafe because it dereferences a raw pointer to get
/// access to underlying Ruby data. The caller must ensure that the pointer
/// is valid and points to an object that has encoding information.
unsafe fn encoding_get(&self, obj: VALUE) -> c_int;

Implementation Notes

  • Encoding index is stored in the object's flags (bits 16-23 typically)
  • Need to handle ENCODING_INLINE_MAX - larger indices require different handling
  • May need to check FL_USHIFT and related flag constants
  • Reference: include/ruby/internal/encoding/encoding.h

Checklist

  • Add method to StableApiDefinition trait
  • Implement for each Ruby version (2.7, 3.0, 3.1, 3.2, 3.3, 3.4, 4.0)
  • Add C fallback in compiled.c
  • Add public macro wrapper in macros.rs
  • Add tests comparing Rust vs C implementation
  • Add to show-asm script for performance verification

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions