ADD RWKV7 #2421

pass-lin · 2025-09-28T15:11:13Z

RWKV7 is one of the strongest RNN models available today, and we now provide a full implementation for it in keras_hub.

📚 References

🔗 Pre-trained Checkpoints (ModelScope)

Numerical-verification and Inference Example notebook

This is the first modern RNN architecture in keras_hub. With the resurgence of recurrent models, more pre-trained RNN backbones will follow; hence this PR also serves as a reference implementation for future work.

Current progress

[✅] backbone implementation
[✅] checkpoint-conversion script
[✅] tokenizer implementation
[✅] unit tests / examples
[✅] complete CausalLM task wrapper
[✅] Add document

gemini-code-assist · 2025-09-28T15:11:32Z

Summary of Changes

Hello @pass-lin, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the keras_hub library by integrating the RWKV7 model, a cutting-edge recurrent neural network. This addition not only provides a robust new model for users but also serves as a foundational reference implementation, encouraging the future inclusion of more modern RNN architectures within the library.

Highlights

RWKV7 Model Integration: Introduced the RWKV7 model, a powerful RNN architecture, into keras_hub, marking a significant expansion of the library's capabilities.
Comprehensive Implementation: The pull request includes a full implementation of the RWKV7 backbone, its dedicated tokenizer, and a causal language model wrapper.
Foundation for Modern RNNs: This is the first modern RNN architecture added to keras_hub, serving as a reference implementation and paving the way for future recurrent model additions.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This PR introduces the RWKV-7 model, a powerful RNN architecture, to keras_hub. The contribution is significant and includes the backbone, tokenizer, preprocessor, an incomplete task model, and a checkpoint conversion script. The implementation follows the modular structure of keras_hub.

However, there are several critical issues that must be addressed before this PR can be merged:

Missing Tests: The PR lacks unit tests for all new components. According to the contribution guidelines, testing is a mandatory requirement.[^1]
Incomplete CausalLM Task: The RWKV7CausalLM task model is a stub with TODOs, making it non-functional for generation.
Critical Bugs: There are critical bugs in the tokenizer and preprocessor implementations that will cause runtime errors.
Style Guide Violations: There are numerous style guide violations, including a filename typo, missing docstrings, and inconsistencies with the recommended model input structure.

I've left detailed comments on these issues. Once these are resolved, this will be a great addition to the library.

keras_hub/src/models/rwkv7/rwkv7_tokenizer.py

keras_hub/src/models/rwkv7/rwkv7_casual_lm.py

gemini-code-assist · 2025-09-28T15:14:58Z

tools/checkpoint_conversion/convert_rwkv7_checkpoints.py

+from modelscope import snapshot_download
+
+from keras_hub.src.models.rwkv7.rwkv7_backbone import RWKV7Backbone
+from keras_hub.src.models.rwkv7.rwkv7_casual_lm import RWKV7CausalLM


There is a typo in the import path: rwkv7_casual_lm should be rwkv7_causal_lm. This will cause an ImportError.

Suggested change

from keras_hub.src.models.rwkv7.rwkv7_casual_lm import RWKV7CausalLM

from keras_hub.src.models.rwkv7.rwkv7_causal_lm import RWKV7CausalLM

keras_hub/src/models/rwkv7/rwkv7_causal_lm_preprocessor.py

keras_hub/src/models/rwkv7/rwkv7_tokenizer.py

gemini-code-assist · 2025-09-28T15:14:59Z

keras_hub/src/models/rwkv7/rwkv7_tokenizer.py

+        "keras_hub.models.RWKVTokenizer",
+    ]
+)
+class RWKVTokenizer(tokenizer.Tokenizer):


The RWKVTokenizer class is missing a docstring. Please add a comprehensive docstring that explains the tokenizer, its arguments, and includes usage examples, as per the style guide.¹

Style Guide References

Footnotes

All public classes must have Google-style docstrings with a summary, examples, and documentation for parameters. ↩

gemini-code-assist · 2025-09-28T15:14:59Z

keras_hub/src/models/rwkv7/rwkv7_layer.py

+            - 0.5
+        )  # soft-clamp to (-inf, -0.5)


This hardcoded value - 0.5 is a magic number. It would be better to define it as a named constant at the top of the file or as a class attribute to improve readability and maintainability.

keras_hub/src/models/rwkv7/rwkv7_backbone.py

gemini-code-assist · 2025-09-28T15:15:00Z

keras_hub/src/models/rwkv7/rwkv7_causal_lm_preprocessor.py

+
+
+@keras_hub_export("keras_hub.models.RWKV7CausalLMPreprocessor")
+class RWKV7CausalLMPreprocessor(CausalLMPreprocessor):


The RWKV7CausalLMPreprocessor class is missing a docstring. Please add a Google-style docstring explaining its purpose, parameters, and include a usage example, as specified in the style guide.¹

Style Guide References

Footnotes

All public classes must have Google-style docstrings with a summary, examples, and documentation for parameters. ↩

keras_hub/src/models/rwkv7/rwkv7_backbone.py

pass-lin · 2025-10-19T02:28:53Z

@divyashreepathihalli @mattdangerw Anybody review my code?

pass-lin · 2025-10-20T10:17:06Z

@sachinprasadhs

gemini-code-assist

Code Review

This pull request introduces a comprehensive implementation of the RWKV-7 model, a modern RNN architecture, into keras_hub. The contribution is well-structured, following the repository's modular design with a backbone, causal LM task, preprocessor, tokenizer, and a checkpoint conversion script. The code is generally of high quality.

My review highlights a few areas for improvement. There are critical bugs in the tokenizer's asset loading and saving logic that need to be addressed. The custom layers in rwkv7_layer.py have inconsistent return signatures, which could lead to runtime errors and makes the code harder to maintain. Additionally, the test files do not use the standardized helper methods from TestCase as required by the repository's style guide, which is a significant deviation. I have also pointed out some minor typos and opportunities for code clarification.

Overall, this is an excellent and valuable addition. Addressing the feedback will improve the robustness, correctness, and maintainability of the new model.

keras_hub/src/models/rwkv7/rwkv7_tokenizer.py

gemini-code-assist · 2025-10-20T10:28:54Z

keras_hub/src/models/rwkv7/rwkv7_tokenizer.py

+        """
+        path = os.path.join(dir_path, VOCAB_FILENAME)
+        with open(path, "r", encoding="utf-8") as f:
+            vocabulary = f.readlines()


The load_assets method uses f.readlines(), which keeps trailing newline characters (\n). This will cause the int() conversion to fail when parsing the vocabulary file because int() cannot parse a string with a trailing newline (e.g., int('3\n')). You should strip the newlines when reading the file.

Suggested change

vocabulary = f.readlines()

vocabulary = f.read().splitlines()

gemini-code-assist · 2025-10-20T10:28:54Z

keras_hub/src/models/rwkv7/rwkv7_layer.py

+        if train_mode:
+            return x, v_first
+        return x, v_first, last_cache_x, finnal_state


The call method in RWKV7_TimeMix has two issues:

Inconsistent return signature: It returns a different number of values depending on train_mode, which can cause runtime errors.¹

Typo: The variable finnal_state is misspelled. It should be final_state.

To fix this, the method should always return the same number of values, and the typo should be corrected. This change should also be propagated to where finnal_state is computed on line 448.

Suggested change

if train_mode:

return x, v_first

return x, v_first, last_cache_x, finnal_state

if train_mode:

return x, v_first, None, None

return x, v_first, last_cache_x, finnal_state

Rules References

Footnotes

Methods should have a consistent return signature to avoid errors and improve predictability. Returning None for values not computed in a certain mode is a good practice. ↩

this is error issue

gemini-code-assist · 2025-10-20T10:28:54Z

keras_hub/src/models/rwkv7/rwkv7_layer.py

+        if train_mode:
+            return output
+        return output, last_cache_x


The call method in RWKV7_ChannelMix has an inconsistent return signature. It returns one value in training mode and two in inference mode. This can lead to errors and makes the layer harder to use. The method should always return the same number of values, using None for outputs that are not computed in a particular mode.¹

Suggested change

if train_mode:

return output

return output, last_cache_x

if train_mode:

return output, None

return output, last_cache_x

Rules References

Footnotes

Methods should have a consistent return signature to avoid errors and improve predictability. Returning None for values not computed in a certain mode is a good practice. ↩

The method in has an inconsistent return signature. It returns one value in training mode and two in inference mode. This can lead to errors and makes the layer harder to use. The method should always return the same number of values, using for outputs that are not computed in a particular mode.call``RWKV7_ChannelMix``None1

Rules References

Footnotes

Methods should have a consistent return signature to avoid errors and improve predictability. Returning for values not computed in a certain mode is a good practice. ↩None

this is error issue

keras_hub/src/models/rwkv7/rwkv7_layer.py

gemini-code-assist · 2025-10-20T10:28:55Z

keras_hub/src/models/rwkv7/rwkv7_layer.py

+
+        Args:
+            x: Input tensor.
+            last_cache_x: Cached previous values.


The sorted variable is initialized and appended to but never used. It should be removed to avoid confusion and unnecessary code.

Suggested change

last_cache_x: Cached previous values.

# sorted = [] # must be already sorted

gemini-code-assist · 2025-10-20T10:28:55Z

keras_hub/src/models/rwkv7/rwkv7_backbone.py

+        """Initialize RWKV7 backbone.
+
+        Args:
+            hidden_size: Hidden dimension size.
+            head_size: Attention head size.
+            num_layers: Number of RWKV blocks.
+            vocabulary_size: Size of vocabulary.
+            intermediate_dim: Intermediate dimension for FFN.
+            gate_lora: LoRA dimension for gating.
+            mv_lora: LoRA dimension for value mixing.
+            aaa_lora: LoRA dimension for alpha parameters.
+            decay_lora: LoRA dimension for decay parameters.
+            dtype: Data type for the layer.
+            dropout_rate: Dropout rate for regularization.
+            **kwargs: Additional arguments.
+        """


The __init__ method's docstring duplicates information already present in the class-level docstring. According to the style guide's example for backbones, the __init__ method should not have a separate docstring.¹ Removing this will make the code more concise and align it with the repository's conventions.

Style Guide References

Footnotes

The style guide example for a backbone class shows arguments documented in the class docstring, not in the __init__ method. ↩

keras_hub/src/models/rwkv7/rwkv7_causal_lm.py

gemini-code-assist · 2025-10-20T10:28:55Z

keras_hub/src/models/rwkv7/rwkv7_causal_lm.py

+            the final hidden representation of the input tokens, and `cache` is
+            the decoding cache.
+        """
+        state_cachce, last_token_cache = cache


There is a typo in the variable name state_cachce. It should be state_cache. This typo appears in multiple places within call_with_cache and _build_cache.

Suggested change

state_cachce, last_token_cache = cache

state_cache, last_token_cache = cache

tools/checkpoint_conversion/convert_rwkv7_checkpoints.py

pass-lin · 2025-10-29T07:39:28Z

@mattdangerw @divyashreepathihalli @sachinprasadhs
It's been almost a month, is there anyone willing to pay attention to me?

sachinprasadhs · 2025-11-04T21:13:41Z

Apologies for the delay in review, taking a look into this. Will add my comments.

sachinprasadhs · 2025-11-04T21:13:49Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces the RWKV-7 model, a modern RNN architecture, to keras_hub. The implementation is comprehensive, covering the backbone, causal LM task, preprocessor, tokenizer, and a checkpoint conversion script. The code is well-structured and follows the modular design principles of the repository.

My review focuses on ensuring adherence to the repository's style guide, particularly regarding testing practices and code style conventions. I've identified several areas for improvement:

The testing for the new components should be updated to use the standardized helper methods from the base TestCase. Some tests also contain incorrect assertions.
There are a few deviations from the coding style, such as the use of type hints in function signatures and a few hardcoded values that could be made more flexible.
The backbone implementation should be updated to accept a padding_mask as input, aligning with the repository's conventions.

Addressing these points will improve the consistency, correctness, and maintainability of the new model. Overall, this is a great contribution, adding a powerful and interesting architecture to the library.

gemini-code-assist · 2025-11-04T21:19:43Z

keras_hub/src/models/rwkv7/rwkv7_backbone.py

+        token_id_input = keras.Input(
+            shape=(None,), dtype="int32", name="token_ids"
+        )
+
+        padding_mask = ops.not_equal(token_id_input, 0)
+
+        x = self.token_embedding(token_id_input)
+        padding_mask = ops.cast(padding_mask, dtype=x.dtype)


The model should accept padding_mask as an input instead of deriving it from token_ids. This hardcodes the assumption that the padding token ID is 0 and deviates from the repository's style guide for backbone models.¹

Please update the model to accept padding_mask as a keras.Input and also update the super().__init__ call to include it in the inputs dictionary.

Suggested change

token_id_input = keras.Input(

shape=(None,), dtype="int32", name="token_ids"

)

padding_mask = ops.not_equal(token_id_input, 0)

x = self.token_embedding(token_id_input)

padding_mask = ops.cast(padding_mask, dtype=x.dtype)

token_id_input = keras.Input(

shape=(None,), dtype="int32", name="token_ids"

)

padding_mask_input = keras.Input(

shape=(None,), dtype="int32", name="padding_mask"

)

x = self.token_embedding(token_id_input)

padding_mask = ops.cast(padding_mask_input, dtype=x.dtype)

Style Guide References

Footnotes

Backbone models should accept standardized input names like token_ids and padding_mask to ensure interoperability. ↩

Make padding_mask askeras.Input, let's not assume the padding token id will be always 0 in all the use case scenarios, which would make it difficult for the users to switch from different backbone to this.

No problem, I asked the author, it can be hardcoded.

gemini-code-assist · 2025-11-04T21:19:43Z

keras_hub/src/models/rwkv7/rwkv7_backbone_test.py

+    def test_backbone_basics(self):
+        """
+        Test basic functionality of the RWKV7 backbone.
+        """
+        y = self.backbone(self.input_data)
+        self.assertEqual(y.shape, (2, 5, 10))
+
+    def test_num_parameters(self):
+        """
+        Test that the model has the expected number of parameters.
+        """
+        self.assertEqual(self.backbone.count_params(), 10208)


The tests should use the standardized test routines provided by TestCase as required by the style guide.¹ Please replace the custom test logic with calls to self.run_backbone_test() and self.run_model_saving_test(). This ensures consistency and covers more test cases automatically, such as variable input shapes and serialization. You will also need to add import pytest.

Suggested change

def test_backbone_basics(self):

"""

Test basic functionality of the RWKV7 backbone.

"""

y = self.backbone(self.input_data)

self.assertEqual(y.shape, (2, 5, 10))

def test_num_parameters(self):

"""

Test that the model has the expected number of parameters.

"""

self.assertEqual(self.backbone.count_params(), 10208)

def test_backbone_basics(self):

self.run_backbone_test(

cls=RWKV7Backbone,

init_kwargs=self.init_kwargs,

input_data=self.input_data,

expected_output_shape=(2, 5, 10),

)

def test_saved_model(self):

self.run_model_saving_test(

cls=RWKV7Backbone,

init_kwargs=self.init_kwargs,

input_data=self.input_data,

)

Style Guide References

Footnotes

The style guide requires using helper methods like self.run_backbone_test() and self.run_model_saving_test() for standardized testing of backbones. ↩

Address this comment from Gemini.

After modifying it this way, fp16 will fail, but I cannot reproduce this error.

This is our standard way of testing the backbone. We should try to identify the issue than coming up with the workaround.

gemini-code-assist · 2025-11-04T21:19:44Z

keras_hub/src/models/rwkv7/rwkv7_causal_lm_preprocessor_test.py

+    def test_preprocessor_basics(self):
+        result = self.preprocessor(x=["hello world hello world hello world"])
+        self.assertAllEqual(
+            result[0], [[0, 0, 0, 0, 0, 0, 4, 1, 5, 1, 4, 1, 5, 1, 4, 1]]
+        )
+        self.assertAllEqual(
+            result[1], [[0, 0, 0, 0, 0, 4, 1, 5, 1, 4, 1, 5, 1, 4, 1, 5]]
+        )
+        self.assertAllEqual(
+            result[2],
+            [
+                [
+                    False,
+                    False,
+                    False,
+                    False,
+                    False,
+                    True,
+                    True,
+                    True,
+                    True,
+                    True,
+                    True,
+                    True,
+                    True,
+                    True,
+                    True,
+                    True,
+                ]
+            ],
+        )


There are a couple of issues here:

The test assertions for the output shapes appear to be incorrect. For a sequence_length of 15, the call method adjusts it to 17. The resulting sample_weight (result[2]) should have a length of 16, but the test asserts a length of 17.

The style guide recommends using the self.run_preprocessor_test() helper for testing preprocessors, which is not used here.¹

Please correct the assertions and refactor the test to use the standard test helper.

Style Guide References

Footnotes

The style guide requires using self.run_preprocessor_test() for testing preprocessors. ↩

The Gemini number is incorrect

gemini-code-assist · 2025-11-04T21:19:44Z

keras_hub/src/models/rwkv7/rwkv7_causal_lm_test.py

+    def test_generate(self):
+        """
+        Test text generation functionality.
+        """
+        causal_lm = RWKV7CausalLM(self.backbone, self.preprocessor)
+        prompt = ["hello world"]
+        output = causal_lm.generate(prompt, 16)
+        self.assertTrue(isinstance(output[0], str))
+        self.assertTrue(isinstance(output, list))
+
+        prompt = "hello world"
+        output = causal_lm.generate(prompt, 16)
+        self.assertTrue(isinstance(output, str))
+
+    def test_generate_strip_prompt(self):
+        """
+        Test that generated text can strip the prompt from output.
+        """
+        prompt = ["hello world"]
+        causal_lm = RWKV7CausalLM(self.backbone, self.preprocessor)
+        output = causal_lm.generate(prompt, 16, strip_prompt=True)
+        self.assertFalse(output[0].startswith(prompt[0]))
+
+    def test_generate_compilation(self):
+        """
+        Test that the generate function compiles correctly and
+        reuses compiled functions.
+        """
+        causal_lm = RWKV7CausalLM(self.backbone, self.preprocessor)
+        causal_lm.generate(["hello world"], 16)
+        first_fn = causal_lm.generate_function
+        causal_lm.generate(["hello world"], 16)
+        second_fn = causal_lm.generate_function
+        self.assertEqual(first_fn, second_fn)
+
+        causal_lm.compile(sampler="greedy")
+        self.assertIsNone(causal_lm.generate_function)


According to the style guide, task model tests should use the self.run_task_test() helper method.¹ This ensures comprehensive testing, including the attached preprocessor functionality. Please refactor the tests to use this helper.

Suggested change

def test_generate(self):

"""

Test text generation functionality.

"""

causal_lm = RWKV7CausalLM(self.backbone, self.preprocessor)

prompt = ["hello world"]

output = causal_lm.generate(prompt, 16)

self.assertTrue(isinstance(output[0], str))

self.assertTrue(isinstance(output, list))

prompt = "hello world"

output = causal_lm.generate(prompt, 16)

self.assertTrue(isinstance(output, str))

def test_generate_strip_prompt(self):

"""

Test that generated text can strip the prompt from output.

"""

prompt = ["hello world"]

causal_lm = RWKV7CausalLM(self.backbone, self.preprocessor)

output = causal_lm.generate(prompt, 16, strip_prompt=True)

self.assertFalse(output[0].startswith(prompt[0]))

def test_generate_compilation(self):

"""

Test that the generate function compiles correctly and

reuses compiled functions.

"""

causal_lm = RWKV7CausalLM(self.backbone, self.preprocessor)

causal_lm.generate(["hello world"], 16)

first_fn = causal_lm.generate_function

causal_lm.generate(["hello world"], 16)

second_fn = causal_lm.generate_function

self.assertEqual(first_fn, second_fn)

causal_lm.compile(sampler="greedy")

self.assertIsNone(causal_lm.generate_function)

def test_task_basics(self):

"""

Test CausalLM task with the preprocessor.

"""

self.run_task_test(

cls=RWKV7CausalLM,

init_kwargs=self.init_kwargs,

train_data=(

["hello world", "the quick brown fox"],

),

expected_output_shape=(2, 16, 5),

)

Style Guide References

Footnotes

The style guide requires using self.run_task_test() for testing task models with preprocessors. ↩

These are the tests modified based on LLaMA Casual LM. I think Gemini's review is incorrect.

gemini-code-assist · 2025-11-04T21:19:44Z

keras_hub/src/models/rwkv7/rwkv7_causal_lm.py

+    Examples:
+    ```python
+    # Initialize the tokenizer and load assets from a local path.
+    tokenizer = RWKVTokenizer()
+    tokenizer.load_assets(rwkv_path)
+
+    # Create a preprocessor with a sequence length of 8.
+    preprocessor = RWKV7CausalLMPreprocessor(tokenizer, sequence_length=8)
+
+    # Initialize the model with a backbone and preprocessor.
+    causal_lm = RWKV7CausalLM(backbone, preprocessor)
+
+    prompts = ["Bubble sort\n```python", "Hello World\n```python\n"]
+
+    causal_lm.compile(sampler="greedy")
+
+    outputs = causal_lm.generate(prompts, max_length=128)
+    for out in outputs:
+        print(out)
+        print("-" * 100)
+    ```


The example in the docstring is not self-contained and cannot be run as-is, as it references undefined variables like RWKVTokenizer, rwkv_path, and backbone.¹ Per the style guide, examples should be comprehensive and runnable. Please update it to be a complete code snippet.

Style Guide References

Footnotes

Docstrings must include comprehensive examples showing usage patterns. ↩

keras_hub/src/models/rwkv7/rwkv7_layer.py

keras_hub/src/models/rwkv7/rwkv7_tokenizer.py

pass-lin · 2025-11-05T05:27:43Z

Apologies for the delay in review, taking a look into this. Will add my comments.

Hello, I have modified the code according to Gemini's suggestions. Please add a manual review

sachinprasadhs

Made some comments for few files, will add more comments.

sachinprasadhs · 2025-11-04T21:19:18Z

keras_hub/src/models/rwkv7/rwkv7_backbone.py

+
+@keras_hub_export("keras_hub.models.RWKV7Backbone")
+class RWKV7Backbone(Backbone):
+    """The [RWKV-7](https://arxiv.org/abs/2503.14456) core architecture.


You can add the link to the paper in the next section, in the first line keep only description of the Backbone.

sachinprasadhs · 2025-11-04T21:21:42Z

keras_hub/src/models/rwkv7/rwkv7_backbone.py

+        gate_lora=128,
+        mv_lora=32,
+        aaa_lora=64,
+        decay_lora=64,


Are these defaults common accross all the checkpoint config, if not then let's not set any default value.
If these are common, then add the "Defaults to xxx" in thier respective arg description.

sachinprasadhs · 2025-11-04T21:33:14Z

keras_hub/src/models/rwkv7/rwkv7_backbone.py

+            dtype=dtype,
+            name="token_embedding",
+        )
+        self.token_embedding.build([None, None])


This self.token_embedding.build([None, None]) can be removed, since you will be calling this layer later in the code and it will be built there.

sachinprasadhs · 2025-11-04T21:33:46Z

keras_hub/src/models/rwkv7/rwkv7_backbone.py

+        self.output_layer_norm = keras.layers.LayerNormalization(
+            epsilon=1e-5, name="output_norm"
+        )
+        self.output_layer_norm.build([None, None, hidden_size])


same, you can remove this self.output_layer_norm.build([None, None, hidden_size]).

sachinprasadhs · 2025-11-04T23:20:47Z

keras_hub/src/models/rwkv7/rwkv7_backbone.py

+        token_id_input = keras.Input(
+            shape=(None,), dtype="int32", name="token_ids"
+        )
+
+        padding_mask = ops.not_equal(token_id_input, 0)
+
+        x = self.token_embedding(token_id_input)
+        padding_mask = ops.cast(padding_mask, dtype=x.dtype)


Make padding_mask askeras.Input, let's not assume the padding token id will be always 0 in all the use case scenarios, which would make it difficult for the users to switch from different backbone to this.

sachinprasadhs · 2025-11-04T23:25:57Z

keras_hub/src/models/rwkv7/rwkv7_backbone_test.py

+
+
+class RWKV7BackboneTest(TestCase):
+    def setUp(self):


Add test_saved_model test

sachinprasadhs · 2025-11-04T23:26:40Z

keras_hub/src/models/rwkv7/rwkv7_backbone_test.py

+    def test_backbone_basics(self):
+        """
+        Test basic functionality of the RWKV7 backbone.
+        """
+        y = self.backbone(self.input_data)
+        self.assertEqual(y.shape, (2, 5, 10))
+
+    def test_num_parameters(self):
+        """
+        Test that the model has the expected number of parameters.
+        """
+        self.assertEqual(self.backbone.count_params(), 10208)


Address this comment from Gemini.

sachinprasadhs · 2025-11-05T18:56:08Z

keras_hub/src/models/rwkv7/rwkv7_backbone.py

+        sequence_output = self.output_layer_norm(x)
+        sequence_output = self.head(sequence_output)
+        super().__init__(
+            inputs=token_id_input,


Once the padding_mask is changed to keras.Input, make the inputs as dictionary which looks something like inputs={ "token_ids": token_id_input, "padding_mask": padding_mask_input, },

sachinprasadhs · 2025-11-05T19:02:09Z

keras_hub/src/models/rwkv7/rwkv7_backbone.py

+            **kwargs,
+        )
+        # Initialize the graph to avoid potential errors in some cases
+        self.call(ops.ones([1, 16], "int32"))


This is redundant, model will be already built in super().__init__(...) and this needs to be changed everytime the input signature is changed.

sachinprasadhs · 2025-11-05T19:03:54Z

keras_hub/src/models/rwkv7/rwkv7_backbone_test.py

+    def test_backbone_basics(self):
+        """
+        Test basic functionality of the RWKV7 backbone.
+        """
+        y = self.backbone(self.input_data)
+        self.assertEqual(y.shape, (2, 5, 10))
+
+    def test_num_parameters(self):
+        """
+        Test that the model has the expected number of parameters.
+        """
+        self.assertEqual(self.backbone.count_params(), 10208)


This is our standard way of testing the backbone. We should try to identify the issue than coming up with the workaround.

pass-lin added 3 commits September 28, 2025 22:48

add RWKV

195ef79

fix

7bc36b5

fix

7d4a7a1

gemini-code-assist bot reviewed Sep 28, 2025

View reviewed changes

pass-lin added 7 commits October 7, 2025 23:15

add inference

e5bb446

add inference

afcff31

add tokenizer doc

ec0baf3

add doc

bd6c618

add test case

4201a7f

fix test

897a64b

fix doc

ff11f94

divyashreepathihalli requested a review from sachinprasadhs October 19, 2025 18:38

gemini-code-assist bot reviewed Oct 20, 2025

View reviewed changes

pass-lin added 3 commits October 20, 2025 18:44

fix gemini review.

ce13d54

format.

0e36b4a

format.

7218888

pass-lin added 5 commits October 29, 2025 16:02

save tokenizer

cc5815b

fix tokenizer load

dd80464

fix save

5e8723d

renew preset

f223002

renew perset.

b2b1573

pass-lin force-pushed the rwkv branch from c2afdde to b2b1573 Compare November 3, 2025 09:11

debug for remat

c5ebeec

gemini-code-assist bot reviewed Nov 4, 2025

View reviewed changes

modify by gemini review .

14111c8

sachinprasadhs reviewed Nov 5, 2025

View reviewed changes

	from keras_hub.src.models.rwkv7.rwkv7_casual_lm import RWKV7CausalLM
	from keras_hub.src.models.rwkv7.rwkv7_causal_lm import RWKV7CausalLM



		@keras_hub_export("keras_hub.models.RWKV7CausalLMPreprocessor")
		class RWKV7CausalLMPreprocessor(CausalLMPreprocessor):

	vocabulary = f.readlines()
	vocabulary = f.read().splitlines()

	last_cache_x: Cached previous values.
	# sorted = [] # must be already sorted

	state_cachce, last_token_cache = cache
	state_cache, last_token_cache = cache

-    def test_backbone_basics(self):
-        """
-        Test basic functionality of the RWKV7 backbone.
-        """
-        y = self.backbone(self.input_data)
-        self.assertEqual(y.shape, (2, 5, 10))
-    def test_num_parameters(self):
-        """
-        Test that the model has the expected number of parameters.
-        """
-        self.assertEqual(self.backbone.count_params(), 10208)
+    def test_backbone_basics(self):
+        self.run_backbone_test(
+            cls=RWKV7Backbone,
+            init_kwargs=self.init_kwargs,
+            input_data=self.input_data,
+            expected_output_shape=(2, 5, 10),
+        )
+    def test_saved_model(self):
+        self.run_model_saving_test(
+            cls=RWKV7Backbone,
+            init_kwargs=self.init_kwargs,
+            input_data=self.input_data,
+        )

ADD RWKV7 #2421

Are you sure you want to change the base?

ADD RWKV7 #2421

Uh oh!

Conversation

pass-lin commented Sep 28, 2025 • edited by sachinprasadhs Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📚 References

🔗 Pre-trained Checkpoints (ModelScope)

Uh oh!

gemini-code-assist bot commented Sep 28, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Sep 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Sep 28, 2025

Choose a reason for hiding this comment

Style Guide References

Footnotes

Uh oh!

gemini-code-assist bot Sep 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist bot Sep 28, 2025

Choose a reason for hiding this comment

Style Guide References

Footnotes

Uh oh!

Uh oh!

pass-lin commented Oct 19, 2025

Uh oh!

pass-lin commented Oct 20, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 20, 2025

Choose a reason for hiding this comment

Rules References

Footnotes

Uh oh!

pass-lin Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 20, 2025

Choose a reason for hiding this comment

Rules References

Footnotes

Uh oh!

pass-lin Oct 20, 2025

Choose a reason for hiding this comment

Rules References

Footnotes

Uh oh!

Uh oh!

gemini-code-assist bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 20, 2025

Choose a reason for hiding this comment

Style Guide References

Footnotes

Uh oh!

pass-lin commented Sep 28, 2025 •

edited by sachinprasadhs

Loading

sachinprasadhs Nov 4, 2025 •

edited

Loading

sachinprasadhs Nov 4, 2025 •

edited

Loading