Correctly handle unbatched audio inputs in Gemma3nAudioFeatureExtractor #42076

kho · 2025-11-06T23:12:00Z

What does this PR do?

Correctly handle unbatched audio inputs in Gemma3nAudioFeatureExtractor

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

RyanMullins · 2025-11-06T23:30:19Z

Thanks for sending this fix @kho!

RyanMullins · 2025-11-07T14:16:52Z

cc @zucchini-nlp @Rocketknight1 for a review to help us land this for Gemma 3n.

zucchini-nlp · 2025-11-07T14:29:26Z

src/transformers/models/gemma3n/feature_extraction_gemma3n.py

        if is_batched:
            raw_speech = [np.asarray([rs]).T for rs in raw_speech]
        elif not is_batched and not isinstance(raw_speech, np.ndarray):
            raw_speech = np.asarray(raw_speech)

        if not is_batched:  # always return a batch
-            raw_speech = [np.asarray([raw_speech])]
+            raw_speech = [np.asarray([raw_speech]).T]


not very related to this PR, but let's unify a bit these blocks for easier reading, maybe smth like below. Otherwise LGTM

if not is_batched: raw_speech = [np.asarray(raw_speech)] raw_speech = [np.asarray([rs]).T for rs in raw_speech]

Good point! I just realized that we also don't need the np.asarray() call in the if statement. PTAL and thanks for reviewing!

…AudioFeatureExtractor

github-actions · 2025-11-07T17:03:44Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: gemma3n

zucchini-nlp

LGTM

HuggingFaceDocBuilderDev · 2025-11-10T08:45:47Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…or (huggingface#42076) * Correctly handle unbatched audio inputs in Gemma3nAudioFeatureExtractor * Simplify the logic for batching the unbatched speech input in Gemma3nAudioFeatureExtractor

Correctly handle unbatched audio inputs in Gemma3nAudioFeatureExtractor

278080e

zucchini-nlp reviewed Nov 7, 2025

View reviewed changes

Simplify the logic for batching the unbatched speech input in Gemma3n…

d14e459

…AudioFeatureExtractor

zucchini-nlp approved these changes Nov 10, 2025

View reviewed changes

zucchini-nlp enabled auto-merge (squash) November 10, 2025 08:36

zucchini-nlp merged commit 926c37a into huggingface:main Nov 10, 2025
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Correctly handle unbatched audio inputs in Gemma3nAudioFeatureExtractor #42076

Correctly handle unbatched audio inputs in Gemma3nAudioFeatureExtractor #42076

kho commented Nov 6, 2025 •

edited

Loading

Uh oh!

RyanMullins commented Nov 6, 2025

Uh oh!

RyanMullins commented Nov 7, 2025

Uh oh!

zucchini-nlp Nov 7, 2025 •

edited

Loading

Uh oh!

kho Nov 7, 2025

Uh oh!

github-actions bot commented Nov 7, 2025

Uh oh!

zucchini-nlp left a comment

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Nov 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Correctly handle unbatched audio inputs in Gemma3nAudioFeatureExtractor #42076

Correctly handle unbatched audio inputs in Gemma3nAudioFeatureExtractor #42076

Conversation

kho commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

RyanMullins commented Nov 6, 2025

Uh oh!

RyanMullins commented Nov 7, 2025

Uh oh!

zucchini-nlp Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kho Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 7, 2025

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Nov 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kho commented Nov 6, 2025 •

edited

Loading

zucchini-nlp Nov 7, 2025 •

edited

Loading