Skip to content

fix stft detect to support torchaudio.transforms.MelSpectrogram#6548

Open
futz12 wants to merge 2 commits intoTencent:masterfrom
futz12:stft-window-detect
Open

fix stft detect to support torchaudio.transforms.MelSpectrogram#6548
futz12 wants to merge 2 commits intoTencent:masterfrom
futz12:stft-window-detect

Conversation

@futz12
Copy link
Contributor

@futz12 futz12 commented Feb 14, 2026

we can directly export MelSpectrogram right now

import torchaudio
from torchaudio.transforms import MelSpectrogram
import torch

# Load an audio file
waveform = torch.randn(1, 1, 16000)
waveform2 = torch.randn(1, 1, 16000 + 160 * 5)
sample_rate = 16000

# Create a MelSpectrogram transform
mel_transform = MelSpectrogram(
    sample_rate=sample_rate,
    n_fft=1024,
    win_length=1024,
    hop_length=160,
    n_mels=128,
    f_min=30,
    f_max=8000,
)
# Apply the transform to the waveform
mel_spectrogram = mel_transform(waveform)
print(mel_spectrogram.shape)  # Output: (channels, n_mels, time)

import pnnx

pnnx.export(mel_transform, "mel_transform.pnnx", waveform, inputs2=waveform2)
(dev) PS C:\Users\wuyex\Documents\Project\ncnn\tools\pnnx\cmake-build-relwithdebinfo\src> python .\mel_transform_ncnn.py
tensor([[[[ 19.7871,  32.1392,  48.4337,  ...,  18.9732, 102.2494, 169.9629],
          [  4.8176,  17.9500,  36.9528,  ...,  64.0518, 131.9300, 158.6522],
          [  1.8964,   9.1304,  18.8681,  ...,  74.7491,  58.5998,  50.5858],
          ...,
          [335.0807, 376.1324, 419.6162,  ..., 246.3479, 321.4625, 366.1712],
          [136.8933, 144.3150, 168.2224,  ..., 342.1469, 265.8184, 225.4766],
          [396.7309, 368.3260, 291.3547,  ..., 229.3970, 219.3196, 220.0465]]]])
(dev) PS C:\Users\wuyex\Documents\Project\ncnn\tools\pnnx\cmake-build-relwithdebinfo\src> python .\mel_transform_pnnx.py

tensor([[[[ 19.7937,  32.1496,  48.4496,  ...,  18.9794, 102.2833, 170.0189],
          [  4.8202,  17.9608,  36.9756,  ...,  64.0931, 132.0132, 158.7513],
          [  1.8974,   9.1336,  18.8738,  ...,  74.7806,  58.6124,  50.5914],
          ...,
          [335.1889, 376.2388, 419.7093,  ..., 246.4327, 321.5598, 366.2824],
          [136.9423, 144.3594, 168.2723,  ..., 342.2699, 265.9131, 225.5431],
          [396.8740, 368.4949, 291.4909,  ..., 229.4855, 219.4119, 220.1355]]]])

@nihui nihui closed this Feb 15, 2026
@nihui nihui reopened this Feb 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants