Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Fetch upstream #6

Draft
wants to merge 260 commits into
base: tokenize_hotwords_flag_master
Choose a base branch
from
Draft
Changes from 1 commit
Commits
Show all changes
260 commits
Select commit Hold shift + click to select a range
f1b311e
Handle audio files less than 10s long for speaker diarization. (#1412)
csukuangfj Oct 11, 2024
eefc172
JavaScript API with WebAssembly for speaker diarization (#1414)
csukuangfj Oct 11, 2024
2d412b1
Kotlin API for speaker diarization (#1415)
csukuangfj Oct 11, 2024
1851ff6
Java API for speaker diarization (#1416)
csukuangfj Oct 11, 2024
1ed803a
Dart API for speaker diarization (#1418)
csukuangfj Oct 11, 2024
5e273c5
Pascal API for speaker diarization (#1420)
csukuangfj Oct 12, 2024
94b26ff
Android JNI support for speaker diarization (#1421)
csukuangfj Oct 12, 2024
5a22f74
Android demo for speaker diarization (#1423)
csukuangfj Oct 13, 2024
99f320b
Release v1.10.28 (#1424)
csukuangfj Oct 13, 2024
df4150d
Upload speaker embedding models to huggingface (#1428)
csukuangfj Oct 14, 2024
77dd5f7
Update README.md (#1431)
semxum Oct 14, 2024
593b967
Add Go API for offline punctuation models (#1434)
csukuangfj Oct 16, 2024
471cbd8
updated onnxruntime-linux-aarch64.cmake so that libonnxruntime.so can…
shawl336 Oct 16, 2024
620597f
Support https://huggingface.co/Revai/reverb-diarization-v1 (#1437)
csukuangfj Oct 17, 2024
4783c8f
fix "log10" compile error by import CMATH lib (#1438)
Zazzle516 Oct 17, 2024
e0586f1
add more models for speaker diarization (#1440)
csukuangfj Oct 17, 2024
1af8ad8
Add Java API example for hotwords. (#1442)
csukuangfj Oct 18, 2024
bcaa91e
update java for hotword jar (#1444)
YeyuchenBa Oct 18, 2024
3edd8d7
add java android demo (#1454)
JameWade Oct 23, 2024
effd5ef
Add C++ API for streaming ASR. (#1455)
csukuangfj Oct 23, 2024
ceb69eb
Add C++ API for non-streaming ASR (#1456)
csukuangfj Oct 23, 2024
b3e05f6
Fix style issues (#1458)
csukuangfj Oct 24, 2024
a5295aa
Handle NaN embeddings in speaker diarization. (#1461)
csukuangfj Oct 24, 2024
2b40079
Add speaker identification with VAD and non-streaming ASR using ALSA …
Peakyxh Oct 24, 2024
b41f6d2
Support GigaAM CTC models for Russian ASR (#1464)
csukuangfj Oct 25, 2024
707cf79
Add GigaAM NeMo transducer model for Russian ASR (#1467)
csukuangfj Oct 25, 2024
d5a2f52
Release v1.10.29 (#1468)
csukuangfj Oct 25, 2024
3d6344e
Fix building node-addon for Windows x86. (#1469)
csukuangfj Oct 25, 2024
b06b460
Begin to support https://github.com/usefulsensors/moonshine (#1470)
csukuangfj Oct 26, 2024
0f2732e
Publish pre-built JNI libs for Linux aarch64 (#1472)
csukuangfj Oct 26, 2024
669f5ef
Add C++ runtime and Python APIs for Moonshine models (#1473)
csukuangfj Oct 26, 2024
bd4b223
Add Kotlin and Java API for Moonshine models (#1474)
csukuangfj Oct 26, 2024
2ca2985
Add C and C++ API for Moonshine models (#1476)
csukuangfj Oct 26, 2024
4a4659a
Add Swift API for Moonshine models. (#1477)
csukuangfj Oct 27, 2024
052b864
Add Go API examples for adding punctuations to text. (#1478)
csukuangfj Oct 27, 2024
3d3edab
Add Go API for Moonshine models (#1479)
csukuangfj Oct 27, 2024
6f261d3
Add JavaScript API for Moonshine models (#1480)
csukuangfj Oct 27, 2024
54468a7
Add Dart API for Moonshine models. (#1481)
csukuangfj Oct 27, 2024
cdd8e1b
Add Pascal API for Moonshine models (#1482)
csukuangfj Oct 27, 2024
3622104
Add C# API for Moonshine models. (#1483)
csukuangfj Oct 27, 2024
91e090f
Release v1.10.30 (#1484)
csukuangfj Oct 27, 2024
9eb493f
Publish pre-built wheels for Python 3.13 (#1485)
csukuangfj Oct 28, 2024
36a0e78
Add some commonly used models to README.md (#1486)
csukuangfj Oct 28, 2024
72dc68c
fix typo (#1488)
pengzhendong Oct 28, 2024
356da3b
Publish pre-built macos xcframework (#1490)
csukuangfj Oct 29, 2024
d9c586c
Removed unused TTS example code in .Net examples (#1492)
csukuangfj Oct 29, 2024
d9f65c9
Update pybind11 to support numpy 2.0 (#1493)
csukuangfj Oct 29, 2024
9fa3bc4
Fix reading tokens.txt on Windows. (#1497)
csukuangfj Oct 30, 2024
a3c89aa
Add two-pass ASR Android APKs for Moonshine models. (#1499)
csukuangfj Oct 31, 2024
9ab89c3
Support building GPU-capable sherpa-onnx on Linux aarch64. (#1500)
csukuangfj Nov 1, 2024
c5205f0
Add an example for computing RTF about streaming ASR. (#1501)
csukuangfj Nov 1, 2024
f0cced1
Publish pre-built wheels with CUDA support for Linux aarch64. (#1507)
csukuangfj Nov 3, 2024
6ee8c99
Fix building (#1508)
csukuangfj Nov 3, 2024
4eeb336
Export the English TTS model from MeloTTS (#1509)
csukuangfj Nov 3, 2024
86b1856
Reduce vad-sense-voice example code. (#1510)
whyb Nov 5, 2024
f94cca7
Fix: Reset sample-buffer after processing (#1521)
iteamvep Nov 8, 2024
f97daed
Fixes #1512 (#1522)
csukuangfj Nov 8, 2024
4fab3f2
Revert: [#1521] No need to reset sample-buffer (#1524)
iteamvep Nov 8, 2024
a16c9af
Add Lazarus example for Moonshine models. (#1532)
csukuangfj Nov 12, 2024
3f777b3
Add isolate_tts demo (#1529)
Spicely Nov 12, 2024
8436ba8
Add WebAssembly example for VAD + Moonshine models. (#1535)
csukuangfj Nov 13, 2024
c34ab35
Add Android APK for streaming Paraformer ASR (#1538)
csukuangfj Nov 14, 2024
b28b0c8
Support static build for windows arm64. (#1539)
csukuangfj Nov 15, 2024
e993c08
fix windows build (#1546)
endink Nov 16, 2024
9a48012
Use xcframework for Flutter iOS plugin. (#1547)
csukuangfj Nov 16, 2024
e424cc9
Support cross-compiling for HarmonyOS (#1553)
csukuangfj Nov 20, 2024
31d6206
HarmonyOS support for VAD. (#1561)
csukuangfj Nov 24, 2024
a4b79f0
Fix flutter ios (#1563)
csukuangfj Nov 26, 2024
298b6b6
Add non-streaming ASR support for HarmonyOS. (#1564)
csukuangfj Nov 26, 2024
2101227
Add streaming ASR support for HarmonyOS. (#1565)
csukuangfj Nov 26, 2024
109fb79
fix building for Android (#1568)
csukuangfj Nov 27, 2024
315d8e2
Publish `sherpa_onnx.har` for HarmonyOS (#1572)
csukuangfj Nov 28, 2024
f3f8961
Add VAD+ASR demo for HarmonyOS (#1573)
csukuangfj Nov 28, 2024
be159f9
Fix publishing har packages for HarmonyOS (#1576)
csukuangfj Nov 29, 2024
299f239
Add CI to build HAPs for HarmonyOS (#1578)
csukuangfj Nov 29, 2024
c9d3b6c
Add microphone demo about VAD+ASR for HarmonyOS (#1581)
csukuangfj Nov 30, 2024
a3d6e1a
Fix getting microphone permission for HarmonyOS VAD+ASR example (#1582)
csukuangfj Nov 30, 2024
dc3287f
Add HarmonyOS support for text-to-speech. (#1584)
csukuangfj Dec 1, 2024
0d6bf52
fix: support both old and new websockets request headers format (#1588)
JiayuXu0 Dec 3, 2024
47a2dd4
'update20241203' (#1589)
goddamnVincent Dec 4, 2024
74a8735
Add on-device tex-to-speech (TTS) demo for HarmonyOS (#1590)
csukuangfj Dec 4, 2024
9352ccf
Release v1.10.33 (#1591)
csukuangfj Dec 4, 2024
84821b1
Fix building node-addon package (#1598)
csukuangfj Dec 6, 2024
91a43cc
Update doc links for HarmonyOS (#1601)
csukuangfj Dec 6, 2024
a743a44
Add on-device real-time ASR demo for HarmonyOS (#1606)
csukuangfj Dec 9, 2024
314545f
Add speaker identification APIs for HarmonyOS (#1607)
csukuangfj Dec 9, 2024
14944d8
Add speaker identification demo for HarmonyOS (#1608)
csukuangfj Dec 10, 2024
1bae408
Add speaker diarization API for HarmonyOS. (#1609)
csukuangfj Dec 10, 2024
914cbad
Add speaker diarization demo for HarmonyOS (#1610)
csukuangfj Dec 10, 2024
e011e84
Release v1.10.34 (#1611)
csukuangfj Dec 10, 2024
9d4659f
Add missing changes about speaker identfication demo for HarmonyOS (#…
csukuangfj Dec 11, 2024
4dc4f1a
Provide sherpa-onnx.aar for Android (#1615)
csukuangfj Dec 12, 2024
be87f86
Use aar in Android Java demo. (#1616)
csukuangfj Dec 12, 2024
0f4b1f4
🔧 build(portaudio-go): Fixed version 1.0.3 (#1614)
deretame Dec 12, 2024
e54c1f4
Release v1.10.35 (#1617)
csukuangfj Dec 12, 2024
efb505f
Update AAR version in Android Java demo (#1618)
csukuangfj Dec 12, 2024
e639c70
Support linking onnxruntime statically for Android (#1619)
csukuangfj Dec 14, 2024
ed8d8e4
Update readme to include Open-LLM-VTuber (#1622)
csukuangfj Dec 16, 2024
5cc60de
Rename maxNumStences to maxNumSentences (#1625)
sawich Dec 16, 2024
70ee779
Support using onnxruntime 1.16.0 with CUDA 11.4 on Jetson Orin NX (Li…
csukuangfj Dec 19, 2024
86381e1
Update readme to include jetson orin nx and nano b01 (#1631)
csukuangfj Dec 19, 2024
7192e57
feat: add checksum action (#1632)
thewh1teagle Dec 20, 2024
b76cd90
Support decoding with byte-level BPE (bbpe) models. (#1633)
csukuangfj Dec 20, 2024
4681bdf
feat: enable c api for android ci (#1635)
thewh1teagle Dec 20, 2024
a3d6313
Update README.md (#1640)
Humorousf Dec 23, 2024
6613828
SherpaOnnxVadAsr: Offload runSecondPass to background thread for impr…
rominf Dec 24, 2024
d00d1c6
Fix GitHub actions. (#1642)
csukuangfj Dec 24, 2024
30a17b9
Release v1.10.36 (#1643)
csukuangfj Dec 24, 2024
fe3265a
Add new tts models for Latvia and Persian+English (#1644)
csukuangfj Dec 24, 2024
08d7713
Add a byte-level BPE Chinese+English non-streaming zipformer model (#…
csukuangfj Dec 24, 2024
b6f0f5f
Support removing invalid utf-8 sequences. (#1648)
csukuangfj Dec 25, 2024
268d562
Add TeleSpeech CTC to non_streaming_server.py (#1649)
csukuangfj Dec 26, 2024
38d64a6
Fix building macOS libs (#1656)
csukuangfj Dec 27, 2024
49154c9
Add Go API for Keyword spotting (#1662)
csukuangfj Dec 31, 2024
5c2cc48
Add swift online punctuation (#1661)
yujinqiu Dec 31, 2024
2c2926a
Add C++ runtime for Matcha-TTS (#1627)
csukuangfj Dec 31, 2024
b2ad6f6
Release v1.10.37 (#1663)
csukuangfj Dec 31, 2024
d353853
Fix initialize TTS in Python. (#1664)
csukuangfj Dec 31, 2024
ebe92e5
Remove spaces after punctuations for TTS (#1666)
csukuangfj Dec 31, 2024
0a43e9c
Add constructor fromPtr() for all flutter class with factory ctor. (#…
w-rui Dec 31, 2024
3422b93
Add Kotlin API for Matcha-TTS models. (#1668)
csukuangfj Dec 31, 2024
f457bae
Support Matcha-TTS models using espeak-ng (#1672)
csukuangfj Jan 2, 2025
a00d3b4
Add Java API for Matcha-TTS models. (#1673)
csukuangfj Jan 2, 2025
a4365da
Avoid adding tail padding for VAD in generate-subtitles.py (#1674)
csukuangfj Jan 3, 2025
9aa4897
Add C API for MatchaTTS models (#1675)
csukuangfj Jan 3, 2025
6489038
Add CXX API for MatchaTTS models (#1676)
csukuangfj Jan 3, 2025
0e299f3
Add JavaScript API (node-addon-api) for MatchaTTS models. (#1677)
csukuangfj Jan 3, 2025
bf3330c
Add HarmonyOS examples for MatchaTTS. (#1678)
csukuangfj Jan 3, 2025
8a60985
Upgraded to .NET 8 and made code style a little more internally consi…
Lamothe Jan 4, 2025
1ef9e5e
Update workflows to use .NET 8.0 also. (#1681)
Lamothe Jan 4, 2025
3eced3e
Add C# and JavaScript (wasm) API for MatchaTTS models (#1682)
csukuangfj Jan 5, 2025
1fe5fe4
Add Android demo for MatchaTTS models. (#1683)
csukuangfj Jan 5, 2025
6f085ba
Add Swift API for MatchaTTS models. (#1684)
csukuangfj Jan 5, 2025
46330b2
Add Go API for MatchaTTS models (#1685)
csukuangfj Jan 6, 2025
c6fcd32
Add Pascal API for MatchaTTS models. (#1686)
csukuangfj Jan 6, 2025
d7c95d3
Add Dart API for MatchaTTS models (#1687)
csukuangfj Jan 6, 2025
930986b
Release v1.10.38 (#1688)
csukuangfj Jan 6, 2025
6d18430
Fix building without TTS (#1691)
csukuangfj Jan 7, 2025
0cb2db3
Add README for android libs. (#1693)
csukuangfj Jan 7, 2025
ecc6538
Fix: export-onnx.py(expected all tensors to be on the same device) (#…
LuomingXu Jan 10, 2025
0d20558
Fix passing strings from C# to C. (#1701)
csukuangfj Jan 13, 2025
cbe07ac
Release v1.10.39 (#1702)
csukuangfj Jan 13, 2025
ce71b63
Fix building wheels (#1703)
csukuangfj Jan 13, 2025
9efe26a
Export kokoro to sherpa-onnx (#1713)
csukuangfj Jan 15, 2025
ffc6b48
Add C++ and Python API for Kokoro TTS models. (#1715)
csukuangfj Jan 16, 2025
af671e2
Add C API for Kokoro TTS models (#1717)
csukuangfj Jan 16, 2025
2d0869c
Fix style issues (#1718)
csukuangfj Jan 16, 2025
cc812e6
Add C# API for Kokoro TTS models (#1720)
csukuangfj Jan 16, 2025
ad61ad6
Add Swift API for Kokoro TTS models (#1721)
csukuangfj Jan 16, 2025
2086f8c
Add Go API for Kokoro TTS models (#1722)
csukuangfj Jan 16, 2025
4335e2a
Add Dart API for Kokoro TTS models (#1723)
csukuangfj Jan 16, 2025
46f2e32
Add Pascal API for Kokoro TTS models (#1724)
csukuangfj Jan 16, 2025
e8d499d
Add JavaScript API (node-addon) for Kokoro TTS models (#1725)
csukuangfj Jan 16, 2025
3a1de0b
Add JavaScript (WebAssembly) API for Kokoro TTS models. (#1726)
csukuangfj Jan 17, 2025
99cef41
Add Koltin and Java API for Kokoro TTS models (#1728)
csukuangfj Jan 17, 2025
bad82f3
Update README.md for KWS to not use `git lfs`. (#1729)
csukuangfj Jan 17, 2025
2df43b3
Release v1.10.40 (#1731)
csukuangfj Jan 17, 2025
9d6c0e5
Fix UI for Android TTS Engine. (#1735)
csukuangfj Jan 20, 2025
e2f096b
Add iOS TTS example for MatchaTTS (#1736)
csukuangfj Jan 20, 2025
a2650b7
Add iOS example for Kokoro TTS (#1737)
csukuangfj Jan 20, 2025
b943341
Fix `dither` binding in Pybind11 to ensure independence from `high_fr…
jacklynblack Jan 20, 2025
8b989a8
Fix keyword spotting. (#1689)
csukuangfj Jan 20, 2025
e764fa6
Update readme to include https://github.com/hfyydd/sherpa-onnx-server…
csukuangfj Jan 20, 2025
5bcd7e1
Reduce vad-moonshine-c-api example code. (#1742)
whyb Jan 21, 2025
bc3322e
Support Kokoro TTS for HarmonyOS. (#1743)
csukuangfj Jan 22, 2025
66e02d8
Release v1.10.41 (#1744)
csukuangfj Jan 22, 2025
340ebca
Fix publishing wheels (#1746)
csukuangfj Jan 22, 2025
e259529
Update README to include https://github.com/xinhecuican/QSmartAssista…
csukuangfj Jan 23, 2025
030aaa7
Add Kokoro TTS to MFC examples (#1760)
csukuangfj Jan 24, 2025
73c3695
Refactor node-addon C++ code. (#1768)
csukuangfj Jan 25, 2025
f178e96
Add keyword spotter C API for HarmonyOS (#1769)
csukuangfj Jan 26, 2025
8847151
Add ArkTS API for Keyword spotting. (#1775)
csukuangfj Jan 29, 2025
59ff854
Add Flutter example for Kokoro TTS (#1776)
csukuangfj Jan 29, 2025
1d950a8
Initialize the audio session for iOS ASR example (#1786)
Ross-Fan Feb 2, 2025
8677d83
Fix: Prepend 0 to tokenization to prevent word skipping for Kokoro. (…
ahadjawaid Feb 3, 2025
08cefe8
Export Kokoro 1.0 to sherpa-onnx (#1788)
csukuangfj Feb 5, 2025
c84a833
Add C++ and Python API for Kokoro 1.0 multilingual TTS model (#1795)
csukuangfj Feb 6, 2025
4372a7a
Add Java and Koltin API for Kokoro TTS 1.0 (#1798)
csukuangfj Feb 7, 2025
a52b819
Add Android demo for Kokoro TTS 1.0 (#1799)
csukuangfj Feb 7, 2025
7330f75
Add C API for Kokoro TTS 1.0 (#1801)
csukuangfj Feb 7, 2025
d815204
Add CXX API for Kokoro TTS 1.0 (#1802)
csukuangfj Feb 7, 2025
e2e0f25
Add Swift API for Kokoro TTS 1.0 (#1803)
csukuangfj Feb 7, 2025
e1a88a7
Add Go API for Kokoro TTS 1.0 (#1804)
csukuangfj Feb 7, 2025
ae32dfa
Add C# API for Kokoro TTS 1.0 (#1805)
csukuangfj Feb 7, 2025
35f5ff3
Add Dart API for Kokoro TTS 1.0 (#1806)
csukuangfj Feb 7, 2025
c254504
Add Pascal API for Kokoro TTS 1.0 (#1807)
csukuangfj Feb 7, 2025
19513af
Add JavaScript API (node-addon) for Kokoro TTS 1.0 (#1808)
csukuangfj Feb 7, 2025
0610679
Add JavaScript API (WebAssembly) for Kokoro TTS 1.0 (#1809)
csukuangfj Feb 7, 2025
c49bbce
Add Flutter example for Kokoro TTS 1.0 (#1810)
csukuangfj Feb 7, 2025
239b43c
Add iOS demo for Kokoro TTS 1.0 (#1812)
csukuangfj Feb 7, 2025
5ca2465
Add HarmonyOS demo for Kokoro TTS 1.0 (#1813)
csukuangfj Feb 7, 2025
f90f9da
Release v1.10.42 (#1814)
csukuangfj Feb 7, 2025
84c4fe6
Add MFC example for Kokoro TTS 1.0 (#1815)
csukuangfj Feb 7, 2025
51b4274
Update sherpa-onnx-tts.js VitsModelConfig.model can be none (#1817)
cgisky1980 Feb 8, 2025
d38cb81
Fix passing gb2312 encoded strings to tts on Windows (#1819)
csukuangfj Feb 8, 2025
69f489f
Support scaling the duration of a pause in TTS. (#1820)
csukuangfj Feb 8, 2025
ee7e622
Fix building wheels for linux aarch64. (#1821)
csukuangfj Feb 8, 2025
07391e6
Fix CI for Linux aarch64. (#1822)
csukuangfj Feb 8, 2025
1030bed
Release v1.10.43 (#1828)
csukuangfj Feb 9, 2025
7d62ccf
Export MatchaTTS fa-en model to sherpa-onnx (#1832)
csukuangfj Feb 10, 2025
9559a10
Add C++ support for MatchaTTS models not from icefall. (#1834)
csukuangfj Feb 10, 2025
2ac41d3
OfflineRecognizer supports create stream with hotwords (#1833)
kellkwang Feb 10, 2025
d5da943
Add PengChengStarling models to sherpa-onnx (#1835)
csukuangfj Feb 10, 2025
ad883d4
Support specifying voice in espeak-ng for kokoro tts models. (#1836)
csukuangfj Feb 10, 2025
73d7c25
Fix: made print sherpa_onnx_loge when it is in debug mode (#1838)
ahadjawaid Feb 10, 2025
f5bf8c8
Add Go API for audio tagging (#1840)
csukuangfj Feb 11, 2025
8b8ef10
Fix CI (#1841)
csukuangfj Feb 11, 2025
d617723
Update readme to contain links for pre-built Apps (#1853)
csukuangfj Feb 13, 2025
ce7c03b
Modify the model used (#1855)
JV-X Feb 13, 2025
115e9c2
Flutter OnlinePunctuation (#1854)
Dokotela Feb 13, 2025
944400e
Fix spliting text by languages for kokoro tts. (#1849)
csukuangfj Feb 13, 2025
3825cf3
Release v1.10.44 (#1857)
csukuangfj Feb 13, 2025
2dd84b4
[update] fixed bug: create golang instance succeed while the c struct…
ilibx Feb 14, 2025
60beff1
fixed typo in RTF calculations (#1861)
mah92 Feb 14, 2025
2337169
Export FireRedASR to sherpa-onnx. (#1865)
csukuangfj Feb 16, 2025
316424b
Add C++ and Python API for FireRedASR AED models (#1867)
csukuangfj Feb 16, 2025
d148860
Add Kotlin and Java API for FireRedAsr AED model (#1870)
csukuangfj Feb 17, 2025
193d313
Add C API for FireRedAsr AED model. (#1871)
csukuangfj Feb 17, 2025
1d49dd2
Add CXX API for FireRedAsr (#1872)
csukuangfj Feb 17, 2025
050df2a
Add JavaScript API (node-addon) for FireRedAsr (#1873)
csukuangfj Feb 17, 2025
7ad44bc
Add JavaScript API (WebAssembly) for FireRedAsr model. (#1874)
csukuangfj Feb 17, 2025
d95d431
Add C# API for FireRedAsr Model (#1875)
csukuangfj Feb 17, 2025
b03f6e6
Add Swift API for FireRedAsr AED Model (#1876)
csukuangfj Feb 17, 2025
b5d89d7
Add Dart API for FireRedAsr AED Model (#1877)
csukuangfj Feb 17, 2025
87a968b
Add Go API for FireRedAsr AED Model (#1879)
csukuangfj Feb 17, 2025
614c510
Add Pascal API for FireRedAsr AED Model (#1877) (#1880)
csukuangfj Feb 17, 2025
9711ab2
Release v1.10.45 (#1881)
csukuangfj Feb 17, 2025
26d5f1f
Fix kokoro lexicon. (#1886)
csukuangfj Feb 18, 2025
4e83b34
speaker-identification-with-vad-non-streaming-asr.py Lack of support …
luffy-git Feb 18, 2025
774cf66
Fix generating Chinese lexicon for Kokoro TTS 1.0 (#1888)
csukuangfj Feb 18, 2025
654d228
Reduce vad-whisper-c-api example code. (#1891)
whyb Feb 18, 2025
4801094
JNI Exception Handling (#1452)
iprovalo Feb 19, 2025
9c810ce
Fix #1901: UnicodeEncodeError running export_bpe_vocab.py (#1902)
sheldonrobinson Feb 20, 2025
ed922e6
Fix publishing pre-built windows libraries (#1905)
csukuangfj Feb 21, 2025
94728bf
Fixing Whisper Model Token Normalization (#1904)
iprovalo Feb 21, 2025
7774e35
feat: add mic example for better compatibility (#1909)
wanghsinche Feb 21, 2025
bafd110
Add onnxruntime 1.18.1 for Linux aarch64 GPU (#1914)
csukuangfj Feb 24, 2025
4d79e6a
Add C++ API for streaming zipformer ASR on RK NPU (#1908)
csukuangfj Feb 24, 2025
808587a
change [1<<28] to [1<<10], to fix build issues on GOARCH=386 that [1<…
franck-li Feb 25, 2025
70742b6
Flutter Config toJson/fromJson (#1893)
Dokotela Feb 25, 2025
dc2f7e9
Fix publishing linux pre-built artifacts (#1919)
csukuangfj Feb 25, 2025
0dcaf3a
go.mod set to use go 1.17, and use unsafe.Slice to optimize the code …
franck-li Feb 25, 2025
b042f5e
fix: AddPunct panic for Go(#1921)
xcel3011 Feb 25, 2025
2f9a2b2
Fix publishing macos pre-built artifacts (#1922)
csukuangfj Feb 26, 2025
82cb8a5
Minor fixes for rknn (#1925)
csukuangfj Feb 26, 2025
eebe199
Build wheels for rknn linux aarch64 (#1928)
csukuangfj Feb 26, 2025
337d5f7
Release v1.10.46 (#1929)
csukuangfj Feb 26, 2025
815ebac
Fix building wheels for Python 3.7 (#1933)
csukuangfj Feb 27, 2025
f5dfcf8
Add Kotlin and Java API for online punctuation models (#1936)
csukuangfj Feb 27, 2025
dfcbc8d
Add Kokoro v1.1-zh (#1942)
csukuangfj Feb 28, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Fix keyword spotting. (k2-fsa#1689)
Reset the stream right after detecting a keyword
  • Loading branch information
csukuangfj authored Jan 20, 2025
commit 8b989a851cbb759976d1f8d40cae91dd9362f816
33 changes: 1 addition & 32 deletions .github/scripts/test-python.sh
Original file line number Diff line number Diff line change
@@ -574,29 +574,6 @@ echo "sherpa_onnx version: $sherpa_onnx_version"
pwd
ls -lh

repo=sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01
log "Start testing ${repo}"

pushd $dir
curl -LS -O https://github.com/pkufool/keyword-spotting-models/releases/download/v0.1/sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01.tar.bz
tar xf sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01.tar.bz
rm sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01.tar.bz
popd

repo=$dir/$repo
ls -lh $repo

python3 ./python-api-examples/keyword-spotter.py \
--tokens=$repo/tokens.txt \
--encoder=$repo/encoder-epoch-12-avg-2-chunk-16-left-64.onnx \
--decoder=$repo/decoder-epoch-12-avg-2-chunk-16-left-64.onnx \
--joiner=$repo/joiner-epoch-12-avg-2-chunk-16-left-64.onnx \
--keywords-file=$repo/test_wavs/test_keywords.txt \
$repo/test_wavs/0.wav \
$repo/test_wavs/1.wav

rm -rf $repo

if [[ x$OS != x'windows-latest' ]]; then
echo "OS: $OS"

@@ -612,15 +589,7 @@ if [[ x$OS != x'windows-latest' ]]; then
repo=$dir/$repo
ls -lh $repo

python3 ./python-api-examples/keyword-spotter.py \
--tokens=$repo/tokens.txt \
--encoder=$repo/encoder-epoch-12-avg-2-chunk-16-left-64.onnx \
--decoder=$repo/decoder-epoch-12-avg-2-chunk-16-left-64.onnx \
--joiner=$repo/joiner-epoch-12-avg-2-chunk-16-left-64.onnx \
--keywords-file=$repo/test_wavs/test_keywords.txt \
$repo/test_wavs/3.wav \
$repo/test_wavs/4.wav \
$repo/test_wavs/5.wav
python3 ./python-api-examples/keyword-spotter.py

python3 sherpa-onnx/python/tests/test_keyword_spotter.py --verbose

21 changes: 21 additions & 0 deletions .github/workflows/c-api.yaml
Original file line number Diff line number Diff line change
@@ -79,6 +79,27 @@ jobs:
otool -L ./install/lib/libsherpa-onnx-c-api.dylib
fi

- name: Test kws (zh)
shell: bash
run: |
gcc -o kws-c-api ./c-api-examples/kws-c-api.c \
-I ./build/install/include \
-L ./build/install/lib/ \
-l sherpa-onnx-c-api \
-l onnxruntime

curl -SL -O https://github.com/k2-fsa/sherpa-onnx/releases/download/kws-models/sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01-mobile.tar.bz2
tar xvf sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01-mobile.tar.bz2
rm sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01-mobile.tar.bz2

export LD_LIBRARY_PATH=$PWD/build/install/lib:$LD_LIBRARY_PATH
export DYLD_LIBRARY_PATH=$PWD/build/install/lib:$DYLD_LIBRARY_PATH

./kws-c-api

rm ./kws-c-api
rm -rf sherpa-onnx-kws-*

- name: Test Kokoro TTS (en)
shell: bash
run: |
22 changes: 22 additions & 0 deletions .github/workflows/cxx-api.yaml
Original file line number Diff line number Diff line change
@@ -81,6 +81,28 @@ jobs:
otool -L ./install/lib/libsherpa-onnx-cxx-api.dylib
fi

- name: Test KWS (zh)
shell: bash
run: |
g++ -std=c++17 -o kws-cxx-api ./cxx-api-examples/kws-cxx-api.cc \
-I ./build/install/include \
-L ./build/install/lib/ \
-l sherpa-onnx-cxx-api \
-l sherpa-onnx-c-api \
-l onnxruntime

curl -SL -O https://github.com/k2-fsa/sherpa-onnx/releases/download/kws-models/sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01-mobile.tar.bz2
tar xvf sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01-mobile.tar.bz2
rm sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01-mobile.tar.bz2

export LD_LIBRARY_PATH=$PWD/build/install/lib:$LD_LIBRARY_PATH
export DYLD_LIBRARY_PATH=$PWD/build/install/lib:$DYLD_LIBRARY_PATH

./kws-cxx-api

rm kws-cxx-api
rm -rf sherpa-onnx-kws-*

- name: Test Kokoro TTS (en)
shell: bash
run: |
Original file line number Diff line number Diff line change
@@ -151,24 +151,27 @@ class MainActivity : AppCompatActivity() {
stream.acceptWaveform(samples, sampleRate = sampleRateInHz)
while (kws.isReady(stream)) {
kws.decode(stream)
}

val text = kws.getResult(stream).keyword
val text = kws.getResult(stream).keyword

var textToDisplay = lastText

var textToDisplay = lastText
if (text.isNotBlank()) {
// Remember to reset the stream right after detecting a keyword

if (text.isNotBlank()) {
if (lastText.isBlank()) {
textToDisplay = "$idx: $text"
} else {
textToDisplay = "$idx: $text\n$lastText"
kws.reset(stream)
if (lastText.isBlank()) {
textToDisplay = "$idx: $text"
} else {
textToDisplay = "$idx: $text\n$lastText"
}
lastText = "$idx: $text\n$lastText"
idx += 1
}
lastText = "$idx: $text\n$lastText"
idx += 1
}

runOnUiThread {
textView.text = textToDisplay
runOnUiThread {
textView.text = textToDisplay
}
}
}
}
3 changes: 3 additions & 0 deletions c-api-examples/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -4,6 +4,9 @@ include_directories(${CMAKE_SOURCE_DIR})
add_executable(decode-file-c-api decode-file-c-api.c)
target_link_libraries(decode-file-c-api sherpa-onnx-c-api cargs)

add_executable(kws-c-api kws-c-api.c)
target_link_libraries(kws-c-api sherpa-onnx-c-api)

if(SHERPA_ONNX_ENABLE_TTS)
add_executable(offline-tts-c-api offline-tts-c-api.c)
target_link_libraries(offline-tts-c-api sherpa-onnx-c-api cargs)
150 changes: 150 additions & 0 deletions c-api-examples/kws-c-api.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
// c-api-examples/kws-c-api.c
//
// Copyright (c) 2025 Xiaomi Corporation
//
// This file demonstrates how to use keywords spotter with sherpa-onnx's C
// clang-format off
//
// Usage
//
// wget https://github.com/k2-fsa/sherpa-onnx/releases/download/kws-models/sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01-mobile.tar.bz2
// tar xvf sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01-mobile.tar.bz2
// rm sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01-mobile.tar.bz2
//
// ./kws-c-api
//
// clang-format on
#include <stdio.h>
#include <stdlib.h> // exit
#include <string.h> // memset

#include "sherpa-onnx/c-api/c-api.h"

int32_t main() {
SherpaOnnxKeywordSpotterConfig config;

memset(&config, 0, sizeof(config));
config.model_config.transducer.encoder =
"./sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/"
"encoder-epoch-12-avg-2-chunk-16-left-64.onnx";

config.model_config.transducer.decoder =
"./sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/"
"decoder-epoch-12-avg-2-chunk-16-left-64.onnx";

config.model_config.transducer.joiner =
"./sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/"
"joiner-epoch-12-avg-2-chunk-16-left-64.onnx";

config.model_config.tokens =
"./sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/tokens.txt";

config.model_config.provider = "cpu";
config.model_config.num_threads = 1;
config.model_config.debug = 1;

config.keywords_file =
"./sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/test_wavs/"
"test_keywords.txt";

const SherpaOnnxKeywordSpotter *kws = SherpaOnnxCreateKeywordSpotter(&config);
if (!kws) {
fprintf(stderr, "Please check your config");
exit(-1);
}

fprintf(stderr,
"--Test pre-defined keywords from test_wavs/test_keywords.txt--\n");

const char *wav_filename =
"./sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/test_wavs/3.wav";

float tail_paddings[8000] = {0}; // 0.5 seconds

const SherpaOnnxWave *wave = SherpaOnnxReadWave(wav_filename);
if (wave == NULL) {
fprintf(stderr, "Failed to read %s\n", wav_filename);
exit(-1);
}

const SherpaOnnxOnlineStream *stream = SherpaOnnxCreateKeywordStream(kws);
if (!stream) {
fprintf(stderr, "Failed to create stream\n");
exit(-1);
}

SherpaOnnxOnlineStreamAcceptWaveform(stream, wave->sample_rate, wave->samples,
wave->num_samples);

SherpaOnnxOnlineStreamAcceptWaveform(stream, wave->sample_rate, tail_paddings,
sizeof(tail_paddings) / sizeof(float));
SherpaOnnxOnlineStreamInputFinished(stream);
while (SherpaOnnxIsKeywordStreamReady(kws, stream)) {
SherpaOnnxDecodeKeywordStream(kws, stream);
const SherpaOnnxKeywordResult *r = SherpaOnnxGetKeywordResult(kws, stream);
if (r && r->json && strlen(r->keyword)) {
fprintf(stderr, "Detected keyword: %s\n", r->json);

// Remember to reset the keyword stream right after a keyword is detected
SherpaOnnxResetKeywordStream(kws, stream);
}
SherpaOnnxDestroyKeywordResult(r);
}
SherpaOnnxDestroyOnlineStream(stream);

// --------------------------------------------------------------------------

fprintf(stderr, "--Use pre-defined keywords + add a new keyword--\n");

stream = SherpaOnnxCreateKeywordStreamWithKeywords(kws, "y ǎn y uán @演员");

SherpaOnnxOnlineStreamAcceptWaveform(stream, wave->sample_rate, wave->samples,
wave->num_samples);

SherpaOnnxOnlineStreamAcceptWaveform(stream, wave->sample_rate, tail_paddings,
sizeof(tail_paddings) / sizeof(float));
SherpaOnnxOnlineStreamInputFinished(stream);
while (SherpaOnnxIsKeywordStreamReady(kws, stream)) {
SherpaOnnxDecodeKeywordStream(kws, stream);
const SherpaOnnxKeywordResult *r = SherpaOnnxGetKeywordResult(kws, stream);
if (r && r->json && strlen(r->keyword)) {
fprintf(stderr, "Detected keyword: %s\n", r->json);

// Remember to reset the keyword stream
SherpaOnnxResetKeywordStream(kws, stream);
}
SherpaOnnxDestroyKeywordResult(r);
}
SherpaOnnxDestroyOnlineStream(stream);

// --------------------------------------------------------------------------

fprintf(stderr, "--Use pre-defined keywords + add two new keywords--\n");

stream = SherpaOnnxCreateKeywordStreamWithKeywords(
kws, "y ǎn y uán @演员/zh ī m íng @知名");

SherpaOnnxOnlineStreamAcceptWaveform(stream, wave->sample_rate, wave->samples,
wave->num_samples);

SherpaOnnxOnlineStreamAcceptWaveform(stream, wave->sample_rate, tail_paddings,
sizeof(tail_paddings) / sizeof(float));
SherpaOnnxOnlineStreamInputFinished(stream);
while (SherpaOnnxIsKeywordStreamReady(kws, stream)) {
SherpaOnnxDecodeKeywordStream(kws, stream);
const SherpaOnnxKeywordResult *r = SherpaOnnxGetKeywordResult(kws, stream);
if (r && r->json && strlen(r->keyword)) {
fprintf(stderr, "Detected keyword: %s\n", r->json);

// Remember to reset the keyword stream
SherpaOnnxResetKeywordStream(kws, stream);
}
SherpaOnnxDestroyKeywordResult(r);
}
SherpaOnnxDestroyOnlineStream(stream);

SherpaOnnxFreeWave(wave);
SherpaOnnxDestroyKeywordSpotter(kws);

return 0;
}
3 changes: 3 additions & 0 deletions cxx-api-examples/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -3,6 +3,9 @@ include_directories(${CMAKE_SOURCE_DIR})
add_executable(streaming-zipformer-cxx-api ./streaming-zipformer-cxx-api.cc)
target_link_libraries(streaming-zipformer-cxx-api sherpa-onnx-cxx-api)

add_executable(kws-cxx-api ./kws-cxx-api.cc)
target_link_libraries(kws-cxx-api sherpa-onnx-cxx-api)

add_executable(streaming-zipformer-rtf-cxx-api ./streaming-zipformer-rtf-cxx-api.cc)
target_link_libraries(streaming-zipformer-rtf-cxx-api sherpa-onnx-cxx-api)

Loading