Skip to content

Latest commit

 

History

History
203 lines (136 loc) · 7.76 KB

Audio.md

File metadata and controls

203 lines (136 loc) · 7.76 KB

CNN音频分割(音乐/人声/性别)工具集 https://github.com/ina-foss/inaSpeechSegmenter

最强CNN语音识别算法开源了:词错率5%,训练超快,Facebook出品 https://github.com/facebookresearch/wav2letter

轻量语音识别解码框架 https://github.com/robin1001/xdecoder

Espresso:快速端到端神经网络语音识别工具集 https://github.com/freewym/espresso

PyTorch音频处理工具/数据集 https://github.com/audeering/audtorch

'基于Kaldi的aidatatang_200zh的训练之葵花宝典' https://github.com/datatang-ailab/aidatatang_200zh/blob/master/README.zh.md

【轻量快速语音合成】’LightSpeech - A Light, Fast and Robust Speech Synthesis.'

https://github.com/xcmyz/lightspeech

DeepSpectrum:基于预训练图像CNN的音频数据特征抽取工具包 https://github.com/DeepSpectrum/DeepSpectrum

Landmark音频指纹 https://github.com/dpwe/audfprint

基于Kaldi/Tensorflow的神经网络说话人识别/鉴别系统 https://github.com/mycrazycracy/tf-kaldi-speaker

【PyTorch语音识别框架】’patter - speech-to-text framework in PyTorch with initial support for the DeepSpeech2 architecture https://github.com/ryanleary/patter

(语音)说话人分割相关资源大列表 https://github.com/wq2012/awesome-diarization

Audio samples from ICML2019 "Almost Unsupervised Text to Speech and Automatic Speech Recognition" https://github.com/SpeechResearch/speechresearch.github.io

(PyTorch)Seq2Seq普通话Transformer语音识别 https://github.com/ZhengkunTian/Speech-Tranformer-Pytorch

Deep neural network based speech enhancement toolkit https://github.com/jtkim-kaist/Speech-enhancement

音乐音频标记预训练深度网络模型 https://github.com/jordipons/musiCNN

End-to-End Automatic Speech Recognition on PyTorch https://github.com/gentaiscool/end2end-asr-pytorch

(Pytorch)音源分离语音信号提取 https://github.com/AppleHolic/source_separation

Code and models for evaluating a state-of-the-art lip reading network https://github.com/afourast/deep_lip_reading

声音模仿秀:5秒钟实时克隆任意语音

https://github.com/CorentinJ/Real-Time-Voice-Cloning

Program to benchmark various speech recognition APIs https://github.com/Franck-Dernoncourt/ASR_benchmark

基于Transformer的TTS语音合成模型 https://github.com/xcmyz/Transformer-TTS

DIY智能音箱(资源列表) https://github.com/voice-engine/make-a-smart-speaker/blob/master/zh.md

用深度学习实时克隆别人的声音 https://towardsdatascience.com/you-can-now-speak-using-someone-elses-voice-with-deep-learning-8be24368fa2b

用卷积网络从立体声音乐中分离乐器 https://towardsdatascience.com/audio-ai-isolating-instruments-from-stereo-music-using-convolutional-neural-networks-584ababf69de

用卷积神经网络从立体声音乐中分离人声 https://towardsdatascience.com/audio-ai-isolating-vocals-from-stereo-music-using-convolutional-neural-networks-210532383785

面向下一代交互设备的开源语音交互操作系统 https://github.com/yodaos-project/yodaos

笑声检测器 https://github.com/ideo/LaughDetection

'ASRT_SpeechRecognition - A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统' by nl8590687 https://github.com/nl8590687/ASRT_SpeechRecognition

用卷积神经网络从立体声音乐中分离人声 https://towardsdatascience.com/audio-ai-isolating-vocals-from-stereo-music-using-convolutional-neural-networks-210532383785

A Pytorch Implementation of "Neural Speech Synthesis with Transformer Network" https://github.com/soobinseo/Transformer-TTS

This is research-code for Synthesizing Obama: Learning Lip Sync from Audio. https://github.com/supasorn/synthesizing_obama_network_training

Voice Operated Character Animation https://voca.is.tue.mpg.de/en https://github.com/TimoBolkart/voca

Deezer 的(Tensorflow)音源分离库,可用命令行直接提取音乐中的人声、钢琴、鼓声等 https://github.com/deezer/spleeter

【开源语音分离/增强库】 https://github.com/speechLabBcCuny/onssen

Feature extractor for DL speech processing. https://github.com/bepierre/SpeechVGG

Transforming Spectrum and Prosody for Emotional Voice Conversion with Non-Parallel Training Data https://github.com/KunZhou9646/Nonparallel-emotional-VC

This is a PyTorch re-implementation of Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition. https://github.com/foamliu/Speech-Transformer

【Athena:开源端到端语音识别引擎】 https://github.com/athena-team/athena

PREDICTING EXPRESSIVE SPEAKING STYLE FROM TEXT IN END-TO-END SPEECH SYNTHESIS https://github.com/Yangyangii/TPGST-Tacotron

PyTorch implementation of LF-MMI for End-to-end ASR https://github.com/YiwenShaoStephen/pychain

Audio samples from ICML2019 "Almost Unsupervised Text to Speech and Automatic Speech Recognition" https://github.com/RayeRen/unsuper_tts_asr

Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention. https://github.com/CSTR-Edinburgh/ophelia

Efficient neural speech synthesis https://github.com/MlWoo/LPCNet

Code for Vision-Infused Deep Audio Inpainting (ICCV 2019) https://github.com/Hangz-nju-cuhk/Vision-Infused-Audio-Inpainter-VIAI

deep learning based speech enhancement using keras or pytorch https://github.com/yongxuUSTC/sednn

Multi-voice singing voice synthesis https://github.com/MTG/WGANSing

【用涂鸦“唱歌”:将图像合成为声音】 https://github.com/jeonghopark/SketchSynth-Simple

【面向语音识别的中文/英文发音辞典】’ https://github.com/speech-io/BigCiDian

【Kaldi/TensorFlow实现的神经网络说话人验证系统】

https://github.com/someonefighting/tf-kaldi-speaker-master

Facebook开源低延迟在线语音识别框架wav2letter

https://github.com/facebookresearch/wav2letter/wiki/Inference-Framework

【GridSound:在线数字音频编辑器】 https://github.com/GridSound/daw

【Asteroid:基于PyTorch的音源分离工具集】 https://github.com/mpariente/ASSteroid

【MelGAN 超快音频合成】 https://github.com/descriptinc/melgan-neurips

用深度学习生成钢琴音乐 https://github.com/haryoa/note_music_generator

音频分析/音乐检索相关数据集大列表 https://www.audiocontentanalysis.org/data-sets/

【用TensorRT在GPU上部署实时文本-语音合成应用】

https://devblogs.nvidia.com/how-to-deploy-real-time-text-to-speech-applications-on-gpus-using-tensorrt/

用WaveNet让语音受损用户重拾原声(少样本自适应自然语音合成) https://deepmind.com/blog/article/Using-WaveNet-technology-to-reunite-speech-impaired-users-with-their-original-voices

(C++)音频文件波形图生成 https://github.com/bbc/audiowaveform

【时域卷积DeepFake变音检测】 https://github.com/dessa-public/fake-voice-detection

Athena:(Tensorflow)端到端自动语音识别引擎开源实现 https://github.com/didi/athena

SV2TTS https://github.com/CorentinJ/Real-Time-Voice-Cloning

【GPU上的特定领域自动语音识别模型】《How to Build Domain Specific Automatic Speech Recognition Models on GPUs》

https://devblogs.nvidia.com/how-to-build-domain-specific-automatic-speech-recognition-models-on-gpus/

【(音频)数字信号处理入门(Notebooks)】 https://github.com/earthspecies/from_zero_to_DSP

【at16k:Python语音识别库】’at16k - Trained models for automatic speech recognition (ASR). A library to quickly build applications that require speech to text conversion.'

https://github.com/at16k/at16k

一维卷积网络音频处理 https://github.com/KinWaiCheuk/nnAudio

CRF数据高效端到端语音识别工具集 https://github.com/thu-spmi/CAT

【音乐波形域音源分离】’Music Source Separation in the Waveform Domain - source separation in the waveform domain for music' https://github.com/facebookresearch/demucs