Skip to content

sony/creativeai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

Deep Generative Modeling


PaGoDA

A 64x64 pre-trained diffusion model is all you need for 1-step high-resolution SOTA generation

NeurIPS24

CTM

Unified framework enables diverse samplers and 1-step generation SOTAs

ICLR24

Applications:
[SoundGen]

SAN

Enhancing GAN with metrizable discriminators

ICLR24

Applications:
[Vocoder]

MPGD

Fast, Efficient, Training-Free, and Controllable diffusion-based generation method

ICLR24

HQ-VAE

Generalizing hierarchical VQ-VAEs with a Bayesian framework

TMLR

FP-Diffusion

Improving density estimation of diffusion

ICML23

GibbsDDRM

Achieving blind inversion using DDPM

ICML23

Applications:
[DeReverb] [SpeechEnhance]

Consistency-type Models

Theoretically unified framework for "consistency" on diffusion model

ICML23 SPIGM Workshop

SQ-VAE

Improving codebook utilization and training stability

ICML22

AR-ELBO

Mitigating oversmoothness in VAE

Neurocomputing

Multimodal NLP


DiffuCOMET

[ACL] [arXiv] [code]

DiffuCOMET: Contextual Commonsense Knowledge Diffusion

ACL24

CyCLIPs/CyCLAPs

[ACL] [arXiv]

On the Language Encoder of Contrastive Cross-modal Models

ACL24

DIIR

[ACL] [arXiv] [code]

Few-shot Dialogue Strategy Learning for Motivational Interviewing via Inductive Reasoning

ACL24

PeaCok

PeaCoK: Persona Commonsense Knowledge for Consistent and Engaging Narratives
(Outstanding Paper Award)

ACL23

ComFact

ComFact: A Benchmark for Linking Contextual Commonsense Knowledge

EMNLP22 Findings

Music Technologies


Mixing Graph Estimation

Searching For Music Mixing Graphs: A Pruning Approach

DAFx24

Guitar Amp. Modeling

Improving Unsupervised Clean-to-Rendered Guitar Tone Transformation Using GANs and Integrated Unaligned Clean Data

DAFx24

Text-to-Music Editing

MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models

IJCAI24

Instr.-Agnostic Trans.

Timbre-Trap: A Low-Resource Framework for Instrument-Agnostic Music Transcription

ICASSP24

Vocal Restoration

VRDMG: Vocal Restoration via Diffusion Posterior Sampling with Multiple Guidance

ICASSP24

hFT-Transformer

Automatic Piano Transcription with Hierarchical Frequency-Time Transformer

ISMIR23

Automatic Music Tagging

An Attention-based Approach To Hierarchical Multi-label Music Instrument Classification

ICASSP23

Vocal Dereverberation

Unsupervised Vocal Dereverberation with Diffusion-based Generative Models

ICASSP23

Mixing Style Transfer

Music Mixing Style Transfer: A Contrastive Learning Approach to Disentangle Audio Effects

ICASSP23

Music Transcription

DiffRoll: Diffusion-based Generative Music Transcription with Unsupervised Pretraining Capability

ICASSP23

Singing Voice Vocoder

Hierarchical Diffusion Models for Singing Voice Neural Vocoder

ICASSP23

Distortion Effect Removal

Distortion Audio Effects: Learning How to Recover the Clean Signal

ISMIR22

Automatic Music Mixing

Automatic Music Mixing with Deep Learning and Out-of-Domain Data

ISMIR22

Sound Separation

Music Source Separation with Deep Equilibrium Models

ICASSP22

Automatic DJ Transition

Automatic DJ Transitions with Differentiable Audio Effects and Generative Adversarial Networks

ICASSP22

Singing Voice Conversion

Robust One-Shot Singing Voice Conversion

Sound Separation

Glenn Gould and Kanji Ishimaru 2021: A collaboration with AI Sound Separation after 60 years

Cinematic Technologies


GenWarp

GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping

NeurIPS24

SoundCTM

SoundCTM: Unifying Score-based and Consistency Models for Full-band Text-to-Sound Generation

MMAudio

Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

Acoustic Inv. Rendering

Hearing Anything Anywhere

CVPR24

STARSS23

STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events

NeurIPS23

CLIPSep

CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos

ICLR23

BigVSAN Vocoder

BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network

ICASSP24

Zero-/Few-shot SELD

Zero- and Few-shot Sound Event Localization and Detection

ICASSP24

Audio Restoration: ViT-AE

Extending Audio Masked Autoencoders Toward Audio Restoration

WASPAA23

Diffiner

Diffiner: A Versatile Diffusion-based Generative Refiner for Speech Enhancement

INTERSPEECH2023

Sound Event Localization and Detection

Multi-ACCDOA: Localizing and Detecting Overlapping Sounds from the Same Class with Auxiliary Duplicating Permutation Invariant Training

ICASSP22

Hosted Challenges


SVG Challenge 2024

Sounding Video Generation Challenge 2024

DCASE Challenge Task 3

Sound Event Localization and Detection Evaluated in Real Spatial Sound Scenes

CPD Challenge 2023

Commonsense Persona-grounded Dialogue Challenge

SDX Challenge 2023

Sound Demixing Challenge 2023

MDX Challenge 2021

Music Demixing Challenge 2021

### Contact
Yuki Mitsufuji (yuhki.mitsufuji@sony.com)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published