1044-Hours-Minnan-Dialect-Speech-Data-by-Mobile-Phone

Description

Hokkien(China) Dialect Scripted Monologue Smartphone speech dataset, collected from monologue based on given prompts, covering short message and other 30+ customer consultation domains. Transcribed with text content, gender, age, accent and other attributes. Our dataset was collected from extensive and diversify speakers(2496 people, which come from Quanzhou, Zhangzhou, Taiwan, Xiamen and other sourthern China districts), geographicly speaking, enhancing model performance in real and complex tasks.nQuality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied. For more details, please refer to the link: https://www.nexdata.ai/datasets/speechrecog/50?source=Github

Specifications

Format

16kHz, 16bit, wav, mono channel

Content category

Customer consultation (covering 30+ domains); short message

Recording condition

Low background noise (indoor)

Recording device

Smartphone; Android:iOS = 3:1

Country

China(CHN)

Language

Hokkien

Speaker

2,496 people; 55% females; 1,049 speakers are among 21-25 years old; speakers are from QuanZhou, ZhangZhou, TaiWan, XiaMen and other southern China districts

Features of annotation

Transcription text, gender, age, accent, noise

Licensing Information

Commercial License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

1044-Hours-Minnan-Dialect-Speech-Data-by-Mobile-Phone

Description

Specifications

Format

Content category

Recording condition

Recording device

Country

Language

Speaker

Features of annotation

Licensing Information

Files

README.md

Latest commit

History

README.md

File metadata and controls

1044-Hours-Minnan-Dialect-Speech-Data-by-Mobile-Phone

Description

Specifications

Format

Content category

Recording condition

Recording device

Country

Language

Speaker

Features of annotation

Licensing Information