大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
-
Updated
May 23, 2024
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
开源SFT数据集整理,随时补充
PTT 八卦版問答中文語料
汉字数据集,包括汉字的相关信息,例如笔画数、部首、拼音、英文释义/同义词等。
QBQTC: 大规模搜索匹配数据集
Pretrained model for Chinese Scientific Text
[CHABCNet] ABCNet on the Chinese dataset, building on Detectron2 (Facebook AI Research)
A large-scale offline Chinese handwritten signature dataset
🧠️🖥️2️⃣️0️⃣️0️⃣️1️⃣️🔠️🔢️ The linguistic:Chinese-Traditional category for AI2001, containing Chinese (Traditional) language linguistic datasets
🧠️🖥️2️⃣️0️⃣️0️⃣️1️⃣️🔠️🔢️ The linguistic:Chinese-Simplified category for AI2001, containing Chinese (Simplified) language linguistic datasets
中国40年春晚小品类节目的文本数据及数据分析 Text Data and Data Analysis of Chinese Spring Festival Gala Comedy Sketches Over 40 Years
Top Economics Journals Publications Dataset and Data Analysis: Top 5 English Journals and Top 3 Chinese Journals
2003-2023焦点访谈节目文本数据及数据分析 Text Data and Data Analysis of Focus Report, a Chinese Investigative TV Program, 2003-2023
Code repository for training Taiwan-ELM models, including data preprocessing, tokenizer development, and model fine-tuning.
Add a description, image, and links to the chinese-dataset topic page so that developers can more easily learn about it.
To associate your repository with the chinese-dataset topic, visit your repo's landing page and select "manage topics."