Skip to content

collecting publicly available distillation datasets based on DepSeek-R1

Notifications You must be signed in to change notification settings

hwei-hw/DeepSeek-Distillation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 

Repository files navigation

DeepSeek-Distillation

This repository collects publicly available distillation datasets based on DepSeek-R1, which are intended for researchers, students, and practitioners interested in exploring and enhancing the reasoning capabilities of large language models.

Datasets

These datasets are distilled from DeepSeek-R1

1. Mathematics

2. Medical

3. General domain

  • sequelbox/Raiden-DeepSeek-R1: is a dataset (62.9K) containing creative-reasoning and analytic-reasoning responses, testing the limits of DeepSeek R1's reasoning skills!
  • LLaVA-R1-100k: 大规模多模态Reasoning数据集,使用GPT4-o来生成图片描述,并通过DeepSeek-R1模型生成Reasoning过程;

4. Mixed domains

Others

These reasoning datasets are distilled from other models, such as GPT-4 series.

About

collecting publicly available distillation datasets based on DepSeek-R1

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published