Python toolkit for Chinese Language Understanding(CLUE) Evaluation benchmark
-
Updated
May 22, 2023 - Python
Python toolkit for Chinese Language Understanding(CLUE) Evaluation benchmark
Official repository for the "VERITE: A Robust Benchmark for Multimodal Misinformation Detection Accounting for Unimodal Bias" paper.
[COLING 2025] NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models
A from-scratch implementation of a T5 model modified with Rotary Position Embeddings (RoPE). This project includes the code for pre-training on the C4 dataset in streaming mode with Flash Attention 2.
Add a description, image, and links to the evaluation-benchmark topic page so that developers can more easily learn about it.
To associate your repository with the evaluation-benchmark topic, visit your repo's landing page and select "manage topics."