evaluation-benchmark

Star

Here are 4 public repositories matching this topic...

CLUEbenchmark / PyCLUE

Star

Python toolkit for Chinese Language Understanding(CLUE) Evaluation benchmark

corpus tiny language-model albert bert chinese-language xlnet evaluation-benchmark chineseglue roberta-wwm-ext

Updated May 22, 2023
Python

stevejpapad / image-text-verification

Star

Official repository for the "VERITE: A Robust Benchmark for Multimodal Misinformation Detection Accounting for Unimodal Bias" paper.

misinformation multimodal-deep-learning evaluation-benchmark

Updated Jan 11, 2024
Python

hhan1018 / NesTools

Star

[COLING 2025] NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models

nested-structures evaluation-benchmark large-language-models tool-learning

Updated Jan 18, 2025
Python

A from-scratch implementation of a T5 model modified with Rotary Position Embeddings (RoPE). This project includes the code for pre-training on the C4 dataset in streaming mode with Flash Attention 2.

nlp pytorch sequence-to-sequence language-model from-scratch rope pre-training huggingface t5 evaluation-benchmark llm rotary-position-embedding flash-attention c4-dataset span-corruption

Updated Jul 9, 2025
Python

Improve this page

Add a description, image, and links to the evaluation-benchmark topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the evaluation-benchmark topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evaluation-benchmark

Here are 4 public repositories matching this topic...

CLUEbenchmark / PyCLUE

stevejpapad / image-text-verification

hhan1018 / NesTools

LaBackDoor / rope-t5

Improve this page

Add this topic to your repo