Pinned Loading
Repositories
Showing 10 of 329 repositories
- UI-TARS-desktop Public
A GUI Agent application based on UI-TARS(Vision-Language Model) that allows you to control your computer using natural language.
bytedance/UI-TARS-desktop’s past year of commit activity - MTVQA Public
MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering. A comprehensive evaluation of multimodal large model multilingual text perception and comprehension capabilities across nine widely-used yet low-resource languages.
bytedance/MTVQA’s past year of commit activity - LVLM_Interpretation Public
The official repo for "Where do Large Vision-Language Models Look at when Answering Questions?"
bytedance/LVLM_Interpretation’s past year of commit activity