Skip to content
View voidism's full-sized avatar

Block or report voidism

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
voidism/README.md

👋 You've reached the GitHub profile of Yung-Sung!

  • 👋 Hi, I’m Yung-Sung (@voidism) I am a final-year PhD student in Electrical Engineering and Computer Science at CSAIL, Massachusetts Institute of Technology, where I work with Jim Glass.
  • My research focuses on large language models: hallucinations, factuality, and retrieval-augmented generation. In addition, I worked on pre-training MetaCLIP 2, a multilingual vision-language model pre-trained on worldwide web-scale data, during my internship at Meta FAIR.
  • My research has introduced several approaches for improving LLM factuality. DoLa enhances factuality through layer-wise knowledge contrasting during decoding. Lookback Lens detects and mitigates hallucinations by analyzing attention patterns under RAG settings. Most recently, SelfCite enables LLMs to generate accurate citations without external supervision.
  • I also used to work on retrieval methods, developing DiffCSE for better sentence embeddings and Query Reranking for more accurate passage retrieval.

[Google Scholar] [CV] [Twitter] [Github] [DBLP] [Blog] [Linkedin] [Instagram]

Pinned Loading

  1. DoLa DoLa Public

    Official implementation for the paper "DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models"

    Python 510 59

  2. DiffCSE DiffCSE Public

    Code for the NAACL 2022 long paper "DiffCSE: Difference-based Contrastive Learning for Sentence Embeddings"

    Python 296 28

  3. huggingface/transformers huggingface/transformers Public

    🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

    Python 149k 30.3k

  4. s3prl/s3prl s3prl/s3prl Public

    Self-Supervised Speech Pre-training and Representation Learning Toolkit

    Python 2.5k 512

  5. facebookresearch/MetaCLIP facebookresearch/MetaCLIP Public

    ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering

    Python 1.7k 73

  6. facebookresearch/SelfCite facebookresearch/SelfCite Public

    Code for the ICML 2025 paper "SelfCite Self-Supervised Alignment for Context Attribution in Large Language Models"

    Python 15