I read these papers that are related to NLP and Deep Learning. Here are various papers from basic to advanced. π In addition, you can check my Korean paper reviews by clicking the link attached to the table. π
You can see more paper reviews, code implementation, and mathematics descriptions in my blog <- click here
I write several articles to explain in detail some Deep Learning technologies. These articles can be found in the table below.
Paper Title | Paper or reference site Link | Paper Review |
---|---|---|
TinyBERT: Distilling BERT for Natural Language Understanding | https://arxiv.org/abs/1909.10351 | https://cartinoe5930.tistory.com/entry/TinyBERT-Distilling-BERT-for-Natural-Language-Understanding-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
DistilBERT: a distilled version of BERT | https://arxiv.org/abs/1910.01108 | https://cartinoe5930.tistory.com/entry/DistilBERT-a-distilled-version-of-BERT-smaller-faster-cheaper-and-lighter-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
It's Not Just Size That Matters: Small Language Models are Also Few-Shot Learners(PET μμ©) | https://arxiv.org/abs/2009.07118 | https://cartinoe5930.tistory.com/entry/Its-Not-Just-Size-That-Matters-Small-Language-Models-Are-Also-Few-Shot-Learners-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
Paper Title | Paper or reference site Link | Paper Review |
---|---|---|
Adapter: Parameter-Efficient learning for NLP | https://arxiv.org/abs/1902.00751 | https://cartinoe5930.tistory.com/entry/%EB%8B%B9%EC%8B%A0%EB%8F%84-Fine-tuning-%ED%95%A0-%EC%88%98-%EC%9E%88%EC%8A%B5%EB%8B%88%EB%8B%A4-with-PEFT-%F0%9F%A4%97 |
Prefix-Tuning: Optimizing Continuous Prompts for Generation | https://arxiv.org/abs/2101.00190 | https://cartinoe5930.tistory.com/entry/%EB%8B%B9%EC%8B%A0%EB%8F%84-Fine-tuning-%ED%95%A0-%EC%88%98-%EC%9E%88%EC%8A%B5%EB%8B%88%EB%8B%A4-with-PEFT-%F0%9F%A4%97 |
LoRA: Low-Rank Adaptation of Large Language Models | https://arxiv.org/abs/2106.09685 | https://cartinoe5930.tistory.com/entry/%EB%8B%B9%EC%8B%A0%EB%8F%84-Fine-tuning-%ED%95%A0-%EC%88%98-%EC%9E%88%EC%8A%B5%EB%8B%88%EB%8B%A4-with-PEFT-%F0%9F%A4%97 |
Towards a Unified View of Parameter-Efficient Transfer Learning | https://arxiv.org/abs/2110.04366 | Will be uploaded later! |
UniPELT: A Unified Framework for Parameter-Efficient Language Model Tuning | https://arxiv.org/abs/2110.07577 | Will be uploaded later! |
(IA)^3: Few-Shot Parameter-Efficient Fine-TUning is Better and Cheaper than In-Context Learning | https://arxiv.org/abs/2205.05638 | Will be uploaded later! |
QLoRA: Efficient Fine-tuning of Quantized LLMs | https://arxiv.org/abs/2305.14314 | Will be uploaded later! |
Stack More Layers Differently: High-Rank Training Through Low-Rank Updates | https://arxiv.org/abs/2307.05695 | Will be uploaded later! |
LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition | https://arxiv.org/abs/2307.13269 | Will be uploaded later! |
Paper Title | Paper or reference site Link | Paper Review |
---|---|---|
Instruction Mining: High-quality Instruction Data Selection for Large Language Models | https://arxiv.org/abs/2307.06290 | No plan! |
SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization | https://arxiv.org/abs/2212.10465 | No plan! |
MoDS: Model-oriented Data Selection for Instruction Tuning | https://arxiv.org/abs/2311.15653 | No plan! |
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models | https://arxiv.org/abs/2312.06585 | No plan! |
Magicoder: Source Code Is All You Need | https://arxiv.org/abs/2312.02120 | No plan! |
WaveCoder: Widespread and Versatile Enhanced Instruction Tuning with Refined Data Generation | https://arxiv.org/abs/2312.14187 | No plan! |
What Makes Good Data for Alignment: A Comprehensive Study of Automatic Data Selection in Instruction Tuning | https://arxiv.org/abs/2312.15685 | No plan! |
Paper Title | Paper | Paper Review |
---|---|---|
FlashAttention: Fast and Memory-Efficient Exact Attention | https://arxiv.org/abs/2205.14135 | https://gordicaleksa.medium.com/eli5-flash-attention-5c44017022ad |
Exponentially Faster Language Modeling | https://arxiv.org/abs/2311.10770 | No plan! |
LLM in a flash: Efficient Large Language Model Inference with Limited Memory | https://arxiv.org/abs/2312.11514 | No plan! |
Paper Title | Paper or reference site Link | Paper Review |
---|---|---|
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks | https://arxiv.org/abs/2005.11401 | No plan! |
Self-RAG: Learning to Retrieve, Generate, And Critique Through Self-Reflection | https://arxiv.org/abs/2310.11511 | No plan! |
InstructRetro: Instruction Tuning Post Retrieval-Augmented Pretraining | https://arxiv.org/abs/2310.07713 | No plan! |
Retrieval-Augmented Generation for Large Language Models: A Survey | https://arxiv.org/abs/2312.10997 | No plan! |
Paper Title | Paper or reference site Link | Paper Review |
---|---|---|
BIG-Bench Hard: Challenging BIG-Bench tasks and whether chain-of-thought can solve tham | https://arxiv.org/abs/2210.09261 | Will be uploaded later! |
Large Language Models are not Fair Evaluators | https://arxiv.org/abs/2305.17926 | Will be uploaded later! |
MT-Bench: Judging LLM-as-a-judge with MT-Bench | https://arxiv.org/abs/2306.05685 | Will be uploaded later! |
InstructEval: Towards Holistic Evaluation of Instruction-Tuned Large Language Models | https://arxiv.org/abs/2306.04757 | Will be uploaded later! |
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets | https://arxiv.org/abs/2307.10928 | Will be uploaded later! |
GAIA: A Benchmark for General AI Assistants | https://arxiv.org/abs/2311.12983 | No plan! |
Paper Title | Paper or reference site Link | Paper Review |
---|---|---|
Morpheme-aware Subword Tokenizer: An Empirical Study of Tokenization Strategies for Various Korean NLP Tasks | https://arxiv.org/abs/2010.02534 | Will be uploaded later! |
What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers | https://arxiv.org/abs/2109.04650 | Will be uploaded later! |
Paper Title | Paper or reference site Link | Paper Review |
---|---|---|
history of CNN | LeNet, AlexNet, VGGNet, GoogLeNet, ResNet, ResNeXt, Sception, Mobilenet, DenseNet, EfficientNet, ConvNext | https://cartinoe5930.tistory.com/entry/CNN-network%EC%9D%98-%EC%97%AD%EC%82%AC |
ViT: An Image Worth 16 x 16 Words: Transformers for Image Recognition at Scale | https://arxiv.org/abs/2010.11929 | https://cartinoe5930.tistory.com/entry/ViT-An-Image-Worth-16-x-16-Words-Transformers-for-Image-Recognition-at-Scale |
Swin Transformer: Hierarchical Vision Transformer using Shifted Winodws | https://arxiv.org/abs/2103.14030 | https://cartinoe5930.tistory.com/entry/Swin-Transformer-Hierarchical-Vision-Transformer-using-Shifted-Windows-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
CLIP: Learning Transferable Visual Models From Natural Language Supervision | https://arxiv.org/abs/2103.00020 | https://cartinoe5930.tistory.com/entry/CLIP-Learning-Transferable-Visual-Models-From-Natural-Language-Supervision-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
Paper or Posting Title | reference site Link | Review |
---|---|---|
Knowledge Distillation: Distilling the Knowledge in a Neural Network | https://arxiv.org/abs/1503.02531 | https://cartinoe5930.tistory.com/entry/Distilling-the-Knowledge-in-a-Neural-Network-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
What is Zero-shot, One-shot, Few-shot Learning? | see my blog! | https://cartinoe5930.tistory.com/entry/Zero-shot-One-shot-Few-shot-Learning%EC%9D%B4-%EB%AC%B4%EC%97%87%EC%9D%BC%EA%B9%8C |