Kaggle LLM Science Exam #52

manisnesan · 2023-08-09T02:36:23Z

Jeremy Twitter thread
Training set 200 science multiple choice questions autogenerated using GPT 3.5
RAG pattern
No Retriever: Use LM alone. Pass the question alone directly to GPT 3.5 using llm library.
OpenAI got the wrong answer for the following

Which of the following statements accurately describes the origin and significance of the triskeles symbol?"

Dive deeper using BING
Give the model a chance to "think about the answer" by asking the question without multiple choice questions. Instead ask

Please accurately describe the origin and significance of the triskeles symbol

First go through each of the 5 options, explaining why it is or isn't a good description, and then finally say which you think is most accurate

followed by multiple choice questions.

Above 2 step approach is a way to get better results
Enable Page Context Feature from bing

Tricks

opening a page in Edge and using the Bing sidebar (you must first enable page context in Edge settings). It even works with PDFs!

To enable it, in Microsoft Edge go to settings, then type "sidebar" in the settings search, scroll down to "app specific settings", and click "Discover": > Allow Web page access

Allow the LLM to think. GPT are autoregressive models . They produce token in a sequence one by one. The more they perform these computations they perform well. So try to have chat related to your question intent.

Example

Precisely and fully describe the answer to the following question: what is resistivity?"

I will ask a multiple choice question, with 5 answers A-E.
First, output 'Options: ' followed by going through each of the 5 options, explaining why it is or isn't a good description.
Then, output 'Summary: ' followed by a description of which you think is most accurate, and why.
Finally, output 'Answers: ' followed by the 5 answers A-E sorted from best answer to worst. E.g 'Answers: B C E A D'.
Reminder: it's VERY IMPORTANT the final line of your response is text text 'Answers: ' followed by the sorted list of answers A-E.

Question: {r.prompt}
A: {r.A}
B: {r.B}
C: {r.C}
D: {r.D}
E: {r.E}

Also use LLM to rewrite the query.

manisnesan · 2023-08-09T02:37:26Z

Jonathan Whitaker P1 YouTube video
Intro to the Kaggle competition
- Mulitple Choice Question Answering (A-E) with 3 guesses allowed. Metric: MAP@3
Benchmarking with GPT3.5
- Load the data
- Create our prompts
  - system -> """Answer the follwing multiple-choice question by providing your top 3 guesses in order from most to least likely, using the following format: 'A C D' (just the letters separated by spaces)."""
  - user -> Question: $QUESTION. Answers: A:$A B:$B C:$C C:$C E:$E
- Call the chat completions API with gpt-3.5-turbo model and messages as input
```
 openai.ChatCompletion.create(
  model="gpt-3.5-turbo",
  messages=[
      {"role": "system", "content": system_message},
      {"role": "user", "content": user_message}
  ])
```
Using the OpenAI function calling API to enforce structure on answers
- - Not relying on the model to provide response in free form but we can able to get a structured output.

# Define the function(s) the model will be able to use (in this case, only one)
    functions = [
        {
            "name": "answer_question",
            "description": "Answers the provided question",
            "parameters": {
                "type": "object",
                "properties": {
                    "reasoning": {
                    "type": "string",
                    "description": "Reasining for what the answer could be. Keep it short."
                    },
                    "answers": {
                    "type": "array",
                    "items": {
                        "type": "string",
                        "enum": ["A", "B", "C", "D", "E"],
                    },
                    "description": "Your top 3 guesses, from most to least likely. e.g. ['A', 'D', 'C']"
                    }
                },
                "required": ["reasoning", "answers"],
            },
        }
    ]

Using Llama2 as a classifier by examining the logits (next token predictions)
Using perplexity to evaluate question-answer pairs
- Refer: LLM Perplexity Ranking Ensemble Kaggle notebook

manisnesan · 2023-09-02T15:45:52Z

Full playlist

manisnesan · 2023-09-07T08:50:49Z

Differential Learning rates and LORA - notebook by Wayde

manisnesan · 2023-09-23T12:58:57Z

RAG with additional dataset from Chris Deotte

Bonus 40k Dataset - Boost CV and LB

manisnesan · 2023-09-23T14:12:58Z

Best Open Source LLM Starter Pack by Radek

? quantized to 8 bits

manisnesan · 2023-09-23T23:15:50Z

Perplexity

manisnesan · 2023-09-24T11:32:11Z

Transformers - Primer by aman.ai

Mathematical background (Vectors, Matrix Multiplication, Dot Product, Masking, Sampling)
Attention (Additive/Multiplicative/Dot Product Attention, Self/Cross-Attention, Multihead Attention)
Core components of the Transformer Architecture (Embeddings, Positional Encoding, Skip Connections, Layer Normalization, Softmax)
Top-level Transformer Architecture (Encoder and Decoder stack)
Implementation details (Byte-Pair Encoding, Teacher Forcing, Label Smoothing)
Lessons learned (What are Transformers learning? Why is training them so hard?)
Pros/cons of Transformers relative to CNNs/RNNs
Relation between Transformers and Graph Neural Networks

🔹 GPT: http://gpt.aman.ai

Background: Generative Pre-Training, Transformer Decoder
GPT-1: Improving Language Understanding by Generative Pre-Training
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-3: Language Models are Few-Shot Learners
GPT-4

🔹 BERT: http://bert.aman.ai

Background: Pre-Training, Transformer Encoder
Contextualized Embeddings
Masked Language Modeling (MLM)
Next Sentence Prediction (NSP)
BERT’s Encoder Architecture vs. Other Decoder Architectures
The Strength of Bidirectionality
Supervised Fine-Tuning

manisnesan · 2023-10-12T19:13:29Z

https://www.kaggle.com/competitions/kaggle-llm-science-exam

Check the solution posts

How to fine-tune LLMs
How to properly use RAG techniques for augmenting LLMs
RAG chunking, embedding, similarity search, and other related techniques
Synthetic data generation for training models
How to optimize inference code for optimal runtime on limited HW resources.
How to fit large LLMs in small GPUs. People even managed to run 70B LLama2 on 2xT4.

https://www.kaggle.com/competitions/kaggle-llm-science-exam/discussion/446414

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kaggle LLM Science Exam #52

Kaggle LLM Science Exam #52

manisnesan commented Aug 9, 2023 •

edited

Loading

manisnesan commented Aug 9, 2023 •

edited

Loading

manisnesan commented Sep 2, 2023

manisnesan commented Sep 7, 2023

manisnesan commented Sep 23, 2023

manisnesan commented Sep 23, 2023 •

edited

Loading

manisnesan commented Sep 23, 2023

manisnesan commented Sep 24, 2023 •

edited

Loading

manisnesan commented Oct 12, 2023

Kaggle LLM Science Exam #52

Kaggle LLM Science Exam #52

Comments

manisnesan commented Aug 9, 2023 • edited Loading

Tricks

manisnesan commented Aug 9, 2023 • edited Loading

manisnesan commented Sep 2, 2023

manisnesan commented Sep 7, 2023

manisnesan commented Sep 23, 2023

manisnesan commented Sep 23, 2023 • edited Loading

manisnesan commented Sep 23, 2023

manisnesan commented Sep 24, 2023 • edited Loading

manisnesan commented Oct 12, 2023

manisnesan commented Aug 9, 2023 •

edited

Loading

manisnesan commented Aug 9, 2023 •

edited

Loading

manisnesan commented Sep 23, 2023 •

edited

Loading

manisnesan commented Sep 24, 2023 •

edited

Loading