Question in reproducing experimental results. #2

wenhaoli-xmu · 2024-12-04T08:02:08Z

HI😊, we are reproducing your experimental results as the baseline of our method.

We are confused by the following questions. 🤔

First, why running the following code needs more than 1 minutes? Since the prompt is short, it is expected to finish very quickly.

In [8]: prompt = "TESLA company is found by"
In [9]: output = model(prompt=prompt)
...(a long time)...

Second, after waiting for over 1 min, we finally got the results looks like this:

In [10]: output
Out[10]: {'text': ['Nik']}

We think this output is not reasonable and want to know if there are some improper configurations in the following scripts?

@dataclass
class MagicpigConfig:
    server_type: str = 'hf'
    server_host: str = '127.0.0.1'
    server_port: str = '5000'
    ssh_server: Optional[str] = None
    ssh_key_path: Optional[str] = None
    model_name_or_path: str = 'meta-llama/Llama-2-7b-chat-hf'

    temperature: float = 0.0
    top_k: int = 32
    top_p: float = 1.0
    random_seed: int = 0
    stop_words: list = field(default_factory=list)
    sliding_window_size: int = None
    threads: int = 1
    
    K: int = 10
    L: int = 150
    S: float = 4.0
    W: int = 64
    Q: int = 0
    QR: float = 0.0
    max_seq_length: int = 4096
    max_new_tokens: int = 128

If there are improper configurations for short prompt generation, we want to further know what is the most suitable configuration under different prompt length? e.g. 1K, 2K, 4K, 8K.

dreaming-panda · 2024-12-07T02:55:34Z

I am unsure whether you can directly input a sentence (without tokenization) into a model. Can you run the RULER experiments?

wenhaoli-xmu · 2024-12-14T08:56:28Z

☺️☺️Thanks a lot for the answer🙏. We indeed use the code which you tested on RULER, and we have figured out the reason why it seems slow.

One more question for you, how can I stat the concrete pruning rate used in the decoding phase 🤔? As MagicPIG uses dynamic retrieval, it is not like Quest, which uses a fixed token budget.

By the way, if I got the concrete pruning rate, can I use the following formula to calculate the overall equivalent token budget 🤔?

sink_budget = 4
local_budget = 64
equivalent_budget = pruning_rate * prefill_context_length + sink_budget + local_budget

dreaming-panda · 2024-12-15T06:31:04Z

I think your understanding is correct. BTW, we will release v0.2 next week. Maybe make it easier for you to evaluate.

wenhaoli-xmu · 2024-12-16T06:18:04Z

Thanks a lot!☺️ Looking forward to your new release.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question in reproducing experimental results. #2

Question in reproducing experimental results. #2

wenhaoli-xmu commented Dec 4, 2024 •

edited

Loading

dreaming-panda commented Dec 7, 2024

wenhaoli-xmu commented Dec 14, 2024 •

edited

Loading

dreaming-panda commented Dec 15, 2024

wenhaoli-xmu commented Dec 16, 2024

Question in reproducing experimental results. #2

Question in reproducing experimental results. #2

Comments

wenhaoli-xmu commented Dec 4, 2024 • edited Loading

dreaming-panda commented Dec 7, 2024

wenhaoli-xmu commented Dec 14, 2024 • edited Loading

dreaming-panda commented Dec 15, 2024

wenhaoli-xmu commented Dec 16, 2024

wenhaoli-xmu commented Dec 4, 2024 •

edited

Loading

wenhaoli-xmu commented Dec 14, 2024 •

edited

Loading