Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question in reproducing experimental results. #2

Open
wenhaoli-xmu opened this issue Dec 4, 2024 · 4 comments
Open

Question in reproducing experimental results. #2

wenhaoli-xmu opened this issue Dec 4, 2024 · 4 comments

Comments

@wenhaoli-xmu
Copy link

wenhaoli-xmu commented Dec 4, 2024

HI😊, we are reproducing your experimental results as the baseline of our method.

We are confused by the following questions. 🤔

First, why running the following code needs more than 1 minutes? Since the prompt is short, it is expected to finish very quickly.

In [8]: prompt = "TESLA company is found by"
In [9]: output = model(prompt=prompt)
...(a long time)...

Second, after waiting for over 1 min, we finally got the results looks like this:

In [10]: output
Out[10]: {'text': ['Nik']}

We think this output is not reasonable and want to know if there are some improper configurations in the following scripts?

@dataclass
class MagicpigConfig:
    server_type: str = 'hf'
    server_host: str = '127.0.0.1'
    server_port: str = '5000'
    ssh_server: Optional[str] = None
    ssh_key_path: Optional[str] = None
    model_name_or_path: str = 'meta-llama/Llama-2-7b-chat-hf'

    temperature: float = 0.0
    top_k: int = 32
    top_p: float = 1.0
    random_seed: int = 0
    stop_words: list = field(default_factory=list)
    sliding_window_size: int = None
    threads: int = 1
    
    K: int = 10
    L: int = 150
    S: float = 4.0
    W: int = 64
    Q: int = 0
    QR: float = 0.0
    max_seq_length: int = 4096
    max_new_tokens: int = 128

If there are improper configurations for short prompt generation, we want to further know what is the most suitable configuration under different prompt length? e.g. 1K, 2K, 4K, 8K.

@dreaming-panda
Copy link
Contributor

I am unsure whether you can directly input a sentence (without tokenization) into a model. Can you run the RULER experiments?

@wenhaoli-xmu
Copy link
Author

wenhaoli-xmu commented Dec 14, 2024

☺️☺️Thanks a lot for the answer🙏. We indeed use the code which you tested on RULER, and we have figured out the reason why it seems slow.

One more question for you, how can I stat the concrete pruning rate used in the decoding phase 🤔? As MagicPIG uses dynamic retrieval, it is not like Quest, which uses a fixed token budget.

By the way, if I got the concrete pruning rate, can I use the following formula to calculate the overall equivalent token budget 🤔?

sink_budget = 4
local_budget = 64
equivalent_budget = pruning_rate * prefill_context_length + sink_budget + local_budget

@dreaming-panda
Copy link
Contributor

I think your understanding is correct. BTW, we will release v0.2 next week. Maybe make it easier for you to evaluate.

@wenhaoli-xmu
Copy link
Author

Thanks a lot!☺️ Looking forward to your new release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants