Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'Prepare training data' facing unhashable type error #114

Closed
Gautam-Rajeev opened this issue Feb 6, 2024 · 1 comment · Fixed by #115
Closed

'Prepare training data' facing unhashable type error #114

Gautam-Rajeev opened this issue Feb 6, 2024 · 1 comment · Fixed by #115

Comments

@Gautam-Rajeev
Copy link
Contributor

Unable to run :

from ragatouille import RAGTrainer
trainer = RAGTrainer(model_name = "MyFineTunedColBERT",
        pretrained_model_name = "colbert-ir/colbertv2.0")

pairs = [
    ("What is the meaning of life ?", "The meaning of life is 42"),
    ("What is Neural Search?", "Neural Search is a terms referring to a family of ")]

trainer.prepare_training_data(raw_data=pairs)

Getting the error :

[<ipython-input-58-5dad2ddb53fa>](https://localhost:8080/#) in <cell line: 1>()
----> 1 trainer.prepare_training_data(raw_data=pairs)

[/usr/local/lib/python3.10/dist-packages/ragatouille/RAGTrainer.py](https://localhost:8080/#) in prepare_training_data(self, raw_data, all_documents, data_out_path, num_new_negatives, hard_negative_minimum_rank, mine_hard_negatives, hard_negative_model_size, pairs_with_labels, positive_label, negative_label)
    123 
    124         self.queries = set([x[0] for x in raw_data])
--> 125         self.collection = list(set(self.collection))
    126         seeded_shuffle(self.collection)
    127 

TypeError: unhashable type: 'Series'
Gautam-Rajeev added a commit to Gautam-Rajeev/RAGatouille that referenced this issue Feb 6, 2024
Fixing issue described AnswerDotAI#114
@bclavie
Copy link
Collaborator

bclavie commented Feb 7, 2024

Thank you for investigating and fixing this! Just had a minor nitpick on the PR that I've changed but all good to me, will be live on PyPi really soon!

bclavie added a commit that referenced this issue Feb 7, 2024
)

* Update RAGTrainer.py

Fixing issue described #114

* chore: error out if query is not a string

* chore: lint

---------

Co-authored-by: bclavie <ben@clavie.eu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants