-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Are the downloadable finetuned weights for secondary structure prediction intra- or inter-familty trained? #16
Comments
Hello 😄, I just tried out weights that were used for SRP family evaluation and results are as expected. Are you sure you are using the right weights? File |
Thank you for your clarification! I’m now able to obtain the expected results for SRP using If I want to replicate the finetuning process as well, do I need to reorganize the folder hierarchy so that it follows a "one family vs. all other families" structure? Could you confirm if this is the correct approach? |
I did use the default data splits to fine-tune the pre-trained model (it seems the files are already organized in an inter-family format). However, the test results are far from expected. Could you please see if the hyperparameters are properly set? I was using the cmd: The summary metrics are: Notice the validation f1 is very high but test f1 is very low. |
I found a few discrepancies in the code compared to our internal version that caused the learning rate to be higher than expected. Thanks for pointing this out. High learning rates during fine-tuning tend to overwrite pre-trained knowledge of the LM. I pushed new changes to the main branch. Could you please pull the latest commit and try repeating the experiments? Results should now align with what we reported in the paper. You can use this command to run the experiment: |
Thank you for double-checking the hyperparameters and updating the code—I really appreciate it. I pulled the latest version and fine-tuned the model on three datasets: archiveII_5s, archiveII_srp, and bpRNA. Here are the test metrics for reference: 5s: F1 = 0.860, Precision = 0.969, Recall = 0.780 Before proceeding with fine-tuning on other archiveII families, I wanted to check if there’s anything I might have overlooked. For instance, I noticed that the default batch size is set to 1, which is uncommon in machine learning. Was this the batch size used in the paper? Looking forward to your thoughts! |
Yes, batch size was set to one in the paper as well. While it is true that such batch size is uncommon in machine learning, it isn't that unusual when it comes to SS prediction based on DL (for example MXfold2 and Ufold also had their training batch sizes set to one). Main problem is that for SS prediction you usually need to featurize/model all possible nucleotide pairings which leads to a quadratic memory complexity. As we conducted fine-tuning experiments on a bit "weaker" GPUs (12GB - 16GB) compared to the GPUs we used for pre-training (A100, ~80GB), we decided to set the batch size to one to be able to process a bit longer sequences during training. |
Hey Rafael - Thanks for the amazing worked, I enjoyed reading your manuscript. I was wondering if it would be possible to make available the secondary structure prediction finetuned weights for the 150M parameter model. I only see the pre-trained ones. Thanks! |
I noticed that the the split in the archiveII dataset (
fam-fold
) is intra-family based. However, the paper claims that the fine-tuning on archiveII used an inter-family split.Could the author clarify a bit about how the downloadable weights are fine-tuned?
The text was updated successfully, but these errors were encountered: