You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for your good job on RxRx3! what a good dataset for HCS tasks! I‘m downloading RxRx3 dataset and have done some analysis on the data, here are some of my problems:
problems for COMPOUND data
I found that for the data (rows) that the "SMILES" column is NAN, the "treatment" column would be CRISPR_control or EMPTY_control, what's the meaning of CRISPR_control and EMPTY_control? and what's the difference?
what's the meaning of the string after the smiles? (e.g. string "|c:11,13,23,29,32,t:1,9,26|" in CC1=C(C(CC(=O)N1)C1=CC=C(C=C1)C(F)(F)F)C(=O)NC1=C(F)C=C2NN=CC2=C1 |c:11,13,23,29,32,t:1,9,26|)
how to define the label if I want to use this dataset for supervised pre-training/classification task?using SMILES/treatment as label directly? or using SMILES/treatment under different concentrations as label?
some error may occur in compound003/Plate36.tar, some images (png files) are missed while could be found in the meta csv
there are 95,701 wells (well_id) in the meta csv for compound data, while totally 220,800 wells (well_id) in the embedding parquet files, how to understand the superfluous embeddings of wells (220,800 - 95,701) in the embedding parquet files?
what model was used for the extraction of the embeddings? could the model be publicly available? which dataset was used for the training of the model? and what's the size of your training set?
problems for CRISPR data
how to define the label if I want to use this dataset for supervised pre-training/classification task?using gene as label directly? or using treatment as label?
what's the meaning of "EMPTY_control" in the gene and treatment columns?
the notation of the treatment is "gene name_guide number", what's the meaning of "guide number"? (e.g. RXRX3-79420_guide_10)
Plate2.tar, Plate3.tar, Plate5.tar in gene017 experiment has downloading error, it would be interrupted in the progress of downloading
The text was updated successfully, but these errors were encountered:
Thanks for your good job on RxRx3! what a good dataset for HCS tasks! I‘m downloading RxRx3 dataset and have done some analysis on the data, here are some of my problems:
problems for COMPOUND data
problems for CRISPR data
The text was updated successfully, but these errors were encountered: