some problems for RxRx3 #9

liujunhongznn · 2023-07-17T11:57:54Z

Thanks for your good job on RxRx3! what a good dataset for HCS tasks! I‘m downloading RxRx3 dataset and have done some analysis on the data, here are some of my problems:

problems for COMPOUND data

I found that for the data (rows) that the "SMILES" column is NAN, the "treatment" column would be CRISPR_control or EMPTY_control, what's the meaning of CRISPR_control and EMPTY_control? and what's the difference?
what's the meaning of the string after the smiles? (e.g. string "|c:11,13,23,29,32,t:1,9,26|" in CC1=C(C(CC(=O)N1)C1=CC=C(C=C1)C(F)(F)F)C(=O)NC1=C(F)C=C2NN=CC2=C1 |c:11,13,23,29,32,t:1,9,26|)
how to define the label if I want to use this dataset for supervised pre-training/classification task?using SMILES/treatment as label directly? or using SMILES/treatment under different concentrations as label?
some error may occur in compound003/Plate36.tar, some images (png files) are missed while could be found in the meta csv
there are 95,701 wells (well_id) in the meta csv for compound data, while totally 220,800 wells (well_id) in the embedding parquet files, how to understand the superfluous embeddings of wells (220,800 - 95,701) in the embedding parquet files?
what model was used for the extraction of the embeddings? could the model be publicly available? which dataset was used for the training of the model? and what's the size of your training set?

problems for CRISPR data

how to define the label if I want to use this dataset for supervised pre-training/classification task?using gene as label directly? or using treatment as label?
what's the meaning of "EMPTY_control" in the gene and treatment columns?
the notation of the treatment is "gene name_guide number", what's the meaning of "guide number"? (e.g. RXRX3-79420_guide_10)
Plate2.tar, Plate3.tar, Plate5.tar in gene017 experiment has downloading error, it would be interrupted in the progress of downloading

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

some problems for RxRx3 #9

some problems for RxRx3 #9

liujunhongznn commented Jul 17, 2023

some problems for RxRx3 #9

some problems for RxRx3 #9

Comments

liujunhongznn commented Jul 17, 2023

problems for COMPOUND data

problems for CRISPR data