Skip to content

Commit

Permalink
Software garden and add figshare DOIs (#52)
Browse files Browse the repository at this point in the history
* software garden and add figshare DOIs

* review suggestions
  • Loading branch information
jenna-tomkinson authored Aug 19, 2024
1 parent c4c0e7b commit 9ff205a
Show file tree
Hide file tree
Showing 5 changed files with 120 additions and 103 deletions.
2 changes: 1 addition & 1 deletion 0.download_data/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,5 +15,5 @@ cd 0.download_data
source download_plates.sh
```

To download 4 plates from figshare, it took about 50 minutes.
To download 5 plates from figshare, it took about __58__ minutes.
There is the option to parallelize this in the future depending on needs.
76 changes: 58 additions & 18 deletions 0.download_data/download_plates.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -85,24 +85,33 @@
" # save extracted files from figshare download to a folder in the `0.download_data` directory\n",
" \"output_dir\": pathlib.Path(\"./Plate_2\"),\n",
" },\n",
" # these plates are combined due to the figshare project containing zip files with the images for each\n",
" # plate that will need to be extracted in a second step\n",
" # these plates are combined due to the figshare project containing zip files with the images for each plate and\n",
" # is extracted in a second step\n",
" \"Plates_3_and_3_prime\": {\n",
" \"figshare_id\": \"22592890\",\n",
" \"version_number\": \"2\",\n",
" \"output_folder\": \"Plates_3_zip\",\n",
" # save extracted zip files from figshare download to a folder in the `0.download_data` directory\n",
" \"output_dir\": pathlib.Path(\"./Plates_3_and_3_prime\"),\n",
" },\n",
" # this plate data was added to figshare as a zip file due to the size of the data (13GB) and will\n",
" # need to be extracted in a second step\n",
" # this plate data was added to figshare as a zip file due to the size of the data and\n",
" # is extracted in a second step\n",
" \"Plate_4\": {\n",
" \"figshare_id\": \"23671056\",\n",
" \"version_number\": \"1\",\n",
" \"output_folder\": \"Plates_4_zip\",\n",
" # save extracted zip file from figshare download to a folder in the `0.download_data` directory\n",
" \"output_dir\": pathlib.Path(\"./Plate_4_zip\"),\n",
" },\n",
" # this plate data was added to figshare as a zip file due to the size of the data and\n",
" # is extracted in a second step\n",
" \"Plate_5\": {\n",
" \"figshare_id\": \"26759914\",\n",
" \"version_number\": \"1\",\n",
" \"output_folder\": \"Plates_5_zip\",\n",
" # save extracted zip file from figshare download to a folder in the `0.download_data` directory\n",
" \"output_dir\": pathlib.Path(\"./Plate_5_zip\"),\n",
" },\n",
"}"
]
},
Expand Down Expand Up @@ -131,7 +140,9 @@
"The metadata has been moved into its own directory!\n",
"The downloaded zip file contents have been extracted into Plates_3_and_3_prime folder for plate with ID 22592890/versions/2!\n",
"The metadata has been moved into its own directory!\n",
"The downloaded zip file contents have been extracted into Plate_4 folder for plate with ID 23671056/versions/1!\n",
"The downloaded zip file contents have been extracted into Plate_4_zip folder for plate with ID 23671056/versions/1!\n",
"The metadata has been moved into its own directory!\n",
"The downloaded zip file contents have been extracted into Plate_5_zip folder for plate with ID 26759914/versions/1!\n",
"The metadata has been moved into its own directory!\n"
]
}
Expand Down Expand Up @@ -184,8 +195,34 @@
" \"path_to_zip_file\": pathlib.Path(\"./Plate_4_zip/plate_4.zip\"),\n",
" \"extraction_path\": pathlib.Path(\"./Plate_4\"),\n",
" },\n",
"}\n",
"\n",
" \"Plate_5\": {\n",
" \"path_to_zip_file\": pathlib.Path(\"./Plate_5_zip/Plate_5.zip\"),\n",
" \"extraction_path\": pathlib.Path(\"./Plate_5\"),\n",
" }\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Starting extraction on Plate_3 zip file!\n",
"All images/files within the zip file have been extracted to Plate_3!\n",
"Starting extraction on Plate_3_prime zip file!\n",
"All images/files within the zip file have been extracted to Plate_3_prime!\n",
"Starting extraction on Plate_4 zip file!\n",
"All images/files within the zip file have been extracted to Plate_4!\n",
"Starting extraction on Plate_5 zip file!\n",
"All images/files within the zip file have been extracted to Plate_5!\n"
]
}
],
"source": [
"for plate, info in zip_images_dictionary.items():\n",
" # set the parameters for the function as variables based on the plate dictionary info\n",
" path_to_zip_file = info[\"path_to_zip_file\"]\n",
Expand All @@ -205,31 +242,34 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Remove folder with zip files for both plates 3 and 3 prime since all files have been extracted"
"### Remove folder with zip files since all files have been extracted"
]
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The directory containing zip files from Figshare has been deleted as the files have been extracted!\n"
"The directory containing zip files from Figshare for Plate_3 has been deleted as the files have been extracted!\n",
"The directory containing zip files from Figshare for Plate_4 has been deleted as the files have been extracted!\n",
"The directory containing zip files from Figshare for Plate_5 has been deleted as the files have been extracted!\n"
]
}
],
"source": [
"# remove the directory with the zip files for Plates 3 and 3 prime only since the zips are no longer needed\n",
"if zip_images_dictionary[\"Plate_3\"][\"path_to_zip_file\"].exists():\n",
" # remove the parent directory with the zip files as we have moved all the images\n",
" parent_directory = os.path.dirname(zip_images_dictionary[\"Plate_3\"][\"path_to_zip_file\"])\n",
" shutil.rmtree(parent_directory)\n",
" print(\"The directory containing zip files from Figshare has been deleted as the files have been extracted!\")\n",
"else:\n",
" print(\"The path to the zip file does not exist. Please check to make sure that the data was downloaded properly from figshare.\")"
"# remove the directory with the zip files for Plates 3, 3 prime, 4, and 5 only since the zips are no longer needed\n",
"for plate in [\"Plate_3\", \"Plate_4\", \"Plate_5\"]:\n",
" if zip_images_dictionary[plate][\"path_to_zip_file\"].exists():\n",
" # remove the parent directory with the zip files as we have moved all the images\n",
" parent_directory = os.path.dirname(zip_images_dictionary[plate][\"path_to_zip_file\"])\n",
" shutil.rmtree(parent_directory)\n",
" print(f\"The directory containing zip files from Figshare for {plate} has been deleted as the files have been extracted!\")\n",
" else:\n",
" print(f\"The path to the zip file for {plate} does not exist. Please check to make sure that the data was downloaded properly from Figshare.\")"
]
}
],
Expand Down
46 changes: 32 additions & 14 deletions 0.download_data/scripts/download_plates.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,24 +53,33 @@
# save extracted files from figshare download to a folder in the `0.download_data` directory
"output_dir": pathlib.Path("./Plate_2"),
},
# these plates are combined due to the figshare project containing zip files with the images for each
# plate that will need to be extracted in a second step
# these plates are combined due to the figshare project containing zip files with the images for each plate and
# is extracted in a second step
"Plates_3_and_3_prime": {
"figshare_id": "22592890",
"version_number": "2",
"output_folder": "Plates_3_zip",
# save extracted zip files from figshare download to a folder in the `0.download_data` directory
"output_dir": pathlib.Path("./Plates_3_and_3_prime"),
},
# this plate data was added to figshare as a zip file due to the size of the data (13GB) and will
# need to be extracted in a second step
# this plate data was added to figshare as a zip file due to the size of the data and
# is extracted in a second step
"Plate_4": {
"figshare_id": "23671056",
"version_number": "1",
"output_folder": "Plates_4_zip",
# save extracted zip file from figshare download to a folder in the `0.download_data` directory
"output_dir": pathlib.Path("./Plate_4_zip"),
},
# this plate data was added to figshare as a zip file due to the size of the data and
# is extracted in a second step
"Plate_5": {
"figshare_id": "26759914",
"version_number": "1",
"output_folder": "Plates_5_zip",
# save extracted zip file from figshare download to a folder in the `0.download_data` directory
"output_dir": pathlib.Path("./Plate_5_zip"),
},
}


Expand Down Expand Up @@ -119,8 +128,16 @@
"path_to_zip_file": pathlib.Path("./Plate_4_zip/plate_4.zip"),
"extraction_path": pathlib.Path("./Plate_4"),
},
"Plate_5": {
"path_to_zip_file": pathlib.Path("./Plate_5_zip/Plate_5.zip"),
"extraction_path": pathlib.Path("./Plate_5"),
}
}


# In[6]:


for plate, info in zip_images_dictionary.items():
# set the parameters for the function as variables based on the plate dictionary info
path_to_zip_file = info["path_to_zip_file"]
Expand All @@ -135,17 +152,18 @@
)


# ### Remove folder with zip files for both plates 3 and 3 prime since all files have been extracted
# ### Remove folder with zip files since all files have been extracted

# In[6]:
# In[7]:


# remove the directory with the zip files for Plates 3 and 3 prime only since the zips are no longer needed
if zip_images_dictionary["Plate_3"]["path_to_zip_file"].exists():
# remove the parent directory with the zip files as we have moved all the images
parent_directory = os.path.dirname(zip_images_dictionary["Plate_3"]["path_to_zip_file"])
shutil.rmtree(parent_directory)
print("The directory containing zip files from Figshare has been deleted as the files have been extracted!")
else:
print("The path to the zip file does not exist. Please check to make sure that the data was downloaded properly from figshare.")
# remove the directory with the zip files for Plates 3, 3 prime, 4, and 5 only since the zips are no longer needed
for plate in ["Plate_3", "Plate_4", "Plate_5"]:
if zip_images_dictionary[plate]["path_to_zip_file"].exists():
# remove the parent directory with the zip files as we have moved all the images
parent_directory = os.path.dirname(zip_images_dictionary[plate]["path_to_zip_file"])
shutil.rmtree(parent_directory)
print(f"The directory containing zip files from Figshare for {plate} has been deleted as the files have been extracted!")
else:
print(f"The path to the zip file for {plate} does not exist. Please check to make sure that the data was downloaded properly from Figshare.")

2 changes: 1 addition & 1 deletion 3.processing_features/1.pycytominer_bulk_pipelines.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -396,7 +396,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.18"
"version": "3.8.19"
}
},
"nbformat": 4,
Expand Down
97 changes: 28 additions & 69 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,36 @@
# NF1 Cell Painting Data

In this repository, we perform the image-based analysis and some analysis of the morphology data.
In this repository, we generate image analysis and image-based profiling pipelines to extract and format single-cell morphological profiles.

We train a machine learning model to predict NF1 genotype within a separate repository called: [NF1_SchwannCell_data_analysis](https://github.com/WayScience/NF1_SchwannCell_data_analysis).
Please visit the above repository for further information on the generation and validation of the logistic regression model.
We train a machine learning model to predict NF1 genotype, evaluate, and generate figures within a separate repository called: [NF1_SchwannCell_data_analysis](https://github.com/WayScience/NF1_SchwannCell_data_analysis).
Please visit the above repository for further information on the generation, validation, and figures from this model for the manuscript.

**Note:** All metadata files are located in the [download data module](./0.download_data/). All larger files, including `SQLite` outputs from CellProfiler and `parquet` processed data file from pycytominer, will need to be downloaded using git LFS after the repo is cloned.
## Goal

It is important to study Schwann cells from NF1 patients because NF1 causes patients to develop neurofibromas, which are peripheral nerve tumors forming bumps underneath the skin that appear due to the decrease of Ras-GAP neurofibromin production.
This decrease in production occurs when the NF1 gene is mutated (NF1 +/-).

**The goal of this project is to predict NF1 genotype from Schwann cell morphology.**
We apply cell image analysis to Cell Painting images and use representation learning to extract morphology features.
We will apply machine learning to the morphology features to discover a biomarker of NF1 genotype.
Once we discover a biomarker from these cells, we hope that our method can be used for drug discovery to treat this rare disease.

## Data

The data we use is a modified Cell Painting assay on [Schwann cells](https://www.ncbi.nlm.nih.gov/books/NBK544316/) from patients with [Neurofibromatosis type 1 (NF1)](https://medlineplus.gov/genetics/condition/neurofibromatosis-type-1/).
The images are publicly available on figshare, under the [NF1 Schwann Cell Genotype Cell Painting Assay project](https://figshare.com/projects/NF1_Schwann_Cell_Genotype_Cell_Painting_Assay/161620).

The data is as follows:

| Plate | DOI | Description |
|-------|-----|-------------|
| Plate 1 | https://doi.org/10.6084/m9.figshare.22233292 | Preliminary plate of 8 wells with image sets of three Cell Painting channels for wildtype and null cells. |
| Plate 2 | https://doi.org/10.6084/m9.figshare.22233700 | Preliminary plate of 32 wells with image sets of three Cell Painting channels for wildtype and null cells. |
| Plates 3 and 3 prime | https://doi.org/10.6084/m9.figshare.22592890 | Plates utilized for modelling. Each contain 48 wells, with Plate 3 treated with 10% FBS and prime treated with 5% FBS. These plate contain all three *NF1* genotypes, with varying seeding densities. |
| Plate 4 | https://doi.org/10.6084/m9.figshare.23671056 | Plate containing 60 wells with null and wildtype cells either not treated or treated with siRNAs. We do not include this plate for modelling or evaluation. The seeding density is 1000 cells. |
| Plate 5 | https://doi.org/10.6084/m9.figshare.26759914 | Plate containing 48 wells with all three *NF1* genotypes used for modelling. The seeding density is 1000 cells. |

There are two versions of the Cell Painting assay in this repository:

In this modified Cell Painting, there are three channels for plates 1 and 2:

Expand All @@ -19,7 +40,7 @@ In this modified Cell Painting, there are three channels for plates 1 and 2:

![Modified_Cell_Painting.png](example_figures/Modified_Cell_Painting.png)

In this modified Cell Painting, there are four channels for plates 3 and 3':
In this modified Cell Painting, there are four channels for all the rest of the plates:

- `DAPI` (Nuclei)
- `GFP` (Endoplasmic Reticulum)
Expand All @@ -28,70 +49,8 @@ In this modified Cell Painting, there are four channels for plates 3 and 3':

![Modified_CellPainting_Plate3.png](example_figures/Modified_CellPainting_Plate3.png)

Plates 1 and 2 measure Cell Painting in isogenic Schwann cells with two different NF1 genotypes:

**Plate 1**
- Wild type (`WT +/+`): In column 6 from the plate (e.g C6, D6, etc.)
- Null (`Null -/-`): In column 7 from the plate (e.g C7, D7, etc.)
There are only rows C-F in this plate.

![plate1_platemap](./0.download_data/metadata/platemap_figures/plate1_platemap_figure.png)

**Plate 2**
- Wild type (`WT +/+`): Columns 1 and 6
- Null (`Null -/-`): Columns 7 and 12
This plate uses all rows (e.g., A-H)

![plate2_platemap](./0.download_data/metadata/platemap_figures/plate2_platemap_figure.png)

Plates 3 and 3' measure Cell Painting in isogenic Schwann cells with all three different NF1 genotypes:

**Plate 3 and 3'(prime)**
For these plates, we looking at different seeding densities to identify which will lower the cell count contribution on the features and identify differential features between genotypes.
As well, the plates have different culturing conditions, where plate 3 cells were cultured in 10% FBS versus plate 3 prime culturing in 5% FBS.
- Wild type (`WT +/+`): Columns 1-3
- Heterzygous (`HET +/-`): Columns 5-7
- Null (`Null -/-`): Columns 9-11
- Seeding density:
- 500 -> Columns 1, 5, and 9
- 1000 -> Columns 2, 6, and 10
- 2000 -> Columns 3, 7, and 11
- 4000 -> Columns 4, 8, and 12

![plate3_platemap](./0.download_data/metadata/platemap_figures/plate3_platemap_figure.png)

**Plate 4**
For plate 4, we will be looking at how using different siRNA constructs to downregulate neurofibromin production in NF1 WT cells impacts the morphology as dose increases.
We will be able to compare this to controls (e.g., untreated WT and Null cells).

The cells were cultured in 5% FBS.

![plate4_platemap_genotype](./0.download_data/metadata/platemap_figures/plate4_platemap_figure_genotype.png)

There are 8 replicates of NF1 Null cells and the rest of the wells contain NF1 WT cells.

![plate4_platemap_dose](./0.download_data/metadata/platemap_figures/plate4_platemap_figure_dose.png)

There are three different siRNA constructs used in this plate, all with the same dose curve from 0.001 nM - 0.1 nM.
Any well with a 0 nM concentration are not treated with a construct.

**Plate 5**
For plate 5, we are specifically comparing morphology between genotypes with the same seeding density (n=4000).
We use all three genotypes (WT, HET, and Null).

The cells were cultured in 5% FBS.

![plate5_platemap](./0.download_data/metadata/platemap_figures/plate5_platemap_figure.png)

## Goal

It is important to study Schwann cells from NF1 patients because NF1 causes patients to develop neurofibromas, which are peripheral nerve tumors forming bumps underneath the skin that appear due to the decrease of Ras-GAP neurofibromin production.
This decrease in production occurs when the NF1 gene is mutated (NF1 +/-).

**The goal of this project is to predict NF1 genotype from Schwann cell morphology.**
We apply cell image analysis to Cell Painting images and use representation learning to extract morphology features.
We will apply machine learning to the morphology features to discover a biomarker of NF1 genotype.
Once we discover a biomarker from these cells, we hope that our method can be used for drug discovery to treat this rare disease.
For more information on plate maps and plate map figures, please go to the [metadata folder](./0.download_data/metadata/) in the first module.
All larger files, including `SQLite` outputs from CellProfiler and `parquet` processed data file from pycytominer, will need to be downloaded using git LFS after the repo is cloned.

## Repository Structure

Expand Down

0 comments on commit 9ff205a

Please sign in to comment.