-
Notifications
You must be signed in to change notification settings - Fork 5
Database details
Edwin Huang edited this page Mar 31, 2025
·
2 revisions
- Every "record" in MongoDB represents a projects, and when using PyMongo to filter & query the database, these can be represented as python dictionaries.

- Above is a screenshot of the "runs" field in a small project.
- In the "runs" field, there may be n samples.
- Each sample, sample_n, can have m features.
- Each feature_m has the following fields
- ["AA_PDF_file", "AA_PNG_file", "AA_amplicon_number", "AA_directory", "AA_summary_file", "All_genes", "CNV_BED_file", "Captured_interval_length", "Classification", "Complexity_score", "Feature_BED_file", "Feature_ID", "Feature_maximum_copy_number", "Feature_median_copy_number", "Filter_flag", "Location", "Oncogenes", "Reference_version", "Run_metadata_JSON", "Sample_metadata_JSON", "Sample_name", "Sample_type", "Tissue_of_origin", "cnvkit_directory", "extra_metadata_from_csv"]
- To query and filter based on some conditions, you can write a pymongo query or find the project ID first, then use Pandas to do filtering.
- For example:
- call "get_one_project" to get a project via project ID. This should be a python dictionary.
- From the project dictionary, search for the 'runs' field, which gives you the features.
- Call replace_space_to_underscore to obtain a features list, then wrap it using pd.DataFrame.
- Code example:
- DB as of 11/20/24
- "runs" is a key in the project, and is a dictionary.
- The keys are: [sample_1, sample_2, ... sample_n]
- Each sample has the keys in the "runs" table
- "sample_data" is a dictionary in each project.