Release v2.1.0 (#189)

* feat: store user compounds data in the filesystem * test: add unit tests to LocalFileStorage * chore: add missing newlines, remove extra newlines * Add base plate class * Add plate reading from dir * add missing dependencies * echo file processing * add echo files parser + tests * update dependencies v2 * add missing newlines * add missing newlines :)) * Add docs * add suggestions * resolve styling issue * Refactor BMG files reading * Add tests * Add summary visualizations * Modify summary tuple method * Fix docstring * Modify test for plate * Optimize dataframe reading * Define PlateSummary type * 72 set up pre-commit (#75) * Add pre-commit * Add black workflow * Reformat files * 73 detect outliers control values (#77) * Fix colours in control values plot * Add outliers detection * Add test for outliers * Apply review fixes * Fix tests * Separate plate_array from df * Fix typing and docs * Change plot types * Change plates viz to plotly * 74 inhibitionactivation values (#79) * Fix colours in control values plot * Add outliers detection * Add test for outliers * Apply review fixes * Fix tests * Separate plate_array from df * compute inhibition/activation * merge activation/inhibition with echo files * check upon the values * fix combining echo bmg files * drop unnecessary columns * remove outliers from the resultant df * split compounds/controls, add z-score * add visualizations * add tests * improve docs and plot --------- Co-authored-by: Zuza Gawrysiak <gawrysiak.zuzanna@gmail.com> * closes #80 general restructuring (#81) * chore: merge src into dashboard * update: set basic pages structure * update: add stages placeholders to Primary Screening process * chore: for the time being move old code to LEGACY folder, remove outdated layouts * feat: add decorator for error handling * fix: correct import in a unit test * update: make data_folder a class attribute to facilitate global configuration * fix: correct LocalFileStorage test * chore: remove redundant style attr * Plot improvements (#85) * Filter low quality plates * Make plot grid size modifiable * Add z per plate plot * Change template to plotly_white * Apply review fixes * Add plotly template as const * Stage 1 and Stage 4 uploading (#86) * Add reading ioFile in bmg * Saving BML in Storage * Add echo reading * Add coments * Resolve tests problems * Pre-commit * Pre-commit changes * Solve problems * Bmg refactor * Bmg refactor * Echo parser refactor * Fix * Fix comment * 82 implement stage 2 (#90) * fix: add correct name to elements property, set stages container style to take full width * fix: subplots' titles not visibile correctly in heatmaps plot * feat: implement paginated heatmap plates viewer * add datatable for statistics preview, extend controls * add docstring * fix: sizing on smaller screens * Add stage 3 in dash (#89) * Add stage 3 in dash * Add z threshold slider * Refactor filtering lq plates * Change return annotation * 83 implement stage 5 in dashboard (#92) * add stage 5 table * add plots ad datatable * add interactive z-score * change act/inh/zscore plots * add RangeSlider * change data table styling * resolve imports/typos * Add stage 3 improvements (#95) * Change view of stage 1 and 4 (#93) * Change view of stage 1 and 4 * Resolve problems * Divide info into two parts * Change view * Change view * closes 96 update stage 2 layout (#97) * adjust stage 2 layout * hide plot controls * disable zooming and panning on heatmap plots * closes 98 add report stage revamp process controls (#99) * add placeholder for stage 6 * add functional controls component * remove old function for creating controls component * add icons to prev/next stage buttons * add border to controls component * remove commented code * closes 101 add correlation process (#104) * chore: rename primary_screening package to screening * update: rename primary-screening process page to screening * update: dummy element to be created per-process page * feat: add correlation process basis * feat: fill in first stage html * feat: add files parsing and validation placeholder * feat: implement remaining stages * Export screening results to csv file (#107) * add save report v1 * download echo_bmg_combined.csv * add custom csv name * clean callbacks.py * update delete_file * remove custom file name * remove imports * fix csv export (#112) * Report generation (#108) * Generate simple raport * Add storage for report data * Add plot in raport * Resolve problems * Clean code * Add secondary screening plots (#103) * Add secondary screening plots * Add concentration calculation * Fix docstring * Save report to download (#113) * 115 deanonymize compounds (#117) * Add eos * Update regex * Add eos to correlation stage * Fix test * Fix well naming issues * Remove unnecessary changes * Fix test * Remove rows without eos * closes 116 UI for hit validation process (#119) * Add hit validation process page stages, implement first stage * add parameters setting to first stage * adjust parameter change callback * implement second stage of hit validation process * remove step from concentration bounds inputs * reorder pages in nav bar * Update INH/ACT/Z-SCORE plots (#114) * add area charts * add hline * add ranges to act/inh * add filter criteria * add well/plate distinction * filter compounds to save * alter serialization * include bmg controls * fix test * fix test #2 * Pipeline check and fixes (#120) * change z-score calculation * handle different echo files * alter act/inh * fix test * fix tests & add exceptions button * filter low quality plates * change the origin * Add url to EOS datable (#129) * Add eos url * Fix styling * Save app settings to JSON file (#131) * Save json from primmary screening * clean * pre-commit * Rename methods and dicts * Add saving settings in correlation and hit validation * Style buttons * closes 126 implement third process (#133) * remove .venv from git tracking * implement basic parsing * implement file parsing and graph creation * implement csv downloading * remove unneeded files --------- Co-authored-by: njytwf <bartosz.stachowiak@aptiv.com> * closes 138 control hit plot (#139) * fix initial process callbacks not setting stored uuid * implement stacking control * round top/bottom values * closes 140 configuration for hit determination (#142) * ui for controls * parametrize hit determination * connect state variables with the configuration callback * remove whitespace * Add plots to screening report (#143) * Add data projections (#132) * update combining process * add data projection process * add dropdowns * add eos links, save file * remove an outdated test * alter combine_assays_for_projection * add loading sign/dynamic table * add controls to the report * Revert "add controls to the report" This reverts commit 14328d2. * remove apply button * add controls to the plot * revert legacy change * include both ACT & INH to projections * update umap package version * remove legacy, return uuid initially * Implement SMILES predictor (#141) * Train xgboost * Predict on ecbd data * Add smoter * Save preds to pq * Fix plotted eoses (#151) * Add predictions to dashboard (#146) * Add predictions to dashboard * Add SMILES plotting * Change predictions file to pq * Add missing requirements * Comment out umap (#153) * Comment out umap * Format notebook * Update default projection * merge inh/act into feature (#144) * merge inh/act into feature * remove functools.partial wrapper * 136 Save individual EOS report (#155) * Add statistics * Create report * Add concentration for 50% modulation * Uppercase letter * closes 147 app redesign 💅💅 (#154) * navbar and homepage re-design * make logo smaller, set correct logo src * redesign about page * reformat file * add smartart to about page * Correlation report (#157) * Create report correlation * Create report correlation * Fix Correlation stage * Add typing * Fix concentration_50 statistic (#161) * Create report correlation * Fix concentration_50 * closes 159 minor UI adjustments (#160) * add link * make main pages responsive * hit browser styling fixes * adjust stacking controls styling * replace buttons list with searchable dropdown for hit browser component selection * fix individual report generation * Add structural similarity (#156) * Add structural similarity * Small refactor * Fix file upload * Improve clustering * Move plot to plots * Update colors * Export eos plots to the final report (#168) * add report generation * add dependencies * alter saved plots * fix: handle case when last page is of size 0 (#169) * closes 165 update projections (#170) * add 3d projection plots * facilitate selected datapoints download for Visualization stage * facilitate selected datapoints download for Similarity stage * disable "Download selected" buttons when 3d plotting enabled * 147 app redesign v2 (#162) * unify plots view, add loaders * add page blockers * divide eos/echo loaders * remove unnecessary comments * change select file color * correct heatmap loader * alter upload text * add info icon * alter info icon position * fix info icon * Add ML experiment setup (#176) * Add ml experiment setup * Add feature selection, hp tuning and docs * Split projection stage (#174) * split stages * add projections processes descriptions * closes 166 minor styling (#175) * unify precision in screening process * fix unit display for concentration50 * add kaleido dependency * unify precision in data projection process * add thousands delimiter * restyle controls to include process name * replace screening statistics charts with bar plots * adjust styling for tooltip * allow to extend tooltip with custom styling * add bottom padding to page container * add tooltips to hit validation process * move tooltip annotation to components module * add tooltips to Screening process * reduce process title size * add tooltips to data projection process * extend docstring of the annotating function * add tooltips to correlation process * adjust concentration slider desc * update sliders tooltip descriptions for correlation * adjust coloring of mean value bar plots * add missing information in various descriptions * render error message when user uploads less than 3 screening files in data projection process * readd missing stage * restore controls chart to scatter plot * Update version (#179) * Update version * Remove dev * fix responsiveness (#181) * Add reproducibility (#177) * Reproducing Screening and Correlation * Reproducing Hit Validation * Remove checkbox changning * Change gitignore * Add alerts * Resolve * Reports refactor (#178) * Reproducing Screening and Correlation * Reproducing Hit Validation * Remove checkbox changning * Create Header in reports * Merge develop * restore changes from broken branch (#185) * After testing adjustments (#184) * add projections smiles info * add modulation_50/concentration_50 to report * add low quality plate csv * fix test * correct typos/pca_smiles_summary * remove pca_smiles_summary * Add activity filtering (#171) * Add activity filtering * Update layout * Add top and bottom thresholds * Add lines and separate threshold change * Add save button * Move button and rename cols * Update button * Refactor uploading files (#187) * Add text on upload * Individual text * Refactor * Add missing words * Resolve merge problem * Bump version (#188) --------- Co-authored-by: Bartosz Stachowiak <sbartekt@op.pl> Co-authored-by: Bartosz Stachowiak <72276326+Tremirre@users.noreply.github.com> Co-authored-by: azywot <agata.zywot@gmail.com> Co-authored-by: AndrzejKaj <101563276+AndrzejKaj@users.noreply.github.com> Co-authored-by: Agata <82370491+azywot@users.noreply.github.com> Co-authored-by: njytwf <bartosz.stachowiak@aptiv.com>
zuzg · Dec 14, 2023 · 0eb69b8 · 0eb69b8
1 parent 435da6b
commit 0eb69b8
Show file tree

Hide file tree

Showing 32 changed files with 1,233 additions and 205 deletions.
diff --git a/.gitignore b/.gitignore
@@ -2,7 +2,7 @@ tmp/
 **/out/
 
 # Data
-data
+./data
 data/raw/*
 !data/raw/.gitkeep
 notebooks/data

diff --git a/dashboard/app.py b/dashboard/app.py
@@ -10,7 +10,7 @@
 FONT_AWESOME_CDN = (
     "https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.0/css/all.min.css"
 )
-VERSION = "v2.0.0"
+VERSION = "v2.1.0"
 
 fs_dir = os.environ.get("DRUG_SCREENING_DATA_DIR", ".drug-screening-data")
 

diff --git a/dashboard/assets/style.css b/dashboard/assets/style.css
@@ -89,6 +89,12 @@
     column-gap: 2rem;
 }
 
+.grid-1-1-projections {
+    display: grid;
+    grid-template-columns: repeat(2, 1fr);
+    justify-content: center;
+    width: 50%;
+  }
 
 .upload-box {
     width: 100%;
@@ -277,4 +283,9 @@
     main.grid-1-1-1-1 {
         grid-template-columns: repeat(1, 1fr);
     }
+
+    main.grid-1-1-projections {
+        grid-template-columns: repeat(1, 1fr);
+        width: 100%;
+    }
 }
diff --git a/dashboard/data/bmg_plate.py b/dashboard/data/bmg_plate.py
@@ -1,9 +1,13 @@
 import io
 import numpy as np
 import pandas as pd
+import logging
+
 from collections import namedtuple
 from enum import Enum, auto
 
+logger = logging.getLogger(__name__)
+
 
 PlateSummary = namedtuple(
     "PlateSummary",
@@ -159,30 +163,46 @@ def parse_bmg_file(filename: str, filecontent: io.StringIO) -> np.ndarray:
             df = pd.read_csv(filecontent, header=None)
             plate = df.to_numpy()
             break
-        well, value = line.split()
+        if not line.strip():
+            continue
+        cells = line.split()
+        if len(cells) != 2:
+            raise ValueError(
+                f"Wrong format of file {filename} - line {i} has {len(cells)} cells instead of 2"
+            )
+        well, value = cells
         i, j = well_to_ids(well)
         plate[i, j] = value
     return barcode, plate
 
 
-def parse_bmg_files(files: tuple[str, io.StringIO]) -> tuple[pd.DataFrame, np.ndarray]:
+def parse_bmg_files(
+    files: tuple[str, io.StringIO]
+) -> tuple[pd.DataFrame, np.ndarray, dict[str, str]]:
     """
     Parse file from iostring with BMG files to DataFrame
 
     :param files: tuple containing names and content of files
-    :return: DataFrame with BMG files (=plates) as rows
+    :param failed_files: dictionary with failed files
+    :return: DataFrame with BMG files (=plates) as rows,
+        plates values as np.array and failed files with errors
     """
     plate_summaries = []
     plate_values = []
+    failed_files = {}
     for filename, filecontent in files:
-        barcode, plate_array = parse_bmg_file(filename, filecontent)
-        plate = Plate(barcode, plate_array)
-        z_wo, outliers_mask = calculate_z_outliers(plate)
-        plate_summaries.append(get_summary_tuple(plate, z_wo))
-        plate_values.append([plate.plate_array, outliers_mask])
+        try:
+            barcode, plate_array = parse_bmg_file(filename, filecontent)
+            plate = Plate(barcode, plate_array)
+            z_wo, outliers_mask = calculate_z_outliers(plate)
+            plate_summaries.append(get_summary_tuple(plate, z_wo))
+            plate_values.append([plate.plate_array, outliers_mask])
+        except Exception as e:
+            logger.warning(f"Error while parsing file {filename}: {e}")
+            failed_files[filename] = str(e)
     df = pd.DataFrame(plate_summaries)
     plate_values = np.asarray(plate_values)
-    return df, plate_values
+    return df, plate_values, failed_files
 
 
 def calculate_activation_inhibition_zscore(
@@ -261,17 +281,18 @@ def get_activation_inhibition_zscore_dict(
 
 def filter_low_quality_plates(
     df: pd.DataFrame, plate_array: np.ndarray, threshold: float = 0.5
-) -> tuple[pd.DataFrame, np.ndarray]:
+) -> tuple[pd.DataFrame, pd.DataFrame, np.ndarray]:
     """
     Remove plates with Z factor lower than threshold
 
     :param df: DataFrame with control values
     :param plate_array: array with plate values
     :param threshold: Z factor threshold value
-    :return: high quality plates
+    :return: high quality plates, low quality plates, high quality plate array
     """
     quality_mask = df.z_factor > threshold
     quality_df = df[quality_mask]
+    low_quality_df = df[~quality_mask][["barcode", "z_factor"]]
     low_quality_ids = np.where(quality_mask == False)
     quality_plates = np.delete(plate_array, low_quality_ids, axis=0)
-    return quality_df, quality_plates
+    return quality_df, low_quality_df, quality_plates
diff --git a/dashboard/data/determination.py b/dashboard/data/determination.py
@@ -1,6 +1,5 @@
-import pandas as pd
 import numpy as np
-
+import pandas as pd
 from scipy.optimize import curve_fit
 
 
@@ -33,7 +32,29 @@ def find_argument_four_param_logistic(
     :param slope: the steepness of the curve
     :return: argument of the function for given y
     """
-    return ic50 * ((lower_limit - upper_limit) / (y - upper_limit) - 1) ** (1 / slope)
+    x = ic50 * ((lower_limit - upper_limit) / (y - upper_limit) - 1) ** (1 / slope)
+    if type(x) == complex:
+        x = np.nan
+    return x
+
+
+def calculate_modulation_ic50_and_concentration_50(row: pd.Series) -> pd.Series:
+    """
+    Calculates modulation_ic50 and concentration_50 (concentration for modulation = 50) for given row.
+
+    :param row: row of the dataframe
+    :return: row with calculated modulation_ic50 and concentration_50
+    """
+    modulation_ic50 = four_param_logistic(
+        row["ic50"], row["BOTTOM"], row["TOP"], row["ic50"], row["slope"]
+    )
+    concentration_50 = find_argument_four_param_logistic(
+        50, row["BOTTOM"], row["TOP"], row["ic50"], row["slope"]
+    )
+
+    return pd.Series(
+        {"modulation_ic50": modulation_ic50, "concentration_50": concentration_50}
+    )
 
 
 def curve_fit_for_activation(screen_df: pd.DataFrame) -> pd.DataFrame:
@@ -140,7 +161,17 @@ def process_activation_df(
         & (activation_df.ic50 < concentration_upper_bound)
         & (activation_df.ic50 > concentration_lower_bound)
     )
-    return activation_df
+
+    cols = activation_df.columns.to_list()
+    modulation_concentration = ["modulation_ic50", "concentration_50"]
+    pos = cols.index("slope")
+    column_order = cols[:pos] + modulation_concentration + cols[pos:]
+
+    activation_df[modulation_concentration] = activation_df.apply(
+        lambda row: calculate_modulation_ic50_and_concentration_50(row), axis=1
+    )
+
+    return activation_df[column_order]
 
 
 def perform_hit_determination(

diff --git a/dashboard/data/json_reader.py b/dashboard/data/json_reader.py
@@ -0,0 +1,21 @@
+import base64
+import io
+import json
+
+
+def load_data_from_json(content: str | None, name: str | None) -> dict | None:
+    if content is None:
+        return None
+    file = None
+
+    _, extension = name.split(".")
+    if extension == "json":
+        _, content_string = content.split(",")
+        decoded = base64.b64decode(content_string)
+        file = io.StringIO(decoded.decode("utf-8"))
+
+    loaded_data = None
+    if file:
+        loaded_data = json.load(file)
+
+    return loaded_data
diff --git a/dashboard/data/preprocess.py b/dashboard/data/preprocess.py
@@ -162,5 +162,5 @@ def calculate_concentration(
     :param summary_assay_volume: to divide by
     :return: dataframe
     """
-    df["Concentration"] = df["Actual Volume_y"] * concetration / summary_assay_volume
+    df["Concentration"] = df["Actual Volume_1"] * concetration / summary_assay_volume
     return df
diff --git a/dashboard/data/validation.py b/dashboard/data/validation.py
@@ -10,6 +10,10 @@ def validate_correlation_dataframe(corr_df: pd.DataFrame) -> None:
     """
     if not corr_df.columns.is_unique:
         raise ValueError("Column names must be unique.")
+    if not ("% ACTIVATION" in corr_df.columns or "% INHIBITION" in corr_df.columns):
+        raise ValueError("Column with ACTIVATION/INHIBITION not found")
+    if not "EOS" in corr_df.columns:
+        raise ValueError("Column with EOS not found")
     ...  # TODO: add more validation
 
 

diff --git a/dashboard/pages/components.py b/dashboard/pages/components.py
@@ -26,6 +26,9 @@
         dcc.Store(id="report-data-hit-validation-input", storage_type="local"),
         dcc.Store(id="report-data-hit-validation-hit-browser", storage_type="local"),
         dcc.Store(id="activation-inhibition-screening-options", storage_type="local"),
+        dcc.Store(id="loaded-setings-screening", storage_type="local"),
+        dcc.Store(id="loaded-setings-correlation", storage_type="local"),
+        dcc.Store(id="loaded-setings-hit-validation", storage_type="local"),
     ],
 )
 
@@ -213,7 +216,7 @@ def make_file_list_component(
                                 className="col",
                                 children=html.Ul(
                                     children=[
-                                        html.Li(name.split(".")[0])
+                                        html.Li(name)
                                         for name in successfull_filenames[i::num_cols]
                                     ]
                                 ),
@@ -239,7 +242,7 @@ def make_file_list_component(
                                 className="col",
                                 children=html.Ul(
                                     children=[
-                                        html.Li(name.split(".")[0])
+                                        html.Li(name)
                                         for name in failed_filenames[i::num_cols]
                                     ]
                                 ),
@@ -298,3 +301,27 @@ def annotate_with_tooltip(
         element.children = [element.children]
     element.children.insert(0, tooltip)
     return element
+
+
+def make_new_upload_view(
+    text1: str,
+    text2: str,
+) -> list[html.Div]:
+    """
+    Prepare children for drag and drop zone.
+
+    :param text1: text for first element, it should be response on uploaded file.
+    :param text2: text to inform that new file can still be uploaded
+    :return: list of html.Div
+    """
+    return [
+        html.Div(text1),
+        html.Div(
+            [
+                "Drag and Drop or ",
+                html.A("Select", className="select-file"),
+                " ",
+                text2,
+            ],
+        ),
+    ]
-Original file line number
+Diff line change
@@ Expand Up / @@ -2,7 +2,7 @@ tmp/ @@
     **/out/
     # Data
-    data
+    ./data
     data/raw/*
     !data/raw/.gitkeep
     notebooks/data
@@ Expand Down @@