Skip to content

Commit d9407b5

Browse files
authored
Minor additions and bug fixes (#13)
* Water raman scans processing and viz * Debugging the S3 demo data download * attempting to migrate from circleCI to github actions * attempting to migrate from circleCI to github actions * attempting to migrate from circleCI to github actions * attempting to migrate from circleCI to github actions * attempting to migrate from circleCI to github actions * attempting to migrate from circleCI to github actions * attempting to migrate from circleCI to github actions * Playing with github actions. Publish to pypi on release. * integrating pre-commit and black * getting the GH action linter working * GH action for docs * GH action for docs * Debugging GH action for docs * Debugging GH action for docs * Debugging GH action for docs * increment minor version for new release * added some tests for new plotting functions. * debugging codecov GH action. * debugging codecov GH action. * debugging codecov GH action. * debugging codecov GH action. * Update README * JOSS paper prep. * Added the MIT REMORA instrument and fixed minor bugs.
1 parent 0f41b69 commit d9407b5

File tree

16 files changed

+254
-53
lines changed

16 files changed

+254
-53
lines changed

.github/workflows/codecov.yml

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
name: Codecov
2+
on: [push]
3+
jobs:
4+
run:
5+
runs-on: ${{ matrix.os }}
6+
strategy:
7+
matrix:
8+
os: [ubuntu-latest]
9+
env:
10+
OS: ${{ matrix.os }}
11+
PYTHON: '3.7'
12+
steps:
13+
- uses: actions/checkout@master
14+
- name: Setup Python
15+
uses: actions/setup-python@master
16+
with:
17+
python-version: 3.7
18+
- name: Generate coverage report
19+
run: |
20+
python -m pip install --upgrade pip
21+
pip install -e .[tests]
22+
pip install pytest-cov
23+
pytest --cov=./ --cov-report=xml
24+
- name: Upload coverage to Codecov
25+
uses: codecov/codecov-action@v1.0.5
26+
with:
27+
token: ${{ secrets.CODECOV_TOKEN }}
28+
file: ./coverage.xml
29+
flags: unittests
30+
name: codecov-umbrella
31+
fail_ci_if_error: true

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
# PyEEM
22

3-
![Test](https://github.com/drewmee/PyEEM/workflows/Test/badge.svg)
4-
[![Read the Docs](https://readthedocs.org/projects/pyeem/badge/?version=latest)](https://pyeem.readthedocs.io/)
53
[![PyPi version](https://img.shields.io/pypi/v/pyeem.svg 'pypi version')](https://pypi.org/project/pyeem/)
64
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/pyeem.svg)](https://pypi.org/project/pyeem/)
5+
[![Test](https://github.com/drewmee/PyEEM/workflows/Test/badge.svg)](https://github.com/drewmee/PyEEM/actions?query=workflow%3ATest)
6+
[![Read the Docs](https://readthedocs.org/projects/pyeem/badge/?version=latest)](https://pyeem.readthedocs.io/)
7+
[![codecov](https://codecov.io/gh/drewmee/PyEEM/branch/master/graph/badge.svg?token=RAPG3XDZ6H)](https://codecov.io/gh/drewmee/PyEEM)
8+
[![Code style](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
79
[![Binder](https://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/drewmee/PyEEM/master?filepath=docs%2Fsource%2Ftutorials%2Fnotebooks)
810
[![License](https://img.shields.io/github/license/mashape/apistatus.svg)](https://github.com/drewmee/PyEEM/blob/master/LICENSE)
9-
[![Code style](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
10-
<!--- Badge for codecov -->
1111

1212
Python library for the preprocessing, analysis, and visualization of Excitation Emission Matrices (EEMs).
1313

docs/source/LICENSE_opcsim

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2016-2020 David H Hagan and Jesse H Kroll
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.
File renamed without changes.

docs/source/tutorials/notebooks/tutorial_2.ipynb

Lines changed: 34 additions & 8 deletions
Large diffs are not rendered by default.

paper/paper.md

Lines changed: 12 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: 'PyEEM: A Python library for the preprocessing, correction, deconvolution and analysis of Excitation Emission Matrices (EEMs).'
2+
title: 'PyEEM: A Python library for the preprocessing, correction, and analysis of Excitation Emission Matrices (EEMs).'
33
tags:
44
- python
55
- fluorescence
@@ -9,10 +9,12 @@ tags:
99
authors:
1010
- name: Drew Meyers
1111
affiliation: "1, 2"
12+
- name: Jay W Rutherford
13+
affiliation: 3
1214
- name: Qinmin Zheng
1315
affiliation: 2
1416
- name: Fabio Duarte
15-
affiliation: "2, 3"
17+
affiliation: "2, 4"
1618
- name: Carlo Ratti
1719
affiliation: 2
1820
- name: Harold H Hemond
@@ -24,42 +26,24 @@ affiliations:
2426
index: 1
2527
- name: Senseable City Lab, Massachusetts Institute of Technology
2628
index: 2
27-
- name: Pontifícia Universidade Católica do Paraná, Brazil
29+
- name: Department of Chemical Engineering, University of Washington
2830
index: 3
31+
- name: Pontifícia Universidade Católica do Paraná, Brazil
32+
index: 4
2933
date: 2020-07-08
3034
bibliography: paper.bib
3135
---
3236

33-
# Statement of Need
34-
35-
Fluorescence Excitation and Emission Matrix Spectroscopy (EEMs) is a popular analytical technique in environmental monitoring. In particular, it has been applied extensively to investigate the composition and concentration of dissolved organic matter (DOM) in aquatic systems [@Coble1990;@McKnight2001;@Fellman2010]. Historically, EEMs have been combined with multi-way techniques such as PCA, ICA, and PARAFAC in order to decompose chemical mixtures [@Bro1997;@Stedmon2008;@Murphy2013;@CostaPereira2018]. More recently, machine learning approaches such as convolutional neural networks (CNNs) and autoencoders have been applied to EEMs for source sepearation of chemical mixtures [@Cuss2016;@Peleato2018;@Ju2019;@Rutherford2020]. However, before these source separation techniques can be performed, several preprocessing and correction steps must be applied to the raw EEMs. In order to achieve comparability between studies, standard methods to apply these corrections have been developed [@Ohno2002;@Bahram2006;@Lawaetz2009;@R.Murphy2010;@Murphy2011;@Kothawala2013]. These standard methods have been implemented in Matlab and R packages [@Murphy2013;@Massicotte;Pucher2019]. However until PyEEM, no Python package existed which implemented these standard correction steps. Furthermore, the Matlab and R implementations impose metadata schemas on users which limit their ability to track several important metrics corresponding with each measurement set. By providing a Python implementation, researchers will now be able to more effectively leverage Python's large scienfitic computing ecosystem when working with EEMs.
36-
37-
In addition to the implementation of the preprocessing and correction steps, PyEEM also provides researchers with the ability to create augmented mixture and single source training data from a small set of calibration EEM measurements. The augmentation technique relies on the fact that fluorescnce spectra are linearly additive in mixtures, according to Beer's law [source]. This augmentation technique was first described in Rutherford et al., in which it was used to train a CNN to predict the concentration of single sources of pollutants in spectral mixtures [@Rutherford2020]. Additionally, augmented and synthetic data has shown promise in improving the performace of deep learning models in several fields [@Nikolenko2019].
38-
39-
PyEEM provides the first open source implementation of such an augmentation technique for EEMs. PyEEM also provides plots toolbox useful in the interpretation of EEMs... [@Hansen2018]
40-
4137
# Summary
4238

43-
- A summary describing the high-level functionality and purpose of the software for a diverse, non-specialist audience...
44-
- Description of how the software enables some new research challenges to be addressed or makes addressing research challenges significantly better (e.g., faster, easier, simpler)...
45-
- Description of how the software is feature-complete (i.e. no half-baked solutions) and designed for maintainable extension (not one-off modifications of existing tools)...
39+
Fluorescence Excitation and Emission Matrix Spectroscopy (EEMs) is a popular analytical technique in environmental monitoring. In particular, it has been applied extensively to investigate the composition and concentration of dissolved organic matter (DOM) in aquatic systems [@Coble1990;@McKnight2001;@Fellman2010]. Historically, EEMs have been combined with multi-way techniques such as PCA, ICA, and PARAFAC in order to decompose chemical mixtures [@Bro1997;@Stedmon2008;@Murphy2013;@CostaPereira2018]. More recently, deep learning approaches such as convolutional neural networks (CNNs) and autoencoders have been applied to EEMs for source separation of chemical mixtures [@Cuss2016;@Peleato2018;@Ju2019;@Rutherford2020]. However, before these source separation techniques can be performed, several preprocessing and correction steps must be applied to the raw EEMs. In order to achieve comparability between studies, standard methods to apply these corrections have been developed [@Ohno2002;@Bahram2006;@Lawaetz2009;@R.Murphy2010;@Murphy2011;@Kothawala2013]. PyEEM provides a Python implementation for these standard preprocessing and correction steps for EEM measurements produced by several common spectrofluorometers.
4640

47-
PyEEM is a python library for the preprocessing, correction, deconvolution and analysis of Excitation Emission Matrices (EEMs)...
41+
In addition to the implementation of the standard preprocessing and correction steps, PyEEM also provides researchers with the ability to create augmented single source and mixture training data from a small set of calibration EEM measurements. The augmentation technique relies on the fact that fluorescence spectra are linearly additive in mixtures, according to Beer's law. This augmentation technique was first described in Rutherford et al., in which it was used to train a CNN to predict the concentration of single sources of pollutants in spectral mixtures [@Rutherford2020]. Additionally, augmented and synthetic data has shown promise in improving the performance of deep learning models in several fields [@Nikolenko2019].
4842

49-
- Supported instruments, example datasets
50-
- Metadata schema [@Hansen2018]
51-
- Preprocessing, corrections, and filtering:
52-
- Cropping and wavelength filtering [SOURCE]
53-
- Blank subtraction [SOURCE]
54-
- Scattering removal [@Bahram2006]
55-
- Include Zepp 2004.
56-
- Inner-filter effect correction [@Ohno2002;@Kothawala2013]
57-
- Raman normalization [@Lawaetz2009;@Murphy2011]
58-
- Augmentation [@Rutherford2020]
59-
- plots [@Hansen2018]
43+
Finally, PyEEM provides an extensive visualization toolbox, based on Matplotlib, which is useful in the interpretation of EEM datasets. This visualization toolbox includes various ways of plotting EEMs, the visualization of the Raman scatter peak area over time, and more.
6044

61-
# Acknowledgements
45+
# Statement of Need
6246

63-
We acknowledge contributions from...
47+
Prior to PyEEM, no open source Python package existed to work with EEMs. However, such libraries have existed for MATLAB and R for some time [@Murphy2013;@Massicotte;Pucher2019]. By providing a Python implementation, researchers will now be able to more effectively leverage Python's large scientific computing ecosystem when working with EEMs. Furthermore, the existing libraries in MATLAB and R do not provide deep learning techniques for decomposing chemical mixtures from EEMs. These libraries provide PARAFAC methods for performing such a task. However, although this technique has been widely used for some time, it has its limitations and recent work has shown promise in using deep learning approaches. For this reason, PyEEM provides a toolbox for generating augmented training data as well as an implementation of the CNN architecture reported in Rutherford et al., which has shown to be able to successfully decompose spectral mixtures [@Rutherford2020].
6448

6549
# References

pyeem/analysis/models/rutherfordnet.py

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,8 @@
1616
)
1717
from tensorflow.keras.models import Sequential
1818

19+
# from tensorflow.keras.optimizers import Adam
20+
1921

2022
class RutherfordNet:
2123
"""The convolutional neural network (CNN) described in Rutherford et al. 2020."""
@@ -86,6 +88,12 @@ def create_model(
8688
default_compile_kws = dict(
8789
loss="mean_squared_error", optimizer="adam", metrics=["accuracy"]
8890
)
91+
"""
92+
opt = Adam(learning_rate=0.0001)
93+
default_compile_kws = dict(
94+
loss="mean_squared_error", optimizer=opt, metrics=["accuracy"]
95+
)
96+
"""
8997
compile_kws = dict(default_compile_kws, **compile_kws)
9098
model.compile(**compile_kws)
9199
return model
@@ -229,7 +237,9 @@ def get_test_data(self, dataset, routine_results_df):
229237
"""
230238
test_samples_df = self._isolate_test_samples(dataset, routine_results_df)
231239

232-
sources = test_samples_df.index.get_level_values("source").unique().values
240+
sources = (
241+
test_samples_df.index.get_level_values("source").unique().dropna().values
242+
)
233243
sources = np.delete(sources, np.where(sources == "mixture"))
234244

235245
X = []

pyeem/augmentation/base.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ def prototypical_spectrum(dataset, source_df):
4040
)
4141

4242
proto_eems = []
43-
for index, row in source_df.iterrows():
43+
for index, row in source_df[source_df["prototypical_sample"]].iterrows():
4444
eem_path = row["hdf_path"]
4545
eem = pd.read_hdf(dataset.hdf, key=eem_path)
4646
proto_eems.append(eem)
@@ -51,11 +51,13 @@ def prototypical_spectrum(dataset, source_df):
5151
"concentration"
5252
].mean()
5353

54+
"""
5455
weights = []
5556
for i in range(len(proto_eems)):
5657
weights.append(random.uniform(0, 1))
57-
5858
proto_eem = np.average([eem.values for eem in proto_eems], axis=0, weights=weights)
59+
"""
60+
proto_eem = np.average([eem.values for eem in proto_eems], axis=0)
5961

6062
proto_eem = pd.DataFrame(
6163
data=proto_eem, index=proto_eems[0].index, columns=proto_eems[0].columns

pyeem/instruments/MIT/__init__.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
from .remora import Remora
2+
3+
name = "MIT"
4+
instruments = [Remora]

pyeem/instruments/MIT/remora.py

Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
import pandas as pd
2+
3+
4+
class Remora:
5+
"""The MIT REMORA, a field compact deployable spectrofluorometer."""
6+
7+
manufacturer = "MIT"
8+
"""Name of Manufacturer."""
9+
10+
name = "REMORA"
11+
"""Name of Instrument."""
12+
13+
supported_models = ["REMORA-V1"]
14+
"""List of supported models."""
15+
16+
def __init__(self, model, sn=None):
17+
"""
18+
Args:
19+
model (str): The model name of the instrument.
20+
sn (str or int, optional): The serial number of the instrument.
21+
Defaults to None.
22+
"""
23+
self.model = model
24+
self.sn = sn
25+
26+
@staticmethod
27+
def load_eem(filepath):
28+
"""Loads an Excitation Emission Matrix which is generated by the instrument.
29+
30+
Args:
31+
filepath (str): The filepath of the data file.
32+
33+
Returns:
34+
pandas.DataFrame: An Excitation Emission Matrix.
35+
"""
36+
eem_df = pd.read_csv(filepath, index_col=0)
37+
eem_df.columns = eem_df.columns.astype(float)
38+
eem_df = eem_df.sort_index(axis=0)
39+
eem_df = eem_df.sort_index(axis=1)
40+
eem_df.index.name = "emission_wavelength"
41+
return eem_df
42+
43+
def load_absorbance(filepath):
44+
"""Loads an absorbance spectrum which is generated by the instrument.
45+
46+
Args:
47+
filepath (str): The filepath of the data file.
48+
49+
Returns:
50+
pandas.DataFrame: An absorbance spectrum.
51+
"""
52+
absorb_df = pd.read_csv(filepath, index_col=0)
53+
absorb_df.index.name = "excitation_wavelength"
54+
absorb_df.sort_index(axis=0)
55+
absorb_df.index = absorb_df.index.astype("float64")
56+
return absorb_df
57+
58+
def load_water_raman(filepath):
59+
"""Loads a water Raman spectrum which is generated by the instrument.
60+
61+
Args:
62+
filepath (str): The filepath of the data file.
63+
64+
Returns:
65+
pandas.DataFrame: An absorbance spectrum.
66+
"""
67+
raman_df = pd.read_csv(filepath, index_col=0)
68+
raman_df.columns = raman_df.columns.astype(float)
69+
raman_df = raman_df.sort_index(axis=0)
70+
71+
raman_df = raman_df.rename(columns={raman_df.columns[0]: "intensity"})
72+
raman_df.index.name = "emission_wavelength"
73+
return raman_df
74+
75+
@staticmethod
76+
def load_spectral_corrections():
77+
"""TODO - Should load instrument specific spectral corrections which will
78+
be used in data preprocessing.
79+
80+
Raises:
81+
NotImplementedError: On the TODO list...
82+
"""
83+
raise NotImplementedError()

pyeem/instruments/__init__.py

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,13 @@
1-
from . import agilent, horiba, tecan
1+
from . import MIT, agilent, horiba, tecan
22
from .base import _get_dataset_instruments_df, get_supported_instruments
33

44
supported, _supported = get_supported_instruments()
55

6-
__all__ = ["agilent", "horiba", "tecan", "get_supported_instruments", "supported"]
6+
__all__ = [
7+
"agilent",
8+
"horiba",
9+
"tecan",
10+
"MIT",
11+
"get_supported_instruments",
12+
"supported",
13+
]

pyeem/instruments/base.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
import pandas as pd
22

3-
from . import agilent, horiba, tecan
3+
from . import MIT, agilent, horiba, tecan
44

55

66
def get_supported_instruments():
@@ -17,6 +17,7 @@ def get_supported_instruments():
1717
agilent.name: agilent.instruments,
1818
horiba.name: horiba.instruments,
1919
tecan.name: tecan.instruments,
20+
MIT.name: MIT.instruments,
2021
}
2122
# instruments = [Aqualog, Fluorolog, Cary]
2223
df = pd.DataFrame()

pyeem/plots/augmentations.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -157,7 +157,7 @@ def single_source_animation(
157157
max_val = ss_np.max()
158158

159159
default_plot_kws = dict(vmin=min_val, vmax=max_val)
160-
plot_kws = dict(default_fig_kws, **plot_kws)
160+
plot_kws = dict(default_plot_kws, **plot_kws)
161161

162162
default_kwargs = dict(zlim_min=min_val, zlim_max=max_val, title=None)
163163
kwargs = dict(default_kwargs, **kwargs)

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@
3232
"numpy<1.19.0,>=1.18.5",
3333
"pandas>=1.0.5",
3434
"xlrd >= 1.0.0",
35-
"h5py>=2.10.0",
35+
"h5py<2.11.0,>=2.10.0",
3636
"tables>=3.6.1",
3737
"matplotlib>=3.3.0",
3838
"celluloid>=0.2.0",

tests/test_instruments.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55

66
class TestInstruments:
7-
manufacturers = ["Agilent", "Horiba", "Tecan"]
7+
manufacturers = ["Agilent", "Horiba", "Tecan", "MIT"]
88
"""
99
manuf_instruments = {
1010
pyeem.instruments.agilent.name: pyeem.instruments.agilent.instruments,

0 commit comments

Comments
 (0)