Conversation
wfondrie
left a comment
There was a problem hiding this comment.
These are welcome changes. Thanks!
| "scipy", | ||
| "tqdm", | ||
| "statsmodels", | ||
| "biopython", # ... we can implement a fasta parser ... |
There was a problem hiding this comment.
comment: Indeed we could implement a FASTA parser, it just hasn't seemed worth it with how little use Gopher has gotten in recent times. Maybe worth it now though.
There was a problem hiding this comment.
fair enough ... although IDK ... biopython is a really heavy dependency and a fasta parser is like 20 lines :P so I usually go with the re-implementing if I am not already using pyteomics. I guess that since this project already depends on scipy+numpy+pandas+matplotlib+statsmodels ... bio is not a big deal
|
With respect to the README, I don't think too much is needed. However the docs should be expanded and updated as needed: https://talusbio.github.io/gopher/ |
That one is build on-release right? |
| schema["Protein.Ids"] = str | ||
|
|
||
| proteins = pd.read_table(proteins_tsv, dtype=schema, usecols=list(schema)) | ||
| proteins["Protein.Ids"] = proteins["Protein.Ids"].str.split(";").str[0] |
There was a problem hiding this comment.
The current version of gopher requires that the index column be the accessions and that all columns are intensity columns. If you don't drop "Genes", "Protein.Group" etc, downstream functions such as test_enrichment will not work (unless you are also planning on changing those).
There was a problem hiding this comment.
they are not being read, so they dont have to be dropped.
There was a problem hiding this comment.
But the returned data frame will have them, right? And then test_enrichment will throw an error, unless you've also modified that.
There was a problem hiding this comment.
it does not, the test checks (and proves) that is not the case.
ltatka
left a comment
There was a problem hiding this comment.
- The read_diann function would ideally work with S3 paths too
- read_diann function will always throw errors as the intensity columns are "unexpected"
- The test_enrichment function requires that all columns are intensity columns, so either this function need to be updated or the read_diann function should drop extra annotation columns.
tests/unit_tests/tabular_test.py
Outdated
| expected.index.name = "Protein" | ||
|
|
||
| result = read_diann(mock_data) | ||
| assert_frame_equal(result, expected) |
There was a problem hiding this comment.
@ltatka this is the test I am talking about, the frame contents are the same except for the fact that its always floats, whilst before they could be a mix of ints and floats depending on the input data.
Addressing some of the tings I noted here: #24
and extras that I noticed while reviewing it.
also ... noticed the cli still takes only encyclopedia inputs ... not sure if adding support for the new input is something we want to do. ... also the readme is pretty slim ... I feel like it would be worth adding to it (I mean ... it mentions how to run it but says nothing about what to expect when running it ... or what the use case is ... or references to any of those ...)