Chore/gen cleanup by jspaezp · Pull Request #25 · TalusBio/gopher

jspaezp · 2025-03-27T17:02:08Z

Addressing some of the tings I noted here: #24
and extras that I noticed while reviewing it.

also ... noticed the cli still takes only encyclopedia inputs ... not sure if adding support for the new input is something we want to do. ... also the readme is pretty slim ... I feel like it would be worth adding to it (I mean ... it mentions how to run it but says nothing about what to expect when running it ... or what the use case is ... or references to any of those ...)

wfondrie

These are welcome changes. Thanks!

wfondrie · 2025-03-27T17:19:04Z

pyproject.toml

+  "scipy",
+  "tqdm",
+  "statsmodels",
+  "biopython", # ... we can implement a fasta parser ...


comment: Indeed we could implement a FASTA parser, it just hasn't seemed worth it with how little use Gopher has gotten in recent times. Maybe worth it now though.

fair enough ... although IDK ... biopython is a really heavy dependency and a fasta parser is like 20 lines :P so I usually go with the re-implementing if I am not already using pyteomics. I guess that since this project already depends on scipy+numpy+pandas+matplotlib+statsmodels ... bio is not a big deal

wfondrie · 2025-03-27T17:20:59Z

With respect to the README, I don't think too much is needed. However the docs should be expanded and updated as needed: https://talusbio.github.io/gopher/

jspaezp · 2025-03-27T17:34:15Z

With respect to the README, I don't think too much is needed. However the docs should be expanded and updated as needed: https://talusbio.github.io/gopher/

That one is build on-release right?

gopher/parsers/tabular.py

ltatka · 2025-03-27T18:34:58Z

gopher/parsers/tabular.py

+    schema["Protein.Ids"] = str
+
+    proteins = pd.read_table(proteins_tsv, dtype=schema, usecols=list(schema))
+    proteins["Protein.Ids"] = proteins["Protein.Ids"].str.split(";").str[0]


The current version of gopher requires that the index column be the accessions and that all columns are intensity columns. If you don't drop "Genes", "Protein.Group" etc, downstream functions such as test_enrichment will not work (unless you are also planning on changing those).

they are not being read, so they dont have to be dropped.

But the returned data frame will have them, right? And then test_enrichment will throw an error, unless you've also modified that.

it does not, the test checks (and proves) that is not the case.

ltatka

The read_diann function would ideally work with S3 paths too
read_diann function will always throw errors as the intensity columns are "unexpected"
The test_enrichment function requires that all columns are intensity columns, so either this function need to be updated or the read_diann function should drop extra annotation columns.

jspaezp · 2025-03-27T18:42:33Z

tests/unit_tests/tabular_test.py

    expected.index.name = "Protein"

    result = read_diann(mock_data)
    assert_frame_equal(result, expected)


@ltatka this is the test I am talking about, the frame contents are the same except for the fact that its always floats, whilst before they could be a mix of ints and floats depending on the input data.

No reply after the changes were made

jspaezp added 2 commits March 27, 2025 09:51

feat: efficient data read and pyproject cleanup

8c17372

chore: deleted redundant config

3c4a0dc

jspaezp requested review from ltatka and wfondrie and removed request for wfondrie March 27, 2025 17:02

wfondrie approved these changes Mar 27, 2025

View reviewed changes

jspaezp added 3 commits March 27, 2025 10:22

chore: updated pre-commit

b1c455f

chore: updated gh actions

c31ee0b

chore: black

822c432

ltatka reviewed Mar 27, 2025

View reviewed changes

gopher/parsers/tabular.py Outdated Show resolved Hide resolved

ltatka reviewed Mar 27, 2025

View reviewed changes

gopher/parsers/tabular.py Show resolved Hide resolved

ltatka reviewed Mar 27, 2025

View reviewed changes

ltatka previously requested changes Mar 27, 2025

View reviewed changes

jspaezp commented Mar 27, 2025

View reviewed changes

feat: added explicit tests for s3 on diann data

3cf875a

jspaezp requested review from ltatka and wfondrie March 27, 2025 19:07

wfondrie approved these changes Apr 22, 2025

View reviewed changes

jspaezp merged commit 1888328 into main Apr 22, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chore/gen cleanup#25

Chore/gen cleanup#25
jspaezp merged 6 commits intomainfrom
chore/gen_cleanup

jspaezp commented Mar 27, 2025 •

edited

Loading

Uh oh!

wfondrie left a comment

Uh oh!

wfondrie Mar 27, 2025

Uh oh!

jspaezp Mar 27, 2025

Uh oh!

wfondrie commented Mar 27, 2025

Uh oh!

jspaezp commented Mar 27, 2025

Uh oh!

Uh oh!

Uh oh!

ltatka Mar 27, 2025 •

edited

Loading

Uh oh!

jspaezp Mar 27, 2025

Uh oh!

ltatka Mar 27, 2025

Uh oh!

jspaezp Mar 27, 2025

Uh oh!

ltatka left a comment

Uh oh!

jspaezp Mar 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jspaezp commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wfondrie left a comment

Choose a reason for hiding this comment

Uh oh!

wfondrie Mar 27, 2025

Choose a reason for hiding this comment

Uh oh!

jspaezp Mar 27, 2025

Choose a reason for hiding this comment

Uh oh!

wfondrie commented Mar 27, 2025

Uh oh!

jspaezp commented Mar 27, 2025

Uh oh!

Uh oh!

Uh oh!

ltatka Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jspaezp Mar 27, 2025

Choose a reason for hiding this comment

Uh oh!

ltatka Mar 27, 2025

Choose a reason for hiding this comment

Uh oh!

jspaezp Mar 27, 2025

Choose a reason for hiding this comment

Uh oh!

ltatka left a comment

Choose a reason for hiding this comment

Uh oh!

jspaezp Mar 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jspaezp commented Mar 27, 2025 •

edited

Loading

ltatka Mar 27, 2025 •

edited

Loading