Skip to content

Chore/gen cleanup#25

Merged
jspaezp merged 6 commits intomainfrom
chore/gen_cleanup
Apr 22, 2025
Merged

Chore/gen cleanup#25
jspaezp merged 6 commits intomainfrom
chore/gen_cleanup

Conversation

@jspaezp
Copy link
Copy Markdown
Contributor

@jspaezp jspaezp commented Mar 27, 2025

Addressing some of the tings I noted here: #24
and extras that I noticed while reviewing it.

also ... noticed the cli still takes only encyclopedia inputs ... not sure if adding support for the new input is something we want to do. ... also the readme is pretty slim ... I feel like it would be worth adding to it (I mean ... it mentions how to run it but says nothing about what to expect when running it ... or what the use case is ... or references to any of those ...)

@jspaezp jspaezp requested review from ltatka and wfondrie and removed request for wfondrie March 27, 2025 17:02
Copy link
Copy Markdown
Member

@wfondrie wfondrie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are welcome changes. Thanks!

"scipy",
"tqdm",
"statsmodels",
"biopython", # ... we can implement a fasta parser ...
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment: Indeed we could implement a FASTA parser, it just hasn't seemed worth it with how little use Gopher has gotten in recent times. Maybe worth it now though.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough ... although IDK ... biopython is a really heavy dependency and a fasta parser is like 20 lines :P so I usually go with the re-implementing if I am not already using pyteomics. I guess that since this project already depends on scipy+numpy+pandas+matplotlib+statsmodels ... bio is not a big deal

@wfondrie
Copy link
Copy Markdown
Member

With respect to the README, I don't think too much is needed. However the docs should be expanded and updated as needed: https://talusbio.github.io/gopher/

@jspaezp
Copy link
Copy Markdown
Contributor Author

jspaezp commented Mar 27, 2025

With respect to the README, I don't think too much is needed. However the docs should be expanded and updated as needed: https://talusbio.github.io/gopher/

That one is build on-release right?

schema["Protein.Ids"] = str

proteins = pd.read_table(proteins_tsv, dtype=schema, usecols=list(schema))
proteins["Protein.Ids"] = proteins["Protein.Ids"].str.split(";").str[0]
Copy link
Copy Markdown
Member

@ltatka ltatka Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current version of gopher requires that the index column be the accessions and that all columns are intensity columns. If you don't drop "Genes", "Protein.Group" etc, downstream functions such as test_enrichment will not work (unless you are also planning on changing those).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they are not being read, so they dont have to be dropped.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the returned data frame will have them, right? And then test_enrichment will throw an error, unless you've also modified that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does not, the test checks (and proves) that is not the case.

ltatka
ltatka previously requested changes Mar 27, 2025
Copy link
Copy Markdown
Member

@ltatka ltatka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The read_diann function would ideally work with S3 paths too
  • read_diann function will always throw errors as the intensity columns are "unexpected"
  • The test_enrichment function requires that all columns are intensity columns, so either this function need to be updated or the read_diann function should drop extra annotation columns.

expected.index.name = "Protein"

result = read_diann(mock_data)
assert_frame_equal(result, expected)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ltatka this is the test I am talking about, the frame contents are the same except for the fact that its always floats, whilst before they could be a mix of ints and floats depending on the input data.

@jspaezp jspaezp requested review from ltatka and wfondrie March 27, 2025 19:07
@jspaezp jspaezp dismissed ltatka’s stale review April 22, 2025 19:52

No reply after the changes were made

@jspaezp jspaezp merged commit 1888328 into main Apr 22, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants