Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating the extract galaxy tools script to add two more columns one for reduced EDAM operation and one for reduced EDAM topic #52

Merged
merged 14 commits into from
Jun 4, 2024

Conversation

EngyNasr
Copy link
Collaborator

@EngyNasr EngyNasr commented Feb 1, 2024

Adding the Jupyter notebook and R scripts that are responsible for updating the tools.tsv to include a reduced EDAM operation column and to produce some figures, more updates and figures are to follow

…dating the tools.tsv to include a reduced EDAM operation column and to produce some figures, more updates and figures are to follow
…perations reduction script along with the resulted updatedtools.tsv file
@paulzierep
Copy link
Collaborator

Tu use this in the CI could you please:

func(df) -> df (with 2 new columns having reduced terms)

…for reduced EDAM operation and one for reduced EDAM topic, such that if a tool has multiple EDAM terms some of them are one the same branch in the EDAM ontology we keep only the ones in the leaf of this branch
@EngyNasr EngyNasr changed the title Adding the Jupyter notebook and R scripts nd one for reduced EDAM topic, such that if a tool has multiplUpdating the extract galaxy tools script to add two more columns one for reduced EDAM operation ae EDAM terms Feb 21, 2024
@EngyNasr EngyNasr changed the title nd one for reduced EDAM topic, such that if a tool has multiplUpdating the extract galaxy tools script to add two more columns one for reduced EDAM operation ae EDAM terms Updating the extract galaxy tools script to add two more columns one for reduced EDAM operation and one for reduced EDAM topic Feb 21, 2024
@paulzierep
Copy link
Collaborator

Thanks @EngyNasr
I did run this locally, and it seems to work.
But can you please:

  • fix linting
  • add comments and function docstring
  • look at
/home/paul/git/galaxyproject/galaxy_tool_extractor/bin/extract_galaxy_tools.py:532: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  return pd.Series([row[0], ', '.join(new_terms)])  # Combine the new terms with commas
/home/paul/git/galaxyproject/galaxy_tool_extractor/bin/extract_galaxy_tools.py:532: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  return pd.Series([row[0], ', '.join(new_terms)])  # Combine the new terms with commas

Copy link
Collaborator

@paulzierep paulzierep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please simplify as suggested

# Convert the cleaned row to a list of EDAM terms using the provided ontology
edam_ontology = get_ontology('https://edamontology.org/EDAM_1.25.owl').load()

terms = cleaned_row
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why ?

# only keep the class if it is not a parent class
if include_class:
new_classes.append(cla)
except Exception as e:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what errors are we getting ?

@@ -516,9 +562,32 @@ def export_tools(

if add_usage_stats:
df = add_usage_stats_for_all_server(df)

# df_edam = df[df['To keep']==True]
df_edam1 =df[df['EDAM operation'].notna()]
Copy link
Collaborator

@paulzierep paulzierep Feb 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry but this seems far to complicated. Can you not just do

df[["EDAM operation (reduced)", "EDAM topic (reduced)"]] = df[["EDAM operation", "EDAM topic"]].map(reduced_edam_term)

and reduced_edam_term is a function that takes the EDAM term as input and return the reduced form.

@paulzierep
Copy link
Collaborator

the test worked on my local branch: paulzierep@6a01007
Can we merge it @bebatut

@bebatut bebatut merged commit bdd3b46 into galaxyproject:main Jun 4, 2024
2 checks passed
neoformit pushed a commit to nomadscientist/galaxy_codex that referenced this pull request Jul 28, 2024
Updating the extract galaxy tools script to add two more columns one for reduced EDAM operation and one for reduced EDAM topic
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants