Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat 363 analysis step #371

Closed
wants to merge 14 commits into from
Closed

Feat 363 analysis step #371

wants to merge 14 commits into from

Conversation

Sun-flow
Copy link
Contributor

No description provided.

@Sun-flow Sun-flow added the enhancement New feature or request label Jun 13, 2023
@Sun-flow Sun-flow requested a review from saskiad June 13, 2023 16:02
@Sun-flow Sun-flow self-assigned this Jun 13, 2023

## These could be subtypes of analysis/processing

class ProcessType(Enum):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this different from the ProcessType in processing.py?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, ProcessName

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, ProcessName

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

processing.py currently does not have a ProcessType class, in my implementation at least. ProcessName would have values that should go here, most likely. I just had some examples in so that we could talk about the general structure first.




class DataTouchInfo:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need doc strings for this class

src/aind_data_schema/datamanipulation.py Outdated Show resolved Hide resolved

## These could be subtypes of analysis/processing

class ProcessType(Enum):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, ProcessName



class DataTouchInfo:
touch_type: TouchType = Field(..., title="Manipulation type")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason the title isn't the same as the name itself? Or vice versa?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i didnt know if using touch as a forward facing field would make sense to users? even though it makes senseo nthe backend. I can change one or the other to match.

notes: Optional[str] = None


class CodeManipulation:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need docstring

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a time when code is used? is that the idea? So not so much code manipulation, but data manipulation? Why is this separate from data touch? Oh because there can be data touches that don't use code... eg manual annotations. ... okay. I'm still not sure if this needs to be its own class or not... either it's a class that is optional or these are optional fields in the data touch class...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes all of these classes are going to be necessary, but I think dividing them up based on topic is useful for organization, and lets us see exactly what we are using. Additionally if more fields need to be added, there's some clear organization for where they go based on what they are related to.




class DataTouchInfo:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need the code manipulation class? (or code information?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataTouch is imported by AnalysisStep and ProcessStep. There are two structure options here:

DataTouch (overarching "Step" class):

  • class for basic info
  • either process/analysis tag
  • class for relevent step (analysis/processing)
  • class for codemanipulation info

but it seems you prefer the separation of Analysis/processing, so alternative looks like:

Process:

  • basic info about process (name/desc)
  • class for DataTouchInfo
  • class for codemanipulation info

Analysis:

  • basic info about analysis (name/desc)
  • class for DataTouchInfo
  • class for codemanipulation info

And some file tracking DataManipulations (we should maybe just change to this name,,,, since DataTouch is so weird...) would be a series of nodes (Process + Analysis Steps) tracking the history.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I realize I got confused - too many different things swimming in my head at once. We want processing and analysis together - but I think then I missed something. Because the DataTouch is still the step, and we need something that has the list of steps, for both processing and analysis.
(and apologies if you jsut removed it in response to this)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me make a figma doc detailing this organization, and I can show it to you later in the week

analysis_name: str = Field(..., description="Name of the analysis method used", title="Analysis name")
analysis_type: AnalysisType = Field(..., description="Type of analysis performed on dataset", title="Analysis type")
touch_info: DataTouchInfo = Field(..., description="General information regarding the data manipulation performed")
analysis_code: CodeManipulation = Field(..., description="Info regarding code used to manipulate data")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I think I prefer this to be in the data touch class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im not sure I am following this comment

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the analysis_code - I think this needs to live in the DataTouch class because it's part of how the data was used.

processing_type: ProcessType = Field(..., description="Type of processing performed on dataset", title="Processing type")
touch_info: DataTouchInfo = Field(..., description="General information regarding the data manipulation performed")
analysis_code: CodeManipulation = Field(..., description="Info regarding code used to manipulate data")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need an Analysis model that creates a list of analysis steps?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah! youve addressed my prior question, lol. It can easily, i would like to discuss briefly what it is that this model is doing first.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry - I think I confused myself. Let me take another look.

@@ -10,6 +10,7 @@

from aind_data_schema.base import AindCoreModel, AindModel
from aind_data_schema.imaging.tile import Tile
from aind_data_schema.datamanipulation import DataTouchInfo
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused - are we keeping this schema? How is this different from the processing in the datamanipulation file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

relic of an earlier edit, ideally we could drop the processing.py file after this update, though we could also have two separate files for analysis + processing if we wanted

"""Description of a single processing step"""

analysis_name: str = Field(..., description="Name of the analysis method used", title="Analysis name")
analysis_type: AnalysisType = Field(..., description="Type of analysis performed on dataset", title="Analysis type")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious if we need an AnalysisType enum - or rather if we can even create it. I think this has to be a description that the scientist provides. Which is sad because they will probably not be good descriptions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thats why I thought analysis type might be useful? at least to have some overarching classes -- that they could then provide further description for

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but we need to have overarching classes of analysis, and I'm not sure we have them...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's the case then that's fine, we won't need an enum. It will be easy to remove in favor of a string.

@saskiad
Copy link
Collaborator

saskiad commented Oct 25, 2023

@Sun-flow can we delete this PR? We already closed this issue, correct? (I might be confusing things so correct me if I'm wrong)

@saskiad
Copy link
Collaborator

saskiad commented Jan 2, 2024

@Sun-flow can you confirm that this PR should be closed and deleted?

@saskiad saskiad closed this Jan 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants