-
Notifications
You must be signed in to change notification settings - Fork 0
Notebooks and Data Assets
Notebooks are separate from login users. (different than current implementation) Notebooks are single workspaces and have a single file space on EFS. Notebooks can be tied to either Users or Teams.
Metadatabase keeps track of Notebooks, including datasets used by Notebooks. Metadatabase keeps track of Teams including which users are part of a team.
Team dataset access is based on lowest common denominator by member users. For example, a team with a google user can only use MAG even if the rest of the team has access to WoS.
A Notebook that uses a high level dataset (wos) cannot be shared with users that do not have access to those datasets. For example, if a Notebook already has datasets that use WoS, it cannot be shared with google users.
Query results do not automatically get sent to a Notebook's file space. Users must manually, through an interface, send results and datasets to notebooks. Users can only send to Notebooks datasets which are made from datasets that the notebook has access to. For example, if a Notebook is created for a Team that includes a google member, a user from IU would not be able to send a WoS based dataset to that notebook.
If an institution looses access to a dataset, the users will no longer be able to access datasets or notebooks that feature datasets they do not have access to, but will not be removed from teams.
Individual users can have a single individual notebook. Individual users can create teams. A team has a single notebook.
Users cannot share individual notebooks with other individuals. Team notebooks are shared with all users in a team. To invite users to a team, the team creator will enter an email. If the email is in the system, that user will be added to the team. If the user is not in the system, the creator will be notified.
There will be Data Assets (data subsets), Tools (Visualizations, Machine learning algorithms, etc), Notebooks, Teams, Projects, and Packages.
Notebooks contain Data Assets, notebook code, and environmental information. A Team has a single shared Notebook. At any point, a Notebook can be "published", a snapshot of the environment is created and a DOI is assigned to that snapshot. We are calling these "publications" for now.
Packages are self-contained data assets and/or tools that can be run independently to produce an output. Packages can be chained together to form a pipeline.
Projects are logical groupings of Teams, Data Assets, and Tools, but do not consider permissions.
Metadata:
Notebooks
- notebook jupyter user id (either the user or team id when notebook is created)
- notebook token
- datasets included in notebook (updated when new datasets are added)
- team id (if a team notebook)
- user id (if an individual notebook)
Teams
- list of users
Projects
- list of teams
- list of data assets
- list of tools
Data Assets (created upon query)
- job id
- job status
- filename of result file
- user who created the asset
- origin dataset
Users
- username
- login token
- institution
Data Sets
- dataset name
- intitution permissions