notebooks

History

Name		Name	Last commit message	Last commit date
parent directory ..
drs		drs
idc		idc
in progress		in progress
registry		registry
schema-registry		schema-registry
search		search
vus		vus
wes		wes
Ewings Sarcoma - FHIR and DRS.ipynb		Ewings Sarcoma - FHIR and DRS.ipynb
FASPNotebook01-BasicFASP.ipynb		FASPNotebook01-BasicFASP.ipynb
FASPNotebook02.ipynb		FASPNotebook02.ipynb
FASPNotebook06 - Direct identification of file.ipynb		FASPNotebook06 - Direct identification of file.ipynb
FASPNotebook06.ipynb		FASPNotebook06.ipynb
FASPNotebook07-JMJD1C Example.ipynb		FASPNotebook07-JMJD1C Example.ipynb
FASPNotebook08-Validation.ipynb		FASPNotebook08-Validation.ipynb
FASPNotebook09-CRDC-BDC.ipynb		FASPNotebook09-CRDC-BDC.ipynb
FASPNotebook10-PhenoPackets.ipynb		FASPNotebook10-PhenoPackets.ipynb
FASPNotebook11-GECCOviaSDL.ipynb		FASPNotebook11-GECCOviaSDL.ipynb
FASPNotebook12-Elixir.ipynb		FASPNotebook12-Elixir.ipynb
FASPNotebook14-SRAExample.ipynb		FASPNotebook14-SRAExample.ipynb
FASPNotebook15-GTEXExample-GCP.ipynb		FASPNotebook15-GTEXExample-GCP.ipynb
FASPNotebook17-GTEX_TCGA_Federated_Analysis.ipynb		FASPNotebook17-GTEX_TCGA_Federated_Analysis.ipynb
FASPNotebook18-GTEXExample-AWS.ipynb		FASPNotebook18-GTEXExample-AWS.ipynb
FASPNotebookGWAS.ipynb		FASPNotebookGWAS.ipynb
Four WES servers.ipynb		Four WES servers.ipynb
GECCO_Gen3_Public_SSD.ipynb		GECCO_Gen3_Public_SSD.ipynb
GECCO_Gen3_on_SB.ipynb		GECCO_Gen3_on_SB.ipynb
GECCO_Interactive.ipynb		GECCO_Interactive.ipynb
GTEXExample-PFB.ipynb		GTEXExample-PFB.ipynb
Kids First Familial Leukemia - FHIR and DRS.ipynb		Kids First Familial Leukemia - FHIR and DRS.ipynb
Part 1 - FASPNotebook09-CRDC-BDC.ipynb		Part 1 - FASPNotebook09-CRDC-BDC.ipynb
README.md		README.md
docs.json		docs.json

README.md

These are the Jupyter notebooks for GA4GH Federated Analysis Systems Project

These supersede the scripts used in the FASP-Scripts used at GA4GH Plenary 2020.

[TOC]

FASPScripts Notebooks

.

The notebooks follow a basic three step pattern used throughout FASP. Each step corresponds to a different GA4GH API as outlined here

Data Connect - to identify subjects and samples of interest based on attrinutes of those subjects and samples
Data Repository Service DRS - to obtain authorized access to files (genomic sequences)
Workflow Execution Service - WES - to perform a workflow on those files

In any notebook more than one implementation of the given API may be used at each step where different data sources need to be searched, where files are in different cloud locations, or where workflow needs to be performed local to those files.

Some notebooks use a non GA4GH API which performs equivalent functionality. The motivation for each notebook was to search particular datasets together in a federated way. Where those data were not available through a GA4GH API a proprietary API was used. In some cases the data sources used in notebooks were created for purposes of demo/exploration. In some cases this was necessary to create scrambled versions of controlled access datasets. In other cases controlled access subject and specimen data were searched but were accessed from private stores maintained under access control.

In all cases where controlled access sequence data was used it remains under the access control of the repositories that make it available (EGA, NIH Cloud Platforms).

The table below indicates for each notebook where a GA4GH API could be used (blue) and where a proprietary API (grey) was used.

**Prerequisites to run notebooks

fasp package - install (e.g. pip) from fasp-scripts directory
Settings file
- The examples directory contains a template settings file with a number of parameters for the FASP scripts. Place a copy of this file in your file system and set the environment variable FASP_SETTINGS to point to it. Edit the settings as appropriate.
Python 3
- See the code for the modules required
A folder in your home directory called .keys containing keys for various services. Not all keys required for all scripts.
- bdc_credentials.json - api_key file obtained from BioDataCatalyst
- crdc_credentials.json - api_key file obtained from Cancer Research Data Commons
- anvil_credentials.json - api_key file obtained from Anvil
- sevenbridges_keys.json - keys for cgc and or cavatica
The following modules are used by different scripts. All scripts are unlikely to be relevant to all users these modules are not installed with the fasp package. Please install those needed for the scripts you will run.
- Google Life Sciences API enabled for your GCP account
- BigQuery python libraries - for scripts that use BigQuery
- Seven Bridges API
- pyega3 - EGA client libraries for download. See also EGA documentation for client API.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Files

notebooks

notebooks

README.md

FASPScripts Notebooks

**Prerequisites to run notebooks

Collapse file tree

Files

notebooks

Directory actions

More options

Directory actions

More options

Latest commit

History

notebooks

Folders and files

parent directory

README.md

FASPScripts Notebooks

**Prerequisites to run notebooks