Datajoint Retinal Ganglion Pipeline

-- Drew Yang

Table of Content

Preparation
Pipeline
STRF Calculation
Visualization
- Commands to work with the visualization component
- Plotting Module
Notebook Exploration

Preparation

Clone this project

git clone <repository>

Install the pipeline component

cd ~/pipeline
make wheel
pip install ./dist/*.whl
# make sure your python environment's /bin is added to system path
rg_pipeline -h

Example Commands

Install the visualization component

Dependency: Please install the pipeline component at first

cd ~/visualization
make wheel
pip install ./dist/*.whl
# make sure your python environment's /bin is added to system path
rg_visual -h

Example Commands

Install requirements for notebooks

cd ~/notebooks
pip install -r requirements.txt

Back To Top

Pipeline

Data Source Manifest

Data source manifest is designed to register multiple datasources with multiple data types: structured like database or csv...; semi-structured like json or xml...; non-structured like text or logs or even an HTTP request...

Note: Currently, it only supports pickle files.

# ~/pipeline/data_source_manifest.json 
[{
    "type": "category/type", 
    "access": "the access of the data source"
    "access2":""
    ...
}]

"type": a pre-defined type of one datasource
- file/pickle
- file/csv
- file/json
- database/mysql
- database/mssql
- database/mongodb
- text/plain
- text/json
- http/json
"access": a specific access of the datasource
- path
- connection string
- credentials
"access2": some other extra add-on info

Back To Top

Data Model

ERD

Model Definition

~/pipeline/rg_pipeline/ingest/experiment.py

Subject
- PK subject_id
- subject_name
Session
- PK session_id
- sample_number
- session_date
- FK subject_id
Stimulation
- PK stimulation_id
- fps
- movie: movie time = fps * n_frames
- movie_shape: derived from movie.shape, for restoring the movie array
- n_frames
- pixel_size
- stim_height
- stim_width
- stimulus_onset
- x_block_size
- y_block_size
- FK session_id
SpikeGroup
- PK spike_group_id
- FK stimulation_id
Spike
- PK spike_id
- spike_time
- spike_movie_time: derived from (spike_time - stimulus_onset) means spike detected respective to the movie playing time, correct?
- sta: Spike-Triggered Average of a number of movie frames before the current spike
- FK spike_group_id

Note: Data Loading part can still be optimized by Model Definition that using dj.Imported/dj.Computed https://tutorials.datajoint.io/beginner/building-first-pipeline/python/importing-data.html

Found Error on macOS

Error: seems like pydot is not compatible with macOS 11.1, already issued a bug at #924

Traceback (most recent call last):
  File "./pipeline/run.py", line 137, in <module>
    main(args)
  File "./pipeline/run.py", line 118, in main
    plot_erd(tpath)
  File "./pipeline/run.py", line 34, in plot_erd
    dj.ERD(schema).save(os.path.join(tpath, 'ERD.svg'), format='svg')
  File "/Users/yam/opt/anaconda3/envs/datajoint/lib/python3.6/site-packages/datajoint/diagram.py", line 347, in save
    f.write(self.make_svg().data)
  File "/Users/yam/opt/anaconda3/envs/datajoint/lib/python3.6/site-packages/datajoint/diagram.py", line 314, in make_svg
    return SVG(self.make_dot().create_svg())
  File "/Users/yam/opt/anaconda3/envs/datajoint/lib/python3.6/site-packages/pydot.py", line 1734, in new_method
    format=f, prog=prog, encoding=encoding)
  File "/Users/yam/opt/anaconda3/envs/datajoint/lib/python3.6/site-packages/pydot.py", line 1933, in create
    raise OSError(*args)
FileNotFoundError: [Errno 2] "dot" not found in path.

Back To Top

Data Loader

Legacy Loader

~/pipeline/rg_pipeline/legacy/loader.py Initially, I wanted to make a general universal loader that can load multiple data types/data sources without specifying the implementation depending on how data looks like(content formats). However, this is only possible when the coming data follows a common industrial or academic format. So that there is no big difference than making it functional.

Data Loading Module

~/pipeline/rg_pipeline/load_utils.py

load_<category>_<type>(datasource:dict)
- load_file_pickle(datasource:dict)
- load_file_csv(datasource:dict)
- load_file_json(datasource:dict)
- load_database_mysql(datasource:dict)
- load_database_mysql(datasource:dict)
- load_text_plain(datasource:dict)
- load_text_json(datasource:dict)
- load_http_json(datasource:dict)

Note: Currently, it only supports pickle files.

Note: datasource:dict is parsed from data_source_manifest.json. For more detail, check here.

Note: Maybe add a feature to load streaming data.

Back To Top

Commands to work with the pipeline component

Installation

# schema is set dynamically as 'USER_retinal'

# Credential config(optional): easier for debugging
# Required: make a credential config file i.e. ~/pipeline/config.ini
rg_pipeline -c <CONFIG_INI_PATH>
# Input credential directly: easier for automation(CI/CD)
# Build tables
rg_pipeline -db tutorial-db.datajoint.io -u <USERNAME> -p <PASSWORD> -b
# Clean up tables
rg_pipeline -db tutorial-db.datajoint.io -u <USERNAME> -p <PASSWORD> -b -cln
# Load data
rg_pipeline -db tutorial-db.datajoint.io -u <USERNAME> -p <PASSWORD> -b -l <DATASOURCE_MANIFEST_JSON_PATH>
# Load data with log
rg_pipeline -db tutorial-db.datajoint.io -u <USERNAME> -p <PASSWORD> -b -l <DATASOURCE_MANIFEST_JSON_PATH> > <LOG_PATH>
# Test
rg_pipeline -db tutorial-db.datajoint.io -u <USERNAME> -p <PASSWORD> -t
# Save dj.ERD as svg - dj.ERD Error
rg_pipeline -db tutorial-db.datajoint.io -u <USERNAME> -p <PASSWORD> -er <DIR>

Back To Top

STRF Calculation

Understanding

Based on the task description and the wiki's explanation of STA plus few articles discussing about receptive field in deep learning, from my understanding, STRF is an averaged movie frame calculated by several movie frames that shown previously before each spike detected.

For example, a spike detected at 3.5 seconds, and at that moment, the stimulus movie is playing frame N. So, if the STRF delay is 5 frames, then the STRF = avg(frame N-4 ~ N). The final result should be a 'blur' frame that has the same pixel height/width as the stimulus movie.

Computation Module

~/pipeline/rg_pipeline/compute_utils.py

get_frame_idx(spike_movie_time:float, fps:float)->int: from spike detected time -> spike-movie relative time -> spike frame index
get_frame_2darray(stimulation:dict, movie:np.ndarray, n_frames:int)->np.ndarray: get spike frame from spike frame index
get_sta(stimulation:dict, movie:np.ndarray, spike_movie_time:float, n_delays:int=STA_DELAY)->np.ndarray: get an average of a number/n_delays of previous frames before the spike frame

Back To Top

Visualization

Commands to work with the visualization component

Installation

# Credential config(optional): easier for debugging
# Required: make a credential config file i.e. ~/visualization/config.ini
rg_visual -c <CONFIG_INI_PATH>
# Input credential directly: easier for automation(CI/CD)
rg_visual -db tutorial-db.datajoint.io -u <USERNAME> -p <PASSWORD>
# then access http://localhost:8050
# other people under the same network can access http://<YOUR_LOCAL_IP>:8050

Plotting Module

./visualization/rg_visual/plot_utils.py

plot_frame(n_frames:int, frame:np.ndarray)->go.Figure(): plot the current selected spike's frame
plot_sta(n_frames:int, n_frames_of_delay:int, sta:np.ndarray)->go.Figure(): plot the STA that averaged by the selected spike's frame and the number of previous frames

Back To Top

Notebook Exploration

# STRF exploration, initial visualization
~/notebooks/visualization_explore.ipynb
# dataset exploration
~/notebooks/data_discovery.ipynb
# database config test
~/notebooks/db_conn_test.ipynb

Back To Top

logger is not ready
DataModel-Loader design can still be optimized
STRF calculation/storage has uncertainty, so that has not been loaded to database yet

Back To Top

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
doc		doc
notebooks		notebooks
pipeline		pipeline
visualization		visualization
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Datajoint Retinal Ganglion Pipeline

-- Drew Yang

Table of Content

Preparation

Clone this project

Install the pipeline component

Install the visualization component

Install requirements for notebooks

Pipeline

Data Source Manifest

Data Model

ERD

Model Definition

Found Error on macOS

Data Loader

Legacy Loader

Data Loading Module

Commands to work with the pipeline component

STRF Calculation

Understanding

Computation Module

Visualization

Commands to work with the visualization component

Plotting Module

Notebook Exploration

About

Releases

Packages

Languages

yambottle/datajoint-retinal-ganglion-pipeline

Folders and files

Latest commit

History

Repository files navigation

Datajoint Retinal Ganglion Pipeline

-- Drew Yang

Table of Content

Preparation

Clone this project

Install the pipeline component

Install the visualization component

Install requirements for notebooks

Pipeline

Data Source Manifest

Data Model

ERD

Model Definition

Found Error on macOS

Data Loader

Legacy Loader

Data Loading Module

Commands to work with the pipeline component

STRF Calculation

Understanding

Computation Module

Visualization

Commands to work with the visualization component

Plotting Module

Notebook Exploration

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages