Skip to content

Prototype search engine for experiment object store

License

Notifications You must be signed in to change notification settings

braingeneers/search

Repository files navigation

Braingeneers Search

Bringeneers NRP bucket crawler with experiment and file explorer hosted at search.braingeneers.gi.ucsc.edu

NOTE: 2023-04-02-e-hc328_unperturbed containes primary and spike sorted NWB files

Install

pip install -r requirments.txt

Develop

First create a small crawl database

python crawl.py --count 10

Then run the server locally in debug and auto reload mode

make debug-server

Run

Build docker files and start using Docker Compose

make build
make up
make follow

NOTE: docker-compose.yml is configured to be run from the braingeneers server so that it integrates into the mission control reverse proxy exposing this as search.braingeneers.gi.ucsc.edu

h5wasm to read NWB files directly in the browser

h5wasm enables the full hdf5 library to run natively in the browser. Using Emscripten FS.createLazyFile enables providing h5wasm a virtual file backed by http that can use range requests to incrementally access the h5 file over the wire. The paves the way to provide a presigned s3 URL so that a browser based app can directly access an h5 file in a cloud store. Unfortunately you can only generate a presigned URL for a single HTTP method, and h5wasm performs a HEAD to get capabilities (like range requests) before making a GET. To work around this the flask server in thie repo responds to the HEAD request directly and then provides a presigned URL redirection for the GET request so that the browser is directly pulling data from s3. This requires that the headers in the HEAD request provide the right capabilities. This approach has the downside of a redirect for every chunk from the proxy to the client. Another approach taken by flatiron's dendro is to fork h5wasm and use an aborted fetch to just get the content length Here is the detailed sequence of requests and responses that h5wasm makes then leads to this incremental reading:

h5wasm HEAD Request and Response

HEAD /data/aff5f64d-9a69-4ff3-a6fe-13a3f30dca50 HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9
Connection: keep-alive
Host: localhost:5282
Referer: http://localhost:5282/static/worker.js
Sec-Fetch-Dest: empty
Sec-Fetch-Mode: cors
Sec-Fetch-Site: same-origin
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.4 Safari/605.1.15

HTTP/1.1 200 OK
Accept-Ranges: bytes
Cache-Control: max-age=3600
Connection: keep-alive
Content-Length: 4966709395
Content-Type: application/octet-stream
Date: Mon, 18 Mar 2024 15:17:34 GMT
ETag: W/"11795069-4966709395-2024-03-11T19:11:17.335Z"
Keep-Alive: timeout=5
Last-Modified: Mon, 11 Mar 2024 19:11:17 GM

h5wasm First GET Request and Response

GET /data/aff5f64d-9a69-4ff3-a6fe-13a3f30dca50 HTTP/1.1
Accept: */*
Accept-Encoding: identity
Accept-Language: en-US,en;q=0.9
Connection: keep-alive
Host: localhost:5282
Range: bytes=0-1048575
Referer: http://localhost:5282/static/worker.js
Sec-Fetch-Dest: empty
Sec-Fetch-Mode: cors
Sec-Fetch-Site: same-origin
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.4 Safari/605.1.15

HTTP/1.1 206 Partial Content
Accept-Ranges: bytes
Cache-Control: max-age=3600
Connection: keep-alive
Content-Length: 1048576
Content-Range: bytes 0-1048575/4966709395
Content-Type: application/octet-stream
Date: Mon, 18 Mar 2024 15:17:34 GMT
ETag: W/"11795069-4966709395-2024-03-11T19:11:17.335Z"
Keep-Alive: timeout=5
Last-Modified: Mon, 11 Mar 2024 19:11:17 GMT

Sample NWB Files on NRP

s3://braingeneers/ephys/2023-04-02-e-hc328_unperturbed/shared/hc3.28_hckcr1_chip16835_plated34.2_rec4.2.nwb
s3://braingeneers/ephys/2023-04-02-e-hc328_unperturbed/shared/hc3.28_hckcr1_chip16835_plated34.2_rec4.2_kilosort2_curated_s1.nwb

References

Indexing

SQLite FTS5 Extension

Quick full-text search using SQLite

NWB

Neurodata Without Borders(NWB)

A NWB-based dataset and processing pipeline of human single-neuron activity during a declarative memory task

NWB Examples

h5wasm

h5wasm wrapper for h5 from http

How h5wasm accesses files over http via Emscripten lazy loading

GitHub thread on access h5 via range requests

Chunking and indexing note in an issue

React components to visualize and graph h5 data (uses h5wasm)

About

Prototype search engine for experiment object store

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published