Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: fuzon-http implementation #20

Merged
merged 18 commits into from
Oct 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 40 additions & 8 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Binary file added docs/img/fuzon-http.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions fuzon-http/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,9 @@ edition.workspace = true

[dependencies]
actix-web = "4.9.0"
clap = { version = "4.5.18", features = ["derive"] }
env_logger = "0.11.5"
fuzon = { version = "0.2.2", path = "../fuzon" }
log = "0.4.22"
serde = { version = "1.0.210", features = ["derive"] }
serde_json = "1.0.128"
87 changes: 87 additions & 0 deletions fuzon-http/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# fuzon-http

This is a web-server to deploy fuzon as a web-service.
All ontologies are loaded once on server startup, and the indices are kept in memory.

cmdoret marked this conversation as resolved.
Show resolved Hide resolved
## Configuration

The server takes a configuration file as input to determine what ontologies to load, and which collections to load them into. Collections are individual matchers which can be queried independently.

## Installation

```shell
cd fuzon-http
cargo build --release
```

## Usage
cmdoret marked this conversation as resolved.
Show resolved Hide resolved

Start the server with:

```shell
../target/release/fuzon-http --config config/example.json
```

Fuzzy matching queries should use `GET /top?collection={collection}&top={top}&query={query}`.

```shell
# example
$ curl 'http://localhost:8080/top?collection=cell_type&top=3&query=leukocyte'
{
"codes": [
{
"label":"\"myeloid leukocyte\"",
"uri":"<http://purl.obolibrary.org/obo/CL_0000766>",
"score":null
},
{
"label":"\"nongranular leukocyte\"",
"uri":"<http://purl.obolibrary.org/obo/CL_0002087>",
"score":null
},
{
"label":"\"myeloid leukocyte migration\"",
"uri":"<http://purl.obolibrary.org/obo/GO_0097529>",
"score":null
}
]
}
```

To discover available collections, use `GET /list`.

```shell
# example
$ curl 'http://localhost:8080/list'
{
"collections": ["cell_type","source_material","taxon_id"]
}
```

## Example

Here is a minimal example of how fuzon-http may be used from a tool.
It is a bash script that continuously reads user-input, retrieves the top 10 best matching codes from the server and displays them in the terminal.

```bash
#!/bin/bash
keys=""
while IFS= read -r -n1 -s key; do
# delete chars when backspace is pressed
if [[ $key == $'\x7f' ]]; then
keys="${keys%?}"
else
keys="${keys}${key}"
fi
# Clear terminal ouptut
tput ed
echo "input: " $keys
curl -s "http://localhost:8080/top?query=${keys}&top=10&collection=cell_type" | jq -r '.codes[] | "\(.label) \(.uri)"'
# move cursor up 11 lines (1 for input display + 10 codes)
tput cuu 11
done
```

And here it is in action:

![](../docs/img/fuzon-http.svg)
9 changes: 9 additions & 0 deletions fuzon-http/config/example.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
{
"host": "::",
"port": 8080,
"collections": {
"cell_type": ["https://purl.obolibrary.org/obo/cl.owl"],
"source_material": ["https://purl.obolibrary.org/obo/uberon.owl"],
"taxon_id": ["https://purl.obolibrary.org/obo/ncbitaxon/subsets/taxslim.owl"]
}
}
88 changes: 58 additions & 30 deletions fuzon-http/src/main.rs
Original file line number Diff line number Diff line change
@@ -1,9 +1,13 @@
use std::collections::HashMap;
use actix_web::{get, web, App, HttpServer, Responder, Result};
use fuzon::{TermMatcher};
use actix_web::{get, middleware, web, App, HttpServer, Responder, Result};
use clap::Parser;
use fuzon::TermMatcher;
use log::info;
use serde::{Deserialize, Serialize};
use serde_json;
use std::env;
use std::sync::Arc;
use std::fs::File;

// URL query parameters when requesting matching codes
#[derive(Debug, Deserialize)]
Expand All @@ -15,92 +19,116 @@ pub struct CodeRequest {

// Response model containing matching codes
#[derive(Debug, Serialize)]
pub struct MatchResponse {
pub struct CodeMatch {
label: String,
uri: String,
score: Option<f64>,
}

#[derive(Debug, Serialize)]
pub struct MatchResponse {
codes: Vec<CodeMatch>,
}

// Config file structure
# [derive(Clone, Debug, Deserialize)]
struct Config {
collections: HashMap<String, String>,
host: String,
port: u16,
collections: HashMap<String, Vec<String>>,
}

// Shared app state built from config and used by services
#[derive(Clone, Debug)]
struct AppState {
collections: Arc<HashMap<String, TermMatcher>>,
}

impl AppState {
fn from_config(data: Config) -> Self {
let collections = data
let collections = data.clone()
.collections
.into_iter()
.map(|(k, v)| (k, TermMatcher::from_paths(vec![&v]).unwrap())).collect();
.inspect(|(k, _)| info!("Loading collection: {}...", k))
.map(|(k, v)| (
k,
TermMatcher::from_paths(
v.iter().map(|s| &**s).collect()).unwrap()
)
)
.collect();

info!("Initialized with: {:?}", &data);
AppState { collections: Arc::new(collections) }
}
}

// Used for debugging
#[get("/hello/{name}")]
async fn greet(name: web::Path<String>) -> impl Responder {
format!("Hello {name}!")
}

// list collections: /list
#[get("/list")]
async fn list(data: web::Data<AppState>) -> impl Responder {

let mut response = HashMap::new();
let collections : Vec<String> = data.collections.keys().cloned().collect();
response.insert("collections".to_string(), collections);

web::Json(collections)
web::Json(response)

}

// Top matching codes from collection for query: /top?collection={collection}&query={foobar}&top={10}
#[get("/top")]
async fn top(data: web::Data<AppState>, req: web::Query<CodeRequest>) -> Result<impl Responder> {

let top_terms: Vec<MatchResponse> = data.collections
let top_terms: Vec<CodeMatch> = data.collections
.get(&req.collection)
.expect(&format!("Collection not found: {}", req.collection))
.top_terms(&req.query, req.top)
.into_iter()
.map(|t| MatchResponse {
.map(|t| CodeMatch {
label: t.label.clone(), uri: t.uri.clone(), score: None
})
.collect();

Ok(web::Json(top_terms))
Ok(web::Json(MatchResponse{ codes: top_terms }))
}

/// http server to serve the fuzon terminology matching api
#[derive(Parser, Debug)]
#[command(version, about, long_about = None)]
struct Args {
/// Path to the configuration file.
#[clap(short, long)]
config: String,
}


#[actix_web::main] // or #[tokio::main]
async fn main() -> std::io::Result<()> {
env::set_var("RUST_LOG", "fuzon_http=info,actix_web=warn,actix_server=info");
env_logger::init();
let args = Args::parse();
let config_path = args.config;

let config: Config = serde_json::from_reader(
File::open(config_path).expect("Failed to open config file.")
).expect("Failed to parse config.");
let host = config.host.clone();
let port = config.port as u16;

// NOTE: config as inline json for debugging, later this will be a config file
let config_data = r#"
{
"collections": {
"cell_type": "https://purl.obolibrary.org/obo/cl.owl",
"source_material": "https://purl.obolibrary.org/obo/uberon.owl",
"taxon_id": "https://purl.obolibrary.org/obo/ncbitaxon/subsets/taxslim.owl"
}
}"#;
let data = web::block(move ||
serde_json::from_str::<Config>(config_data).unwrap()
AppState::from_config(config)
cmdoret marked this conversation as resolved.
Show resolved Hide resolved
)
.await
.expect("Failed to parse config");
.expect("Failed to initialize state from config.");

HttpServer::new(move || {
App::new()
.app_data(web::Data::new(AppState::from_config(data.clone())))
.service(greet)
.wrap(middleware::Logger::default())
.app_data(web::Data::new(data.clone()))
.service(list)
.service(top)
})
.bind(("127.0.0.1", 8080))?
.bind((host, port))?
.run()
.await
}
1 change: 1 addition & 0 deletions fuzon/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ lazy_static! {
};
}

#[derive(Debug, Clone)]
pub struct TermMatcher {
pub terms: Vec<Term>,
}
Expand Down
Loading