You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
prototype in Jupyter if necessary, but create an application, possibly in another (NMDC or BBOP)? repo
take an 80/20 low hanging fruit approach
save intermediate data; make it easy to backtrack the results of the repairs
keep track of possible losses "already added by me"
NCBI XML -> JSON/MongoDB (Jasper's or Mark's similar approaches) -> DuckDB
not all paths are added to DuckDB yet
attributes may be concatenated with ||| when writing into DuckDB ???
new potential losses for benefit of performance/manageability
lowercase and normalize whitespaces?
are there user-provided attributes that NCBI "failed" to harmonize
take a stance on what the reasonable values are (see submission schema)
suggest opportunities to route unreasonable values into other slots/fields/attributes?
split wherever curies were found and make a unique collection of strings. annotate with multiple ontologies using OAK... using one combined backend or iteratively?
try to include drag remediation?
assume 0 or more occurrences of CURIes with {prefix}{delimiter}{local_id}
where prefix and local_id can be letters, numbers or both, with a min and max len
delimiters: mostly expect : or _. may include soem whitespace
there are some without delimiters... how many? ENVO1234567
just ENV instead of ENVO
ENVO:label pattern
NCIT etc with different length local_ids or with letters
some prefixes with numbers
The text was updated successfully, but these errors were encountered:
i.e. extracting ontology class CURIes from env_broad_scale, env_loall_scale and env_medium
followup with Jasper, Peter, Paramvir and Mikaela
https://www.kbase.us/team/
|||
when writing into DuckDB ???try to include drag remediation?
assume 0 or more occurrences of CURIes with {prefix}{delimiter}{local_id}
where prefix and local_id can be letters, numbers or both, with a min and max len
delimiters: mostly expect
:
or_
. may include soem whitespacethere are some without delimiters... how many? ENVO1234567
just ENV instead of ENVO
ENVO:label pattern
NCIT etc with different length local_ids or with letters
some prefixes with numbers
The text was updated successfully, but these errors were encountered: