-
Notifications
You must be signed in to change notification settings - Fork 10
Home
function idig_view(UUID=FALSE, type="record")
function idig_search(rq="", query=FALSE, sort=FALSE, limit=0, offset=0, fields=c(..), max_items=100000)
function idig_media(mq="", rq="", query=FALSE, sort=FALSE, limit=0, offset=0, fields=c(..), max_items)
function idig_toprecords(rq="", fields=c("scientificname"), count=10)
function idig_topmedia(mq="", rq="", fields=c("scientificname"), count=10
function idig_count(rq="")
function idig_metafields()
- camelCase functions, underscore variables (insert debate here but camelCase and period.separator appear popluar: http://stackoverflow.com/questions/1944910/what-is-your-preferred-style-for-naming-variables-in-r)
- Prefix API end points with "idig", eg "idigSearch". This will mean there is a 1-1 correspondence between the API and the R methods.
- Add in methods that are language/library-specific by tacking on stuff to the end of the endpoint eg "idigSearchGetAll()"
Just drop GET entirely.
The lightest option would be to skip the object for now and go with named params on the idigSearch method which would also require no user code changes in the future:
results <= idigSearch(rq='json string')
# if people want lists now
# query <= list(family=list("asteraceae","fagaceae"))
# results <= idigSearch(rq=jsonlite::toJSON(q))
# later let people write this
# results <= idigSearch(query=idigQuery(stuff))
Later, we can add idigSearch(query=...) to take a query object without having people change their existing code.
I wrote the below before changing my mind: I think I want to do a query object. Nested lists are just going to be trouble. The object can have a "fromJson" method that just takes JSON text for those who are chaining APIs or who want to just write JSON. Otherwise, probably a 1:1 matching of https://github.com/iDigBio/idigbio-search-api/blob/master/app/lib/query-shim.js is best. And update the iDigBio Query format documentation to include the query shim methods so that there is 1:1:1 correspondence between a query format snipped, a query shim method, and an R query object method.
- Rows named "1", "2", etc
- Place UUID in row as a column, always present, user can't turn off
- Allow users to specify column names and pull prefixed dwc:country from data list and unprefixed county from indexTerms list. Why? I imagine at some point indexTerms will be cleaned data and data will be original. This will let people choose. Also, don't have to modify the API returns now with fancy logic to drop some but not all namespace prefixes in the data list. More work on my side to concat indexTerms and data terms though. I assume indexed terms will continue to be un-namespaced
- Alter API to take parameter "fields" which is just a flat list of terms either from indexTerms list or data list and return only those fields.
- No "fields" parameter means return some default set, proposed: occur ID, ins code, collection code, catalog number, genus, species, scientific name, date collected, lat, long -> Enough for most people, no raw data only cleaned, and skips the verbatium and text fields which are chunky. (Only slightly afraid of people thinking this means that this is all iDigBio has in it...)
- Support fields=all to return everything known.
- Try to type & factor (if appropriate) fields that are from indexTerms according to what they are in ES. Data types may not match up but take a look. Never type fields from data. Always force user to type them.
- If "fields" is given, probably have enough information to try pre-allocating data frame. Try out some options here to see if that is a useful performance improvement AFTER everything else works and someone says it's slow :)
Objects can have attributes, cool. We'll role with Francois' suggestions about setting attribution as a attribute of the returned dataframe and having a formatter function return it given a result data frame.
Haven't thought about it.
The other APIs provide access to a some other stuff, taxa, organizations, field names, etc. Probably we should do the same, decision to be made whether that's a bare list of everything or searchable. Taxa is probably the first one to consider and I assume that can be done off ES and added to the search API easily.
Look @ Francois's stuff for JSON definitions between Python and R