-
Notifications
You must be signed in to change notification settings - Fork 5
Classifiers
The ability to subset and manipulate the content of the knowledge base to meet the diverse needs of its users is crucial. Some users may need to query the entire content, while others may only require a specific subset of resources that meet certain criteria. This section outlines the essential mechanisms and attributes for classifying or categorizing resources when querying the knowledge base. Ideally, these should be reflected in some form through the API to provide flexibility and accessibility for all users.
The following attributes are currently available to filter or classify OpenAPI resources:
- class: available in the Postgres resource view and the metadata and can be used to distinguish between OpenAPI2 and OpenAPI3 resources
- collection: the harvester used to collect this resource. Current options are 'kin' and 'postman_apis'
- validity: a true/false boolean attribute that can be found in the resource metadata (
isValid
) - size: expressed in bytes and available in the Postgres resource view and the metadata
- version: available the JSON metadata under the
Custom search criteria can naturally be expressed directly in Postgres queries, an many of our research questions can technically be used as a classifier. But not all options can realistically be exposed as API parameters.
We are currently exploring the use of spectral rules to expand the classification of resources. Essentially, any set of spectral rules can be applied to evaluate OpenAPI resources, resulting in a pass/fail status and a score for the entire set or at the individual rule level. The resulting report is stored in the database as a JSON-formatted resource attachment, which can then be utilized to create a query filter or as an analytical dimension. This not only enables us to compute statistics on API compliance with policies and best practices, but also to use the ruleset as a classifier to match a specific definition.
A ruleset can be generic (e.g. a common definition or public policy) or reflect the perpective or definition of specific organizations or individuals. For example, we could have multiple ruleset qualifying an API as 'popular', 'real', 'valid'.
Note that for this to work generically, each ruleset (and rule within that set) must have a clear and unique identifier. These should be stored in our GitHub repo.
There are many other classifiers we would like to associate with the OpenAPI resources and need to investigate how they can be implemented. These include:
- provenance: who is behind this API, who are the custodians
- industry classification (e.g using NAICS)
- sector: public, academic, private
- consumers: who are the user communities
- concepts: what is the API about, keywords, subjects, etc.
- lifecycle stage: where is this API if the producer and consumer lifecycle
- operational: it is up and running somewhere, public / private
- language: which language does the API speaks (ISO 639-1)