From 8bbcb3875ddefdb41e14ed88d925b977aeb83e65 Mon Sep 17 00:00:00 2001 From: Csaky Date: Wed, 17 Apr 2024 16:39:49 -0700 Subject: [PATCH 1/5] Add basic docs --- docs/Architecture-Hosted.md | 45 +++ docs/Architecture.md | 57 +++ docs/Authentication.md | 42 +++ docs/Buckets.md | 18 + docs/Endpoint-Notes.md | 68 ++++ docs/Hosted-Service-Onboarding.md | 42 +++ docs/Hosting-Considerations.md | 16 + docs/Metadata-Tag.md | 85 +++++ docs/Permissions.md | 180 +++++++++ docs/Product-Roadmap.md | 107 ++++++ docs/Self-Hosting-COMS.md | 29 ++ docs/Synchronization.md | 27 ++ docs/Testing.md | 492 +++++++++++++++++++++++++ docs/Use-Case-Examples.md | 35 ++ docs/about-us.md | 7 - docs/configuration.md | 131 ++++++- docs/getting-started.md | 7 - docs/images/coms_architecture.png | Bin 0 -> 39884 bytes docs/images/coms_erd_audit.png | Bin 0 -> 3171 bytes docs/images/coms_erd_public.png | Bin 0 -> 76397 bytes docs/images/coms_network_flow.png | Bin 0 -> 15933 bytes docs/images/coms_self_architecture.png | Bin 0 -> 14735 bytes docs/images/coms_sync_flow.png | Bin 0 -> 33073 bytes docs/images/queue_manager_state.png | Bin 0 -> 18152 bytes docs/index.md | 45 ++- docs/tips-and-tricks.md | 7 - mkdocs.yml | 25 +- 27 files changed, 1427 insertions(+), 38 deletions(-) create mode 100644 docs/Architecture-Hosted.md create mode 100644 docs/Architecture.md create mode 100644 docs/Authentication.md create mode 100644 docs/Buckets.md create mode 100644 docs/Endpoint-Notes.md create mode 100644 docs/Hosted-Service-Onboarding.md create mode 100644 docs/Hosting-Considerations.md create mode 100644 docs/Metadata-Tag.md create mode 100644 docs/Permissions.md create mode 100644 docs/Product-Roadmap.md create mode 100644 docs/Self-Hosting-COMS.md create mode 100644 docs/Synchronization.md create mode 100644 docs/Testing.md create mode 100644 docs/Use-Case-Examples.md delete mode 100644 docs/about-us.md delete mode 100644 docs/getting-started.md create mode 100644 docs/images/coms_architecture.png create mode 100644 docs/images/coms_erd_audit.png create mode 100644 docs/images/coms_erd_public.png create mode 100644 docs/images/coms_network_flow.png create mode 100644 docs/images/coms_self_architecture.png create mode 100644 docs/images/coms_sync_flow.png create mode 100644 docs/images/queue_manager_state.png delete mode 100644 docs/tips-and-tricks.md diff --git a/docs/Architecture-Hosted.md b/docs/Architecture-Hosted.md new file mode 100644 index 0000000..8b9f784 --- /dev/null +++ b/docs/Architecture-Hosted.md @@ -0,0 +1,45 @@ +This page outlines the architecture and deployment features of the BC Gov Hosted COMS service. It is mainly intended for a technical audience, and for people who want to have a better understanding of how we have the service deployed. + +**Note:** For more details of the COMS application itself and how it works, see the [Architecture](Architecture) overview. + +## Table of Contents + +- [Infrastructure](#infrastructure) +- [High Availability](#high-availability) +- [Network Connectivity](#network-connectivity) +- [Database connection Pooling](#database-connection-pooling) +- [Horizontal Autoscaling](#horizontal-autoscaling) + +## Infrastructure + +The BC Govt. Hosted COMS service runs on the OpenShift container ecosystem. The following diagram provides a general logical overview of main component relations. Main network traffic flows are shown in fat arrows, while secondary network traffic relations are shown with a simple black line. + +![Hosted COMS Architecture](images/coms_architecture.png) + +**Figure 1 - The general infrastructure and network topology of the BC Govt. hosted COMS** + +### High Availability + +The COMS API and Database are all designed to be highly available within an OpenShift environment. The Database achieves high availability by leveraging [Patroni](https://patroni.readthedocs.io/en/latest/). COMS is designed to be a scalable and atomic microservice. On the OCP4 platform, there can be between 2 to 16 running replicas of the COMS microservice depending on service load. This allows the service to reliably handle a large variety of request volumes and scale resources appropriately. + +### Network Connectivity + +In general, all network traffic enters through the BC Govt. API Gateway. A specifically tailored Network Policy rule exists to allow only network traffic we expect to receive from the API Gateway. When a client connects to the COMS API, they will be going through OpenShift's router and load balancer before landing on the API gateway. That connection then gets forwarded to one of the COMS API pod replicas. Figure 1 represents the general network traffic direction with the outlined fat arrows. The direction of those arrows represents which component is initializing the TCP/IP connection. + +COMS uses a database network pool to maintain persistent database connections. Pooling allows the service to avoid the overhead of repeated TCP/IP 3-way handshakes to start a connection. By reusing existing connections in a network pool, we can pipeline and improve network efficiency. We pool connections from COMS to Patroni within our architecture. The OpenShift load balancer follows general default Kubernetes scheduling behavior. + +### Database connection Pooling + +We introduced network pooling for Patroni connections to mitigate network traffic overhead. As our volume of traffic increased, it became expensive to create and destroy network connections for each transaction. While low volumes of traffic are capable of operating without any notable delay to the user, we started encountering issues when scaling up and improving total transaction flow within COMS. + +By reusing connections whenever possible, we were able to avoid the TCP/IP 3-way handshake done on every new connection. Instead we could leverage existing connections to pipeline traffic and improve general efficiency. We observed up to an almost 3x performance increase in total transaction volume flow by switching to pooling. + +### Horizontal Autoscaling + +In order to make sure our application can horizontally scale (run many copies of itself), we had to ensure that all processes in the application are self-contained and atomic. Since we do not have any guarantees of which pod instance would be handling what task at any specific moment, the only thing we can do is to ensure that every unit of work is clearly defined and atomic so that we can prevent situations where there is deadlock, or double executions. + +While implementing Horizontal Autoscaling is relatively simple by using a [Horizontal Pod Autoscaler](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/) construct in OpenShift, we can only take advantage of it if the application is able to handle the different types of lifecycles. Based on usage metrics such as CPU and memory load, the HPA can increase or decrease the number of replicas on the platform in order to meet the demand. + +We found that in our testing, we were able to reliably scale up to around 17 pods before we began to crash out our Patroni database. While we haven't been able to reliably isolate the cause of this, we suspect that the underlying Postgres database can only handle up to 100 concurrent connections (and is thus ignoring Patroni's max connection limit of 500) or that the database containers are simply running out of memory before being able to handle more connections. As such, this is why we decided to cap our HPA to a maximum of 16 pods at this time. + +Our current limiting factor for scaling higher is the ability for our database to support more connections for some reason or another. If we get into the situation where we need to scale past 16 pods, we will need to consider more managed solutions for pooling db connections such as [PgBouncer](https://www.pgbouncer.org/). diff --git a/docs/Architecture.md b/docs/Architecture.md new file mode 100644 index 0000000..899e395 --- /dev/null +++ b/docs/Architecture.md @@ -0,0 +1,57 @@ +This page outlines the general architecture and design principles of COMS. It is mainly intended for a technical audience, and for people who want to have a better understanding of how the system works. + +## Table of Contents + +- [Infrastructure](#infrastructure) +- [Database Structure](#database-structure) +- [Code Design](#code-design) + +## Infrastructure + +![COMS Architecture](images/coms_self_architecture.png) + +**Figure 1 - The general infrastructure and network topology of COMS** + +## Database Structure + +The PostgreSQL database is written and handled via managed, code-first migrations. We generally store tables containing users, objects, buckets, permissions, and how they relate to each other. As COMS is a back-end microservice, lines of business can leverage COMS without being tied to a specific framework or language. The following figures depict the database schema structure as of April 2023 for the v0.4.0 release. + +![COMS Public ERD](images/coms_erd_public.png) + +**Figure 3 - The public schema for a COMS database** + +Database design focuses on simplicity and succinctness. It effectively tracks the user, the object, the bucket, the permissions, and how they relate to each other. We enforce foreign key integrity by invoking onUpdate and onDelete cascades in Postgres. This ensures that we do not have dangling references when entries are removed from the system. Metadata and tags are represented as many-to-many relationships to maximize reverse search speed. + +![COMS Audit ERD](images/coms_erd_audit.png) + +**Figure 4 - The audit schema for a COMS database** + +We use a generic audit schema table to track any update and delete operations done on the database. This table is only modified by database via table triggers, and is not normally accessible by the COMS application itself. This should meet most general security, tracking and auditing requirements. + +## Code Design + +COMS is a relatively small and compact microservice with a very focused approach to handling and managing objects. However, not all design choices are self-evident just from inspecting the codebase. The following section will cover some of the main reasons why the code was designed the way it is. + +### Organization + +The code structure in COMS follows a simple, layered structure following best practice recommendations from Express, Node, and ES6 coding styles. The application has the following discrete layers: + +| Layer | Purpose | +| ---------- | -------------------------------------------------------------------------------------------- | +| Controller | Contains controller express logic for determining what services to invoke and in what order | +| DB | Contains the direct database table model definitions and typical modification queries | +| Middleware | Contains middleware functions for handling authentication, authorization and feature toggles | +| Routes | Contains defined Express routes for defining the COMS API shape and invokes controllers | +| Services | Contains logic for interacting with either S3 or the Database for specific tasks | +| Validators | Contains logic which examines and enforces incoming request shapes and patterns | + +Each layer is designed to focus on one specific aspect of business logic. Calls between layers are designed to be deliberate, scoped, and contained. This hopefully makes it easier to tell at a glance what each piece of code is doing and what it depends on. For example, the validation layer sits between the routes and controllers. It ensures that incoming network calls are properly formatted before proceeding with execution. + +#### Middleware + +COMS middleware focuses on ensuring that the appropriate business logic filters are applied as early as possible. Concerns such as feature toggles, authentication and authorization are handled here. Express executes middleware in the order of introduction. It will sequentially execute and then invoke the next callback as a part of its call stack. Because of this, we must ensure that the order we introduce and execute our middleware adhere to the following pattern: + +1. Run the `require*` middleware functions first (these generally invole the middleware found in `featureToggle.js`) +2. Validation and structural cheks +3. Permission and authorization checks +4. Any remaining middleware hooks before invoking the controller diff --git a/docs/Authentication.md b/docs/Authentication.md new file mode 100644 index 0000000..cc5b231 --- /dev/null +++ b/docs/Authentication.md @@ -0,0 +1,42 @@ +This page describes how to authenticate requests to the COMS API. The [Authentication Modes](Configuration#authentication-modes) must be enabled in the COMS configuration. + +**Note:** The BC Gov Hosted COMS service only allows OIDC Authentication using JWT's issued by the [Pathfinder SSO `standard` keycloak realm](https://github.com/bcgov/sso-keycloak/wiki#standard-service)). + +## OIDC Authentication + +With [OIDC mode](Configuration#oidc-keycloak) enabled, requests to the COMS API can be authenticated using a **User ID token** (JWT) issued by an OIDC authentication realm. The JWT should be added in an Authorization header (type `Bearer` token). + +COMS will only accept JWT's issued by one OIDC realm (specified in the COMS config). JWT's are typically issued to an application and saved to a user's browser when he/she signs-in to a website through the [Authorization Code Flow](https://openid.net/specs/openid-connect-core-1_0.html#CodeFlowAuth). Both the website (client app) and the instance of COMS must be [configured to use the same OIDC authentication realm](https://github.com/bcgov/common-object-management-service/blob/master/app/README.md#keycloak-variables) in order for the JWT to be valid. + +When COMS receives the request, it will validate the JWT (by calling the OIDC realm's token endpoint). The JWT is a reliable way of verifying the the user's identity on which the COMS permission model is based. + +The authentication when downloading an object also uses S3 pre-signed URLs: + +### Authentication flow for readObject + +Reference: [API Specification](https://coms.api.gov.bc.ca/api/v1/docs#tag/Object/operation/readObject) for more details. + +A common use case for COMS is to download a specific object from object storage. +Depending on the `download` mode specified in the request, the COMS `readObject` endpoint will return one of the following: + +1. The file directly from S3, by first doing a HTTP 302 redirect to a temporary pre-signed S3 object URL +2. The file streamed/proxied through COMS +3. The temporary pre-signed S3 object URL itself + +COMS uses the redirect flow by default because it avoids unnecessary network hops. For significantly large object transactions, redirection also has the added benefit of maximizing COMS microservice availability. Since the large transaction does not pass through COMS, it is able to remain capable of handling other client requests. + +![COMS Network Flow](images/coms_network_flow.png) + +**Figure 2 - The general network flow for a typical COMS object request** + +## Basic Auth + +If [Basic Auth Mode](Configuration#basic-auth) is enabled in your COMS instance, requests to the COMS API can be authenticated using an HTTP Authorization header (type `Basic`) containing the username and password configured in COMS. + +This mode offers more direct access for a 'service account' authorized in the scope of the application rather than for a specific user and by-passes the COMS object/bucket permission model. + +Basic Auth mode is not available on the BC Gov hosted COMS service. + +## Unauthenticated Mode + +[Unauthenticated Mode](Configuration#unauthenticated-auth) configuration is generally recommended when you expect to run COMS in a highly secured network environment and do not have concerns about access control to objects as you have another application handling that already. diff --git a/docs/Buckets.md b/docs/Buckets.md new file mode 100644 index 0000000..86499c1 --- /dev/null +++ b/docs/Buckets.md @@ -0,0 +1,18 @@ + +### Configuring Buckets + +- COMS is [configured with a 'default' bucket](Configuration#object-storage). Various object management endpoints will use this bucket if no `bucketId` parameter is provided. (**Note:** the default bucket fall-back behaviour is not available in the BC Gov Hosted COMS service.) + +- Additional buckets can be added to the COMS system using the [createBucket](https://coms.api.gov.bc.ca/api/v1/docs#tag/Bucket/operation/createBucket) endpoint. + +- When a bucket is created, if the createBucket API request is authenticated with a User ID token (JWT), that user will be granted all [5 permissions](Permissions#permission-codes). Bucket Permissions can be granted to other users ([bucketAddPermissions](https://coms.api.gov.bc.ca/api/v1/docs#tag/Permission/operation/bucketAddPermissions)), if the request is authenticated with a JWT for a user with `MANAGE` permission. + +If you are self-hosting COMS you can also manage permissions for any object or bucket by using these endpoints with [basic authentication](Authentication#basic-auth). + +### Using the Bucket **Key** + +When you create a bucket in COMS, technically you are 'mounting' your S3 bucket (actual bucket provisioned) at a specified path in the `key` property of the [createBucket](https://coms-dev.api.gov.bc.ca/api/v1/docs#tag/Bucket/operation/createBucket) request body. + +COMS will only operate with objects at that 'folder' within the actual bucket. A COMS `bucket` can more accurately be thought of as a 'mount' to a single path within a bucket. + +To work with objects in 'sub-folders' (with other prefixes), you can create multiple COMS 'buckets' mounted at different paths by specifying different keys. diff --git a/docs/Endpoint-Notes.md b/docs/Endpoint-Notes.md new file mode 100644 index 0000000..5b5463f --- /dev/null +++ b/docs/Endpoint-Notes.md @@ -0,0 +1,68 @@ +This page outlines the general usage patterns and organization of the COMS API. This article is intended for a technical audience, and for people who are planning on using the API endpoints. + +**The COMS API is documented using the [Open API Specification](https://coms.api.gov.bc.ca/api/v1/docs)** + +## Table of Contents + +- [Bucket](#bucket) +- [Object](#object) + - [Metadata](#metadata) + - [Tag](#tag) + - [Versions](#versions) +- [Permission](#permission) +- [Sync](#sync) +- [User](#user) + +## Bucket + +Bucket operations offer the usual CRUD operations for bucket resource management. For example: + +- `CREATE /bucket` and `PATCH /bucket/{bucketId}` will pre-emptively check to see if the proposed credential changes represent a network-accessible bucket. These endpoints will yield an error if it is unable to validate the bucket. + +## Object + +Object endpoints directly influence and manipulate S3 objects and information inherent to them. These endpoints serve as the main core of COMS, focusing on CRUD operations for the objects themselves. + +- Uploading (`POST /object`) or updating an object ( `POST /object/{objectId}`) accepts a file in a multipart/form-data body. You can include metadata (via headers) and tags (using query params) in this request. +- `GET /object/{objectId}` is the main endpoint for users to directly access and download a single object. +- `HEAD /object/{objectId}` should be used for situations where you need to get information about the object, but do not want the binary stream of the object itself. +- `DELETE /object/{objectId}` deletes either the object or a specific version of the object. COMS follows the S3 standard for [deleting versioned objects](https://docs.aws.amazon.com/AmazonS3/latest/userguide/DeletingObjectVersions.html) + - If versioning is enabled, calling `/object/{objectId}` is a soft-delete, adding a 'delete-marker' version. To restore this object, remove the delete-marker with `/object/{objectId}?versionId={VersionId of delete-marker}`. To hard-delete a versioned object, you must delete the last version `/object/{objectId}?versionId={last version}`. + - Calling in the Delete endpoint on a bucket without versioning is a hard-delete. +- The `GET /object` search and `PATCH /object/{objectId}/public` public toggle require a backing database in order to function. + +### Metadata + +Metadata operation endpoints directly focus on the manipulation of metadata of S3 Objects. Each endpoint will create a copy of the object with the modified metadata attached. + +More details found here: [Metadata and Tags](Metadata-Tag) + +### Tag + +Tag operation endpoints directly focus on the manipulation of tags of S3 Objects. Unlike Metadata, Tags can be modified without the need to create new versions of the object. + +More details found here: [Metadata and Tags](Metadata-Tag) + +### Versions + +Version specific operations focus on listing and discovering versioning information known by COMS. While the majority of version-specific operations are available as query parameters in the Objects endpoints, the `GET /object/{objectId}/version` endpoint focuses on letting users discover and list what versions are available to work with. + +## Permission + +Permission operation endpoints directly focus on associating users to objects with specific permissions. All of these endpoints require a database to function. Existing permissions can be searched for using `GET /permission/object` and `GET /permission/bucket`, and standard create, read and delete operations for permissions exist to allow users to modify access control for specific objects they have management permissions over. + +More details found here: [Permissions](Permissions) + +## Sync + +*Available in COMS v0.7+* + +Sync endpoints allow synchronizing COMS' internal state with that of the actual S3 bucket/object. This can be useful for setting up a S3 bucket with preexisting files for use with COMS without having to re-upload everything through the COMS API, or for synchronizing changes made through an external S3 client (e.g. S3 Browser, Cyberduck etc) to an object already managed by COMS. + +API calls to the sync endpoints do not immediately add all detected changes to COMS' internal database; instead, they are added to a queue where they are eventually processed. The endpoint `GET /sync/status` returns the number of items that are currently sitting in this queue. + +At the time of writing, synchronization is not done automatically, so the sync endpoints must be used in order for COMS to know of any changes to the bucket/object. + +## User + +User operation endpoints focus on exposing known tracked users and identity providers. These endpoints serve as a reference point for finding the right user and identity to manipulate in the Permission endpoints. As COMS is relatively agnostic to how a user logs in (it only cares that you exist), the onus of determining which identity provider a user uses falls onto the line of business to handle, should that be something that needs monitoring. diff --git a/docs/Hosted-Service-Onboarding.md b/docs/Hosted-Service-Onboarding.md new file mode 100644 index 0000000..9ee2878 --- /dev/null +++ b/docs/Hosted-Service-Onboarding.md @@ -0,0 +1,42 @@ + +The COMS API is available as a hosted service for BC Government client applications. + +Some important aspects of the hosted service to consider: + +### Authentication + +- Requests to COMS API requests must be authorized using a **User ID token** (OAuth JWT) issued in the Pathfinder SSO ['Standard'](https://github.com/bcgov/sso-keycloak/wiki#standard-service) realm. Typically a user would sign-in to your app (website) and your app would call COMS with that user's JWT. + +- Basic Auth or authentication using a service-account (eg: client credentials) is currently not available on the Hosted COMS service. This is a feature if you are self-hosting. + +### Acquiring a Bucket + +- Object Storage buckets must be obtained by the client. Any S3 compatible bucket will work (for example: AWS S3 and Minio). OCIO provide a low-cost [object Storage service](https://ssbc-client.gov.bc.ca/services/ObjectStorage/overview.htm). NRM clients can request a bucket through the [Optimization Team](https://apps.nrs.gov.bc.ca/int/confluence/display/OPTIMIZE/NRM+Object+Storage+Service). + +- Once provisioned, you can add your bucket to COMS using the [createBucket](https://coms.api.gov.bc.ca/api/v1/docs#tag/Bucket/operation/createBucket) endpoint. See: [Managing Buckets](Buckets). + +- **Bucket credentials** (`Access Key ID` and `Secret Access Key`) are stored in the database as encrypted strings. Encryption is done by NodeJS's internal `crypto` library. The key for encryption is assigned to a `SERVER_PASSPHRASE` environment variable, and is only available inside the scope of the COMS app container. + +### Privacy Controls + +- The stricter [Privacy Controls](Configuration#privacy-controls) setting is enabled in the Hosted service (requires `READ` permission on bucket or object to discover or access the file and related data). This removes the abiility to search for objects that you don't have permissions for. + +### Additional features + +- **BCBox Integration:** Using the Hosted COMS service has the added benefit of being able to integrate your application with [BCBox](https://bcbox.nrs.gov.bc.ca/) - a hosted drop-box type interface for sharing files. + +- A **Synchronization** feature is [coming soon](Product-Roadmap) that will allow COMS to manage objects that were in the bucket or are handled outside of the COMS API. + +### Environments + +- As part of your development workflow, ensure your application is using the correct COMS environment. COMS only accepts JWT's issued in the corresponding SSO `standard` realm. + + COMS environments: + - Development: [https://coms-dev.api.gov.bc.ca/api/v1/](https://coms.api.gov.bc.ca/api/v1/) + - Test: [https://coms-test.api.gov.bc.ca/api/v1/](https://coms.api.gov.bc.ca/api/v1/) + - Production: [https://coms.api.gov.bc.ca/api/v1/](https://coms.api.gov.bc.ca/api/v1/) +

+*** +
+ +**Note:** Please also review the [Hosting Considerations](Hosting-Considerations) page, and reasons to [self-host](Self-Hosting-COMS). diff --git a/docs/Hosting-Considerations.md b/docs/Hosting-Considerations.md new file mode 100644 index 0000000..3b77ca6 --- /dev/null +++ b/docs/Hosting-Considerations.md @@ -0,0 +1,16 @@ +### Should I self-host COMS or use the hosted service? + +Feature Comparison: + +|   Feature |   Hosted |   Self-Hosted | +| :--- | :--- | :--- | +|   Keycloak Realm |   SSO '[Standard Realm](https://github.com/bcgov/sso-keycloak/wiki#standard-service)' |   any OIDC realm +|   IDP support |   `IDIR`
  `Basic BCeID`
  `Business BCeID` |   Configurable +|   [BCBox](https://bcbox.nrs.gov.bc.ca/) integration | | +|   Hosting Platform |   [OpenShift](Architecture-Hosted#infrastructure) |   [Source Code](https://github.com/bcgov/common-object-management-service/)
  [Docker](https://hub.docker.com/r/bcgovimages/common-object-management-service/)
  [OpenShift](Architecture-Hosted#infrastructure) +|   Database Custodians |   Us |   You +|   Object Storage Custodians |   You |   You +|   Multi-bucket support | | +|   Strict [Privacy mode](Configuration#privacy-controls) | |   Configurable +|   [No-Auth mode](Configuration#unauthenticated)| |   Configurable +|   Custom configuration options | | diff --git a/docs/Metadata-Tag.md b/docs/Metadata-Tag.md new file mode 100644 index 0000000..ee02559 --- /dev/null +++ b/docs/Metadata-Tag.md @@ -0,0 +1,85 @@ +This page outlines the general design used for managing Metadata and Tags on S3 Objects. This page is mainly targeted for users and for people who are planning on implementing and leveraging the API endpoints. + +## Table of Contents + +- [Overview](#overview) + - [Metadata](#metadata) + - [Tag](#tag) +- [Usage in COMS](#usage-in-coms) + - [General Operations](#general-operations) + - [Search](#search) + +## Overview + +In general, metadata is "data that provides information about other data", but is not considered a part of the content of the data itself. Your line of business may require metadata to do things like the following: + +- Describe the contents of the object +- Explain the structure of the object +- Track administrative lifecycles of the object +- Reference other related objects +- Record legal/licensing information about the object + +For these scenarios, having a pragmatic way to assign, manage, and lookup these pieces of metadata in an effective way is indispensable. While S3 does support assigning and managing metadata and tags, the S3 API does not provide a way to efficiently search for objects using metadata and tags. This is where COMS can fill in the gap. + +### Metadata + +S3 supports the the manipulation of metadata on S3 objects. The key behavior to understand with metadata is that in S3, metadata is considered a part of the object definition itself. As such, each operation on metadata will create a copy of the object with the modified metadata attached. When the metadata for an object has to change, if the object resides in a version-enabled bucket, it will create a new version of the object with the new metadata and a copy of the original object bytestream. + +Other general key notes to consider when implementing user-defined metadata are the following: + +- S3 stores user-defined metadata keys in lowercase. +- The request header maximum size for user-defined metadata shall not exceed 2KB in size. +- The size of user-defined metadata is measured by taking the sum of the number of bytes in the UTF-8 encoding of each key and value. +- Avoid using characters outside the US-ASCII and UTF-8 standards for metadata values + +More details found here: [AWS: Working with object metadata](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingMetadata.html) + +### Tag + +S3 also supports the manipulation of tags on S3 objects. While tags are logically similar to metadata, S3 treats tags differently than metadata. The key behavior to understand with tags is that in S3, unlike metadata, tags can be modified without the need to create new versions of the object. As such, operations on tags can be ad-hoc manipulated without triggering the creation of a new version of the object. + +Other general key notes to consider when implementing user-defined tags are the following: + +- Only up to 10 tags may be associated with an object at a time. +- Tags that are associated with an object must have unique tag keys. +- A tag key can be up to 128 Unicode characters in length +- A tag value can be up to 256 Unicode characters in length. +- Keys and values are case sensitive. + +More details found here: [AWS: Categorizing your storage using tags](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-tagging.html) + +## Usage in COMS + +COMS for the most part follows the general patterns found in the S3 API. To this end, since metadata is handled in `x-amz-meta-*` headers, COMS does the same. Since tags do not have quite as well of a defined structure, COMS follows the spirit of the Tagging > Tagset structure by using a deepObject query model to define multiple key/value tagsets in the query (looks like `tagset[x]=a&tagset[y]=b`). In COMS, the pattern for interacting with metadata and tags will be consistent across the entire API, so you can expect metadata in headers, and tags in the query parameters. + +### General Operations + +For most general object operations, we recommend you to define your metadata and tags, should you need them, during the creation or uploading of the objects themselves. This is because the createObject and updateObject endpoints can both handle metadata and tags concurrently. By defining them during the creation and update stages, you can minimize the number of network calls needed to be done to COMS and the S3 endpoint. + +However, we do also support out-of-band metadata and tag manipulation with a set of PATCH, PUT and DELETE operands. These operations allow you to add, replace or delete metadata and tags respectively for a specific object. New object versions will be transparently generated if metadata is altered. As such, COMS is capable of allowing the full lifecycle of metadata and tag management at any point in time. + +### Search + +One of the most powerful features of COMS is its dynamic searchObjects endpoint. Using its database, It is capable of searching metadata and tags. + +This search works using a set intersection model; you can be as specific or broad with your search parameters, and the search endpoint will happily do so. + +For example, to search objects with the following criteria... + +* **Metadata:** `foo=bar`, `baz=bam` + +* **Tags:** `x=a`, `y=b` + +...add the following to the API call: + +* **Headers:** `x-amz-meta-foo=bar` , `x-amz-meta-baz=bam` + +* **Query parameters:** `tagset[x]=a&tagset[y]=b` + +The response will be a list of objects that have all of the specified metadata and tags (as expected for a set intersection). + +The search endpoint also allows you to search objects with a specific key without a corresponding value. For example, searching for objects with the metadata `foo` (i.e. `x-amz-meta-foo=""`) and tag `x` (i.e. `tagset[x]`) will return objects that have the specified metadata and tags, regardless of what the corresponding values are. + +These metadata and tag selectors can also be combined with other supported query parameters for [the search query endpoint](https://coms-dev.api.gov.bc.ca/api/v1/docs#tag/Object/operation/searchObjects). + +Search results can also be scoped to a current user's permissions by enabling the COMS `PrivacyMask` [Privacy Configuration](Configuration#privacy-controls). diff --git a/docs/Permissions.md b/docs/Permissions.md new file mode 100644 index 0000000..91a0fbb --- /dev/null +++ b/docs/Permissions.md @@ -0,0 +1,180 @@ +This page outlines the general design used for managing User Access Control to S3 Objects. This page targets users and developers who are planning on implementing and leveraging COMS. + +## Table of Contents + +- [Overview](#overview) +- [Access Model](#access-model) + - [Endpoints](#endpoints) +- [Permission Codes](#permission-codes) + - [Bucket/Object Inheritance](#bucketobject-inheritance) + - [Examples](#examples) + - [Response Scope Expansion](#response-scope-expansion) + - [Mode Considerations](#mode-considerations) +- [Overrides](#overrides) + - [Public](#public) + +## Overview + +One of the core features of COMS is its focus on leveraging your specified Identity and Access Management (IAM) provider to manage access control and permissions to your resources. Secured access bucket and object resources are enforced when COMS is running in either OIDCAUTH or FULLAUTH mode. There are several notable nuances to how COMS leverages these permissions that we will discuss in further depth below. + +## Access Model + +COMS leverages a Discretional Access Control (DAC) model for granting access to and sharing buckets and objects. This model is used to to maximize the ability for users and clients to be able to choose at will who they wish to share their resources with. The primary benefits of the DAC model are: + +1. Simplicity - as long as a user has a permission attached to the resource, they will be able to access the resource. +1. Flexibility - decentralized access control management, allowing resource owners to grant and revoke access to their objects at will without the overhead of going through a chain of command. +1. Granularity - the data owner is able to add or remove access permissions based on individual needs and concerns. + +The key thing to take from COMS access control model is its decentralized design. The original creator of the resource will have general ownership rights to share and distribute their objects at will. + +### Endpoints + +There are a suite of endpoints under the `/permission` path available for users to be able to interact with the COMS permission system. These endpoints directly focus on the following goals: + +- List and search for all resources that a user has explicit or implicit permissions to +- Create, update and delete permission bindings for users to bucket or object resource + +Permission operation endpoints directly focus on associating users to resources with specific permissions. Existing permissions can be searched for using `GET /permission/`, and standard create, read and delete operations for permissions exist to allow users to modify access control for specific resources they have management permissions over. + +Any authorized user will be able to query for the current permission states to determine whether they have access to certain resources. However, only users that have the `MANAGE` permission for their associated resources will be able to modify, grant and revoke permissions to their specific resource at their discretion. + +## Permission Codes + +COMS DAC model contains 5 discrete permission codes. Each of the codes represents a different set of permissions and actions that are allowed to be performed on the resource. For the most part, the permissions follow general CRUD principles and should be relatively self-explanatory. + +| PermCode | Permission | Description | +| --- | --- | --- | +| `CREATE` | Create | Grants resource creation permission. Normally only the owner will have this permission assigned. | +| `READ` | Read | Grants resource read permission. Ignored when in public mode for only objects. | +| `UPDATE` | Update | Grants resource update permission. Allows user to upload a new version and/or edit metadata/tags for the object, or to edit bucket details. | +| `DELETE` | Delete | Grants resource deletion permission. Allows user to delete objects and versions. | +| `MANAGE` | Manage | Grants resource permission management. Allows the user to add/remove these permissions to other users. | + +Note that should you have the `MANAGE` permcode, it is possible for you to delete and lock yourself out of your own resources if you are not careful! COMS does not provide any safeguards for accidental lockouts when authenticating as a user. In the event this occurs, you must contact the custodian of your COMS instance to restore your permissions on the affected resources. + +### Bucket/Object Inheritance + +As of COMS v0.4.0, there is now a multi-leveled permission system relating to both buckets as well as objects. While the original core concepts of the DAC model still apply, there are a few key points of note: + +- Permission grants and revocations focus on binding a user to a specific resource with a specific permission code. +- Objects reside in specific buckets. As such, permissions on objects can be are now computed based on whether you have permissions in either the bucket or the object. +- You will be able to expand the response scope to also include inherited permissions by either adding in the `bucketPerms` or `objectPerms` query parameter on the respective endpoints + +#### Examples + +While the following examples are non-exhaustive, they hopefully provide a general idea of how permission transitivity applies in COMS. + +Suppose Alice wishes to update object O which resides in bucket B. For this to happen, one of the following must be true: + +1. Alice must have the `UPDATE` permission for Object O. +2. Alice must have the `UPDATE` permission for Bucket B. As object O resides in bucket B, the permission "cascades" to the object. + +Suppose Alice wishes to manage object O which resides in bucket B. For this to happen, one of the following must be true: + +1. Alice must have the `MANAGE` permission for Object O. +2. Alice must have the `MANAGE` permission for Bucket B. As object O resides in bucket B, the permission "cascades" to the object. + +Suppose Alice wishes to read bucket B. For this to happen, one of the following must be true: + +1. Alice must have the `READ` permission for Bucket B. +2. Alice has at least one permission binding with object O which resides in Bucket B. This will be visible by using the `objectPerms` query param. In this scenario, Alice does not have the ability to read the bucket, but they will be able to know it exists. + +Suppose Alice wishes to list all buckets they have access to. For this to happen, one of the following conditions must be true: + +1. Alice shall know about bucket B when they have at least one permission relation binding Alice with the bucket in question. +2. Alice shall know about bucket B when they have at least one or more objects O residing in bucket B. At least one or more permissions must bind Alice with object O. This will be visible by using the `objectPerms` query param. + +#### Response Scope Expansion + +There will be situations where you will want to expand the scope of your permission search to also include implicitly accessible resources. This can be done by adding either the `objectPerms` or `bucketPerms` query parameters to your API call. For example: + +`GET /permission/bucket?userId=2d7f3e23-4643-47dc-b4b8-451c0844251e&objectPerms=true` + +```json +[ + { + "bucketId": "13e4e09b-5f79-48ab-985e-e4dc753a8b6a", + "permissions": [ + { + "id": "9fae9b19-6db3-40f2-b644-53a6c3fa87a6", + "bucketId": "13e4e09b-5f79-48ab-985e-e4dc753a8b6a", + "userId": "2d7f3e23-4643-47dc-b4b8-451c0844251e", + "permCode": "CREATE", + "createdBy": "2d7f3e23-4643-47dc-b4b8-451c0844251e", + "createdAt": "2022-08-24T23:00:29.806Z", + "updatedBy": null, + "updatedAt": "2022-08-24T23:00:29.756Z" + }, + { + "id": "ce80040d-eb44-4170-8aea-364db8cab74a", + "bucketId": "13e4e09b-5f79-48ab-985e-e4dc753a8b6a", + "userId": "2d7f3e23-4643-47dc-b4b8-451c0844251e", + "permCode": "READ", + "createdBy": "2d7f3e23-4643-47dc-b4b8-451c0844251e", + "createdAt": "2022-08-24T23:00:29.806Z", + "updatedBy": null, + "updatedAt": "2022-08-24T23:00:29.756Z" + } + ] + }, + { + "bucketId": "ce602214-8da4-48a2-a994-877e0415ea64", + "permissions": [] + } +] +``` + +`GET /permission/object?userId=2d7f3e23-4643-47dc-b4b8-451c0844251e&bucketPerms=true` + +```json +[ + { + "objectId": "13e4e09b-5f79-48ab-985e-e4dc753a8b6a", + "permissions": [ + { + "id": "9fae9b19-6db3-40f2-b644-53a6c3fa87a6", + "objectId": "13e4e09b-5f79-48ab-985e-e4dc753a8b6a", + "userId": "2d7f3e23-4643-47dc-b4b8-451c0844251e", + "permCode": "CREATE", + "createdBy": "2d7f3e23-4643-47dc-b4b8-451c0844251e", + "createdAt": "2022-08-24T23:00:29.806Z", + "updatedBy": null, + "updatedAt": "2022-08-24T23:00:29.756Z" + }, + { + "id": "ce80040d-eb44-4170-8aea-364db8cab74a", + "objectId": "13e4e09b-5f79-48ab-985e-e4dc753a8b6a", + "userId": "2d7f3e23-4643-47dc-b4b8-451c0844251e", + "permCode": "READ", + "createdBy": "2d7f3e23-4643-47dc-b4b8-451c0844251e", + "createdAt": "2022-08-24T23:00:29.806Z", + "updatedBy": null, + "updatedAt": "2022-08-24T23:00:29.756Z" + } + ] + }, + { + "objectId": "ce602214-8da4-48a2-a994-877e0415ea64", + "permissions": [] + } +] +``` + +The key thing to note with these scope expanded responses is that you gain the knowledge of a broader list of objects or buckets. However, as they are implicit inferences, they will not have any explicit permission objects in their respective arrays. For more details and clarification, reference the COMS OpenAPI specification which can be found under the `/api/v1/docs` path of your respective COMS instance. + +### Mode Considerations + +The above permission system will only be enforced if your instance of COMS is running in either `OIDCAUTH` or `FULLAUTH`. COMS will also require a database as it needs to have a way of persisting permission information. However, the following modes will have alternate behaviors: + +- Both `NOAUTH` and `BASICAUTH` modes will completely ignore permissions as they are not in scope of permission and security enforcement. This applies whether there is a backing database or not. +- While running in `FULLAUTH` mode, if the client authenticates with a Basic authorization header, permissions are ignored as basic auth behaves as a system superuser and has "sudo" permissions to the COMS system. This applies whether there is a backing database or not. + +For more specific information on COMS deployment modes and how they differ, please take a look at the COMS [Configuration guide](Configuration#authentication-modes). + +## Overrides + +While the COMS DAC model is generally precise about what users are able to do, there are specific escape hatches and situations where the DAC is superceded or ignored by COMS for a different ruleset. This is done because while the DAC is rich and capable of expressing many access control needs, there are situations where the user may not be known in advance, and as such binding a permission to the potential user is not possible. + +### Public + +The simplest way for a data owner to share their files is to set your object as public through the `PATCH /object/:objId/public` endpoint. When an object is public, any anonymous user or entity will be able to read and download that specific object. An object with the public flag ignores all the granular DAC permission codes for the read object operation. diff --git a/docs/Product-Roadmap.md b/docs/Product-Roadmap.md new file mode 100644 index 0000000..921bc78 --- /dev/null +++ b/docs/Product-Roadmap.md @@ -0,0 +1,107 @@ +Below is a rough outline of the features that we are targeting for COMS. + +## v0.1.0 - Minimum Viable Product (MVP) + +### General + +* [x] General Documentation +* [x] Database action auditing +* [x] Over 50% coverage in unit tests + +### Authentication + +* [x] Multiple authentication modes + * [x] Unauthenticated + * [x] Basic + * [x] OIDC + * [x] Full (Basic and OIDC) +* [x] Support multiple identity providers (eg: IDIR, BCeID) + +### Object Operations + +* [x] Upload multiple objects to storage +* [x] Expiring object download links +* [x] Object versioning and history + +### Permission Management + +* [x] Share objects +* [x] Search for OIDC users +* [x] Toggle object for public access +* [x] Update and manage object permissions + +## v0.2.0 - Version Tracking + +* [x] Explicit version management +* [x] Soft-delete objects +* [x] Enhance validation layer support + +## v0.3.0 - Metadata & Tagging + +* [x] Object metadata +* [x] Object tagging +* [x] Object discovery via metadata/tagging + +## v0.4.0 - Multi-tenant Bucket support + +* [x] Multi-bucket support +* [x] Database refactor +* [x] Permission cascade/inheritance refactor +* [x] API shape extensions + +## v0.4.1 - Reliability Pass + +* [x] Improve architectural documentation +* [x] Refactor objection model mock pattern; improve unit test coverage +* [x] Validate bucket credentials on add and allow folder key to be optional +* [x] Performance improvements + +## v0.4.2 - Domain Scoped Metadata Keys + +* [x] Change mandatory metadata keys to be domain scoped + +## v0.5.0 - Filename Support + +* [x] Support arbitrary filenames for S3 Objects +* [x] Add First Nations glyph support to filenames +* [x] Track and enforce `coms-id` as an S3 tag +* [x] Deprecate `coms-id` and `coms-name` S3 metadata enforcement +* [x] Add tracking support for version-specific S3 ETags + +## v0.6.0 - File Transit + +* [x] Add new PUT endpoints for uploading files +* [x] Extend filesize limit from 50GB to 5TB +* [x] Add more consistent error responses for upload failure situations + +## v0.7.0 - Synchronization + +* [x] Bucket and object synchronization support + * [x] Add retroactive backfill for existing S3 buckets and directories + * [x] Retroactive bucket and object database population + * [x] Implement merge-conflict logic for data collision scenarios + * [x] Add global synchronization support for `coms-id` tags + * [x] Add proactive `coms-id` tag annotation support for objects + * [x] Add status probe endpoint for synchronization +* [x] Fix request timeouts from large file uploads +* [x] Remove metadata key/value length constraints +* [x] RFC 7807 error reporting compliance + +## v0.8.0 - Invite Links, Pagination and S3 Public synchronization + +* [x] Add paginated object search support +* [x] Add public permission tracking support from S3 Object ACL endpoints +* [x] Track S3 last modified date on versions +* [x] Add bucket last synchronized date and object last synced date +* [x] Add children bucket creation support +* [x] Improve environment variable support for recognizing truthy string representations +* [x] Add invite link deferred permission grant support (READ) +* [x] Security improvements and various bugfixes + +## v0.9.0 - Continuous Improvement + +* [ ] Add additional permissions to invite link + +## TBD - Feature ideas only - subject to further feedback + +* [ ] TBD \ No newline at end of file diff --git a/docs/Self-Hosting-COMS.md b/docs/Self-Hosting-COMS.md new file mode 100644 index 0000000..d8a0913 --- /dev/null +++ b/docs/Self-Hosting-COMS.md @@ -0,0 +1,29 @@ + +To compare with features with the BC Gov Hosted Service, see the [Hosting Considerations](Hosting-Considerations) page. + +## Reasons to self-host + +- There's a [Docker image](https://hub.docker.com/r/bcgovimages/common-object-management-service/) and [Helm chart](https://github.com/bcgov/common-object-management-service/blob/master/charts/coms/Chart.yaml) to help deploy COMS on OpenShift. +- Your application uses a custom OIDC realm or has custom integration requirements with other IDPs. +- You just need a user-friendly, REST-based S3 client 'wrapper'. +- You can configure COMS to suit your needs: + - Refer to the different [Authentication Modes](Configuration#authentication-modes) + - Use the default S3 bucket to use for all operations + - Disable the strict [Privacy Controls](Configuration#privacy-controls) to make object metadata searchable +- You want to modify COMS source code before running (it's a REST API built with NodeJS and Express) +- You want to be the custodians of the COMS database that contains user permissions and document metadata + +## Getting started + +To run COMS on your local computer, see the following:: + +- [Application README](https://github.com/bcgov/common-object-management-service/blob/master/app/README.md) +- [Docker Image](https://hub.docker.com/r/bcgovimages/common-object-management-service/) +- [GitHub repo](https://github.com/bcgov/common-object-management-service/) +- [API Specifiction](https://coms.api.gov.bc.ca/api/v1/docs) + +## Contact us to find out more + +COMS is developed by the [Common Services Team](https://bcgov.github.io/common-service-showcase/).
+Email:
+Community help: [Rocket.Chat](https://chat.developer.gov.bc.ca/channel/nr-common-services-showcase) diff --git a/docs/Synchronization.md b/docs/Synchronization.md new file mode 100644 index 0000000..89697d7 --- /dev/null +++ b/docs/Synchronization.md @@ -0,0 +1,27 @@ +It is possible to directly modify the contents of a COMS-managed S3 bucket without going through COMS itself, as long as the user has the correct S3 bucket credentials. + +However, as there is no mechanism for S3 to directly notify COMS of any changes, this can lead to a discrepancy between what's actually in the S3 bucket and what COMS thinks is in the bucket. + +To avoid this, COMS can be told to **synchronize** a bucket, where it looks at a S3 bucket it manages and updates the entries in its database, matching what's actually in the bucket. + +## The synchronization process + +Clients can trigger the sync process through a [set of dedicated API endpoints](https://coms.api.gov.bc.ca/api/v1/docs#tag/Sync). + +When these endpoints are called, COMS checks both the S3 bucket and its database for a list of objects in the bucket, merges them into a single list without duplicates, and enqueues each resulting object (or **job**) into a shared queue in the COMS database, before returning the number of objects enqueued. + +The actual synchronization is handled by a separate sync service, which polls the database queue for new jobs every 10 seconds. Once a job is picked up, it compares the corresponding object's state in both S3 and COMS; in particular, it looks at whether it exists in either S3 or COMS (that is, whether it's a new or deleted file), as well as its tags and metadata, and updates the database accordingly. + +![COMS sync flow](images/coms_sync_flow.png) +**Figure 1 - an illustration of the sync process** + +### The `queueManager` service + +The actual sync work is performed by the `queueManager` (labeled "Sync service" in the sequence diagram above), which is on a thread separate from the COMS API. + +Every 10 seconds, it polls the queue, which is implemented as a database table named `object_queue`. If the queue is empty, it goes back to waiting for another ten seconds. + +If the queue is not empty, it grabs a job from it, and performs the sync process on the associated file. Once it completes that process, it checks the queue for another job to process. If the queue is empty, it goes back to waiting for another 10 seconds before polling the queue again. + +![queueManager state](images/queue_manager_state.png) +**Figure 2 - an illustration of `queueManager`'s possible states, as it polls the queue and performs sync jobs.** diff --git a/docs/Testing.md b/docs/Testing.md new file mode 100644 index 0000000..dad7058 --- /dev/null +++ b/docs/Testing.md @@ -0,0 +1,492 @@ +In the future, regression may possibly be automated, but in the meantime, here's a list of test cases to run through. + +## Synchronization + +- [Buckets with versioning](#buckets-with-versioning) + - [Sync a particular bucket (versioned)](#sync-a-particular-bucket-versioned) + - 🟢 Test case: create COMS bucket for S3 bucket with existing objects + - 🟢 Test case: directly uploading files via S3 + - [Sync the default bucket (versioned)](#sync-the-default-bucket-versioned) + - 🟢 Test case: upload some files to the default bucket + - [Syncing tags (versioned)](#syncing-tags-versioned) + - 🟢 Test case: adding tags directly via S3 + - 🟢 Test case: deleting tags directly via S3 + - 🟢 Test case: reuse `objectId` from existing `coms-id` tag + - 🟢 Test case: deleting the `coms-id` tag + - 🟢 Test case: syncing objects with 10 tags + - [Sync a particular object (versioned)](#sync-a-particular-object-versioned) + - 🟢 Test case: update existing object + - 🟢 Test case: sync object with no new changes + - 🟢 Test case: soft-deleting objects directly via S3 + - 🟢 Test case: hard-deleting objects directly via S3 + - 🟢 Test case: undoing a soft deletion directly via S3 +- [Buckets without versioning](#buckets-without-versioning) + - [Sync a particular bucket (unversioned)](#sync-a-particular-bucket-unversioned) + - 🟢 Test case: create COMS bucket for S3 bucket with existing objects + - 🟢 Test case: directly uploading files via S3 + - [Sync the default bucket (unversioned)](#sync-the-default-bucket-unversioned) + - 🟢 Test case: upload some files to the default bucket + - [Syncing tags (unversioned)](#syncing-tags-unversioned) + - 🟢 Test case: adding tags directly via S3 + - 🟢 Test case: deleting tags directly via S3 + - 🟢 Test case: reuse `objectId` from existing `coms-id` tag + - 🟢 Test case: deleting the `coms-id` tag + - 🟢 Test case: syncing objects with 10 tags + - [Sync a particular object (unversioned)](#sync-a-particular-object-unversioned) + - 🟢 Test case: update existing object + - 🟢 Test case: sync object with no new changes + - 🟢 Test case: deleting an object + +Tests should be run on all of the following types of S3 buckets: + +- Buckets with versioning enabled +- Buckets with versioning disabled +- NRM object storage service (Dell ECS, but mostly S3-compliant) + - Both versioned and unversioned buckets + +### Buckets with versioning + +#### Sync a particular bucket (versioned) + +Sync a particular bucket so that COMS tracks all of the files in the corresponding S3 bucket. + +##### 📝 Preconditions + +- An existing S3 bucket is available + +##### 🟢 Test case: create COMS bucket for S3 bucket with existing objects + +1. Upload some files directly to an S3 bucket +2. Create a COMS bucket (i.e. `PUT /bucket`) for said S3 bucket and save the `bucketId` +3. `GET /bucket/:bucketId/sync` +4. Call `GET /sync/status` every few seconds, until it returns `0` + +- On the first call, assert that it returns `{number of files uploaded via S3}` +- Continue to call the endpoint every few seconds until returns `0` + +###### Pass criteria + +- `{number of files in COMS bucket} === {number of files in S3}` +- `{names of files in COMS bucket} === {names of files in S3}` +- Files marked as soft-deleted in S3 are tracked and marked as such by COMS + +##### 🟢 Test case: directly uploading files via S3 + +1. Create a COMS bucket (i.e. `PUT /bucket`) and save the `bucketId` +2. Upload some files directly via S3 +3. `GET /bucket/:bucketId/sync` +4. Call `GET /sync/status` every few seconds, until it returns `0` + +- On the first call, assert that it returns `{number of files uploaded via S3}` +- Continue to call the endpoint every few seconds until returns `0` + +###### Pass criteria + +- `{number of files in COMS bucket} === {number of files in S3}` +- `{names of files in COMS bucket} === {names of files in S3}` + +#### Sync the default bucket (versioned) + +Sync the default bucket so that COMS tracks all of the files in the corresponding S3 bucket. + +##### 📝 Preconditions + +- COMS has been configured with a default S3 bucket + +##### 🟢 Test case: upload some files to the default bucket + +1. Upload some files directly via S3 +2. `GET /sync` +3. Call `GET /sync/status` every few seconds, until it returns `0` + +- On the first call, assert that it returns `{number of files uploaded via S3}` +- Continue to call the endpoint every few seconds until returns `0` + +###### Pass criteria + +- `{number of files in default COMS bucket} === {number of files in S3}` +- `{names of files in default COMS bucket} === {names of files in S3}` + +#### Syncing tags (versioned) + +Sync a bucket and ensure that any changes to object tags made externally are correctly synced to COMS. + +##### 📝 Preconditions + +- An existing S3 bucket is available and is being managed by COMS + +##### 🟢 Test case: adding tags directly via S3 + +1. Upload file to bucket via COMS API and save the `objectId` +2. Add a tag directly via S3 +3. Call `GET /sync/status` every few seconds, until it returns `0` + +- On the first call, assert that it returns `{number of files uploaded via S3}` +- Continue to call the endpoint every few seconds until returns `0` + +4. `GET /object/tagging?objectId={objectId}` + +###### Pass criteria + +- COMS tags for the object are the same as its S3 object tags + +##### 🟢 Test case: deleting tags directly via S3 + +1. Upload file to bucket via COMS API with some tags(i.e. `PUT /object?tagset[{key}]={value}`) and save the `objectId` +2. Delete a tag directly via S3 +3. Call `GET /sync/status` every few seconds, until it returns `0` + +- On the first call, assert that it returns `{number of files uploaded via S3}` +- Continue to call the endpoint every few seconds until returns `0` + +4. `GET /object/tagging?objectId={objectId}` + +###### Pass criteria + +- COMS tags for the object are the same as its S3 object tags + +##### 🟢 Test case: reuse `objectId` from existing `coms-id` tag + +1. Upload file to bucket via the COMS API (i.e. `PUT /object`) +2. Delete the table entry for the corresponding object in the COMS database (i.e. `DELETE FROM object WHERE id={objectId}`) +3. `GET /bucket/:bucketId/sync` +4. Call `GET /sync/status` every few seconds, until it returns `0` + +- On the first call, assert that it returns `{number of files uploaded via S3}` +- Continue to call the endpoint every few seconds until returns `0` + +###### Pass criteria + +- The `coms-id` S3 tag is unchanged before/after deleting the object from the COMS `object` table +- `objectId` should equal the `coms-id` S3 tag + +##### 🟢 Test case: deleting the `coms-id` tag + +1. Upload file to bucket via COMS API and save the `objectId` +2. Delete the `coms-id` tag directly via S3 +3. Call `GET /sync/status` every few seconds, until it returns `0` + +- On the first call, assert that it returns `{number of files uploaded via S3}` +- Continue to call the endpoint every few seconds until returns `0` + +4. `GET /object/tagging?objectId={objectId}` +5. `GET /object/:objectId/` + +###### Pass criteria + +- `coms-id` tag is present and has `{objectId}` as its value +- `GET /object/:objectId/` returns a HTTP 200 + +##### 🟢 Test case: syncing objects with 10 tags + +1. Upload a file to the bucket directly via S3. +2. Add 10 tags to the file uploaded via S3. +3. `GET /bucket/:bucketId/sync` +4. Call `GET /sync/status` every few seconds, until it returns `0` + +- On the first call, assert that it returns `{number of files uploaded via S3}` +- Continue to call the endpoint every few seconds until returns `0` + +###### Pass criteria + +- File is added to COMS with all its tags +- No `coms-id` tag is added + +#### Sync a particular object (versioned) + +Sync a particular object so that COMS tracks all of the changes in the corresponding S3 object. + +##### 📝 Preconditions + +- There is a COMS bucket that links to an existing **versioned** S3 bucket + +##### 🟢 Test case: update existing object + +1. Upload file via COMS API (i.e. `PUT /object`) +2. Overwrite just-uploaded file with a newer version directly via S3 +3. `GET /object/:objectId/sync` +4. Call `GET /sync/status` every few seconds + +- On the first call, assert that it returns `1` +- Continue to call the endpoint every few seconds until returns `0` + +5. `GET /object/:objectId/` + +###### Pass criteria + +- `{contents of file retrieved via COMS} === {contents of file uploaded via S3}` +- `{latest version ID of file on COMS} === {latest version ID of file on S3}` + +##### 🟢 Test case: sync object with no new changes + +1. Upload file via COMS API (i.e. `PUT /object`) +2. `GET /object/:objectId/sync` +3. Call `GET /sync/status` every few seconds + +- On the first call, assert that it returns `1` +- Continue to call the endpoint every few seconds until returns `0` + +4. `GET /object/:objectId/` + +###### Pass criteria + +- `GET /sync/status` returns `0` +- `{contents of file retrieved via COMS} === {contents of file uploaded via S3}` +- `{latest version ID of file on COMS} === {latest version ID of file on S3}` + +##### 🟢 Test case: soft-deleting objects directly via S3 + +1. Upload file via COMS API (i.e. `PUT /object`) +2. Soft-delete object directly via S3 +3. `GET /object/:objectId/sync` +4. Call `GET /sync/status` every few seconds, until it returns `0` + +- On the first call, assert that it returns `1` +- Continue to call the endpoint every few seconds until returns `0` + +5. `GET /object/:objectId/` +6. `GET /object/:objectId/version` + +###### Pass criteria + +- `{contents of file retrieved via COMS} === {contents of file uploaded via S3}` +- `{latest S3 version ID of file on COMS} === {latest version ID of file on S3}` + - The latest version of the COMS object is a delete marker; i.e. `isLatest === true && deleteMarker === true` + +##### 🟢 Test case: hard-deleting objects directly via S3 + +1. Upload file via COMS API (i.e. `PUT /object`) +2. Delete latest version of object directly via S3 +3. `GET /object/:objectId/sync` +4. Call `GET /sync/status` every few seconds, until it returns `0` + +- On the first call, assert that it returns `1` +- Continue to call the endpoint every few seconds until returns `0` + +5. `GET /object/:objectId/` + +###### Pass criteria + +- `GET /object/:objectId/` returns HTTP 404 + +##### 🟢 Test case: undoing a soft deletion directly via S3 + +1. Upload file via COMS API (i.e. `PUT /object`) +2. Soft-delete object via COMS API +3. Restore deleted object directly via S3 (i.e. by deleting the latest version, which should be a "delete marker" version) +4. `GET /object/:objectId/sync` +5. Call `GET /sync/status` every few seconds, until it returns `0` + +- On the first call, assert that it returns `1` +- Continue to call the endpoint every few seconds until returns `0` + +6. `GET /object/:objectId/` +7. `GET /object/:objectId/version` + +###### Pass criteria + +- `{content of file retrieved via COMS} === {contents of file version restored in S3}` +- `{latest version ID of file on COMS} === {latest version ID of file on S3}` + - The latest version of the COMS object is not a delete marker; i.e. `isLatest === true && deleteMarker === false + +### Buckets without versioning + +#### Sync a particular bucket (unversioned) + +Sync a particular bucket so that COMS tracks all of the files in the corresponding S3 bucket. + +##### 📝 Preconditions + +- An existing S3 bucket is available + +##### 🟢 Test case: create COMS bucket for S3 bucket with existing objects + +1. Upload some files directly to an S3 bucket +2. Create a COMS bucket (i.e. `PUT /bucket`) for said S3 bucket and save the `bucketId` +3. `GET /bucket/:bucketId/sync` +4. Call `GET /sync/status` every few seconds, until it returns `0` + +- On the first call, assert that it returns `{number of files uploaded via S3}` +- Continue to call the endpoint every few seconds until returns `0` + +###### Pass criteria + +- `{number of files in COMS bucket} === {number of files in S3}` +- `{names of files in COMS bucket} === {names of files in S3}` +- Files marked as soft-deleted in S3 are tracked and marked as such by COMS + +##### 🟢 Test case: directly uploading files via S3 + +1. Create a COMS bucket (i.e. `PUT /bucket`) and save the `bucketId` +2. Upload some files directly via S3 +3. `GET /bucket/:bucketId/sync` +4. Call `GET /sync/status` every few seconds, until it returns `0` + +- On the first call, assert that it returns `{number of files uploaded via S3}` +- Continue to call the endpoint every few seconds until returns `0` + +###### Pass criteria + +- `{number of files in COMS bucket} === {number of files in S3}` +- `{names of files in COMS bucket} === {names of files in S3}` + +#### Sync the default bucket (unversioned) + +Sync the default bucket so that COMS tracks all of the files in the corresponding S3 bucket. + +##### 📝 Preconditions + +- COMS has been configured with a default S3 bucket + +##### 🟢 Test case: upload some files to the default bucket + +1. Upload some files directly via S3 +2. `GET /sync` +3. Call `GET /sync/status` every few seconds, until it returns `0` + +- On the first call, assert that it returns `{number of files uploaded via S3}` +- Continue to call the endpoint every few seconds until returns `0` + +###### Pass criteria + +- `{number of files in default COMS bucket} === {number of files in S3}` +- `{names of files in default COMS bucket} === {names of files in S3}` + +#### Syncing tags (unversioned) + +Sync a bucket and ensure that any changes to object tags made externally are correctly synced to COMS. + +##### 📝 Preconditions + +- An existing S3 bucket is available and is being managed by COMS + +##### 🟢 Test case: adding tags directly via S3 + +1. Upload file to bucket via COMS API and save the `objectId` +2. Add a tag directly via S3 +3. Call `GET /sync/status` every few seconds, until it returns `0` + +- On the first call, assert that it returns `{number of files uploaded via S3}` +- Continue to call the endpoint every few seconds until returns `0` + +4. `GET /object/tagging?objectId={objectId}` + +###### Pass criteria + +- COMS tags for the object are the same as its S3 object tags + +##### 🟢 Test case: deleting tags directly via S3 + +1. Upload file to bucket via COMS API with some tags(i.e. `PUT /object?tagset[{key}]={value}`) and save the `objectId` +2. Delete a tag directly via S3 +3. Call `GET /sync/status` every few seconds, until it returns `0` + +- On the first call, assert that it returns `{number of files uploaded via S3}` +- Continue to call the endpoint every few seconds until returns `0` + +4. `GET /object/tagging?objectId={objectId}` + +###### Pass criteria + +- COMS tags for the object are the same as its S3 object tags + +##### 🟢 Test case: reuse `objectId` from existing `coms-id` tag + +1. Upload file to bucket via the COMS API (i.e. `PUT /object`) +2. Delete the table entry for the corresponding object in the COMS database (i.e. `DELETE FROM object WHERE id={objectId}`) +3. `GET /bucket/:bucketId/sync` +4. Call `GET /sync/status` every few seconds, until it returns `0` + +- On the first call, assert that it returns `{number of files uploaded via S3}` +- Continue to call the endpoint every few seconds until returns `0` + +###### Pass criteria + +- The `coms-id` S3 tag is unchanged before/after deleting the object from the COMS `object` table +- `objectId` should equal the `coms-id` S3 tag + +##### 🟢 Test case: deleting the `coms-id` tag + +1. Upload file to bucket via COMS API and save the `objectId` +2. Delete the `coms-id` tag directly via S3 +3. Call `GET /sync/status` every few seconds, until it returns `0` + +- On the first call, assert that it returns `{number of files uploaded via S3}` +- Continue to call the endpoint every few seconds until returns `0` + +4. `GET /object/tagging?objectId={objectId}` +5. `GET /object/:objectId/` + +###### Pass criteria + +- `coms-id` tag is present and has `{objectId}` as its value +- `GET /object/:objectId/` returns a HTTP 200 + +##### 🟢 Test case: syncing objects with 10 tags + +1. Upload a file to the bucket directly via S3. +2. Add 10 tags to the file uploaded via S3. +3. `GET /bucket/:bucketId/sync` +4. Call `GET /sync/status` every few seconds, until it returns `0` + +- On the first call, assert that it returns `{number of files uploaded via S3}` +- Continue to call the endpoint every few seconds until returns `0` + +###### Pass criteria + +- File is added to COMS with all its tags +- No `coms-id` tag is added + +#### Sync a particular object (unversioned) + +Sync a particular object so that COMS tracks all of the changes in the corresponding S3 object. + +##### 📝 Preconditions + +- There is a COMS bucket that links to an existing **unversioned** S3 bucket + +##### 🟢 Test case: update existing object + +1. Upload file via COMS API (i.e. `PUT /object`) +2. Overwrite just-uploaded file directly via S3 +3. `GET /object/:objectId/sync` +4. Call `GET /sync/status` every few seconds + +- On the first call, assert that it returns `1` +- Continue to call the endpoint every few seconds until returns `0` + +###### Pass criteria + +- `{contents of file retrieved via COMS} === {contents of file uploaded via S3}` + +##### 🟢 Test case: sync object with no new changes + +1. Upload file via COMS API (i.e. `PUT /object`) +2. `GET /object/:objectId/sync` +3. Call `GET /sync/status` every few seconds + +- On the first call, assert that it returns `1` +- Continue to call the endpoint every few seconds until returns `0` + +4. `GET /object/:objectId/` + +###### Pass criteria + +- `GET /sync/status` returns `0` +- `{contents of file retrieved via COMS} === {contents of file uploaded via S3}` + +##### 🟢 Test case: deleting an object + +1. Upload file via COMS API (i.e. `PUT /object`) +2. Delete file directly via S3 +3. `GET /object/:objectId/sync` +4. Call `GET /sync/status` every few seconds, until it returns `0` + +- On the first call, assert that it returns `1` +- Continue to call the endpoint every few seconds until returns `0` + +5. `GET /object/:objectId/` + +###### Pass criteria + +- `GET /object/:objectId/` returns HTTP 404 diff --git a/docs/Use-Case-Examples.md b/docs/Use-Case-Examples.md new file mode 100644 index 0000000..780a79d --- /dev/null +++ b/docs/Use-Case-Examples.md @@ -0,0 +1,35 @@ +### Common COMS API use-cases + +- Users of your web application submitting documents to BC Government employees. +- Uploading files and making them publicly available to download in a browser. +- Sharing files between an authorized group of users. +- Integration with BCBox allows users to find their files in our hosted user-interface. + +### API Usage Patterns + +The following steps describe how a document management interface in your application could potentially leverage COMS: + +#### Uploading a file + +1. `User A` logs in to your application via an OIDC Authentication realm; the user's JWT is stored in the browser's cache. +2. The user fills in a web form on your website attaching a document from their computer (a common user experience). +3. When the user submits this web form, the client application sends a HTTP multipart/form-data POST to the [Create Object](https://coms.api.gov.bc.ca/api/v1/docs#tag/Object/operation/createObjects) endpoint (attaching the User A's JWT in an Authorization header). +4. COMS will do the following + - Inspect the JWT and create a user record in the COMS database + - Pass the file to an S3 bucket + - Grant all PERMISSIONS to that user + +#### Sharing a file + +5. Client application can do the following: + - Call the COMS [User Search](https://coms.api.gov.bc.ca/api/v1/docs#tag/User/operation/searchUsers) endpoint to return a list of matching users in the COMS database. + - Call the [Add Permission](https://coms.api.gov.bc.ca/api/v1/docs#tag/Permission/operation/objectAddPermissions) endpoint to grant a Government employee READ permission on the file. + +#### Downloading a file + +6. `User B` logs in to the client application. +7. Client application makes request to [Read Object](https://coms.api.gov.bc.ca/api/v1/docs#tag/Object/operation/readObject) endpoint (with User B's JWT in an Authorization header) + - COMS will verify User B using the JWT and look up permissions for the file in the COMS database + - COMS will then respond with a redirect to a pre-signed url to the source object in the storage server or allow direct download via proxy. Read the section on [OIDC AUthentication](Authentication#authentication-flow-for-readobject) for more details. + +For full implementation details of the COMS API, visit the [BCBox](https://github.com/bcgov/bcbox) repository. diff --git a/docs/about-us.md b/docs/about-us.md deleted file mode 100644 index ae4f37c..0000000 --- a/docs/about-us.md +++ /dev/null @@ -1,7 +0,0 @@ -# About us - -## Suggested Content - -Introduce your team or project for a more in-depth understanding for your users - - diff --git a/docs/configuration.md b/docs/configuration.md index 3ed2ced..0373ace 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -1,7 +1,130 @@ # Configuration -## Suggested Content +This page outlines the general deployment decisions you will need to consider before standing up COMS and is mainly intended for a technical audience, and for people who want to have a better understanding of how the system features interact with each other. For instructions on running COMS, please refer to our [Application README](https://github.com/bcgov/common-object-management-service/blob/master/app/README.md). -* Explain configuration/customization requirements, if necessary -* Guide on tailoring the service to specific needs -* Provide instructions for integrating with other systems if needed \ No newline at end of file + + - [Object Storage](#object-storage) + - [Authentication Modes](#authentication-modes) + - [Bucket Credentials Encryption](#bucket-credential-encryption) + - [Privacy Controls](#privacy-controls) + + +The configuration of COMS is done using the NodeJS [config](https://www.npmjs.com/package/config) library. +environment variables for the COMS application are listed [here](https://raw.githubusercontent.com/bcgov/common-object-management-service/master/app/config/custom-environment-variables.json). These variables can be created in each deployment environment. In this page we explain these configuration options: + +**Note:** Some features are enabled using `enabled: "true"`. To disable the feature, omit this line (or environment variable) entirely from your config. + +## Object Storage + +This group of variables define a **default** object storage location and bucket. A default is required and provides the scope for various COMS endpoints where bucketId is optional. If no bucketId parameter is passed, the default bucket will be used. + +```sh +"objectStorage": { + "accessKeyId": "", # eg: The ECS Object User or IAM ID + "bucket": "" # eg: climatedocs, + "defaultTempExpiresIn": "