Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DS-4301] Added Content Reports section and Filtered Collections report therein #202

Merged
merged 25 commits into from
Feb 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
177457e
Added a missing link and deleted a duplicate entry
Sep 13, 2022
1d35992
Added the Filtered Collections report spec
Sep 13, 2022
022db3c
Merge branch 'DSpace:main' into main
jeffmorin Nov 30, 2022
7a51061
Filtered Items report
Dec 2, 2022
9e094d3
Merge branch 'main' of github.com:jeffmorin/RestContract
Dec 2, 2022
77bcc5e
JSON and content fixes
Dec 2, 2022
3470e30
JSON fix
Dec 2, 2022
b8a1274
JSON fix
Dec 2, 2022
1527355
Improved API documentation
Jan 9, 2023
8bad9b3
Changed contentreports category to contentreport
Jan 11, 2023
a1a13dd
Merge branch 'DSpace:main' into main
jeffmorin Feb 15, 2023
3cc4598
Merge branch 'DSpace:main' into main
jeffmorin Mar 1, 2023
5455187
Merge branch 'DSpace:main' into main
jeffmorin Apr 19, 2023
41e9ac3
Added GET endpoint to Filtered Items report
Apr 19, 2023
001a342
Merge branch 'DSpace:main' into main
jeffmorin May 1, 2023
b6af0ae
Merge branch 'DSpace:main' into main
jeffmorin May 25, 2023
26648ee
Merge branch 'DSpace:main' into main
jeffmorin Nov 20, 2023
718ca8a
Updated to latest version from main branch
Nov 21, 2023
a552cb8
Merge branch 'DSpace:main' into main
jeffmorin Dec 18, 2023
6386f46
Merge branch 'DSpace:main' into main
jeffmorin Feb 12, 2024
768e541
Merge branch 'DSpace:main' into main
jeffmorin Feb 20, 2024
487f59b
Merge branch 'DSpace:main' into main
jeffmorin Feb 22, 2024
014af1b
Added beta feature warning in both Content Report pages
Feb 22, 2024
9b2bdca
Removed POST endpoints from documentation
Feb 27, 2024
1d4a1ff
Merge branch 'DSpace:main' into main
jeffmorin Feb 27, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
149 changes: 149 additions & 0 deletions contentreport-filteredcollections.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
# Filtered Collections report
[Back to the list of all defined endpoints](endpoints.md)

This endpoint provides aggregated statistics about the number of items per collection according to selected filters.

NOTE: This is currently a beta feature.


**GET /api/contentreport/filteredcollections**
tdonohue marked this conversation as resolved.
Show resolved Hide resolved

The endpoint takes a `filters` query parameter whose value is a comma-separated list of filters
like the following:
```
?filters=is_discoverable,has_multiple_originals,has_pdf_original
```

Alternatively, the comma-separated list can be replaced by a repetition of the `filters` parameter
for each requested filter:
```
?filters=is_discoverable&filter=has_multiple_originals&filter=has_pdf_original
```


Please see [below](#available-filters) for the list of available filters.

## Report contents

For each collection, the basic report consists of:
* name (label) and handle of the collection
* name (label) and handle of the parent community
* total number of items
* number of items matching all selected filters

In addition, a `summary` element provides the total number of items and the total number of items matching all filters
for the whole repository.

An example JSON response document to `/api/contentreport/filteredcollections`:
```json
{
"id": "filteredcollections",
"collections": [
{
"label": "Collection 1",
"handle": "100/1",
"values": {
"is_discoverable": 23,
"has_multiple_originals": 3,
"has_pdf_original": 14
},
"community_label": "Community 1",
"community_handle": "20.500.11794/1",
"nb_total_items": 23,
"all_filters_value": 3
},
{
"label": "Collection 2",
"handle": "100/2",
"values": {
"is_discoverable": 1,
"has_multiple_originals": 0,
"has_pdf_original": 0
},
"community_label": "Community 1",
"community_handle": "20.500.11794/1",
"nb_total_items": 1,
"all_filters_value": 0
},
{
"label": "Collection 3",
"handle": "100/3",
"values": {
"is_discoverable": 1,
"has_multiple_originals": 0,
"has_pdf_original": 1
},
"community_label": "Community 1",
"community_handle": "20.500.11794/1",
"nb_total_items": 1,
"all_filters_value": 0
}
],
"summary": {
"label": null,
"handle": null,
"values": {
"is_discoverable": 25,
"has_multiple_originals": 3,
"has_pdf_original": 15
},
"community_label": null,
"community_handle": null,
"nb_total_items": 25,
"all_filters_value": 3
},
"type": "filtered-collections",
"_links": {
"self": {
"href": "http://localhost:8080/dspace-server/api/contentreport/filtered-collections"
}
}
}
```

## Available filters

The available filters are as follows:

* Item Property Filters
* `is_item`: Is Item - always true
* `is_withdrawn`: Withdrawn Items
* `is_not_withdrawn`: Available Items - Not Withdrawn
* `is_discoverable`: Discoverable Items - Not Private
* `is_not_discoverable`: Not Discoverable - Private Item
* Basic Bitstream Filters
* `has_multiple_originals`: Item has Multiple Original Bitstreams
* `has_no_originals`: Item has No Original Bitstreams
* `has_one_original`: Item has One Original Bitstream
* Bitstream Filters by MIME Type
* `has_doc_original`: Item has a Doc Original Bitstream (PDF, Office, Text, HTML, XML, etc)
* `has_image_original`: Item has an Image Original Bitstream
* `has_unsupp_type`: Has Other Bitstream Types (not Doc or Image)
* `has_mixed_original`: Item has multiple types of Original Bitstreams (Doc, Image, Other)
* `has_pdf_original`: Item has a PDF Original Bitstream
* `has_jpg_original`: Item has JPG Original Bitstream
* `has_small_pdf`: Has unusually small PDF
* `has_large_pdf`: Has unusually large PDF
* `has_doc_without_text`: Has document bitstream without TEXT item
* Supported MIME Type Filters
* `has_only_supp_image_type`: Item Image Bitstreams are Supported
* `has_unsupp_image_type`: Item has Image Bitstream that is Unsupported
* `has_only_supp_doc_type`: Item Document Bitstreams are Supported
* `has_unsupp_doc_type`: Item has Document Bitstream that is Unsupported
* Bitstream Bundle Filters
* `has_unsupported_bundle`: Has bitstream in an unsupported bundle
* `has_small_thumbnail`: Has unusually small thumbnail
* `has_original_without_thumbnail`: Has original bitstream without thumbnail
* `has_invalid_thumbnail_name`: Has invalid thumbnail name (assumes one thumbnail for each original)
* `has_non_generated_thumb`: Has non-generated thumbnail
* `no_license`: Doesn't have a license
* `has_license_documentation`: Has documentation in the license bundle
* Permission Filters
* `has_restricted_original`: Item has Restricted Original Bitstream
* `has_restricted_thumbnail`: Item has Restricted Thumbnail
* `has_restricted_metadata`: Item has Restricted Metadata

Possible response status:

* 200 OK - The specific report data was found, and the data has been properly returned.
* 403 Forbidden - In case of unauthorized user session.
140 changes: 140 additions & 0 deletions contentreport-filtereditems.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
# Metadata query (aka Filtered Items) report
[Back to the list of all defined endpoints](endpoints.md)

This endpoint provides a custom query API to select items from existing collections,
according to given Boolean and metadata filters.

NOTE: This is currently a beta feature.


**GET /api/contentreport/filtereditems**

The report parameters are described [below](#report-parameterization).

Additionally, a `pageNumber` parameter is available to retrieve results starting at a given page
(according to `pageLimit`, the maximum number of items per page). Page numbering starts at 0.

All parameters except `pageNumber` and `pageLimit` are repeatable. Multiple values can be expressed either
by repeating the corresponding parameter, e.g.:
```
?filters=is_discoverable&filters=has_multiple_originals&filters=has_pdf_original
```

of by using a comma-separated value, e.g.:

```
?filters=is_discoverable,has_multiple_originals,has_pdf_original
```

except the `queryPredicates` parameter, which supports only parameter repetition for multiple values
to avoid any ambiguities in case a predicate values contains commas.

Please see [below](#report-parameterization) for parameterization details.

## Report contents

An example JSON response document to `/api/contentreport/filtereditems` (metadata removed for brevity):
```json
{
"id": "filtereditems",
"items": [
{
"id": "07e388ff-f22b-4d4f-8275-acab5c3edacc",
"uuid": "07e388ff-f22b-4d4f-8275-acab5c3edacc",
"name": "Enhancing the lubricity of an environmentally friendly Swedish diesel fuel MK1",
"handle": "20.500.11794/42",
"metadata": {
"dc.contributor.author": [
{
"value": "Smith, John",
"language": null,
"authority": "6eee383a-f126-4705-9ffb-b4aa4832070e",
"confidence": 600,
"place": 0
}
],
"dc.publisher": [
{
"value": "Elsevier",
"language": "fr_CA",
"authority": null,
"confidence": -1,
"place": 0
}
],
},
"inArchive": true,
"discoverable": true,
"withdrawn": false,
"lastModified": "2015-11-23T17:30:21.463+00:00",
"entityType": "Publication",
"owningCollection": {
"id": "d98a828c-45c2-43d9-9861-6b9800bf14f5",
"uuid": "d98a828c-45c2-43d9-9861-6b9800bf14f5",
"name": "Articles publiés dans des revues avec comité de lecture",
"handle": "100/1",
"metadata": {
"dc.identifier.uri": [
{
"value": "http://localhost:4000/handle/100/1",
"language": null,
"authority": null,
"confidence": -1,
"place": 0
}
],
"dspace.entity.type": [
{
"value": "Publication",
"language": null,
"authority": null,
"confidence": -1,
"place": 0
}
]
},
"type": "collection"
},
"type": "item"
},
{
...
}
],
"itemCount": 40,
"type": "filtereditemsreport",
"_links": {
"self": {
"href": "http://localhost:8080/dspace-server/api/contentreport/filtereditems"
}
}
}
```

## Report parameterization

The parameters are specified as follows:

* `collections`: The collection UUIDs where to search items. If none are provided, the whole repository is searched.
* `presetQuery`: This parameter is not used on the REST API side. It defines a predefined set of query predicates
defined in the Angular layer.
* `queryPredicates`: Predicates used to filter matching items. They can be predefined (see `presetQuery` above)
or defined specifically by the user. As mentioned above, they are the only parameter that cannot be repeated
using comma-separated values.
* `pageLimit`: Maximum number of items per page.
* `filters`: Supplementary filters, these are the same as those available in the Filtered Collections report.
Please see [/api/contentreport/filteredcollections](contentreport-filteredcollections.md#available-filters) for details.
* `additionalFields`: Fields to add to the basic report for each item included in the report.

The _basic report_ mentioned above includes, for each item:

* Sequential number (order of appearance in the report)
* UUID
* Parent collection
* Handle
* Title

Possible response status:

* 200 OK - The specific report data was found, and the data has been properly returned.
* 403 Forbidden - In case of unauthorized user session.
2 changes: 2 additions & 0 deletions endpoints.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,8 @@
* [/api/authz/features](features.md)
* [/api/statistics](statistics.md)
* [/api/tools/itemrequests](item-requests.md)
* [/api/contentreport/filteredcollections](contentreport-filteredcollections.md)
* [/api/contentreport/filtereditems](contentreport-filtereditems.md)

## Actuator endpoints
The following endpoints are implemented using [Spring Boot Actuator](https://docs.spring.io/spring-boot/docs/current/reference/html/actuator.html#actuator.enabling) and are enabled by default:
Expand Down
Loading