Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

search content pages #111

Merged
merged 11 commits into from
Jul 18, 2024
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Filter
title: LLM
weight: 2
---

TBD
TODO
92 changes: 92 additions & 0 deletions content/Data.norge.no/Search/Search/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
---
title: Search
weight: 1
---

The service responsible for handling searches is [fdk-search-service](https://github.com/Informasjonsforvaltning/fdk-search-service), which is described by [this OpenAPI specification](https://raw.githubusercontent.com/Informasjonsforvaltning/fdk-search-service/main/openapi.yaml).

Production endpoint: <https://search.api.fellesdatakatalog.digdir.no/search>
Demo endpoint: <https://search.api.demo.fellesdatakatalog.digdir.no/search>
Staging endpoint: <https://search.api.staging.fellesdatakatalog.digdir.no/search>

Simple example using the staging endpoint:
```Shell
curl -X POST 'https://search.api.staging.fellesdatakatalog.digdir.no/search' -H 'Content-Type: application/json' -d '{"query":"test"}'
```

### Searchable fields

There are 3 searchable fields, they are `title`, `description` and `keyword`. The service will by default try to find matches for the query in all 3 fields, but it's possible to define which of the fields it should include in the search body.

Example where only hits from the description fields are included:
```Shell
curl -X POST 'https://search.api.staging.fellesdatakatalog.digdir.no/search' -H 'Content-Type: application/json' -d '{"query":"test", "fields": {"title":false,"description":true,"keyword":false}}'
```

#### Boosting

Hits from some fields will be prioritized over others, i.e. a matching hit from the title field will be prioritized over a hit from the description field.

| Field | Boost |
| ------ | ------ |
| title, full phrase match | 30 |
| title, partial match | 15 |
| keyword | 5 |
| description | 1 |

Take the title "Test search service" and the two queries "test service" and "search service". The first query will have 2 partial matches "test" and "service", with a combined search value of 15 + 15 = 30, the second query will have 3 matches where two are partial, "search" and "service", and one is a full phrase match, "search service", with a combined search value of 15 + 15 + 30 = 60.

### Specific resource types

Each resource type has it's own endpoint, the available endpoints are `/datasets`, `/data-services`, `/concepts`, `/information-models`, `/events` and `/services`

Example using the datasets endpoint:
```Shell
curl -X POST 'https://search.api.staging.fellesdatakatalog.digdir.no/search/datasets' -H 'Content-Type: application/json' -d '{"query":"test"}'
```

### Pagination

All search results will be paginated, it is possible to customize the size and page number with the pagination field in the search body.

Example using the pagination field, with current page set to number 5 and there are 10 hits per page:
```Shell
curl -X POST 'https://search.api.staging.fellesdatakatalog.digdir.no/search' -H 'Content-Type: application/json' -d '{"query":"test","pagination":{"size":10,"page":5}}'
```

### Filtering

It's possible to filter the search result, see SearchFilters in the [OpenAPI specification](https://raw.githubusercontent.com/Informasjonsforvaltning/fdk-search-service/main/openapi.yaml) for a list of all possible filters and what type of value they accept.

Example using the data theme filter:
```Shell
curl -X POST 'https://search.api.staging.fellesdatakatalog.digdir.no/search' -H 'Content-Type: application/json' -d '{"query":"test","filters":{"dataTheme":{"value":["ENVI"]}}}'
```

Example using the open data filter:
```Shell
curl -X POST 'https://search.api.staging.fellesdatakatalog.digdir.no/search' -H 'Content-Type: application/json' -d '{"query":"test","filters":{"openData":{"value":true}}}'
```

Example using the formats filter:
```Shell
curl -X POST 'https://search.api.staging.fellesdatakatalog.digdir.no/search' -H 'Content-Type: application/json' -d '{"query":"test","filters":{"formats":{"value":["MEDIA_TYPE application/json"]}}}'
```

#### Aggregations

Each search result will include aggregations of the query for possible filter values. There are always included a value for each filter, it's a list of the filter options represented in the total search result and a count of how many hits the filter option has.

Given that this search:
```Shell
curl -X POST 'https://search.api.staging.fellesdatakatalog.digdir.no/search' -H 'Content-Type: application/json' -d '{"query":"test"}'
```
Has this as the aggregation in the result:
```
"aggregations":{"accessRights":[{"key":"PUBLIC","count":5},{"key":"RESTRICTED","count":16}]}
```

Then the next example would therefore have 5 hits in it's result, since the aggregation values shows that the query has 5 hits where the value for the access rights field is PUBLIC and 16 where the value is RESTRICTED.
```Shell
curl -X POST 'https://search.api.staging.fellesdatakatalog.digdir.no/search' -H 'Content-Type: application/json' -d '{"query":"test","filters":{"accessRights":{"value":"PUBLIC"}}}'
```
54 changes: 54 additions & 0 deletions content/Data.norge.no/Search/Sparql/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
---
title: SPARQL
weight: 3
---

[Read more about SPARQL here](https://www.w3.org/TR/sparql11-overview/).

Production endpoint: <https://sparql.fellesdatakatalog.digdir.no>
Demo endpoint: <https://sparql.demo.fellesdatakatalog.digdir.no>
Staging endpoint: <https://sparql.staging.fellesdatakatalog.digdir.no>

Simple example using the staging endpoint:
```Shell
curl -X POST 'https://sparql.staging.fellesdatakatalog.digdir.no/?query=SELECT%20%2A%20WHERE%20%7B%20?sub%20?pred%20?obj%20.%20%7D%20LIMIT%201'
```

Data.norge.no has a simple GUI for SPARQL queries:
- production <https://data.norge.no/sparql>
- demo <https://demo.fellesdatakatalog.digdir.no/sparql>
- staging <https://staging.fellesdatakatalog.digdir.no/sparql>

### Query examples

List all properties and objects where the subject is this dataset <https://staging.fellesdatakatalog.digdir.no/datasets/04edc67b-046c-37a8-9822-29f03d2f1e80>:

```shell
PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dct: <http://purl.org/dc/terms/>

SELECT ?property ?object

WHERE {​​​​​​
?dataset a dcat:Dataset .
?record foaf:primaryTopic ?dataset .
?record a dcat:CatalogRecord .
?record dct:identifier "04edc67b-046c-37a8-9822-29f03d2f1e80" .
?dataset ?property ?object .
}​​​​​​
```

List all dataset titles:

```shell
PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX dct: <http://purl.org/dc/terms/>

SELECT ?title

WHERE {​​​​​​​​​​​​​​
?dataset a dcat:Dataset .
?dataset dct:title ?title .
}​​​​​​​​​​​​​​​​​​​​​
```
24 changes: 24 additions & 0 deletions content/Data.norge.no/Search/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
---
title: Search
weight: 8
---

There are 3 different ways to search for data in Data.norge.no, each is based on a different technology.

### Search

The main search service in Data.norge.no is based on ElasticSearch. This service searches the text in a limited selection of fields, and has filters and other advanced functionality that helps users navigate.

[See this page for more info](https://informasjonsforvaltning.github.io/data.norge.no/search/search)

### LLM

The LLM search service uses a 'Large Language Model' (LLM) that is built from the same data used in the main search, but will accept naturally worded queries where the other search demands exact wording of titles or keywords.

[See this page for more info](https://informasjonsforvaltning.github.io/data.norge.no/search/llm)

### SPARQL

The SPARQL search service is based on the RDF query language SPARQL, and is our most advanced and powerful search. A user will have to create queries that follows the correct syntax, but will be able to search all harvested data points, not just pre-selected fields.

[See this page for more info](https://informasjonsforvaltning.github.io/data.norge.no/search/sparql)
38 changes: 0 additions & 38 deletions content/Data.norge.no/Søk_og_filter/Sparql/_index.md

This file was deleted.

12 changes: 0 additions & 12 deletions content/Data.norge.no/Søk_og_filter/Søk/_index.md

This file was deleted.

14 changes: 0 additions & 14 deletions content/Data.norge.no/Søk_og_filter/_index.md

This file was deleted.