query_aggregate() limited to certain field types #9

ChRauh · 2023-08-29T12:29:29Z

Might be obvious from a database perspective, but took me a loong time to figure out: the axes argument in query_aggregate() is apparently limited to "date" and "keyword" field types, if I'm not mistaken. Relatedly, the set_fields() function seems to work only on empty indices (i.e. before uploading documents).

Both issues should be flagged in the documentation and/or backed by more informative error messages than the current 'internal server error' (HTTP 500) to save future users some time ...

Thank you!

# No field specification 
my_data <- data.frame(date = rep(as.Date("2020-01-01"), 6),
                      title = paste0("Title ", seq(1:6)),
                      text = paste0("bla bla blub ", seq(1:6)),
                      aggregator = c("A", "A", "A", "B", "B", NA))
create_index("my_index", description = "My Index")
upload_documents("my_index", my_data)
get__fields("my_index")
query_aggregate(index = "my_index",  
                queries = NULL,
                axes = list(list(name="aggregator",
                                 field="aggregator")))  # HTTP 500 error
set_fields(index = "my_index", list(aggregator = "keyword")) # HTTP 500 error

# With field specification 
my_data <- data.frame(date = rep(as.Date("2020-01-01"), 6),
                      title = paste0("Title ", seq(1:6)),
                      text = paste0("bla bla blub ", seq(1:6)),
                      aggregator = c("A", "A", "A", "B", "B", NA))
create_index("my_index", description = "My Index")
set_fields(index = "my_index", list(aggregator = "keyword")) # Before upload!
get__fields("my_index")
upload_documents("my_index", my_data)
query_aggregate(index = "my_index",  
                queries = NULL,
                axes = list(list(name="aggregator",
                                 field="aggregator")))  # Ah!

The text was updated successfully, but these errors were encountered:

JBGruber · 2023-09-21T12:28:49Z

I added a specific error to flag this:

library(amcat4r)
amcat_login("http://localhost/amcat", cache = 2L)
#> ✔ Authentication at http://localhost/amcat successful!
# No field specification
my_data <- data.frame(
  date = rep(as.Date("2020-01-01"), 6),
  title = paste0("Title ", seq(1:6)),
  text = paste0("bla bla blub ", seq(1:6)),
  aggregator = c("A", "A", "A", "B", "B", NA)
)
create_index("my_index", description = "My Index")
upload_documents("my_index", my_data)
get_fields("my_index")
#> # A tibble: 5 × 2
#>   name       type 
#>   <chr>      <chr>
#> 1 aggregator text 
#> 2 date       date 
#> 3 text       text 
#> 4 title      text 
#> 5 url        url
query_aggregate(
  index = "my_index",
  queries = NULL,
  axes = list(list(
    name = "aggregator",
    field = "aggregator"
  ))
)
#> Error in `query_aggregate()`:
#> ! Aggregation axes need to be either date or keyword fields. Check the
#>   field types with `get_field()`

^{Created on 2023-09-21 with reprex v2.0.2}

Generally, the error 500 thing is a little annoying. Generally I think it would be better if we would pipe the errors from amcat through to the user. However, in this case the error I see in the server logs are quite confusing as well:

'Fielddata is disabled on [aggregator] in [my_index]. Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [aggregator] in order to load field data by uninverting the inverted index. Note that this can use significant memory.'

JBGruber added a commit that referenced this issue Sep 21, 2023

address #9

2864479

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

query_aggregate() limited to certain field types #9

query_aggregate() limited to certain field types #9

ChRauh commented Aug 29, 2023

JBGruber commented Sep 21, 2023

query_aggregate() limited to certain field types #9

query_aggregate() limited to certain field types #9

Comments

ChRauh commented Aug 29, 2023

JBGruber commented Sep 21, 2023