Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

query_aggregate() limited to certain field types #9

Open
ChRauh opened this issue Aug 29, 2023 · 1 comment
Open

query_aggregate() limited to certain field types #9

ChRauh opened this issue Aug 29, 2023 · 1 comment

Comments

@ChRauh
Copy link

ChRauh commented Aug 29, 2023

Might be obvious from a database perspective, but took me a loong time to figure out: the axes argument in query_aggregate() is apparently limited to "date" and "keyword" field types, if I'm not mistaken. Relatedly, the set_fields() function seems to work only on empty indices (i.e. before uploading documents).

Both issues should be flagged in the documentation and/or backed by more informative error messages than the current 'internal server error' (HTTP 500) to save future users some time ...

Thank you!

# No field specification 
my_data <- data.frame(date = rep(as.Date("2020-01-01"), 6),
                      title = paste0("Title ", seq(1:6)),
                      text = paste0("bla bla blub ", seq(1:6)),
                      aggregator = c("A", "A", "A", "B", "B", NA))
create_index("my_index", description = "My Index")
upload_documents("my_index", my_data)
get__fields("my_index")
query_aggregate(index = "my_index",  
                queries = NULL,
                axes = list(list(name="aggregator",
                                 field="aggregator")))  # HTTP 500 error
set_fields(index = "my_index", list(aggregator = "keyword")) # HTTP 500 error

# With field specification 
my_data <- data.frame(date = rep(as.Date("2020-01-01"), 6),
                      title = paste0("Title ", seq(1:6)),
                      text = paste0("bla bla blub ", seq(1:6)),
                      aggregator = c("A", "A", "A", "B", "B", NA))
create_index("my_index", description = "My Index")
set_fields(index = "my_index", list(aggregator = "keyword")) # Before upload!
get__fields("my_index")
upload_documents("my_index", my_data)
query_aggregate(index = "my_index",  
                queries = NULL,
                axes = list(list(name="aggregator",
                                 field="aggregator")))  # Ah!

JBGruber added a commit that referenced this issue Sep 21, 2023
@JBGruber
Copy link
Member

I added a specific error to flag this:

library(amcat4r)
amcat_login("http://localhost/amcat", cache = 2L)
#> ✔ Authentication at http://localhost/amcat successful!
# No field specification
my_data <- data.frame(
  date = rep(as.Date("2020-01-01"), 6),
  title = paste0("Title ", seq(1:6)),
  text = paste0("bla bla blub ", seq(1:6)),
  aggregator = c("A", "A", "A", "B", "B", NA)
)
create_index("my_index", description = "My Index")
upload_documents("my_index", my_data)
get_fields("my_index")
#> # A tibble: 5 × 2
#>   name       type 
#>   <chr>      <chr>
#> 1 aggregator text 
#> 2 date       date 
#> 3 text       text 
#> 4 title      text 
#> 5 url        url
query_aggregate(
  index = "my_index",
  queries = NULL,
  axes = list(list(
    name = "aggregator",
    field = "aggregator"
  ))
)
#> Error in `query_aggregate()`:
#> ! Aggregation axes need to be either date or keyword fields. Check the
#>   field types with `get_field()`

Created on 2023-09-21 with reprex v2.0.2

Generally, the error 500 thing is a little annoying. Generally I think it would be better if we would pipe the errors from amcat through to the user. However, in this case the error I see in the server logs are quite confusing as well:

'Fielddata is disabled on [aggregator] in [my_index]. Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [aggregator] in order to load field data by uninverting the inverted index. Note that this can use significant memory.'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants