Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparison view treats multi-word value as multiple tokens #385

Open
arildm opened this issue Aug 27, 2024 · 2 comments
Open

Comparison view treats multi-word value as multiple tokens #385

arildm opened this issue Aug 27, 2024 · 2 comments
Labels

Comments

@arildm
Copy link
Member

arildm commented Aug 27, 2024

  1. With the Svenska partiprogram och valmanifest (vivill) corpus selected, save two searches for comparison
  2. Compare the searches using the parti attribute
  3. Click any of the multi-word party names, e.g. Folkpartiet liberalerna
  4. Expected: Some results
  5. Actual: No results

Apparently, the API request has cqp2=[_.text_party_name = "Folkpartiet"] [_.text_party_name = "liberalerna"]

@arildm arildm added the bug label Aug 27, 2024
@arildm
Copy link
Member Author

arildm commented Aug 28, 2024

The backend /loglike response doesn't distinguish a multi-word value from multiple tokens. Compare these calls:

"han"+verb vs. "hon"+verb by sense: Space in string separates tokens

{ "loglike": {
  "hon..1:-1.000 vara..1:-1.000": 2375.04,
  "han..1:-1.000 vara..1:-1.000": -1774.16,
  "hon..1:-1.000 skola..4:-1.000": 1062.87,
  // ...

"frihet" vs. "jämlikhet" by party: Space in string does not separate tokens

{ "loglike": {
  "Feministiskt initiativ": 78.12,
  "V\u00e4nsterpartiet": 74.7,
  "Moderaterna": -73.75,
  // ...

Perhaps we can interpret the string value as one or more tokens depending on the input queries (set1_cqp and set2_cqp)? But changing the response format would probably be a more robust approach.

@arildm
Copy link
Member Author

arildm commented Sep 2, 2024

This is where the string in the reponse is whitespace-separated:

const tokenLists = key.split("/").map((tokens) => tokens.split(" "))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant