Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate vocabulary and language menus and detect language of text #21

Open
wants to merge 52 commits into
base: main
Choose a base branch
from

Conversation

juhoinkinen
Copy link
Member

@juhoinkinen juhoinkinen commented Oct 30, 2024

This changes the first menu from "Vocabulary and text language" (Annif project) to "Vocabulary" and adds a menu "Text language" (the Annif project is selected as a combination of the selected vocabulary and language). Thus, the text language can be selected independently of the vocabulary; when a vocabulary is selected, that does not have Annif projects for some languages, those language items are disabled in the languages menu.

Also adds automatic language detection, which updates the selection in the language menu. The language detection is run when

  • a file or url is inputted
  • a user alters text in the textarea and there is >= 10 characters in the text; there is a delay of 1 second

This PR branch is based on the branch of PR #18, so it's better to merge that to main before this. Closes #9.

@juhoinkinen juhoinkinen marked this pull request as ready for review January 13, 2025 07:23
@juhoinkinen juhoinkinen changed the title Detect language of text Separate vocabulary and language menus and detect language of text Jan 13, 2025
@osma osma self-requested a review February 3, 2025 08:28
@juhoinkinen juhoinkinen changed the base branch from update-2024 to main February 6, 2025 13:49
Copy link
Member

@osma osma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general, I left a few comments and suggestions.

var textract_base_url = 'https://ai.dev.finto.fi/textract/'//'http://localhost:8001/textract/';
var textract_base_url = 'https://ai.dev.finto.fi/textract/'
} else {
// local development with VS Code Live Server extension - use APIs of Annif on localhost via Live Server proxy (overcomes CORS error by /v1/detect-language)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to have logic like this on the main branch?
(I'm fine either way, just asking whether this is intentional)

},
body: JSON.stringify({
text: this.text,
languages: ["fi", "sv", "en"] // TODO Here should be only langs that selected project supports
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the TODO a problem in practice for current Finto AI? For some vocabs, we don't support all languages.

.then(response => response.json())
.then(data => {
this.text_language = data.results[0].language;
// this.text_language_detection_score = data.results[0].score; TODO Add to tooltip
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the TODO a problem? Should it be fixed now instead of leaving it like this?

@@ -254,7 +323,9 @@ const mainApp = createApp({
})
.then(data => {
this.projects = data.projects
this.selected_project = this.projects[0].project_id
// Assume vocabulary id is a prefix of project id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, the vocab ID would be returned by the Annif REST API, but it doesn't expose it currently. I opened an issue about it on the Annif side.

computed: {
disabledLanguages() {
// Map of languages and their enabling criteria based on vocabularyId
return {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't the duplicate keys here problematic? Does this really work as intended?

juhoinkinen and others added 3 commits March 6, 2025 15:55
Co-authored-by: Osma Suominen <osma.suominen@helsinki.fi>
Co-authored-by: Osma Suominen <osma.suominen@helsinki.fi>
Co-authored-by: Osma Suominen <osma.suominen@helsinki.fi>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Detect language of text
2 participants