Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set language #463

Open
ran-j opened this issue Aug 31, 2018 · 9 comments
Open

Set language #463

ran-j opened this issue Aug 31, 2018 · 9 comments

Comments

@ran-j
Copy link

ran-j commented Aug 31, 2018

hello.

Instead of doing:

Portugese | X |   |   | PorterStemmerPt
Russian | X |   |   | PorterStemmerRu
Swedish | X |   |   | PorterStemmerSv

Can you create a config ? , like:

natural.Language('PT-BR');

and the just use just

'strings'.stem()

@Hugo-ter-Doest
Copy link
Collaborator

Hugo-ter-Doest commented Sep 1, 2018

I'm not sure if I understand what you mean. At the moment stemmers for different languages are separate functions. Is it that you can set the language of natural and after that use stemmers like that? So if I say natural.setLanguage('Pt') that all modules like stemmer and tokenizer and sentimenter are set to Portugese?

Interesting idea. We need to improve the language system of natural, so maybe this is a way to go.

@PauloQuerido
Copy link

PauloQuerido commented Sep 1, 2018

This is secondary, but the timing is perfect to ask the change in the documentation: "Portugese" is misspelling of "Portuguese".

(And yes, a general language configuration is a good idea.)

@ran-j
Copy link
Author

ran-j commented Sep 1, 2018

I'm saying to use one function to all languages and set the language that you wanna use example:

//set the PT language
natural.Language('PT-BR');

//and the use the function:
'strings'.stem()

To set the language that I want to use dynamic.

//set the PT language
natural.Language('PT-BR');

//and the use the function:
'strings'.stem()

//set the EN language
natural.Language('EN-ES');

//and the use the function:
'strings'.stem()

@Hugo-ter-Doest
Copy link
Collaborator

It seems that you are using ISO language codes. PT-BR is Brazilian Portuguese. But what is EN-ES, or do you mean EN-US?

@ran-j
Copy link
Author

ran-j commented Sep 2, 2018

Yes kkk I mean EN-US was just a example.

Did you get the idea ?

@Hugo-ter-Doest
Copy link
Collaborator

Hugo-ter-Doest commented Sep 2, 2018

Yeah I get the idea. It will take quite some refactoring of modules that support multiple languages. Plus I'm afraid we have to introduce a global setting that can be seen by all modules. Something like:

function setLanguage(l) {
  global.language = l;
}

Also, a default language must be set in natural's index file.

@ran-j
Copy link
Author

ran-j commented Sep 2, 2018

yes like that, The code will get more organized and easy to change the language. Im sking that becouse Im creating a chatbot and that will be helpfull for me

@Hugo-ter-Doest
Copy link
Collaborator

Hugo-ter-Doest commented Sep 11, 2018

I did some work on runtime language support. Please have a look at this branch. There is a section at the top of the README about language support and I refactored the Porter stemmer. Als some tests were added for the config module and the generic Porter stemmer.

@ran-j
Copy link
Author

ran-j commented Sep 11, 2018

Yes, like that and how can I help to improve PT language functions ? (Sentiment,Stemmer,Tokenizer)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants