Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What languages are supported #47

Open
gasyoun opened this issue Mar 11, 2021 · 2 comments
Open

What languages are supported #47

gasyoun opened this issue Mar 11, 2021 · 2 comments

Comments

@gasyoun
Copy link

gasyoun commented Mar 11, 2021

Is it English only?

@christofs
Copy link

No, many languages are supported, including Chinese, Japanese, Korean or Hebrew, Arabic, Cyrillic and Coptic, as long as your texts are in the UTF-8 encoding. See details here (page 10): https://github.com/computationalstylistics/stylo_howto/raw/master/stylo_howto.pdf

@Frenzie
Copy link
Contributor

Frenzie commented Mar 12, 2021

It depends a bit of the specific context you're thinking of. For basic tokenization just about anything goes, but for pronoun detection you might have to add your own.

\arguments{
\item{corpus.lang}{an optional argument specifying the language of the texts
analyzed: available languages are \code{English}, \code{Latin},
\code{Polish}, \code{Dutch}, \code{French}, \code{German}, \code{Spanish},
\code{Italian}, and \code{Hungarian} (default is \code{English}).}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants