Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different results from command line tool #49

Open
nirzohar opened this issue Dec 19, 2018 · 3 comments
Open

Different results from command line tool #49

nirzohar opened this issue Dec 19, 2018 · 3 comments

Comments

@nirzohar
Copy link

The predict-prob method return different results in the java and the native command line tool.
Foe example see the results from test05PredictProba in the JFastTextTest class (or test with your own model).
The java return probability is: 0.500125
The C++ native tool return probability is: 0.500075

Right, this looks like a minor not important, but when test the probs results with large model files, I see huge gap between the return probabilities.

@carschno
Copy link
Contributor

I've been able to reproduce this and have isolated the issue to the trailing newline character. Apparently, this is rooted in FastText itself; however, the problem probably does not arise there because it operates on line-by-line input, whereas the Java API allows for arbitrary (multi-line) strings.

$ echo "Weak wifi otherwise all ok" | fasttext predict-prob model.bin - 5
__label__60 0.678738 __label__80 0.315212 __label__40 0.0055875 __label__100 0.000415088  __label__20 6.88466e-05

Now without trailing newline:

$ echo -n "Weak wifi otherwise all ok" | fasttext predict-prob model.bin - 5
__label__60 0.807072 __label__80 0.126261 __label__40 0.049052 __label__4 0.00411388 __label__5 0.00340998

Running on the command line, using the java package (created with mvn clean package):

$ echo "Weak wifi otherwise all ok" | java -jar JFastText/target/jfasttext-0.4-jar-with-dependencies.jar  predict-prob model.bin - 5
__label__60 0.678737 __label__80 0.315212 __label__40 0.0055875 __label__100 0.000415088 __label__20 6.88467e-05

Again, without trailing newline:

$ echo -n "Weak wifi otherwise all ok" | java -jar JFastText/target/jfasttext-0.4-jar-with-dependencies.jar  predict-prob model.bin - 5
__label__60 0.807072 __label__80 0.126261 __label__40 0.049052 __label__4 0.00411388 __label__5 0.00340998

In the Java API, this is also reproducible. With trailing newline:

	JFastText jft = new JFastText();
	jft.loadModel("model.bin");
	List<ProbLabel> predictions = jft.predictProba("Weak wifi otherwise all ok\n", 5);

Without trailing newline:

	JFastText jft = new JFastText();
	jft.loadModel("model.bin");
	List<ProbLabel> predictions = jft.predictProba("Weak wifi otherwise all ok", 5);

The results are the same as above with echo and echo -n respectively.

@carschno
Copy link
Contributor

This is actually a known issue in FastText, see:
facebookresearch/fastText#435 and facebookresearch/fastText#165

@kun368
Copy link

kun368 commented Feb 25, 2022

Based on what @carschno mentioned, I used this to get the right results:

public Map<String, Double> predictTopLabel(String text, int k) {
    Map<String, Double> scoreMap = new LinkedHashMap<>();
    text = StringUtils.trimToEmpty(text) + "\n";
    final List<JFastText.ProbLabel> pl = model.predictProba(text, k);
    for (JFastText.ProbLabel i : CollectionUtils.emptyIfNull(pl)) {
        final double prob = Math.exp(i.logProb);
        final double score = Math.round(prob * 100000000) / 100000000;
        scoreMap.put(i.label, score);
    }
    return scoreMap;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants