-
Notifications
You must be signed in to change notification settings - Fork 0
ai-ku/glookup
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
GLOOKUP Copyright (c) 2008-2014, Deniz Yuret This is the code used in: Deniz Yuret. 2008. Smoothing a Tera-word Language Model. In the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. See http://goo.gl/rmD87d for details. The glookup program reads ngram patterns with wildcards (represented with the '_' character) from stdin and prints their counts from the Web1T Google ngram data (whose path is given by the -p option). Please see glookup.1 (man page), or glookup.txt (plain text format) for documentation. The model.pl script optimizes and tests various language models. See 'perldoc model.pl', or model.txt for documentation. Typical usage: model.pl -patterns < text > patterns glookup -p web1t_path < patterns > counts model.pl -counts counts < text The glookup.pl script quickly searches for a given pattern in uncompressed Google Web1T data. Use the C version for bulk processing, the perl version to get a few counts quickly.
About
glookup - reads ngram patterns with wildcards from stdin and prints their counts from the Web1T Google ngram data.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published