Keywords and keyphrases (multi-word units) are widely used in large document collections.
They describe the content of single documents and provide a kind of semantic metadata that is useful for a wide variety of purposes.
In libraries professional indexers select keyphrases from a controlled vocabulary (also called Subject Headings) according to defined cataloguing rules. On the Internet, digital libraries, or any depositories of data also use keyphrases (or here called content tags or content labels) to organize and provide a thematic access to their data.
KEA is an algorithm for extracting keyphrases from text documents. It can be either used for free indexing or for indexing with a controlled vocabulary build from The University of Waikato in the Digital Libraries and Machine Learning Labs of the Computer Science Department by Eibe Frank and Olena Medelyan
OpenKM keyphrase extraction summarization service is an open-source software distributed under the GNU Affero General Public License.
This video shows how KEA Summarization works:
$ git clone [git-repo-url] openkm-community
$ cd openkm-community
$ mvn clean package
KEA Summarization service is supported by developers and technical enthusiasts via the forum of the user community. If you want to raise an issue, please follow the below recommendations:
- Before you post a question, please search the question to see if someone has already reported it / asked for it.
- If the question does not already exist, create a new post.
- Please provide as much detailed information as possible with the issue report. We need to know the version of OpenKM, Operating System, browser and whatever you think might help us to understand the problem or question.
KEA Summarization is available to Open Source community under the GNU Affero General Public License. The OpenKM source code is available for the entire community, which is free to use, modify and redistribute under the premises of such license.