JSearch is the open software to extract string and find keyword from HWP and Office format.
<dependency>
<groupId>io.github.qwefgh90</groupId>
<artifactId>jsearch</artifactId>
<version>0.3.0</version>
</dependency>
- It should work with various types of document. ex) hwp, pdf, office
- It should support extract string and rapidly find keyword from doucments.
- It will be jar library.
- All functions are synchronous.
- a result of extraction contains full string.
- a result of finding contains word count.
This software has been developed with reference to the HWP file format open specification by Hancom, Inc. http://www.hancom.co.kr/userofficedata.userofficedataList.do?menuFlag=3 한글과컴퓨터의 한/글 문서 파일(.hwp) 공개 문서를 참고하여 개발하였습니다.
a part to handle .hwp format is forked source in java-hwp project.