Skip to content

na9da/haskell-jusText

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

haskell-jusText

This is a haskell clone of the python jusText project. It is useful for removing boiler plate content from HTML pages leaving just the main content. jusText applies certain heuristics to identify the main content of the page. You can read more about it in the thesis work done by Jan Pomik´alek.

Building

  stack install
  haskell-jusText <htmlFile> <stopwordsFile>

Stopword files for different languages are available in the original repo.

About

Tool for removing boilerplate from HTML pages

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published