- internal refactoring for clarity
- Do a better job of stripping out script tags
- update deps
- update deps
- Fix bug in min_clause_words_filter ( used in article_sentence_extractor )
- Allow tests to run in Docker
- Update circle to continue to work
- Add architecture flow
- Code formatting
- Add min words filter specs
- Add label action specs
- Add missing test case to ignorable element spec
- Add merge_next case to text block spec
- Dry up includes
- Add KeepEverythingWithMinKWords Extractor
- Add ArticleSentence Extractor
- Add LargestContent Extractor
- Add KeepEverything Extractor
- Add NumWordsRules Extractor
- Add Canola Extractor
- Add Default Extractor
- Tweak dependency to use Nokogiri 1.6.6.2 or newer
- Add Apache 2.0 license to reflect original work by Christian Kohlschütter
- bugfix new line character escaping bug
- Add Article Extractor