- [] Articles from ONA selected via keywords/collection/dates
- [] Sitemap ingestion
- [] Generally any other API
- [] Setup mediacloud directory proxy
- [] Deduplication.
- [] Sentence level de-duplication.
- [] Classifier threshold
- Metadata subsets for final return
- Entity Extraction (via API)
- [] Scrapy NER
- [] N-Grams
- [] Byline Detection
- [] Quote extraction, attribution
- [] link extraction, network generation
- [] NYT based topic/theme detection
- [] sentence-level story splitting
- [] train a word-2-vec model
- [] media-to-media link count (ie: table of most linked-to sources)
- [] media-to-document link count (ie: table of most linked-to documents)
- [] Country-level tagging- what region is this about?
- CSV
- [] Network Maps
- [] Kibana Instance Export
- [] Custom Tooling...?
- S3 buckets.