You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(Future User Story): I'm a PageOneX user, and I want to simply type in a phrase or keyword, choose my newspapers and date range, and see an automatically generated PageOneX visualization of front page coverage.
Notes:
step 1, solve search for keywords (use Lexis Nexis, Google news, MediaCloud, etc to find the dates of stories w/keywords, search limited to page A1).
step 2, use OCR to find the keywords on the front pages for those dates
step 3, use machine learning to train software to identify the spatial boundaries of the article that contains that keyword, and select those boundaries :)
voila! automated PageOneX :)
The text was updated successfully, but these errors were encountered:
This approach, using PDF extracted information looks promising: https://github.com/samzhang111/frontpages/
You have first to set up the script to daily download front pages from the Newseum.
@samzhang111 has also worked with some python libraries to detect spatial boundaries in PDFs
(Future User Story): I'm a PageOneX user, and I want to simply type in a phrase or keyword, choose my newspapers and date range, and see an automatically generated PageOneX visualization of front page coverage.
Notes:
The text was updated successfully, but these errors were encountered: