Broad strokes: we determined that the MVP fulfills the pieces not currently achieved by Splunk—which could serve as the entire front end at very small scale. This means data integrity and depth are the core of the value added by PDAP.
Ingestion → Archival Storage → Search & analysis
Data stewardship
Transparency
Discipline
“Librarian”
"I want police data."
Make a query → get information
Enter search via UI (selects) or code → present specific data
api (json), chart, csv
"I want to be able to analyze the police data I found."
Analysis tools or analysis that is done for you
Find extremes in the data automatically
"PDAP needs to verify data."
- Guard the submissions process
- Credibility score for each type of data
"PDAP needs to be like a librarian."
- Nonintrusive
- Pedigree
- Legally captured?
- Multiple sources?
- Anonymity?
"What does it mean to verify data?"
"We need to be able to get data out of cold storage."
- Eventually it'll be too much data for Splunk
Query data → Analyze data → Export insights
Upload data → Analyze data*
*The user agrees that we can keep the data, and provides information or verification about it.
Save an Analysis → Share the Analysis with someone else
Save an Analysis → Revisit it with updated data
&#xNAN;Alert user if an analysis changes based on updated information
- easily write regex
- accept any type of data
- oddly / non-delimited
- many file types
- faster analysis / searching on the server rather than locally
- automatically find "interesting fields"
- search
- analysis
Supply data to PDAP by volunteering or other sources
Verify submitted data → Request more info from a submitter
Provide an unprecedented breadth of data
Safely archive historic data for the foreseeable future
Understand the categorization structure
&#xNAN;We need to make sure the structure is future-proof, and establish policies for sortation that cannot easily be corrupted.