Skip to content

Latest commit

 

History

History
47 lines (34 loc) · 924 Bytes

Data Collection.md

File metadata and controls

47 lines (34 loc) · 924 Bytes
aliases
Data Acquisition

Definition

  • We call Data Collection (aka Acquisition) the process of finding and accessing new data sources
  • Happens outside the data warehouse/lake and may involve different organizations
  • Not to be confused with [[synonimns/Data Ingestion]], which is the process of filling our Data Warehouse/Lake with new data

Data Collection Examples

  • [[Reading Files]]
  • [[Data Crawling]]
  • [[Accessing Databases]]
  • [[Web API | Calling REST API]]
  • [[Consuming WebSockets]]

Not to be confused with [[Data Ingestion]]

  • Maintaining a [[ Distributed File System]]
  • Using a [[Distributed Message Queue]]
  • Using a[[ Publishing Subscribe System]]

Data Source Selection Criteria

[.column]

  • Credibility
  • Completeness
  • Accurateness
  • Verifiability
  • Currency
  • Accessibility

[.column]

  • Compliance
  • Cost
  • Legal issues
  • Security
  • Storage
  • Provenance