-
Notifications
You must be signed in to change notification settings - Fork 1
Data cubes for earth observation data
A data cube is a multidimensional array of values, typically used to represent data in a structured format for analysis and querying. While the number of dimensions can vary, three dimensions are most common. This is because much of the data we encounter is two-dimensional, such as spreadsheets or images. When you stack hundreds or thousands of these two-dimensional datasets—like satellite images taken over different months—you create a three-dimensional structure, with time serving as the third dimension. This stacked format resembles a cube, hence the name "data cube."
In a geographic context, the typical dimensions for width and length correspond to the x- and y-coordinates of latitude and longitude. Depending on the use case, the third dimension could be depth (e.g., when working with geologic or oceanographic data), altitude (e.g., when analyzing atmospheric data), or time, which is the leading third dimension in the context of earth observation.
Architecture of a data cube; Source: Kopp et al.
When observing the Earth, the amount of data collected and stored is enormous. The organized, array-based structure of a data cube facilitates access to the needed data.
Due to the organized structured by the grouped data, it is quick and easy for the user to find and retrieve data.
Users can make different kinds of queries:
- Request data from a specific location (e.g., a time series of observations from that area)
- Request data from a specific time (e.g., an image from that date).
The multidimensional structure enables complex analyses, such as time series analysis and multidimensional querying.
Data cubes are designed to handle large volumes of data, which is important in Earth observation where the datasets can reach petabytes in size. Their scalability ensures that they can accommodate growing datasets without sacrificing performance.
Data cubes can integrate data from various sources and sensors, harmonizing different formats and spatial resources. This is helpful for researchers who can then easily examine trends, patterns and relationships in the data over different dimensions, such as geographical location and time.
Data once stored and processed correctly in a data cube, can directly be used for computation, e.g. for machine learning.
The biggest challenge when working with data cubes is the sheer volume of data, which can reach petabytes in size. For storing those amounts of data locally a certain infrastructure is necessary of which the costs can be prohibitively high. Currently, it is most effective to use public commercial clouds like Amazon AWS, Google Cloud, or Microsoft Azure to optimize data storage. Additionally, data compression techniques for images can help manage storage needs.
Beyond storage solutions, decisions must be made regarding file formats (e.g., JPEG, GeoTIFF) and their respective advantages and disadvantages. It's also essential to plan how to process data within the data cube and consider its overall design for optimal performance.
Decisions regarding processing and design impact performance during computationally intensive tasks. Specific hardware and algorithms are often required to facilitate these computations effectively.
While a data cube with correctly stored data is practical, there are several challenges in data integration to become ‘analysis ready data’ (ARD):
- The data must be geometrically correct and in the appropriate coordinate system.
- Atmospheric corrections must be applied to eliminate distortions.
- Cloud masking is necessary to remove clouds, ensuring a clear view of the surface.
- Data from different sources must be standardized to ensure compatibility.
- The time dimension must be accurately represented.
Many users who want to work with data cubes may not possess the necessary technical skills or experience to effectively execute complex queries and data analyses. This can lead to inefficient use of the data and limit the organization's ability to analyze and make decisions based on the insights derived from the data.
Legal and ethical challenges in data cubes and Earth observation primarily revolve around privacy concerns and data protection regulations. Organizations must handle the collected data responsibly, consider individuals' privacy, and avoid misuse. To ensure that data cubes follow the FAIR principles, ISO 19123-1/19123-3 were instated and updated.
Data cubes have been effectively used for managing water resources by tracking changes in water extent, monitoring wetlands, and assessing flood risks.
Via Environmental Monitoring it is possible to perform time series analyses for monitoring forest cover, coastline changes and urban expansion. This is interesting for e.g. governments who want to track illegal activities.
By providing timely and easily accessible data, data cubes support disaster response efforts. For example, they have been used to track flood risks and assess the aftermath of extreme weather events.
This is consequently also important for the government and their political decisions in a natural disaster.
Developed by Geoscience Australia and hosted on the National Computational Infrastructure at the Australian National University, the Australian GeoScience Data Cube represents a significant advancement in Earth observation data management. This pixel-aligned collection comprises over 300,000 Landsat scenes from across Australia, all geometrically and spectrally calibrated to accurately reflect the Earth's surface. As the first continental Landsat data cube featuring overlapping temporal scenes, it paves the way for innovative analytical approaches.
Australia uses data from the Data Cube for various purposes, such as scientific research, improving community safety, exploring mineral resources, navigation, and supporting the country's prosperity.
The Australian Geoscience Data Cube laid the foundation for Digital Earth Australia (DEA). DEA is a newer initiative that provides a comprehensive platform, enabling users to access and analyze Earth observation data through user-friendly tools and services. This platform extends the utility of the data cube, offering insights for decision-making across sectors like environmental monitoring, agriculture, and disaster management.
DEA provides free satellite imagery amounting to nearly 1.5 petabytes. The data is 'Analysis Ready Data' (ARD), making analyses as easy as possible for potential users.
Digital Earth Africa (DEA) is the largest project under NASA's Committee on Earth Observation Satellites (CEOS) initiative, utilizing the Open Data Cube technology to create a comprehensive data cube for the African continent. It includes datasets from Landsat and Sentinel-1, offering a rich collection of satellite imagery and web services. Through DEA, users can explore spectral indices over time and perform interactive change detection using spectral and radar data. A notable example is the detection of changes in the Volta River area in central Ghana. The data is easily accessible via the Africa GeoPortal.
Achieving the Full Vision of Earth Observation Data Cubes; Kopp et al.
https://www.mdpi.com/2306-5729/4/3/94 (Stand: 06.10.2024)
OBSERVER: Data cubes: Enabling and facilitating Earth Observation applications
https://www.copernicus.eu/en/news/news/observer-data-cubes-enabling-and-facilitating-earth-observation-applications (Stand: 05.10.2024)
Earth observation data cubes for water resources management
https://www.space4water.org/s4w/web/news/earth-observation-data-cubes-water-resources-management (Stand: 06.10.2024)
Digital Earth Africa
https://www.digitalearthafrica.org (Stand: 06.10.2024)
ISO Norm 19123-1
https://committee.iso.org/sites/tc211/home/projects/projects---complete-list/iso-19123-1.html (Stand: 08.10.2024)
Australian GeoScience Data Cube
https://www.ga.gov.au/scientific-topics/dea/about/open-data-cube (Stand: 04.10.2024)
stac client for python:
import json # Zur Anzeige der Abfrageergebnisse import pystac import requests # Für Interaktion mit APIs
from pystac import Catalog, get_stac_version # Erweiterung von pystac zum Einbinden von bestehenden Catalogs from pystac_client import Client # Erweiterung von pystac u.a. zum suchen in STACs
root_catalog = Catalog.from_file('https://raw.githubusercontent.com/stac-utils/pystac/main/docs/example-catalog/catalog.json') root_catalog.describe() # Aufbau des Catalogs
print(f"ID: {root_catalog.id}") print(f"Title: {root_catalog.title or 'N/A'}") print(f"Description: {root_catalog.description or 'N/A'}")
collections = list(root_catalog.get_collections()) # get_collections() und weitere Func. im Handout erläutert print(f"Number of collections: {len(collections)}") # Anzahl der vorhandenen Collections print("Collections IDs:") for collection in collections: print(f"- {collection.id}")
items = list(root_catalog.get_all_items()) print(f"Number of items: {len(items)}") for item in items: print(f"- {item.id}")
item = root_catalog.get_item("LC80140332018166LGN00", recursive=True) # Einzelenes Item, weitere Benutzung im Folgenden
print(item.geometry) print(item.bbox) print(item.datetime) print(item.collection_id) item.get_collection() # Abfrage, zu welcher Collection das item gehört
print(item.common_metadata.instruments) print(item.common_metadata.platform) print(item.common_metadata.gsd)
for asset_key in item.assets: #.assets als Func zur Abfrage aller Assets eines Items asset = item.assets[asset_key] print('{}: {} ({})'.format(asset_key, asset.href, asset.media_type)) # asset-key,(..) werden in den String {},(..) eingesetzt
asset = item.assets['B3'] print(asset.to_dict()) # Ähnlich zur Abfrage mit .format
for asset_key in item.assets: asset = item.assets[asset_key] asset_url = asset.href file_name = asset_key + '.' + asset.media_type.split('/')[-1]
# Fragt die Daten von der API ab
response = requests.get(asset_url) # Nutzung der requests-Library
# Speichere die Datei
with open(file_name, 'wb') as f:
f.write(response.content)
print(f'{file_name} heruntergeladen.')
catalog_url = 'https://planetarycomputer.microsoft.com/api/stac/v1' client = Client.open(catalog_url) # Client interagiert mit API-Endpunkt (URL)
search = client.search( collections=['sentinel-2-l2a'], bbox=[-47.02148, -17.35063, -42.53906, -12.98314], datetime='2023-01-01/2023-01-31', limit = 10 )
items = list(search.items()) print(len(items)) print(items) item = items[5] print(f"Item ID: {item.id}") print(f"Item datetime: {item.datetime}")
for asset_key, asset in item.assets.items(): print(f"Asset Key: {asset_key}") print(f"Asset URL: {asset.href}") print(f"Asset Media Type: {asset.media_type}")
stac client for R:
install.packages("rstac") install.packages("sf") install.packages("terra") install.packages("tibble") library(terra) library(sf) library(tibble) library(rstac)
stac_url <- "https://planetarycomputer.microsoft.com/api/stac/v1"
s_obj <- stac(stac_url) str(s_obj)
get_request(s_obj)
s_obj %>% get_request()
conformance_classes <- s_obj %>% conformance() %>% get_request() conformance_classes
collections_query <- s_obj %>% collections()
collections_query %>% get_request()
stac_search( q = s_obj, collections = "usgs-lcmap-conus-v13", datetime = "2021-01-01/2021-12-31", limit = 10 ) %>% get_request()
ashe <- read_sf(system.file("shape/nc.shp", package = "sf"))[1, ] plot(st_geometry(ashe))
ashe_bbox <- ashe %>% st_transform(4326) %>% st_bbox() ashe_bbox
stac_query <- stac_search( q = s_obj, collections = "usgs-lcmap-conus-v13", bbox = ashe_bbox, datetime = "2021-01-01/2021-12-31", limit = 10 ) %>% get_request() stac_query
signed_stac_query <- items_sign( stac_query, sign_planetary_computer() # Authentifizierung beim Planetary Computer ) signed_stac_query
output_directory <- "C:/Users/lraeu/OneDrive/Desktop/Geosoftware II/geosoft2-2024/data" assets_download(signed_stac_query, "lcpri", output_dir = output_directory, overwrite = TRUE) output_file <- file.path("C:/Users/lraeu/OneDrive/Desktop/Geosoftware II/geosoft2-2024/data/lcmap/CU/V13/025011/2021/LCMAP_CU_025011_2021_20220721_V13_CCDC/LCMAP_CU_025011_2021_20220629_V13_LCPRI.tif") %>% rast() plot(output_file) rast("C:/Users/lraeu/OneDrive/Desktop/Geosoftware II/geosoft2-2024/data/B1.tiff")
ashe %>% st_transform(st_crs(output_file)) %>% st_geometry() %>% plot(add = TRUE, lwd = 3)