This section explains how to use the Collection Builder application to collect data from Sentinel and Landsat providers, how to use the surface reflectance processors, and how to publish the data sets.
If you have not read yet how to install or deploy the system, please refer to :doc:`installation` or :doc:`deploy` documentation.
The resource /api/radcor
is used to dispatch tasks for both data collect, processing and publish. These parameters can be defined as:
w
,s
,a
,d
: Bounding box limits. (Required when neitherscenes
ortiles
is set);
scenes
: List of direct scenes to collect from catalog (Optional).
tiles
: List of tiles to collect from catalog. Used Grid reference from Collection in Download. (Optional).
catalog
: Define the catalog to search;
catalog_args
: Customize the catalog kwargs. The options includesusername
,password
,progress
, etc. Default isunset
;
dataset
: The dataset name offered bycatalog
. See more in BDC-Collectors
start
: Start datetime;
end
: End datetime;
cloud
: Maximum Cloud cover factor. Default is100
;
action
: Argument to dispatch execution. The supported values are:
start
- which search and dispatches the matched scenes;preview
- search in catalog with given parameters and return the matched values;
tasks
: Define the intent execution and which the collection to store data. The supported values are:
download
- Tries to download data from remote server using the bdc-collectors and models bdc.collections_providers for download priorities;correction
- Apply Surface Reflectance Processor according the modelCollection.metadata
in BDC-Catalog;publish
- Publish the collection in database. This step also generates the bdc.quicklook and band indexes from bdc.bands;post
- Apply post processing step in datasets;harmonization
- Apply Data Harmonization on Landsat-5, Landsat-7, Landsat-8 and Sentinel-2 products using the module sensor-harm;The parameter
collection
is a key identifier for the given collection usingCollectionName-Version
. You must have collection inserted in your database to trigger a data collect. The tasks parameter can be nested in order to given an order of execution. For example, if you need todownload
data and then publish datapublish
, you can chain the tasks as following:"tasks": [ "type": "download", "collection": "LC8_DN-1", "args": {}, "tasks": [ { "type": "publish", "collection": "LC8_DN-1", "args": {}, "tasks": [] } ] ]This will evaluate the tasks into Celery Chain/Group Concept. If you chain values into tasks property as array, the values are designed as Celery Group. Otherwise, as a Celery Chain..
You can download a Sentinel 2 scene from the provider SciHub
with dataset S2MSI1C
using the following example:
curl -XPOST -H "Content-Type: application/json" \
--data '{
"w": -45.9,
"s": -12.74,
"n": -12.6,
"e": -45.8,
"catalog": "SciHub",
"dataset": "S2MSI1C",
"start": "2020-01-09T00:00:00",
"end": "2020-01-10T23:59:59",
"cloud": 100,
"action": "start",
"force": true,
"tasks": [
{
"type": "download",
"collection": "S2_L1C-1",
"args": {},
"tasks": [
{
"type": "correction",
"collection": "S2_L2A-1",
"args": {},
"tasks": [
{
"type": "publish",
"collection": "S2_L2A-1",
"args": {}
}
]
}
]
}
]
}' \
localhost:5000/api/radcor
The output of the above request can be seen below:
{
"S2A_MSIL1C_20200110T132231_N0208_R038_T23LLF_20200110T145523": {
"pathrow": "23LLF",
"sceneid": "S2A_MSIL1C_20200110T132231_N0208_R038_T23LLF_20200110T145523",
"type": "MSIL1C",
"date": "2020-01-10",
"cloud": 31.5879,
"footprint": "MULTIPOLYGON (((-45.834015 -13.655483318755016, -45.830658 -12.662710833236419, -46.841522 -12.657636587187465, -46.84897 -13.649996915046348, -45.834015 -13.655483318755016)))",
"tileid": "23LLF",
"size": "813.67 MB",
"link": "https://scihub.copernicus.eu/apihub/odata/v1/Products('9e16c509-06d5-4387-81e6-8d4f08f2ad72')/$value",
"icon": "https://scihub.copernicus.eu/apihub/odata/v1/Products('9e16c509-06d5-4387-81e6-8d4f08f2ad72')/Products('Quicklook')/$value"
},
"Results": 1
}
Note
The parameter "action": "start"
can be replaced by "action": "preview"
in order to perform just a query in the provider.
This option will not download the data but will show the found scenes in the provider.
You can check the status download container:
docker logs -f bdc-collection-builder-worker-download --tail 200
[2020-04-28 09:45:15,093: INFO/MainProcess] Received task: bdc_collection_builder.celery.tasks.download[5efed43b-b913-4877-b9e2-e97c3c9a8947]
[2020-04-28 09:45:16,220: INFO/ForkPoolWorker-2] Starting Download S2A_MSIL1C_20200110T132231_N0208_R038_T23LLF_20200110T145523 - bdc020...
[2020-04-28 09:45:17,598: INFO/ForkPoolWorker-2] Downloading image https://scihub.copernicus.eu/apihub/odata/v1/Products('9e16c509-06d5-4387-81e6-8d4f08f2ad72')/$value in /home/gribeiro/data/bdc-collection-builder/Repository/Archive/S2_MSI/2020-01/S2A_MSIL1C_20200110T132231_N0208_R038_T23LLF_20200110T145523.zip, user AtomicUser(bdc020, released=False), size 813 MB
Note
Depending the date
, you may face issues with the Long Term Archive,
which the Sentinel-2
scenes are offline
. By default, the BDC-Collection-Builder
will re-schedule offlines
scenes
to be executed in a hour
as following:
[2022-09-15 14:18:18,035: INFO/ForkPoolWorker-2] Starting Download Task for S2_L1C(id=4, scene_id=S2A_MSIL1C_20200110T132231_N0208_R038_T23LLG_20200110T145523) [2022-09-15 14:18:18,039: INFO/ForkPoolWorker-2] Trying to download from SciHub(id=5) [2022-09-15 14:18:19,644: INFO/ForkPoolWorker-1] Downloading 9e16c509-06d5-4387-81e6-8d4f08f2ad72 to /tmp/download_11kkzi7e_S2A_MSIL1C_20200110T132231_N0208_R038_T23LLF_20200110T145523/S2A_MSIL1C_20200110T132231_N0208_R038_T23LLF_20200110T145523.zip [2022-09-15 14:18:19,644: WARNING/ForkPoolWorker-1] Product 9e16c509-06d5-4387-81e6-8d4f08f2ad72 is not online. Triggering retrieval from long term archive. [2022-09-15 14:18:19,795: INFO/ForkPoolWorker-2] Downloading 64cdfd4f-2b92-442c-87a2-1a7728600dd7 to /tmp/download_2bv0k4nz_S2A_MSIL1C_20200110T132231_N0208_R038_T23LLG_20200110T145523/S2A_MSIL1C_20200110T132231_N0208_R038_T23LLG_20200110T145523.zip [2022-09-15 14:18:20,363: INFO/ForkPoolWorker-1] Task bdc_collection_builder.celery.tasks.download[64a19e93-f493-4f05-9e6b-5278ddfaecef] retry: Retry in 3600s: DataOfflineError('S2A_MSIL1C_20200110T132231_N0208_R038_T23LLF_20200110T145523') [2022-09-15 14:18:20,624: INFO/MainProcess] Received task: bdc_collection_builder.celery.tasks.download[dc5b849c-9f5b-4a8e-a965-45d522305fab] ETA:[2022-09-15 18:18:20.622378+00:00] [2022-09-15 14:18:20,627: INFO/ForkPoolWorker-2] Task bdc_collection_builder.celery.tasks.download[dc5b849c-9f5b-4a8e-a965-45d522305fab] retry: Retry in 3600s: DataOfflineError('S2A_MSIL1C_20200110T132231_N0208_R038_T23LLG_20200110T145523')
You can change the retry interval (in seconds) with env parameter TASK_RETRY_DELAY=3600
.
You can download a Landsat-8 scene using the following example:
curl -XPOST -H "Content-Type: application/json" \
--data '{
"w": -48.33171,
"s": -14.06716,
"e": -46.21973,
"n": -11.95925,
"catalog": "USGS",
"dataset": "landsat_ot_c2_l1",
"start": "2020-02-20T00:00:00",
"end": "2020-02-23T23:59:59",
"cloud": 100,
"action": "start",
"force": true,
"tasks": [
{
"type": "download",
"collection": "LC8_DN-1",
"args": {},
"tasks": [
{
"type": "publish",
"collection": "LC8_DN-1",
"args": {}
}
]
}
]
}' \
localhost:5000/api/radcor
You can check the status download container:
docker logs -f bdc-collection-builder-worker-download --tail 200
[2022-09-15 14:45:27,455: INFO/MainProcess] Received task: bdc_collection_builder.celery.tasks.download[f1445319-086d-46b1-9abf-6f1979ebd143]
[2022-09-15 14:45:27,456: INFO/MainProcess] Received task: bdc_collection_builder.celery.tasks.download[f040f03e-d5f6-4e7a-b6db-990fc7ea240a]
[2022-09-15 14:45:27,518: INFO/ForkPoolWorker-1] Starting Download Task for LC8_DN(id=2, scene_id=LC08_L1TP_221069_20200223_20200822_02_T1)
[2022-09-15 14:45:27,518: INFO/ForkPoolWorker-2] Starting Download Task for LC8_DN(id=2, scene_id=LC08_L1TP_221068_20200223_20200822_02_T1)
The resource /api/radcor/restart is responsible for restart any tasks in BDC-Collection-Builder.
TODO
In order to restart a failed task in Collection Builder, you must get the activity identifier (id
) on the table collection_builder.activities
.
For example, if you need to restart a Sentinel 2 download task which sceneid is S2A_MSIL1C_20200110T132231_N0208_R038_T23LLF_20200110T145523
, use the following commands:
Connect to database in docker:
docker exec -it bdc-collection-builder-pg psql -U postgres -d bdc
Use the following command to search by activity type downloadS2
and sceneid S2A_MSIL1C_20200110T132231_N0208_R038_T23LLF_20200110T145523
:
SELECT id, activity_type, collection_id, sceneid FROM collection_builder.activities
WHERE activity_type = 'download'
AND sceneid = 'S2A_MSIL1C_20200110T132231_N0208_R038_T23LLF_20200110T145523'
SELECT id, activity_type, collection_id, sceneid FROM collection_builder.activities
WHERE activity_type = 'publish'
AND sceneid = 'S2A_MSIL1C_20200110T132231_N0208_R038_T23LLF_20200110T145523'
After that, use the id
to restart a collection builder activity:
curl -XGET -H "Content-Type: application/json" localhost:5000/api/radcor/restart?ids=1
Note
If activity does not exists on database, you must dispatch a execution as mentioned in section Collecting Sentinel 2 L1C Images and Collecting Landsat-8 Level 1 Images.