Skip to content

CUMULUS-3960: Updated PostToCmr task to be able to republish granules #3906

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Jan 31, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,8 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
user-data for compatibility with Amazon Linux 2023 AMI
- Fixed `tf-modules/cumulus` scripts to use Instance Metadata Service V2
- Updated `fake-provider-cf.yml` to work for Amazon Linux 2023 AMI
- **CUMULUS-3960**
- Updated `PostToCmr` task to be able to `republish` granules
- **CUMULUS-3965**
- Updated `tf-modules/cumulus/ecs_cluster` and `fake-provider-cf.yml` launch templates to require IMDSv2
- **CUMULUS-3990**
Expand Down
22 changes: 22 additions & 0 deletions packages/cmrjs/src/cmr-utils.js
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,16 @@ const {

const log = new Logger({ sender: '@cumulus/cmrjs/src/cmr-utils' });

/**
* @typedef {{
* provider: string,
* clientId: string,
* username?: string,
* password?: string,
* token?: string
* }} CmrCredentials
*/

function getS3KeyOfFile(file) {
if (file.filename) return parseS3Uri(file.filename).Key;
if (file.filepath) return file.filepath;
Expand Down Expand Up @@ -233,6 +243,17 @@ async function publish2CMR(cmrPublishObject, creds, cmrRevisionId) {
throw new Error(`invalid cmrPublishObject passed to publis2CMR ${JSON.stringify(cmrPublishObject)}`);
}

/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason why this wasn't here before? did we never have the case of having to remove from CMR until now or was there just another way to do it (via API or something)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mainly for my curiousity I guess

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

api package has _removeGranuleFromCmr which does more stuff

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this is like a mini-function of that kinda where it only deletes, sounds good and pretty useful maybe 👍

* Remove granule from CMR.
*
* @param {string} granuleUR - the granuleUR
* @param {CmrCredentials} creds - credentials needed to post to CMR service
*/
async function removeFromCMR(granuleUR, creds) {
const cmrClient = new CMR(creds);
return await cmrClient.deleteGranule(granuleUR);
}

/**
* Returns the S3 object identified by the specified S3 URI and (optional)
* entity tag, retrying up to 5 times, if necessary.
Expand Down Expand Up @@ -1249,6 +1270,7 @@ module.exports = {
publish2CMR,
reconcileCMRMetadata,
removeEtagsFromFileObjects,
removeFromCMR,
updateCMRMetadata,
uploadEcho10CMRFile,
uploadUMMGJSONCMRFile,
Expand Down
2 changes: 2 additions & 0 deletions packages/cmrjs/src/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ const {
granulesToCmrFileObjects,
reconcileCMRMetadata,
removeEtagsFromFileObjects,
removeFromCMR,
updateCMRMetadata,
} = require('./cmr-utils');

Expand All @@ -34,6 +35,7 @@ module.exports = {
publish2CMR,
reconcileCMRMetadata,
removeEtagsFromFileObjects,
removeFromCMR,
granulesToCmrFileObjects,
updateCMRMetadata,
};
4 changes: 2 additions & 2 deletions tasks/post-to-cmr/.nycrc.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,6 @@
],
"statements": 94.0,
"functions": 88.0,
"branches": 97.0,
"branches": 88.0,
Copy link
Contributor Author

@jennyhliu jennyhliu Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the handler function is not covered as before, not sure how to improve this.

"lines": 94.0
}
}
2 changes: 2 additions & 0 deletions tasks/post-to-cmr/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ Config object fields:
| process | string | (required) | Process the granules went through
| stack | string | (required) | Name of deployment stack
| cmr | object | (required) | CMR credentials object
| concurrency | number | 20 | Maximum concurrency of requests to CMR
| republish | boolean | false | Whether to remove published granules from CMR and republish them again

### Input

Expand Down
48 changes: 39 additions & 9 deletions tasks/post-to-cmr/index.js
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
'use strict';

const keyBy = require('lodash/keyBy');
const pMap = require('p-map');
const cumulusMessageAdapter = require('@cumulus/cumulus-message-adapter-js');
const {
addEtagsToFileObjects,
granulesToCmrFileObjects,
metadataObjectFromCMRFile,
publish2CMR,
removeFromCMR,
removeEtagsFromFileObjects,
} = require('@cumulus/cmrjs');
const { getCmrSettings, getS3UrlOfFile } = require('@cumulus/cmrjs/cmr-utils');
Expand Down Expand Up @@ -85,6 +87,28 @@ function checkForMetadata(granules, cmrFiles) {
});
}

/**
* Remove granules from CMR
*
* @param {object} params - parameter object
* @param {Array<object>} params.granules - granules to remove
* @param {object} params.cmrSettings - CMR credentials
* @param {number} params.concurrency - Maximum concurrency of requests to CMR
* @throws {Error} - Error from CMR request
*/
async function removeGranuleFromCmr({ granules, cmrSettings, concurrency }) {
const granulesToUnpublish = granules.filter((granule) => granule.published || !!granule.cmrLink);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its considered published if it doesnt have a granule.cmrLink ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory published granule should have cmrLink, but we should be able to delete the granule from cmr if we don't have the link

await pMap(
granulesToUnpublish,
(granule) => removeFromCMR(granule.granuleId, cmrSettings),
{ concurrency }
);

if (granulesToUnpublish.length > 0) {
log.info(`Removing ${granulesToUnpublish.length} out of ${granules.length} granules from CMR for republishing`);
}
}

/**
* Post to CMR
*
Expand All @@ -108,7 +132,17 @@ function checkForMetadata(granules, cmrFiles) {
*/
async function postToCMR(event) {
const { cmrRevisionId, granules } = event.input;
const { etags = {} } = event.config;
const { etags = {}, republish = false, concurrency = 20 } = event.config;

const cmrSettings = await getCmrSettings({
...event.config.cmr,
...event.config.launchpad,
});

// if republish is true, unpublish granules which are public
if (republish) {
await removeGranuleFromCmr({ granules, cmrSettings, concurrency });
}

granules.forEach((granule) => addEtagsToFileObjects(granule, etags));

Expand All @@ -122,18 +156,14 @@ async function postToCMR(event) {

const startTime = Date.now();

const cmrSettings = await getCmrSettings({
...event.config.cmr,
...event.config.launchpad,
});

// post all meta files to CMR
const results = await Promise.all(
updatedCMRFiles.map((cmrFile) => publish2CMR(cmrFile, cmrSettings, cmrRevisionId))
const results = await pMap(
updatedCMRFiles,
(cmrFile) => publish2CMR(cmrFile, cmrSettings, cmrRevisionId),
{ concurrency }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stopOnError: true is default, if set to false, it will change current lambda behavior

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jonathan suggested to have the promise.allsettled behavior, which is adding stopOnError: true to pMap, like
{ concurrency, stopOnError: true }, but that changes the current behavior of post-to-cmr, unit tests fail, and out of the scope of this ticket.
The comment explains why it's not added.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohhh ok, that makes sense, thanks for lmk 🙌

);

const endTime = Date.now();

const outputGranules = buildOutput(
results,
granules
Expand Down
3 changes: 2 additions & 1 deletion tasks/post-to-cmr/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,8 @@
"@cumulus/cumulus-message-adapter-js": "2.3.0",
"@cumulus/errors": "19.1.0",
"@cumulus/launchpad-auth": "19.1.0",
"lodash": "^4.17.21"
"lodash": "^4.17.21",
"p-map": "^4.0.0"
},
"devDependencies": {
"@cumulus/cmr-client": "19.1.0",
Expand Down
10 changes: 10 additions & 0 deletions tasks/post-to-cmr/schemas/config.json
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,11 @@
}
}
},
"concurrency": {
"type": "number",
"description": "Maximum concurrency of requests to CMR",
"default": 20
},
"launchpad": {
"type": "object",
"description": "credentials needed to get launchpad token",
Expand All @@ -64,6 +69,11 @@
}
}
},
"republish": {
"type": "boolean",
"description": "Whether to remove published granules from CMR and republish them again.",
"default": false
},
"skipMetaCheck": {
"description": "Adds the option to allow PostToCMR to pass when processing a granule without a metadata file.",
"default": false,
Expand Down
62 changes: 61 additions & 1 deletion tasks/post-to-cmr/tests/cmr-test.js
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ test.serial('postToCMR throws error if CMR correctly identifies the xml as inval
}
});

test.serial('postToCMR fails when CMR is down', async (t) => {
test.serial('postToCMR fails to publish granules when CMR is down', async (t) => {
sinon.stub(cmrClient.CMR.prototype, 'getToken');
const { bucket, payload } = t.context;
const newPayload = payload;
Expand All @@ -133,6 +133,27 @@ test.serial('postToCMR fails when CMR is down', async (t) => {
}
});

test.serial('postToCMR fails to republish granules when CMR is down', async (t) => {
sinon.stub(cmrClient.CMR.prototype, 'getToken');
const newPayload = cloneDeep(t.context.payload);
newPayload.config.republish = true;
newPayload.config.concurrency = 2;
newPayload.input.granules[0].published = true;
newPayload.input.granules[0].cmrLink = randomString;

sinon.stub(cmrClient.CMR.prototype, 'deleteGranule').throws(new CMRInternalError());
t.teardown(() => {
cmrClient.CMR.prototype.deleteGranule.restore();
});

try {
await t.throwsAsync(postToCMR(newPayload),
{ instanceOf: CMRInternalError });
} finally {
cmrClient.CMR.prototype.getToken.restore();
}
});

test.serial('postToCMR raises correct error', async (t) => {
sinon.stub(cmrClient.CMR.prototype, 'getToken');
const { bucket, payload } = t.context;
Expand Down Expand Up @@ -190,6 +211,45 @@ test.serial('postToCMR succeeds with correct payload', async (t) => {
});
});

test.serial('postToCMR successfully republishes granules with correct payload', async (t) => {
const { bucket, payload } = t.context;
const newPayload = cloneDeep(payload);
newPayload.config.concurrency = 2;
newPayload.config.republish = true;
newPayload.input.granules[0].published = true;
newPayload.input.granules[0].cmrLink = randomString;

const granuleId = newPayload.input.granules[0].granuleId;
const cmrFileKey = `${granuleId}.cmr.xml`;

sinon.stub(cmrClient.CMR.prototype, 'deleteGranule');
sinon.stub(cmrClient.CMR.prototype, 'ingestGranule').callsFake(resultThunk);
t.teardown(() => {
cmrClient.CMR.prototype.deleteGranule.restore();
cmrClient.CMR.prototype.ingestGranule.restore();
});

await s3PutObject({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the other calls/functions but what does this s3PutObject do in the case of the test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

postToCMR function reads the cmr metadata file and ingest to CMR.

Bucket: bucket,
Key: cmrFileKey,
Body: fs.createReadStream(path.join(path.dirname(__filename), 'data', 'meta.xml')),
});

await validateInput(t, newPayload.input);
await validateConfig(t, newPayload.config);
const output = await postToCMR(newPayload);
await validateOutput(t, output);
t.is(output.granules.length, 1);
t.is(
output.granules[0].cmrLink,
`https://cmr.uat.earthdata.nasa.gov/search/concepts/${result['concept-id']}.echo10`
);
output.granules.forEach((g) => {
t.true(Number.isInteger(g.post_to_cmr_duration));
t.true(g.post_to_cmr_duration >= 0);
});
});

test.serial('postToCMR immediately succeeds using metadata file ETag', async (t) => {
const newPayload = cloneDeep(t.context.payload);
const granuleId = newPayload.input.granules[0].granuleId;
Expand Down