Skip to content

Conversation

@sharkySharks
Copy link

@sharkySharks sharkySharks commented Jan 31, 2024

Description

While we assert in the product schema that the sku field is not null, it should also not be an empty string. Recently ran into some customer issues where an empty sku field left events unprocessed. This schema needs to be updated in the snowplow console, but I will wait for feedback here before making any schema changes. Also interested in the process for updating this schema within snowplow.

Related Issue

Please see internal adobe jira issue DSCS-863.

Motivation and Context

sku is a required field, which means it should also not be empty.

How Has This Been Tested?

Downstream processing pipelines for recommendations were breaking because the sku field was empty for some events for some customers. In those pipelines queries that assumed the sku would not be empty on product events have been updated to explicitly remove empty strings as well as the null values so that the pipelines do not error with missing sku information. Twofold, schema validation should also handle empty product event sku information so that events that do not meet our expectations fail fast and return accurate assumptions to our customers about which product information may be missing required sku data.

Screenshots (if appropriate):

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • I have signed the Adobe Open Source CLA.
  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly. - I believe the schema is self documenting, but happy to update other docs if necessary
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@@ -174,4 +175,4 @@
"version": "2-0-5"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this version will also need to be bumped up accordingly

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am guessing this is a patch bump to 2-0-6 or would be consider this a minor change to 2-1-0 since this could start rejecting events from stores where the sku field is empty? Also not entirely clear if schema changes are entirely manual and this file is used to reflect what is up in snowplow.

@aniham
Copy link
Collaborator

aniham commented Feb 1, 2024

@sharkySharks I like the idea and glad we are doing this.

The file you're changing here is for documentation only. The correct steps to actually do what you're trying to do are below:

  1. Go into the Snowplow console and create a new version of the Product schema in QA. 2.1.0 is correct - this is a breaking change that will result in many more events ending up in the Bad Bucket.
  2. Test locally against Snowplow QA by changing the schema URL to point to new version:
    https://github.com/adobe/commerce-events/blob/main/packages/storefront-events-collector/src/schemas.ts#L9
  3. Publish Snowplow changes to Prod
  4. Submit PR here that changes docs AND schema file above ^^ to point to new URL

Let me know if any of this is unclear!

@sharkySharks
Copy link
Author

@aniham
Copy link
Collaborator

aniham commented Feb 2, 2024

@sharkySharks yes for QA you have to be on VPN, sorry forgot to mention earlier

@@ -9,7 +9,8 @@
},
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more I am thinking about this, should we also add minLength to the name field since it is also required?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why "name" is a required field. I don't think we have any filters related to that in the code today. If we do, we can consider adding minLength otherwise I'd leave it as is

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we are unsure of the effects then I'll just keep the minLength added to sku and we can revisit name at a different point. I'm not sure we can be sure of all of the ways adding minLength to name will change things 😬

@sharkySharks
Copy link
Author

For context on the version number, snowplow did not allow for the minor version to be set to 2-1-0, this is the message that I received and decided to go with the patch bump version:

Screenshot 2024-02-02 at 11 17 15 AM

@sharkySharks
Copy link
Author

sharkySharks commented Feb 2, 2024

Update on testing, I was able to test this out in the qa environment locally, sent productPageView and addToCart events from localhost, saw them end up in the good bucket in qa kibana. Nothing in the bad bucket from those events. Not sure if other events need to be tested as well.

I think we should also consider minLength for the required name field as well.

Also, I do not have production promotion permissions in the snowplow console as far as I can tell. But latest code here is what I tested against.

@aniham
Copy link
Collaborator

aniham commented Feb 5, 2024

@sharkySharks this sounds right. I think you tested the right things though you should probably do a search through the repo to make sure no other events are using the product context (or if they are, to test those as well).

Next steps will be:

  • merge new snowplow schema version to prod (push button)
  • merge your code here to QA
  • have shirley and vinod run their smoke tests and verify all is good in QA

After that is done, you should be able to publish your change here to prod!

@aniham
Copy link
Collaborator

aniham commented Feb 5, 2024

@sharkySharks note that once this change goes into prod, we're going to see a uptick of events going to the bad bucket - that is the correct behavior and what we want ofc - but worth keeping an eye out and doing some reporting on what changed.

@krithikachandran
Copy link

For context on the version number, snowplow did not allow for the minor version to be set to 2-1-0, this is the message that I received and decided to go with the patch bump version:

Screenshot 2024-02-02 at 11 17 15 AM

I think it should be 3-0-0 since it's a breaking change. That way, we can roll it back easily if something goes wrong

@sharkySharks
Copy link
Author

@krithikachandran I am good with using the major version upgrade, particularly if we think it is going to be a breaking change even if it is a bug fix. I will redeploy it out with that version and test.

@aniham I don't have permissions to push the schema to prod when it is ready. I don't see that button, if you could check my permissions I can do that after I test the major version upgrade.

@aniham
Copy link
Collaborator

aniham commented Feb 5, 2024

@sharkySharks you're now admin in Snowplow - should be able to merge!

@sharkySharks
Copy link
Author

Ran both positive and negative tests against the new schema 3-0-0.

Here is what happens in the negative case:

Screenshot 2024-02-06 at 6 03 27 PM

migrated the schema to production in snowplow.

Screenshot 2024-02-06 at 6 06 01 PM

If all looks good to you here I'll wait for a final sign off on the latest schema version update on this pr and then go through QA testing tomorrow for the app.

@sharkySharks
Copy link
Author

Hey folks, I've updated this pr, I have not tested this yet as I want to get feedback on the potential changes first and then move forward with making the changes in snowplow for qa and run through all the testing paths once there is agreement on these changes.

I went ahead and added minLength to required fields for string and array fields that did not also include null in their accepted types. I have not done the work to verify what fields should still be required or not, I just blanketed all required string and array fields with minLength for schemas in this repo.

This is coming out of numerous outages this year where required fields were empty and broke various parts of ML pipeline processing.

@sharkySharks sharkySharks changed the title DSCS-863: add minLength to product schema DSCS-863: add minLength to all required schema fields, where applicable Feb 22, 2024
"query": {
"type": "string"
"type": "string",
"minLength": 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this maps to phrase in productSearch request. If so, there are cases where it could be empty (when browsing a category page, gql query filters on a category but no phrase is provided).

"properties": {
"code": {
"type": "string"
"type": "string",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have rarely seen finalAdjustments being set in price data. Doesn't look like it's marked as required, we can remove this

"name": {
"type": "string"
"type": "string",
"minLength": 1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens here if the search returns no results? We might still need the events but with empty fields?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants