Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Key and value serializers for the same topic use the same Glue schema #93

Open
laxgoalie392 opened this issue Oct 19, 2021 · 7 comments
Labels
question Further information is requested research Needs research

Comments

@laxgoalie392
Copy link

Note: I'm coming from a world where we use the confluent schema registry so i apologize if i'm misunderstanding something.

Confluent schema registry registers 2 schemas for topics, one for values and one for keys. these schemas are postfixed with -value and -key respectively. It looks like the serializers support this by accepting an isKey argument. However, when i run my application, they both seem to use the same schema in Glue and the 2nd schema shows up as Failed. Am I doing something wrong or is this a bug?

AWSSchemaNamingStrategy.java#L31 seems to be the culprit. one function takes in isKey and then drops it

@blacktooth
Copy link
Contributor

Do you want to use data object to determine the second schema?

@blacktooth blacktooth added question Further information is requested PR welcome Users are welcome to submit PR research Needs research and removed PR welcome Users are welcome to submit PR labels Dec 8, 2021
@laxgoalie392
Copy link
Author

i assume it would be determined the same way as for message values. the important part is that they cant be registered to the same schema as your message keys can (and most likely do) have different schemas

@OneCricketeer
Copy link

OneCricketeer commented Jun 25, 2022

The only fix for this is to explicitly override the key converter strategy with your own class since no other one exists. Or, just not use structured types for the keys to begin with, e.g stay with numbers or strings

Otherwise data is effectively ignored in that default method, so both keys and values return transportName (the topic without any suffix)

https://github.com/awslabs/aws-glue-schema-registry/blob/master/common/src/main/java/com/amazonaws/services/schemaregistry/common/AWSSchemaNamingStrategyDefaultImpl.java

@mohitpali
Copy link
Contributor

You could use these settings as well to specifically provide a separate schema name for key and value -

key.converter.schemaName=<YOUR_KEY_SCHEMA_NAME>
value.converter.schemaName=<YOUR_VALUE_SCHEMA_NAME>

@OneCricketeer
Copy link

@laxgoalie392
Copy link
Author

laxgoalie392 commented Aug 2, 2022

We ended up creating a subclass of com.amazonaws.services.schemaregistry.common.AWSSchemaNamingStrategy and setting the schemaNameGenerationClass property.

import com.amazonaws.services.schemaregistry.common.AWSSchemaNamingStrategy

class ConfluentSchemaNamingStrategy extends AWSSchemaNamingStrategy {
  override def getSchemaName(transportName: String, data: Any, isKey: Boolean): String = s"$transportName-${if (isKey) "key" else "value"}"
  override def getSchemaName(transportName: String): String = s"$transportName-value"
}

still feels like a bug that if both your key and value are avro then it will try to register the two schemas under the same one in glue. it's pretty rare that we have a key that is avro but it has happened.

@kothapet
Copy link

You also probably need the flexibility to set the registry names for both key and value separately.
See issue I opened for dbezium connector and the workaround I am using.
#234

key.converter.registryName=<YOUR_KEY_REGISTRY_NAME> value.converter.registryName=<YOUR_VALUE_REGISTRY_NAME>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested research Needs research
Projects
None yet
Development

No branches or pull requests

5 participants