diff --git a/modules/ROOT/nav.adoc b/modules/ROOT/nav.adoc index e218c39..581aa15 100644 --- a/modules/ROOT/nav.adoc +++ b/modules/ROOT/nav.adoc @@ -1,9 +1,5 @@ .Processing data -* Change Data Capture (CDC) -** xref:use-cases-architectures:change-data-capture/index.adoc[] -** xref:use-cases-architectures:change-data-capture/table-schema-evolution.adoc[] -** xref:use-cases-architectures:change-data-capture/consuming-change-data.adoc[] -** xref:use-cases-architectures:change-data-capture/questions-and-patterns.adoc[] +* xref:cdc-for-cassandra:ROOT:cdc-concepts.adoc[Change Data Capture (CDC)] * xref:astra-streaming:getting-started:real-time-data-pipelines-tutorial.adoc[Build real-time data pipelines with {product}] .Migrating to {pulsar} diff --git a/modules/ROOT/pages/index.adoc b/modules/ROOT/pages/index.adoc index 8806857..4d4de4d 100644 --- a/modules/ROOT/pages/index.adoc +++ b/modules/ROOT/pages/index.adoc @@ -17,7 +17,7 @@ Learn about the amazing things you can do with all of our streaming products. We've included best practices for Apache Pulsar, a full connector reference, - and examples for getting the most out of Astra's CDC feature. + and examples for getting the most out of CDC. xref:astra-streaming:getting-started:real-time-data-pipelines-tutorial.adoc[Build real-time data pipelines with {product}] @@ -36,7 +36,7 @@ Delivered as a fully-managed SaaS with boundless scale, massive throughput and low latency, Astra Streaming is the simplest way to modernize your event driven architecture and turbo charge your data in motion strategy. - xref:astra-streaming:getting-started:index.adoc[Astra Streaming quickstat] + xref:astra-streaming:getting-started:index.adoc[Astra Streaming quickstart] @@ -120,10 +120,11 @@

xref:astra-streaming:developing:astream-cdc.adoc[Change Data Capture]

- Change Data Capture (CDC) for Astra Streaming enables you to send data - changes in real time throughout your entire ecosystem. With a wide range of - connectors to data warehouses, messaging systems, data lakes as well as client - libraries, you can send your data wherever it needs to go in real time. + Change Data Capture (CDC) enables you to send data + changes in real time throughout your entire ecosystem. + With a wide range of connectors to data warehouses, + messaging systems, data lakes as well as client libraries, + you can send your data wherever it needs to go in real time. xref:astra-streaming:developing:astream-cdc.adoc[] diff --git a/modules/use-cases-architectures/pages/change-data-capture/consuming-change-data.adoc b/modules/use-cases-architectures/pages/change-data-capture/consuming-change-data.adoc deleted file mode 100644 index 56a295b..0000000 --- a/modules/use-cases-architectures/pages/change-data-capture/consuming-change-data.adoc +++ /dev/null @@ -1,38 +0,0 @@ -= Consuming change data with {pulsar-reg} -:navtitle: Consuming change data -:description: This article describes how to consume change data with {pulsar-reg}. -:csharp: C# - -[NOTE] -==== -This article is a continuation of the xref:change-data-capture/index.adoc[] article. Please read that article first to understand the fundamentals of what resources are being used. -==== - -== {pulsar-short} clients - -Each client handles message consumption a little differently but there is one overall pattern to follow. As we learned in the previous sections, a CDC message will arrive as an Avro GenericRecord of type KeyValue. Typically, the first step will be to separate the key and value portions of the message. You will find the {cass-short} table's key fields in the key of the record and the change data in the value portion of the record. Both of which are Avro records themselves. From there you'll want to deserialize the Avro record and extract the interesting info. - -Below are example implementations for each runtime consuming messages from the CDC data topic. - -While these examples are in the `astra-streaming-examples` repository, they are not {product}-specific. -You can use these examples to consume CDC data topics in your own {cass-short}/{pulsar-short} clusters. - -* svg:common::icons/logos/csharp.svg[role="icon text-xl",name="C#"] https://github.com/datastax/astra-streaming-examples/blob/master/csharp/astra-cdc/Program.cs[{csharp} CDC project example] -* svg:common::icons/logos/go.svg[role="icon text-xl",name="Go"] https://github.com/datastax/astra-streaming-examples/blob/master/go/astra-cdc/main/main.go[Golang CDC project example] -* svg:common::icons/logos/java.svg[role="icon text-xl",name="Java"] https://github.com/datastax/astra-streaming-examples/blob/master/java/astra-cdc/javaexamples/consumers/CDCConsumer.java[Java CDC consumer example] -* svg:common::icons/logos/nodejs.svg[role="icon text-xl",name="Node.js"] https://github.com/datastax/astra-streaming-examples/blob/master/nodejs/astra-cdc/consumer.js[Node.js CDC consumer example] -* svg:common::icons/logos/python.svg[role="icon text-xl",name="Python"] https://github.com/datastax/astra-streaming-examples/blob/master/python/astra-cdc/cdc_consumer.py[Python CDC consumer example] - -== {pulsar-short} functions - -It is very common to have a function consuming the CDC data. Functions usually perform additional processing on the data and pass it to another topic. Similar to a client consumer, it will need to deserialize the message data. Below are examples of different functions consuming messages from the CDC data topic. - -While these examples are in the `astra-streaming-examples` repository, they are not {product}-specific. You can use these examples to consume CDC data topics in your own {cass-short}/{pulsar-short} clusters. - -* svg:common::icons/logos/go.svg[role="icon text-xl",name="Go"] https://github.com/datastax/astra-streaming-examples/blob/master/go/astra-cdc/main/main.go[Golang CDC project example] -* svg:common::icons/logos/java.svg[role="icon text-xl",name="Java"] https://github.com/datastax/astra-streaming-examples/blob/master/java/astra-cdc/javaexamples/functions/CDCFunction.java[Java CDC function example] -* svg:common::icons/logos/python.svg[role="icon text-xl",name="Python"] https://github.com/datastax/astra-streaming-examples/blob/master/python/cdc-in-pulsar-function/deschemaer.py[Python CDC function example] - -== See also - -* xref:use-cases-architectures:change-data-capture/questions-and-patterns.adoc[] \ No newline at end of file diff --git a/modules/use-cases-architectures/pages/change-data-capture/index.adoc b/modules/use-cases-architectures/pages/change-data-capture/index.adoc deleted file mode 100644 index acc3434..0000000 --- a/modules/use-cases-architectures/pages/change-data-capture/index.adoc +++ /dev/null @@ -1,90 +0,0 @@ -= Change Data Capture (CDC) pattern with {cass-reg} and {pulsar-reg} -:navtitle: Overview -:description: This article describes how to capture changes in an {cass-reg} database and publish them to {pulsar-reg} as events. - -Change Data Capture (CDC) is a design pattern used in software development to capture and propagate changes made to data in a system. The CDC pattern is commonly used in real-time data streaming applications to enable near-real-time processing of data changes. In a typical CDC implementation, a change to a row of data (insert, update, delete) is detected and recorded. The change (or mutation) is made available to downstream systems as an event for further processing. This allows applications to react quickly to changes in the data while not adding unneeded load on the data store, enabling real-time data processing and analytics. - -Before we get into the specifics of CDC, let’s first understand the resources needed to complete the flow. - -[TIP] -==== -Throughout this document, "Source Connector" will refer to the CDC Source Connector component, while "source connector" refers to any other source connector in a {pulsar-short} deployment. - -For more information about the CDC Source Connector component, see the xref:cdc-for-cassandra:ROOT:index.adoc[{company} CDC for {cass-reg} connector documentation]. -==== - -== {pulsar-reg} source connectors - -Source connectors in {pulsar} are responsible for ingesting data from external sources into the {pulsar-short} system. They can be used to collect data from a variety of sources including databases, message queues, and file systems. When the source connector “sees” data, it streams the data to a {pulsar-short} topic. This enables users to easily integrate data from disparate sources into their {pulsar-short}-based applications. Source connectors make it easy to ingest, process, and analyze large volumes of data from a variety of sources into {pulsar-short}. - -{pulsar-short} offers extensible APIs where developers can use a defined interface to develop their own connector. The interface takes much of the boilerplate burdens away from a developer and gets them right to the purpose of the connector. Creating a connector means adding in the know-how to work with data from the source and adapt it to produce a compliant message with the {pulsar-short} client. - -As you’ll learn in the next section, among the processes needed to capture change data, the {cass-short} Source Connector is a very important part. To run a source connector, you provide configuration about what data will be selected, how to connect with the upstream system, and the destination topic for the new message. The source connector takes care of producing the message. {pulsar-short} source connectors run as {pulsar-short} functions within the cluster, so many of the features of functions apply (like the number of instances to run and how to configure the function instance running environment). Metrics and logs for a source connector are automatically made a part of the cluster. - -=== Monitoring source connectors - -Monitoring a source connector includes two areas: health and performance. Every connector in {pulsar-short} emits basic metrics about its health, including stats like the number of records received from the source, and the number of messages written to the destination topic. Connectors also emit debugging metrics like the number of exceptions thrown by the source. Performance metrics include health metrics as well as specific knowledge about the source. Refer to the https://pulsar.apache.org/docs/reference-metrics/#connectors[connectors area of {pulsar-short} metrics] for a complete list and explanation of metrics. - -=== Source connector logs - -Most {pulsar-short} source connectors emit logs that show lifecycle events as well as custom events specific to the connector type. All logs are handled the same way core cluster logs are handled. By default, they are written to the console and collected by log4j destinations. If you are using function workers, you can access log files on their disk. Refer to {pulsar-short}'s https://pulsar.apache.org/docs/io-debug/[connector debugging guide] for more information. - -== {pulsar-short} schemas and the schema registry - -The {pulsar} schema registry is a feature of a {pulsar-short} cluster that manages the schemas of messages sent and received on {pulsar-short} topics. In {pulsar-short}, messages are stored as bytes. Schemas provide a way to serialize and deserialize messages with a particular structure or type, allowing for interoperability between different systems. - -The schema registry in {pulsar-short} stores and manages schema definitions for all message types sent and received in {pulsar-short}. The schema registry enforces schema compatibility rules, such as requiring a producer to send messages that conform to a certain schema, or rejecting messages that don't match the schema. - -Schemas follow a primitive or complex type. Primitive schemas are simple data types like bool, int, string, and float. Because {pulsar-short} is written in Java that is where the primitives are based. When a different client runtime is used, a conversion may need to be done. Refer to the https://pulsar.apache.org/docs/schema-understand/#primitive-type[{pulsar-short} primitive types table] for a full reference. - -Complex schemas introduce a more structured way of messaging. The two types of complex messages are KeyValue and Struct. KeyValue is JSON formatted text that offers a separation of custom labels and their values. Struct is a custom class definition set as Avro, Json, or Protobuf. - -KeyValue offers an interesting way to encode a message called “Separated”. This option separates a message key and the message payload. This in turn has the option to store message key information as a different data type than the message payload. It also offers special compression capabilities. CDC takes advantage of separating KeyValue messages when it produces both the event and data topic. - -=== Namespace schema configurations - -In the context of CDC there are a few schema configurations of note. All of these are specific to the namespace where the event and data topics are logically located. - -- *schema-compatibility-strategy*: this setting instructs the {pulsar-short} broker how to handle new schemas introduced to existing topics by producers. This is relevant to CDC when a table's design is changed. For example, if a new column is added, the registered schema is changed to include that new value. The chosen schema-compatibility-strategy decides if the namespace will allow this. If schema validations are enabled, this option decides what strategy is used. {pulsar-short}'s default strategy is "FULL" which means existing optional table columns can be modified. Learn more about the different types of strategies in the https://pulsar.apache.org/docs/next/schema-understand/#schema-compatibility-check-strategy[{pulsar-short} docs]. - -- *allow-auto-update-schema*: given the compatibility strategy, this setting is a flag that determines if an update to the schema is generally allowed. CDC sets this to ‘true’ so changes in a table’s design can automatically propagate to the topic. - -- *schema-autoupdate-strategy*: when auto update is enabled (set to true) this setting directs the Broker how to ensure consumers of a topic are able to process messages. If a consumer attempts to connect with a schema that does not match the current schema, this strategy will decide if it is allowed to receive messages. CDC sets this to 'BACKWARDTRANSITIVE', which means if optional table columns have been added or a column has been removed, the old schema is allowed. - -- *schema-validation-enforce*: this flag limits how producers and consumers are allowed to be configured. When enabled (true) producer and consumer clients must have a schema set before sending the message. When disabled (set to false) {pulsar-short} will allow producers and consumers without a set schema to send or receive messages. CDC disables this option (set to false), so producers and consumers do not have to know the message schema ahead of time. - -== {cass-short} change data capture (CDC) agent and {cass-short} Source Connector for {pulsar} - -The {cass-short} CDC agent is a process running on each node in a {cass-short} cluster that watches for data changes on tables that have enabled the CDC feature. Using {cass-short}'s https://cassandra.apache.org/doc/4.0/cassandra/configuration/cass_yaml_file.html#commitlog_sync[commitlog_sync option], the agent periodically syncs a separate log in a special “cdc_raw” directory. Each log entry is a CDC event. The CDC agent creates a new event message containing the row coordinates of the changed data and produces the message to a downstream {pulsar-short} cluster. - -Each table that has CDC enabled also has a corresponding Source Connector in {pulsar-short}. This is unlike the CDC agent where the process runs on each {cass-short} node, keeping a log of all table changes. Each table-specific Source Connector subscribes to the events topic the agent is producing messages to. When the connector “sees” a message for its table, it uses the row coordinates within the message to retrieve the mutated data from {cass-short} and create a new message with the specifics. That new message is written to a data topic where others can subscribe and receive CDC messages. - -=== Event deduplication - -A particular advantage in the Source Connector is its deduplication feature. You might have read about {pulsar-short}'s built in https://pulsar.apache.org/docs/2.11.x/concepts-messaging/#message-deduplication[deduplication capabilities] - this is *not* utilized in the message flow because CDC needs a finer grain control to detect duplicates. As the CDC agent discovers a new commit log, an authentic identifier is created using the MD5 hash algorithm. That key identifier is added to the event message. - -When message consumers like the Source Connector connect to the event topic, they establish a subscription type. {pulsar-short} has 4 types of subcriptions: exclusive, shared, failover, and key_shared. In a typical CDC flow, the Source Connector will have multiple instances running in parallel. When multiple consumers are a part of a key_shared subscription, {pulsar-short} will deliver a duplicate hash key to the same consumer no matter how many times it’s sent. - -When a {cass-short} cluster has multiple hosts (with multiple commit logs), and they all use the same mutation to calculate the same hash key, the same consumer will always receive it. Each Source Connector keeps a cache of hashes it has seen and ensures duplicates are dropped before producing the data message. - -Learn more about {pulsar-short}'s key_shared subscription type in the https://pulsar.apache.org/docs/2.11.x/concepts-messaging/#key_shared[{pulsar-short} documentation]. - -== Putting together the CDC flow - -Now that you understand the different resources used in this CDC pattern, let’s follow the flow to see how a CDC message is produced. - -. Create a {pulsar-short} tenant to hold CDC messages. -.. Create a namespace (or use the “default”). -.. Create a topic for event messages. -.. Create a topic for data messages. -. Start the CDC source connector in {pulsar-short} by setting the destination topic (aka the data topic), the event topic, and {cass-short} connection info (along with other settings). -. Configure the {cass-short} change agent with a working directory and the {pulsar-short} service URL (along with other settings) in the {cass-short} node (restart is required). -. Create a {cass-short} table and enable CDC. -. Insert a row of data into the table. -.. The change agent will detect a mutation to the table and write a log. -.. The log will be converted to an event message and written to the events topic. -.. The source connector will complete the flow by producing a final change message to the data topic. - -== Next steps - -* xref:use-cases-architectures:change-data-capture/table-schema-evolution.adoc[] \ No newline at end of file diff --git a/modules/use-cases-architectures/pages/change-data-capture/questions-and-patterns.adoc b/modules/use-cases-architectures/pages/change-data-capture/questions-and-patterns.adoc deleted file mode 100644 index f933483..0000000 --- a/modules/use-cases-architectures/pages/change-data-capture/questions-and-patterns.adoc +++ /dev/null @@ -1,107 +0,0 @@ -= CDC questions and patterns with {cass-reg} and {pulsar-reg} -:navtitle: Questions and patterns -:description: This article describes how table schema changes are handled in the {cass-reg} Connector for {pulsar-reg}. - -We have collected common questions and patterns from our customers that are using CDC. We hope this will help you in your journey of getting the most out of this feature. Please also refer to the xref:cdc-for-cassandra:ROOT:faqs.adoc[CDC for {cass-short} FAQs] in the official documentation for more information. - -.How do I know if CDC is enabled on a table? -[%collapsible] -==== -You can check the CDC status of a table by running the following CQL query: - -`SELECT * FROM system_distributed.cdc_local WHERE keyspace_name = 'keyspace_name' AND table_name = 'table_name';` - -If the CDC status is `enabled`, then CDC is enabled on the table. If the CDC status is `disabled` then CDC is disabled on the table. If the CDC status is `null` then CDC is not enabled on the table. - -If the CDC status is `null`, then you can enable CDC on the table by running the following CQL query: - -`ALTER TABLE keyspace_name.table_name WITH cdc = {'enabled': true};` - -If the CDC status is `enabled`, then you can disable CDC on the table by running the following CQL query: - -`ALTER TABLE keyspace_name.table_name WITH cdc = {'enabled': false};` - -If the CDC status is `disabled`, then you can enable CDC on the table by running the following CQL query: - -`ALTER TABLE keyspace_name.table_name WITH cdc = {'enabled': true};` -==== - -.How do I know if the {cass-short} agent is running? -[%collapsible] -==== -You can check the status of the {cass-short} agent by running the following CQL query: - -`SELECT * FROM system_distributed.cdc_local WHERE keyspace_name = 'cdc' AND table_name = 'raw_cdc';` - -The `status` column will be `running` if the agent is running. If the `status` column is `null` then the agent is not running. If the `status` column is `stopped` then the agent is not running. - -If the `status` column is `stopped` then you can start the agent by running the following CQL query: - -`ALTER TABLE cdc.raw_cdc WITH cdc = {'enabled': true};` - -If the `status` column is `null` then you can start the agent by running the following CQL query: - -`ALTER TABLE cdc.raw_cdc WITH cdc = {'enabled': true};` - -If the `status` column is `running` then you can stop the agent by running the following CQL query: - -`ALTER TABLE cdc.raw_cdc WITH cdc = {'enabled': false};` -==== - -.What happens to unacknowledged event messages the {cass-short} agent can’t deliver? -[%collapsible] -==== -Unacknowledged messages mean the CDC agent was not able to produce the event message in {pulsar-short}. If this is the case the table row mutation will fail which the {cass-short} client will then see an exception. So data will get committed to {cass-short} and no event will be created. - -Another scenario might be the {pulsar-short} broker is too busy to process messages and a backlog has been created. In this case, {pulsar-short}'s backlog policies take effect and event messages are handled accordingly. The data will be committed to {cass-short} but there might be some additional latency to the event message creation. - -The design of CDC in {cass-short} assumed that when table changes are sync’d to the raw_cdc log, another process will be draining that log. There is a max log size setting that will disable writes to the table when the set threshold is reached. If a connection to the {pulsar-short} cluster is needed for the log to be drained, and it’s not responsive, the log will being to fill, which can impact a table’s write availability. - -For more, see the xref:cdc-for-cassandra:ROOT:install.adoc#scaling-up-your-configuration[Scaling up your configuration] section in the official documentation. -==== - -.Does the {cass-short} Source Connector use a dead-letter topic? -[%collapsible] -==== -A dead letter topic is used when a message cannot be delivered to a consumer. Maybe the message acknowledgment time expired (no consumer acknowledged receipt of the message), or a consumer negatively acknowledged the message, or a retry letter topic is in use and retries were exhausted. - -The {cass-short} Source Connector creates a consumer to receive new event messages from the CDC agent, but does not configure a dead letter topic. It is assumed that parallel instances, broker compute, and function worker compute will be sized to handle the workload. -==== - -.How do I scale CDC to handle my production loads? -[%collapsible] -==== -There are 3 areas of scalability to focus on. First are the hosts in the {cass-short} cluster. The CDC agent is running on each host in its own JVM. If you are administering your own {cass-short} cluster, then you can tune the JVM compute properties to handle the appropriate workload. If you are using {cass-short} in a serverless environment, then the JVM is already set to handle significant load. - -The second area of focus is the number of {cass-short} Source Connector instances running. This is initially set when the Source Connector is created, and can be updated throughout the life of the running connector. Depending on your {pulsar-short} configuration, an instance can represent a process thread on the broker or a function worker. If using Kubernetes, this could be a pod. Each represents different scaling strategies like increasing compute, adding more workers, and more K8s nodes. - -Finally, the third area focuses on managing the broker backlog size and throughput tolerances. There are potentially a large amount of messages being created, so you must ensure the {pulsar-short} cluster is sized correctly. Our Luna Streaming xref:luna-streaming:install-upgrade:production-cluster-sizing.adoc[] can help you understand this better. -==== - -.I want to filter table data by column -[%collapsible] -==== -Transformation functions are a great way to manipulate messages on CDC data (with no code required!) Put them inline to watch the data topic and write to a different topic. Call the topic something memorable like "filtered-data" topic. - -Learn more about transformation functions xref:streaming-learning:functions:index.adoc[here]. -==== - -.Multi-region CDC using the {cass-short} sink -[%collapsible] -==== -One of the requirements of CDC is that both the {cass-short} and {pulsar-short} clusters need to be in the same cloud region (or on-premise data center). If you are using geo-replication, you need the change data to be replicated across multiple clusters. The most manageable way to handle this is to use {pulsar-short}’s {cass-short} sink to "watch" the CDC data topic and write the change to a different {cass-short} table (in another Org). - -The {cass-short} sink requires the following provisions: - -- Use the CDC data topic as its source of messages -- Provide a secure bundle (creds) to another {cass-short} cluster -- Map message values to a specific table in the other cluster -- Use {pulsar-short}’s delivery guarantee to ensure success -- Use {pulsar-short}’s connector health metrics to monitor failures -==== - -.Migrating table data using CDC -[%collapsible] -==== -Migrating data between tables solves quite a few different challenges. The basic approach is to use a {cass-short} sink to watch the {cass-short} source and write to another table while mapping columns appropriately. As the original table is phased out, the number of messages will decrease to none, while consumers are watching the new table's CDC data topic. Refer to the "Multi-region CDC" question above for more detail. -==== \ No newline at end of file diff --git a/modules/use-cases-architectures/pages/change-data-capture/table-schema-evolution.adoc b/modules/use-cases-architectures/pages/change-data-capture/table-schema-evolution.adoc deleted file mode 100644 index b96604a..0000000 --- a/modules/use-cases-architectures/pages/change-data-capture/table-schema-evolution.adoc +++ /dev/null @@ -1,65 +0,0 @@ -= {cass-reg} table schema evolution with CDC -:navtitle: Table schema evolution -:description: This article describes how table schema changes are handled in the {cass-reg} Connector for {pulsar-reg}. - -[NOTE] -==== -This article is a continuation of the xref:change-data-capture/index.adoc[] article. Please read that article first to understand the fundamentals of what resources are being used. -==== - -The message schema is of particular importance in completing the CDC pattern. Initially, it is set to match the {cass-short} table’s schema as closely as possible, but some data types are not known in {pulsar-short} (or more accurately, not known in Avro). To overcome this, there are adaptations performed when the {cass-short} Source Connector builds the {pulsar-short} message. Some types are not compatible and not able to be adapted. In this case, those columns of data are dropped while creating the {pulsar-short} message. - -To better understand how exactly the CDC agent constructs the event message, here is the pseudo code of how the schema is created: - -[source,java] ----- -org.apache.pulsar.common.schema.SchemaType.AVRO GenericRecord = all key fields in the Cassandra table -org.apache.cassandra.schema TableMetadata = convert a log entry to a mutation instance - -Schema> keyValueSchema = Schema.KeyValue( - (convert GenericRecord to byte[]), - (set TableMetadata to SchemaType.AVRO), - KeyValueEncodingType.SEPARATED -); ----- - -Notice the two types used in KeyValue. The byte array is an Avro-encoded record that documents the table's primary key(s). The MutationValue is an extended Avro record that has direction on what changed and how to get its specifics. - -CDC sets the initial topic schema on the first change it detects. Once the initial topic schema has been set, a “happy path” has been established to create change data events in {pulsar-short}. - -Inevitably, table design will change. Columns are added, updated, or even removed. When these changes occur, the resources that are part of the CDC flow need to be adapted to keep this happy path of event data. This can become quite a complex set of decisions and as such, there are specific changes a CDC flow can tolerate before the flow needs to be recreated entirely. - -Here is a brief summary of how the data message schema is created: - -. Receive GenericRecord event message of type KeyValue. -. Use a {cass-short} client to query for row data. -. Convert data as GenericRecord of type KeyValue and return. -. The connector interface produces a new message to the destination topic. - -== Adding a table column - -This is the easiest of scenarios for table design change. Assuming the new column’s data type is compatible with the source connector, a new schema will replace the existing and message compatibility will be kept. Note that because the schema auto-update compatibility strategy is set to BACKWARD_TRANSITIVE, the new column must be optional, which is the default of any non-primary-key column. - -An example of adding a column: - -`ALTER TABLE [keyspace_name.] table_name ADD my-super-awesome-column text;` - -== Updating a table column - -Altering a table column includes renaming a column or changing a column’s type. Assuming the new column’s data type is compatible with the source connector, a new schema will replace the existing schema and message compatibility will be kept. Once a table has been created, a table’s primary key(s) can not be modified. This fits well with the CDC pattern. - -While technically updating columns is possible when CDC is enabled, it is not recommended. Instead, changes to a {cass-short} table should be additive only. (If you are familiar with data migrations, this concept is the same). To change the name or type of table column, add a new column. The resulting event messages will have a reference to both columns, and you can handle this migration downstream. - -Note that this recommendation assumes a schema compatibility strategy of BACKWARD_TRANSITIVE. If you are using a different schema compatibility strategy, table updates will be handled differently. - -== Removing a table column - -Removing a table column is a simple command in CQL. The resulting CDC data messages will simply not include that data anymore. - -An example of removing a column: - -`ALTER TABLE [keyspace_name.] table_name DROP my-super-awesome-column;` - -== Next - -Let's move on to consuming event data in {pulsar-short}! xref:use-cases-architectures:change-data-capture/consuming-change-data.adoc[]. \ No newline at end of file