Skip to content

Conversation

@Dimitrionian
Copy link
Collaborator

Add Apache HBase Provider

This PR introduces a new Apache HBase Provider for Apache Airflow, enabling seamless integration with Apache HBase - a distributed, scalable, big data store built on Apache Hadoop.

Features

  • Table Operations: Create, delete, and manage HBase tables with column families
  • Data Operations: Insert, retrieve, scan, and batch operations on table data
  • Backup & Restore: Full and incremental backup operations with restore capabilities
  • Monitoring: Sensors for table existence, row counts, and column values
  • SSL/TLS Support: Secure connections with certificate validation
  • Kerberos Authentication: Enterprise authentication with keytab support
  • Connection Pooling: High-performance connection pooling with optimized batch processing
  • Performance Optimization: Configurable batch sizes and parallel execution

@Dimitrionian Dimitrionian self-assigned this Jan 13, 2026
exists = bool(row_data)
self.log.info("Row %s in table %s exists: %s", self.row_key, self.table_name, exists)
return exists
except Exception as e:

This comment was marked as resolved.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove exception handling to forward exceptions higher to allow Airflow handling them and retry out of the box

self.log.info("Table %s has %d rows, expected: %d", self.table_name, row_count,
self.expected_count)
return row_count == self.expected_count
except Exception as e:

This comment was marked as resolved.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove exception handling to forward exceptions higher to allow Airflow handling them and retry out of the box

self.column, self.row_key, self.expected_value, actual_value
)
return matches
except Exception as e:

This comment was marked as resolved.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove exception handling to forward exceptions higher to allow Airflow handling them and retry out of the box

}

dag = DAG(
"example_hbase_backup_simple",

This comment was marked as resolved.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an old version from the development and testing process. The point is Airflow copies dags in /dags folder to operate. I will remove it

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we need DAG-examples?
I don't like this default DAG-name without DAG in file system.
May be I don't right.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you right. I hide this comment.

- airflow.providers.hbase.example_dags.example_hbase
- airflow.providers.hbase.example_dags.example_hbase_ssl
- airflow.providers.hbase.example_dags.example_hbase_advanced
- airflow.providers.hbase.example_dags.example_hbase_backup_simple No newline at end of file

This comment was marked as resolved.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an old version from the development and testing process. The point is Airflow copies dags in /dags folder to operate. I will remove it


def __init__(
self,
action: str,

This comment was marked as resolved.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added Enum logic

return False


class HBaseRowCountSensor(BaseSensorOperator):

This comment was marked as resolved.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning added

self.log.info("Row %s not found in table %s", self.row_key, self.table_name)
return False

actual_value = row_data.get(self.column.encode('utf-8'), b'').decode('utf-8')

This comment was marked as resolved.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Direct bytes comparison added, test added

name: Standard
description: Empty standard provider
state: ready
source-date-epoch: 1734000000

This comment was marked as resolved.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems, I copied it from another provider but it doesn't matter: it's managed by a release manager

`Apache HBase <https://hbase.apache.org/>`__

state: ready
source-date-epoch: 1734000000

This comment was marked as resolved.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems, I copied it from another provider but it doesn't matter: it's managed by a release manager. Nonetheless, changed it to an actual date for the Hbase provider

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants