Skip to content

Add ADBC driver support for Arrow Flight SQL #856

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

vishwamartur
Copy link

@vishwamartur vishwamartur commented Nov 19, 2024

Related to #276

Add support for ADBC (Arrow Database Connectivity) driver for Arrow Flight SQL.

  • ADBC Driver Implementation

    • Add src/Processors/Formats/Impl/ADBCDriver.cpp to implement the ADBC driver for Arrow Flight SQL.
    • Add src/Processors/Formats/Impl/ADBCDriver.h to declare the ADBC driver class and its methods.
  • Configuration

    • Modify src/configure_config.cmake to include ADBC driver support and necessary libraries.
  • Documentation

    • Update README.md to include information about ADBC driver support, examples, and usage instructions.
  • Testing

    • Add tests/ADBCDriverTest.cpp to implement tests for the ADBC driver, including connection and query execution tests.

/claim #276

Related to timeplus-io#276

Add support for ADBC (Arrow Database Connectivity) driver for Arrow Flight SQL.

* **ADBC Driver Implementation**
  - Add `src/Processors/Formats/Impl/ADBCDriver.cpp` to implement the ADBC driver for Arrow Flight SQL.
  - Add `src/Processors/Formats/Impl/ADBCDriver.h` to declare the ADBC driver class and its methods.

* **Configuration**
  - Modify `src/configure_config.cmake` to include ADBC driver support and necessary libraries.

* **Documentation**
  - Update `README.md` to include information about ADBC driver support, examples, and usage instructions.

* **Testing**
  - Add `tests/ADBCDriverTest.cpp` to implement tests for the ADBC driver, including connection and query execution tests.
@CLAassistant
Copy link

CLAassistant commented Nov 19, 2024

CLA assistant check
All committers have signed the CLA.

Copy link

algora-pbc bot commented Nov 19, 2024

💵 To receive payouts, sign up on Algora, link your Github account and connect with Stripe.

@jovezhong
Copy link
Contributor

Thanks for the PR. We will be reviewing it shortly.

@jovezhong
Copy link
Contributor

Hi @vishwamartur,

Thanks for the PR. I am checking with our engineering team to see who will be the best person to look into the implementation details. What I am expecting

  • Able to send SQL query/DDL to Timeplus via all supported ADBC languages. According to https://arrow.apache.org/adbc/current/index.html, they are C, C++, Go, Java, Python, R. You updated the README for a C++ example, with grpc to Arrow Flight SQL. Can you also work on the ADBC drivers in other langauges?
  • For large resultset, the ADBC should perform same or better than JDBC(row based)
  • It'll be great to support streaming SQL with ADBC

I will arrange some blog/video around the ADBC/Arrow support when the PR is merged.

Hope it makes sense, and feel free to let us know your thoughts.

@vishwamartur
Copy link
Author

Hi @jovezhong,

Thank you for the detailed feedback and suggestions!

To start, we’d like to focus on fully implementing and stabilizing the ADBC driver support in C++. Once the C++ implementation is complete and meets the required performance and functionality benchmarks (e.g., large result set handling, streaming SQL), we can then plan to extend support to other languages like Go, Java, Python, and R.

This phased approach will allow us to ensure a solid foundation before expanding to other ecosystems. Let me know if this sounds good, or if you have any immediate priorities that require parallel development in other languages.

Thanks!
@vishwamartur

@jovezhong
Copy link
Contributor

Sounds good. Let's have the C++ driver has the 1st feature-complete ADBC driver, then expand to more languages. From high priority to lower: C++ > Java > Python > Go. You don't need to work on R adapter. Ideally we contribute the ADBC driver for Timeplus, similar to https://arrow.apache.org/adbc/current/driver/postgresql.html

@zeroshade
Copy link

Looking at this, this doesn't appear to actually have much to do with ADBC in anything but name. Does Timeplus already support Arrow FlightSQL? If so, then there's nothing that needs to be done as all of the ADBC bindings would be able to use the FlightSQL driver to connect query data from any one of multiple languages (Go, C++, C, Python, R, Rust, Java, etc.)

If Timeplus doesn't already support FlightSQL, then you need to implement the ADBC C interface to create a driver, ideally as a shared object library that can be separately distributed as a client rather than built into Timeplus directly. I can help with that if needed.

@jovezhong
Copy link
Contributor

Thanks Matt for the comment. Today in Timeplus Proton server we don't have FlightSQL built-in. I leave more discussions between you and @vishwamartur

To be clear, we want ADBC support more than FlightSQL.

@zeroshade
Copy link

I just want to clarify: @vishwamartur is the goal here to have an ADBC driver to connect to Time plus with? Or for Time plus to connect to other sources via ADBC? That will affect what is expected to be implemented here.

@zeroshade
Copy link

zeroshade commented Nov 20, 2024

@jovezhong i just to be clear, if Timeplus exposes a Flight SQL server for connectivity, you would get ADBC support for free via the flight SQL ADBC (and ODBC/JDBC) driver that already exists.

That said, I believe you already are built on ClickHouse, so it shouldn't be too difficult to create an ADBC driver which can use the ClickHouse protocol for connecting and retrieving Arrow formatted data, right?

@vishwamartur
Copy link
Author

Hi @zeroshade,

Thanks for the clarification! The goal is to create an ADBC driver for clients to connect to Timeplus. Leveraging the ClickHouse protocol to retrieve Arrow-formatted data makes sense, given our architecture.

If you have any specific suggestions for implementing the ADBC C interface or designing the driver as a shared library, I’d greatly appreciate it.

Looking forward to your thoughts!

Best,
Vishwa

@zliang-min
Copy link
Collaborator

@vishwamartur I might have missed something, but looking at the PR, I don't see how this can let someone create a ADBC driver to connect to timeplus proton. Could you help me to understand how this works, please?

@zliang-min
Copy link
Collaborator

zliang-min commented Nov 20, 2024

@jovezhong i just to be clear, if Timeplus exposes a Flight SQL server for connectivity, you would get ADBC support for free via the flight SQL ADBC (and ODBC/JDBC) driver that already exists.

That said, I believe you already are built on ClickHouse, so it shouldn't be too difficult to create an ADBC driver which can use the ClickHouse protocol for connecting and retrieving Arrow formatted data, right?

@zeroshade yes, proton also has the arrow format support as ClickHouse does, but there are gaps as the implementations are not up-to-date with the ClickHouse repo at the moment. This might or might not have impact on implementing an ADBC driver ( I don't now much about implementing an ADBC driver ). I don't know if ADBC interface supports streaming already, since proton is a streaming data engine, this is one thing to pay attention to when implementing a database driver for it.

@vishwamartur
Copy link
Author

vishwamartur commented Nov 20, 2024

image

@zeroshade, could you please suggest any changes?

@zliang-min, if I’m mistaken, I would appreciate your guidance and suggestions for improvements. I’ll do my best to implement them.

@zliang-min
Copy link
Collaborator

@vishwamartur to achieve the goal of being able to connect to timeplus proton via an ADBC driver, there are two options:

  • create an timeplus proton specific ADBC driver that understands how to talk to proton to execute queries. Such a driver can live in its own repo, and it does not have to be implemented in C++ ( for example, you can use Go and the proton go driver to implement a GO ADBC driver for proton ).
  • or, like @zeroshade mentioned, you can implement a Flight SQL server in proton, so that people can use any existing ADBC driver to connect to proton and run queries.

The second option allows the maximum availability and makes it easier to integrate with the existing ecosystem. The first option is probably easier, but it has big limitations ( it limits what languages can be used, and it's hard to utilize what are already there in the ecosystem ).

Hopefully this makes sense.

@zeroshade
Copy link

The second option allows the maximum availability and makes it easier to integrate with the existing ecosystem. The first option is probably easier, but it has big limitations ( it limits what languages can be used, and it's hard to utilize what are already there in the ecosystem ).

It actually doesn't limit the languages as much as you'd expect. For example, the current ADBC FlightSQL driver is implemented in Go and distributed as a C shared object that can be loaded by ADBC driver managers. If you implement the Go ADBC Interface, then it's a simple case to use the existing SDK to create a distributable driver that can be easily loaded by any ADBC driver manager.

@zeroshade, could you please suggest any changes?

I would argue that ADBC Driver belongs in the same box as SDK, JDBC/ODBC and Data/BI Connectors. An ADBC driver is just another driver, similar in concept to a JDBC or ODBC driver (but columnar and Arrow-native instead of row-oriented).

@vishwamartur
Copy link
Author

I’ve made the changes in this pull request. Could you please review them and share your suggestions? I’m happy to make any necessary updates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants