-
Notifications
You must be signed in to change notification settings - Fork 3
Maint/upgrade to duckdb v1.4 #27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Maint/upgrade to duckdb v1.4 #27
Conversation
When using RDKit from spack with vcpkg toolchain, the Boost::system, Boost::serialization, and Boost::iostreams targets must be explicitly created before find_package(RDKit). vcpkg's BoostConfig only creates targets for components explicitly requested in find_package(Boost). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add is_substruct tests to mol_search.test covering substructure searching with benzene, pyridine, and functional group patterns - Add edge_cases.test with tests for NULL handling, invalid SMILES, stereochemistry, complex molecules (taxol), charged atoms, isotopes, and ring systems - Document that is_exact_match treats stereoisomers as equivalent - Add notes/ directory with RDKit vcpkg port documentation for future 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Replace ExtensionUtil with ExtensionLoader for all registration calls - Update Extension::Load signature to use ExtensionLoader& instead of DuckDB& - Use DUCKDB_CPP_EXTENSION_ENTRY macro for loadable extension entry point - Update multi_file includes to new duckdb/common/multi_file/ path - Convert between vector<string> and vector<OpenFileInfo> for file lists - Update duckdb and extension-ci-tools submodules to v1.4 All 8 test cases pass (232 assertions). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
bodowd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR @teaguesterling
I liked the additional tests and adapting for the new multi_file_list.hpp was very helpful.
I didn't understand the vcpkg related changes. Could you help me understand those better? Does it make building the extension easier? Or maybe it's useful for something else?
|
|
||
| ## Summary | ||
|
|
||
| RDKit is not available in vcpkg. Creating a C++ only port is feasible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you help me understand what this markdown file is for?
|
|
||
| The project uses: | ||
| - **vcpkg toolchain** for DuckDB extension build system | ||
| - **Spack-installed RDKit** at `/mnt/aux-data/teague/Dev/spack/var/spack/environments/duckdb/.spack-env/view/` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This path seems to be specific to your system. Is this intended? Also in other portions of this file, for example line 34
| DUCKDB_EXTENSION_API void duckdb_rdkit_init(duckdb::DatabaseInstance &db) { | ||
| duckdb::DuckDB db_wrapper(db); | ||
| db_wrapper.LoadExtension<duckdb::DuckdbRdkitExtension>(); | ||
| DUCKDB_CPP_EXTENSION_ENTRY(duckdb_rdkit, loader) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed the parquet extension uses this macro #ifdef DUCKDB_BUILD_LOADABLE_EXTENSION , but the extension template does not.
I'm not familiar with why to use one or the other. Is there an advantage of one approach over the other?
| //! get the files - convert string paths to OpenFileInfo | ||
| vector<OpenFileInfo> file_infos; | ||
| for (auto &file : bind_data->files) { | ||
| file_infos.emplace_back(file); | ||
| } | ||
| SimpleMultiFileList file_list(std::move(file_infos)); | ||
| auto all_files = file_list.GetAllFiles(); | ||
| bind_data->files.clear(); | ||
| for (auto &file_info : all_files) { | ||
| bind_data->files.push_back(file_info.path); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(No change requested, just making a note to remind when looking at this in the futur)
SimpleMultiFileList::GetAllFiles() from duckdb v1.4 now returns a vector of OpenFileInfo. bind_data is expecting vector<string> still. In the future, perhaps possible to change bind_data to be vector<OpenFileInfo> and pull out the path to the file when needed. It might allow this part of the code to be simpler
|
Hey! I should have waited a bit longer to open this. I was trying to get it to build with the standard duckdb extension tool chain and getting boost into vcpkg was a first step there. I'm currently getting it to also pull rdkit as a git submodule so it could be added to the overall extension ci tool chain. I accidentally opened this pr to your repo instead of my fork. |
Are the changes for bumping to duckdb v1.4, like the changes involved with If they are independent, perhaps the PR can be split. Then we could merge the parts that are independent from the vcpkg work first while you are figuring the vcpkg issues out. What do you think? |
|
I'm happy to split this up! Will open new PRs with each feature.
…On Tue, Dec 9, 2025, 13:50 bodowd ***@***.***> wrote:
*bodowd* left a comment (bodowd/duckdb_rdkit#27)
<#27 (comment)>
Hey! I should have waited a bit longer to open this. I was trying to get
it to build with the standard duckdb extension tool chain and getting boost
into vcpkg was a first step there. I'm currently getting it to also pull
rdkit as a git submodule so it could be added to the overall extension ci
tool chain.
I accidentally opened this pr to your repo instead of my fork.
Are the changes for bumping to duckdb v1.4, like the changes involved with #include
"duckdb/common/multi_file/multi_file_reader.hpp" , independent from the
vcpkg work?
If they are independent, perhaps the PR can be split.
Also the additional tests seem independent from the vcpkg work and could
be a different PR from this current one as well.
Then we could merge the parts that are independent from the vcpkg work
first while you are figuring the vcpkg issues out. What do you think?
—
Reply to this email directly, view it on GitHub
<#27 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAARLWPKUTNARTAAOGTPTWD4A473NAVCNFSM6AAAAACOKECPNCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTMMZUGQZTAOBYGY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
No description provided.