Skip to content

Conversation

@teaguesterling
Copy link

No description provided.

teaguesterling and others added 3 commits December 7, 2025 10:39
When using RDKit from spack with vcpkg toolchain, the Boost::system,
Boost::serialization, and Boost::iostreams targets must be explicitly
created before find_package(RDKit). vcpkg's BoostConfig only creates
targets for components explicitly requested in find_package(Boost).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add is_substruct tests to mol_search.test covering substructure
  searching with benzene, pyridine, and functional group patterns
- Add edge_cases.test with tests for NULL handling, invalid SMILES,
  stereochemistry, complex molecules (taxol), charged atoms, isotopes,
  and ring systems
- Document that is_exact_match treats stereoisomers as equivalent
- Add notes/ directory with RDKit vcpkg port documentation for future

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Replace ExtensionUtil with ExtensionLoader for all registration calls
- Update Extension::Load signature to use ExtensionLoader& instead of DuckDB&
- Use DUCKDB_CPP_EXTENSION_ENTRY macro for loadable extension entry point
- Update multi_file includes to new duckdb/common/multi_file/ path
- Convert between vector<string> and vector<OpenFileInfo> for file lists
- Update duckdb and extension-ci-tools submodules to v1.4

All 8 test cases pass (232 assertions).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Owner

@bodowd bodowd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @teaguesterling

I liked the additional tests and adapting for the new multi_file_list.hpp was very helpful.

I didn't understand the vcpkg related changes. Could you help me understand those better? Does it make building the extension easier? Or maybe it's useful for something else?


## Summary

RDKit is not available in vcpkg. Creating a C++ only port is feasible.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you help me understand what this markdown file is for?


The project uses:
- **vcpkg toolchain** for DuckDB extension build system
- **Spack-installed RDKit** at `/mnt/aux-data/teague/Dev/spack/var/spack/environments/duckdb/.spack-env/view/`
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This path seems to be specific to your system. Is this intended? Also in other portions of this file, for example line 34

DUCKDB_EXTENSION_API void duckdb_rdkit_init(duckdb::DatabaseInstance &db) {
duckdb::DuckDB db_wrapper(db);
db_wrapper.LoadExtension<duckdb::DuckdbRdkitExtension>();
DUCKDB_CPP_EXTENSION_ENTRY(duckdb_rdkit, loader) {
Copy link
Owner

@bodowd bodowd Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed the parquet extension uses this macro #ifdef DUCKDB_BUILD_LOADABLE_EXTENSION , but the extension template does not.

I'm not familiar with why to use one or the other. Is there an advantage of one approach over the other?

Comment on lines +139 to +149
//! get the files - convert string paths to OpenFileInfo
vector<OpenFileInfo> file_infos;
for (auto &file : bind_data->files) {
file_infos.emplace_back(file);
}
SimpleMultiFileList file_list(std::move(file_infos));
auto all_files = file_list.GetAllFiles();
bind_data->files.clear();
for (auto &file_info : all_files) {
bind_data->files.push_back(file_info.path);
}
Copy link
Owner

@bodowd bodowd Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(No change requested, just making a note to remind when looking at this in the futur)

SimpleMultiFileList::GetAllFiles() from duckdb v1.4 now returns a vector of OpenFileInfo. bind_data is expecting vector<string> still. In the future, perhaps possible to change bind_data to be vector<OpenFileInfo> and pull out the path to the file when needed. It might allow this part of the code to be simpler

@teaguesterling
Copy link
Author

teaguesterling commented Dec 9, 2025

Hey! I should have waited a bit longer to open this. I was trying to get it to build with the standard duckdb extension tool chain and getting boost into vcpkg was a first step there. I'm currently getting it to also pull rdkit as a git submodule so it could be added to the overall extension ci tool chain.

I accidentally opened this pr to your repo instead of my fork.

@bodowd
Copy link
Owner

bodowd commented Dec 9, 2025

Hey! I should have waited a bit longer to open this. I was trying to get it to build with the standard duckdb extension tool chain and getting boost into vcpkg was a first step there. I'm currently getting it to also pull rdkit as a git submodule so it could be added to the overall extension ci tool chain.

I accidentally opened this pr to your repo instead of my fork.

Are the changes for bumping to duckdb v1.4, like the changes involved with #include "duckdb/common/multi_file/multi_file_reader.hpp" , independent from the vcpkg work?

If they are independent, perhaps the PR can be split.
Also the additional tests seem independent from the vcpkg work and could be a different PR from this current one as well.

Then we could merge the parts that are independent from the vcpkg work first while you are figuring the vcpkg issues out. What do you think?

@teaguesterling
Copy link
Author

teaguesterling commented Dec 9, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants