Skip to content

Conversation

m09526
Copy link

@m09526 m09526 commented Jul 9, 2025

Which issue does this PR close?

Closes #380.

Rationale for this change

This adds an ObjectStore wrapping implementation that logs all calls being made to the wrapped implementation. This is to aid in debugging. It is particularly useful when object stores are used by 3rd party code such as Apache DataFusion so the developer can determine what remote object calls the 3rd party code is making.

What changes are included in this PR?

TracingStore wrapper is added.

Are there any user-facing changes?

Yes, the new implementation is part of the public API.

Cargo.toml Outdated
http = "1.2.0"
humantime = "2.1"
itertools = "0.14.0"
log = "0.4.27"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use tracing instead given this crate already depends on it?

Copy link
Author

@m09526 m09526 Jul 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! The PR has been updated to reflect this.

@m09526 m09526 changed the title Add LoggingStore wrapper implementation Add TracingStore wrapper implementation Jul 10, 2025
src/trace.rs Outdated
Comment on lines 101 to 104
debug!(
"{} head request for {}/{}",
self.prefix, self.path_prefix, location
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of debug level events, maybe we should use spans?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made some changes to use spans instead.

@asubiotto
Copy link
Contributor

asubiotto commented Jul 11, 2025

FWIW, https://github.com/datafusion-contrib/datafusion-tracing/tree/main/instrumented-object-store exists and seems to work well with our use of datafusion. Just wanted to point it out to reduce duplication of efforts. cc @geoffreyclaude

@geoffreyclaude
Copy link

FWIW, https://github.com/datafusion-contrib/datafusion-tracing/tree/main/instrumented-object-store exists and seems to work well with our use of datafusion. Just wanted to point it out to reduce duplication of efforts. cc @geoffreyclaude

👋 Hi!
instrumented-object-store uses the tracing crate to wrap oject-store operations in spans, and is published as a standalone crate so as not to pull in yet another dependency into arrow-rs-object-store or datafusion.
If you want just basic debug logging, I'd say this PR is complimentary. All these wrappers are short and simple single file code anyways, so it's probably fine to have multiple different versions!

@asubiotto
Copy link
Contributor

asubiotto commented Jul 11, 2025

I'm happy with anything, but I feel like tracing is a better fit for instrumenting these kinds of requests and it's nice to have a canonical version. I would probably pull instrumented-object-store into this repo since it doesn't have anything to do with datafusion (I think?) and make it a feature of the object_store crate.

@geoffreyclaude
Copy link

I'm happy with anything, but I feel like tracing is a better fit for instrumenting these kinds of requests and it's nice to have a canonical version. I would probably pull instrumented-object-store into this repo since it doesn't have anything to do with datafusion (I think?) and make it a feature of the object_store crate.

Absolutely, that makes a lot of sense as well! Feel free to copy instrumented-object-store over here if that's what you prefer, and I'll stop publishing it once datafusion depends on an object-store version that incorporates it.

@m09526
Copy link
Author

m09526 commented Jul 17, 2025

How does that version look? I've made some modifications to avoid the use of single events and moved to using spans.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good to me -- thank you @m09526

@geoffreyclaude and @asubiotto are you happy with this implementation as well?

@asubiotto
Copy link
Contributor

AFAIU this PR reimplements functionality that already exists in instrumented-object-store (minus some missing span tags and hardcoding the debug level). I personally would prefer to see instrumented-object-store move into object_store and out of datafusion-contrib rather than have two implementations of the same thing in my dependencies but I don't care that much. Happy with whatever everyone else is happy with.

@geoffreyclaude
Copy link

I think this looks good to me -- thank you @m09526

@geoffreyclaude and @asubiotto are you happy with this implementation as well?

@alamb I agree with @asubiotto that moving instrumented-object-store as is here would probably be simpler, if only because it is already validated in (our) production setup. Then if there are changes to do for better user experience or performance, we can iterate on the code.

Moreover, from a quick look at this PR it seems to be missing functionality of instrumented-object-store, e.g. tracing of operation results, and the get and put calls?

@m09526
Copy link
Author

m09526 commented Aug 12, 2025

Moreover, from a quick look at this PR it seems to be missing functionality of instrumented-object-store, e.g. tracing of operation results, and the get and put calls?

We deliberately only instrumented the non-provided trait functions since the plain versions of get and put are just wrappers around those functions.

We could add other things like tracing of operation results?

@geoffreyclaude
Copy link

the plain versions of get and put are just wrappers around those functions.

Not necessarily: you can very well have a particular implementation that overrides the Trait's get and put defaults without calling get_opts or put_opts. In which case it won't get traced?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Question] Would contribution of logging wrapper and a "readahead" wrapper for ObjectStore be wanted?

6 participants