Deduplicate PhysicalExpr on proto ser/de using Arc pointer addresses #18192

adriangb · 2025-10-21T03:33:13Z

This stemmed from wanting to deduplicate DynamicFilterPhysicalExpr which is essential for them to have a chance at working.

I started by thinking we should add an id field to DynamicFilterPhysicalExpr specifically. But then I had the thought: what if I used the pointer address of the Arc as the id? This has several advantages:

Contains all of the changes to the protobuf ser/de code, no changes necessary to expressions themselves.
Works for all expressions: it's very common to have duplicates in a plan (that's the whole reason they're Arc'ed!) and we would currently use multiple times the memory when deserializing an InList expression because we make a deep copy for each Arc'ed reference.
Is very cheap to compute: I considered using Hash but not only is it a lot more code to derive it everywhere, it also requires that everything is hashable and could be very expensive to compute (again, imagine a huge InList expression).
Has the potential to reduce serialization cost: I did not make the change to keep the diff small so we still serialize duplicate data that we don't use. In theory we could keep some sort of seen: HashSet<usize> on the serialization side and only serialize each expression once.
If this is somehow buggy or fails it's okay, the failure mode is the current behavior of just having duplication, I don't think there's any chance of some sort of collision (maybe another advantage over hashes?).

From a high level point of view this also makes sense: if it's the same Arc on one end it should be on the other end, and this is just a mechanism to achieve that.

Use of pointers as numbers is... a bit scary, but this usage seems safe to me:

We are not accessing any memory or otherwise using the address as a pointer, we just use it as a read only id.
There is no unsafe code, this is all public well documented APIs
Maybe there's some issue lurking with threads? Not sure but in any case the ser/de code should all be single threaded, non asynchronous code.

98% of the diff is the change to introduce DecodeContext. I felt that was cleaner than trying to use some part of RuntimeEnv or something - that is really a different concern. But I'm open to ideas.

datafusion/proto/src/physical_plan/mod.rs

gabotechs · 2025-10-21T05:51:01Z

datafusion/proto/src/physical_plan/from_proto.rs

+    // Check cache first if an ID is present
+    if let Some(id) = proto.id {
+        if let Some(cached) = decode_ctx.get_cached_expr(id) {
+            return Ok(cached);
+        }
+    }
+


This looks like a very elegant solution!

It does solve the problem at hand, but I wonder how could this be extended for the case where the producer part of a dynamic filter is deserialized in one machine, and the consumer part in deserialized in other different machine, which is almost always going to be the case in a distributed context.

For example, the idea that powered gabotechs#7, is that users can subscribe to changes to the dynamic filters in the global registry, and send/produce updates over the wire, something like:

let ctx = SessionContext::new(); let registry = ctx .task_ctx() .session_config() .get_extension::<DynamicFiltersRegistry>(); registry.subscribe_to_updates(); registry.push_updates(); let plan: Arc<dyn ExecutionPlan>; // <- plan with dynamic filters execute_stream(plan, ctx.task_ctx());

I really would love to see something like this happening without recurring to a "global place" for reading/writing updates to dynamic filters, but I cannot come up with other ideas.

Do you think there's a chance we can do that with something simpler like what this PR proposes?

🤔 your idea for having callbacks in #17370 could be a good alternative, although I wonder how those can be set from the outside in an arbitrary Arc<dyn ExecutionPlan>.

If we had a way of getting all the dynamic filter expressions in a plan (fn(plan: &Arc<dyn ExecutionPlan>) -> Vec<&DynamicPhysicalExpr>) this would indeed be a very nice approach.

I was thinking that we set the callbacks when we deserialize in a custom codec or something. We might have to add a function to the codec trait along the lines of visit_physical_expr(&self, expr: Arc<dyn PhysicalExpr>) -> Arc<dyn PhysicalExpr>

Note that it can happen that under certain circumstances the producer part of a dynamic filter is never serialized/deserialized, as it might never get sent over the wire, but the consumer part does. I imagine in this scenario we will be left with an un-connected dynamic filter.

I guess we can inject the hooks when we serialize as well?

But it will also not get serialized at all, any dynamic filter present above the first network boundary (reading from top to bottom) will never suffer any serialization or deserialization

Ah I see your point now. So we still need some way to get a reference to all Arc<dyn PhysicalExpr> outside of serialization 🤔

I also played with the option of adding an expressions(&self) -> Vec<&dyn PhysicalExpr> or something similar to the ExecutionPlan trait, like the children() method, but it gets a bit messy as the relationship between expressions and an ExecutionPlan is a bit different.

It was not too bad though, maybe that approach could be revisited.

I think we may just need to go in that direction. Since it wouldn't be used for serialization it's okay if it's not implemented everywhere. As long as we implement it on the key nodes we know have dynamic filters it should work.

gabotechs · 2025-10-21T06:29:19Z

datafusion/proto/proto/datafusion.proto

  }
+
+  // Optional ID for caching during deserialization.
+  // Set to the Arc pointer address during serialization to enable deduplication.


🤔 I'm not sure if this addresses the fact that serialization will most likely happen on one machine, but deserialization would happen in a different one.

What this accomplishes is that if you deserialize both a DataSourceExec and TopK execution node on the same machine at least those two are connected.

Connecting them between machines needs more work, e.g. #18192 (comment)

Makes sense. Still feels a bit weird to use a pointer address as identifier where in most cases serialization/deserialization will happen in different machines, but I have to agree it gets the job done

The thing is we're not really using the addresses as pointers, just as a way to identify which two expressions are the same expression.

Maybe we can change the documentation to something like:

// Optional ID for caching during deserialization. This is used for deduplication, // so PhysicalExprs with the same ID will be deserialized as Arcs pointing to the // same address (instead of distinct addresses) on the deserializing machine. // // We use the Arc pointer address during serialization as the ID, as this by default // indicates if a PhysicalExpr is identical to another on the serializing machine.

Very nice suggestion, I'll commit it tomorrow :)

adriangb · 2025-10-21T15:24:45Z

@Jefffrey any chance you could give some input on this change?

Jefffrey

I think this makes sense to me. Main concern (other than the pointer bits) is the introduction of DecodeContext; I guess it wasn't easily possible to do via codec (I think you mentioned this already)?

Jefffrey · 2025-10-22T01:23:59Z

datafusion/proto/proto/datafusion.proto

  }
+
+  // Optional ID for caching during deserialization.
+  // Set to the Arc pointer address during serialization to enable deduplication.


Maybe we can change the documentation to something like:

// Optional ID for caching during deserialization. This is used for deduplication, // so PhysicalExprs with the same ID will be deserialized as Arcs pointing to the // same address (instead of distinct addresses) on the deserializing machine. // // We use the Arc pointer address during serialization as the ID, as this by default // indicates if a PhysicalExpr is identical to another on the serializing machine.

adriangb · 2025-10-22T16:16:54Z

I think this makes sense to me. Main concern (other than the pointer bits) is the introduction of DecodeContext; I guess it wasn't easily possible to do via codec (I think you mentioned this already)?

I don't see a better way to add the mutable context necessary for this to work.
One olive branch I can extend to ease backwards compatibility is to impl From<&'a TaskContext> for DecodeContext<'a> and make DecodeContext<'a> Clone and then change the public signature of the functions that now require a DecodeContext to accept impl Into<DecodeContext<'a>>. But that is quite a bit of added complexity, personally I don't think it's worth it but I can implement that if reviewers feel it is required.

adriangb · 2025-10-22T16:32:30Z

I think this makes sense to me. Main concern (other than the pointer bits) is the introduction of DecodeContext; I guess it wasn't easily possible to do via codec (I think you mentioned this already)?

I don't see a better way to add the mutable context necessary for this to work. One olive branch I can extend to ease backwards compatibility is to impl From<&'a TaskContext> for DecodeContext<'a> and make DecodeContext<'a> Clone and then change the public signature of the functions that now require a DecodeContext to accept impl Into<DecodeContext<'a>>. But that is quite a bit of added complexity, personally I don't think it's worth it but I can implement that if reviewers feel it is required.

Something like this: https://github.com/pydantic/datafusion/pull/41/files

Jefffrey · 2025-10-23T01:59:14Z

I think this makes sense to me. Main concern (other than the pointer bits) is the introduction of DecodeContext; I guess it wasn't easily possible to do via codec (I think you mentioned this already)?

I don't see a better way to add the mutable context necessary for this to work. One olive branch I can extend to ease backwards compatibility is to impl From<&'a TaskContext> for DecodeContext<'a> and make DecodeContext<'a> Clone and then change the public signature of the functions that now require a DecodeContext to accept impl Into<DecodeContext<'a>>. But that is quite a bit of added complexity, personally I don't think it's worth it but I can implement that if reviewers feel it is required.

Something like this: https://github.com/pydantic/datafusion/pull/41/files

Hmm yeah I don't this is is any better, since we might as well go all the way instead of a halfway solution 😅

I'll cc @timsaucer too as they also changed the signatures recently for proto physical plan in #18123

timsaucer · 2025-10-24T20:47:24Z

Thanks for the ping. One thing I was planning to do this weekend was to write up a PR to move from TaskContext to dyn FunctionRegistry, which TaskContext implements. I believe that the function registry is the only portion of the TaskContext we need. I believe this would simplify the code as we currently have two paths, some like parse_physical_exprs that take &TaskContext and some like parse_expr that take &dyn FunctionRegistry.

I'm happy to put that PR in, but since you're digging into this bit of the code maybe we can include it?

Also +1 on the impl Into<DecodeContext<'a>> on the signatures as I think that will make for a much more ergonomic experience.

I think the core idea is a good one.

adriangb · 2025-10-24T22:49:45Z

I'm not clear on what the pros/cons of &TaskContext vs. &dyn FunctionRegistry are. I fear that some folks doing distributed (cc @gabotechs) want "more" there not less while other folks doing FFI (I'm guessing that's your use case right @timsaucer?) want "less", those two things seem to be opposed with each other. I guess as long as downcast matching is possible it may be okay to have the &dyn FunctionRegistry in the API and have implementations that want a &TaskContext do some downcasting? Either way that seems like a bigger discussion to have on your PR / proposal / issue.

+1 on the impl Into<DecodeContext<'a>> on the signatures as I think that will make for a much more ergonomic experience

Hmm the only reason I see to do that is for backwards compatibility. The complexity and opacity introduced is not worth it otherwise IMO.

Jefffrey · 2025-10-25T03:01:48Z

Thanks for the ping. One thing I was planning to do this weekend was to write up a PR to move from TaskContext to dyn FunctionRegistry, which TaskContext implements. I believe that the function registry is the only portion of the TaskContext we need. I believe this would simplify the code as we currently have two paths, some like parse_physical_exprs that take &TaskContext and some like parse_expr that take &dyn FunctionRegistry.

I'm happy to put that PR in, but since you're digging into this bit of the code maybe we can include it?

Also +1 on the impl Into<DecodeContext<'a>> on the signatures as I think that will make for a much more ergonomic experience.

I think the core idea is a good one.

In regards to parse_expr, I actually have a PR that changes it to accept TaskContext in order to support subqueries, see #18167

gabotechs · 2025-10-25T05:05:50Z

I'm not clear on what the pros/cons of &TaskContext vs. &dyn FunctionRegistry are. I fear that some folks doing distributed (cc @gabotechs) want "more"

Note that having a DecodeContext or a impl Into<DecodeContext<'a>> for tracking ids of expressions derived from their pointer addresses still leaves part of the challenge unsolved, as we'll still not be able use that for communicating dynamic filter updates over the wire in a distributed context as is.

It might be worth to at least have a plan on how to do that end to end before committing to introducing an API change that might need to get revisited for having a full solution.

milenkovicm · 2025-10-25T05:53:38Z

I'm not clear on what the pros/cons of &TaskContext vs. &dyn FunctionRegistry are. I fear that some folks doing distributed (cc @gabotechs) want "more" there not less while other folks doing FFI (I'm guessing that's your use case right @timsaucer?) want "less", those two things seem to be opposed with each other. I guess as long as downcast matching is possible it may be okay to have the &dyn FunctionRegistry in the API and have implementations that want a &TaskContext do some downcasting? Either way that seems like a bigger discussion to have on your PR / proposal / issue.

+1 on the impl Into<DecodeContext<'a>> on the signatures as I think that will make for a much more ergonomic experience

Hmm the only reason I see to do that is for backwards compatibility. The complexity and opacity introduced is not worth it otherwise IMO.

I can chime in here, TaskContext replaced SessionContext in #17601 , SessionContext was needed to provide RuntimeEnv

milenkovicm · 2025-10-25T05:55:18Z

I wonder in which cases decoding protobuf is a bottleneck? Do you have some flame graphs to show, or this might be theoretical bottleneck?

milenkovicm · 2025-10-25T06:04:35Z

I'm not clear on what the pros/cons of &TaskContext vs. &dyn FunctionRegistry are. I fear that some folks doing distributed (cc @gabotechs) want "more"

Note that having a DecodeContext or a impl Into<DecodeContext<'a>> for tracking ids of expressions derived from their pointer addresses still leaves part of the challenge unsolved, as we'll still not be able use that for communicating dynamic filter updates over the wire in a distributed context as is.

It might be worth to at least have a plan on how to do that end to end before committing to introducing an API change that might need to get revisited for having a full solution.

If we're talking about adding impl Into<DecodeContext<'a>> I believe it makes sense to add &dyn DecodeContext better, with non caching implementation as the default.
At the moment caching implementation may have benefits in very specific case, so for generic case having cache disabled does look like the best approach.

milenkovicm

If we want to proceed in this direction, making cache disabled by default would make sense, as the benefits of having it on are not really obvious and very use case specific.

adriangb · 2025-10-25T13:10:22Z

If we want to proceed in this direction, making cache disabled by default would make sense, as the benefits of having it on are not really obvious and very use case specific.

Isn't there a benefit for all users of reducing blowup of duplicate expressions? If duplicate expressions aren't a problem we wouldn't be Arcing them in the first place.

The cost is miniscule: a hashmap of integers and pointers.

timsaucer · 2025-10-25T13:16:29Z

I'm not clear on what the pros/cons of &TaskContext vs. &dyn FunctionRegistry are. I fear that some folks doing distributed (cc @gabotechs) want "more" there not less while other folks doing FFI (I'm guessing that's your use case right @timsaucer?) want "less", those two things seem to be opposed with each other. I guess as long as downcast matching is possible it may be okay to have the &dyn FunctionRegistry in the API and have implementations that want a &TaskContext do some downcasting? Either way that seems like a bigger discussion to have on your PR / proposal / issue.

+1 on the impl Into<DecodeContext<'a>> on the signatures as I think that will make for a much more ergonomic experience

Hmm the only reason I see to do that is for backwards compatibility. The complexity and opacity introduced is not worth it otherwise IMO.

I can chime in here, TaskContext replaced SessionContext in #17601 , SessionContext was needed to provide RuntimeEnv

This is good to know. Then either as this PR or as a follow on, it would be good to move the current &dfn FunctionRegistry on the logical side over to &TaskContext or DecodeContext.

milenkovicm · 2025-10-25T13:24:10Z

If we want to proceed in this direction, making cache disabled by default would make sense, as the benefits of having it on are not really obvious and very use case specific.

Isn't there a benefit for all users of reducing blowup of duplicate expressions? If duplicate expressions aren't a problem we wouldn't be Arcing them in the first place.

The cost is miniscule: a hashmap of integers and pointers.

Most users do not have this problem to start with.

It's not issue with performance overhead, issue is with user generated ids, and subtitle bugs it can bring.

Also I believe it's trivial to have two implementation one which will cache other which won't and change it as needed

milenkovicm · 2025-10-25T16:09:50Z

datafusion/proto/src/bytes/mod.rs

    let protobuf = protobuf::PhysicalPlanNode::decode(bytes)
        .map_err(|e| plan_datafusion_err!("Error decoding expr as protobuf: {e}"))?;
-    protobuf.try_into_physical_plan(ctx, extension_codec)
+    let decode_ctx = DecodeContext::new(ctx);


should decode_ctx be method parameter rather than created here?

this comet is added for consistency with other public methods expecting DecodeCtx but I'm not sure we should expose &DecodeContext in public methods, explanation in following comment

milenkovicm · 2025-10-25T16:10:00Z

datafusion/proto/src/bytes/mod.rs

        .map_err(|e| plan_datafusion_err!("Error serializing plan: {e}"))?;
    let extension_codec = DefaultPhysicalExtensionCodec {};
-    back.try_into_physical_plan(&ctx, &extension_codec)
+    let decode_ctx = DecodeContext::new(ctx);


should decode_ctx be method parameter rather than created here?

this comet is added for consistency with other public methods expecting DecodeCtx but I'm not sure we should expose &DecodeContext in public methods, explanation in following comment

milenkovicm · 2025-10-25T17:22:57Z

One simple question: if the TaskContext and DecodeContext are reused to decode protos from multiple clients or even from one client sending plan after plan, is there a chance of id collision?

I believe the current implementation does not prevent the reuse of DecodeContext for two different plans.

let task_ctx = ctx.task_ctx();
    let decode_ctx = DecodeContext::new(&task_ctx);
    let result_exec_plan: Arc<dyn ExecutionPlan> = proto
        .try_into_physical_plan(&decode_ctx, codec)
        .expect("from proto");

Two subsequent encoded plans coming to the same DecodeContext (one after the other, even from a same client) may have identical IDs for different expressions (we were unlucky to have one arc doped and another arc created at the same location for two different queries coming one after the other. I know we have to be very, very unlucky, but we have no guarantee it wont happen).
Hence, we need to consume DecodeContext after decoding to prevent its re-use for the next decode in all public interfaces, preventing its re-use.

I can't really claim but current approach use of arc address may be safe for referencing expressions within single plan.

If DecodeContext can't be re-used, how many times cache will be hit, and would it really save that much resources to add additional moving part, and changing public interface again?

adriangb · 2025-10-25T18:37:31Z

if the TaskContext and DecodeContext are reused to decode protos from multiple clients or even from one client sending plan after plan, is there a chance of id collision?

Yes, my intention was that you create a new DecodeContext per plan that you decode.

milenkovicm · 2025-10-25T19:22:12Z

if the TaskContext and DecodeContext are reused to decode protos from multiple clients or even from one client sending plan after plan, is there a chance of id collision?

Yes, my intention was that you create a new DecodeContext per plan that you decode.

then public methods should consume (and invalidate) DecodeContext instead of passing is as a reference

adriangb · 2025-10-25T19:57:22Z

Yeah that makes sense! I'm not sure how to encode that into the APIs, but making it very explicit in the docs, etc. should be enough?

One thing I'm thinking is if there's a way to satisfy all of the input here by making a new high level API.
An annoyance I've had with the Codec system is that it's not clear how to inject extra behavior, it only functions as a fallback.

Something along the lines of:

pub trait Decoder {
   fn decode_plan(&self, plan: PhysicalPlanNode) -> Result<Arc<dyn ExecutionPlan>>;
   fn decode_expression(&self, expression: PhysicalExprNode) -> Result<Arc<dyn ExpressionNode>>;
}

pub struct DefaultDecoder {
  ctx: TaskContext,
}

impl DefaultDecoder {
  pub fn new(ctx: TaskContext) -> Self {
    Self { ctx }
  }

impl Decoder for DefaultDecoder {
   fn decode_plan(decoder: & dyn Decoder, plan: PhysicalPlanNode) -> Result<Arc<dyn ExecutionPlan>> {
      // Essentially the code inside of `PhysicalPlanNode::try_from_physical_plan`
     // but passing around a reference to ourselves as `&dyn Decoder` so that e.g. if we have to decode
     // predicates inside of a plan it calls back into `decode_expression`
     // Maybe delegates to 
   }

   fn decode_expression(decoder: &dyn Decoder expression: ExpressionNode, input_schema: &Schema) -> Result<Arc<dyn ExpressionNode>> {
      // essentially the code inside of `parse_physical_expr` but again passing around a reference to ourselves
   }
}

Then it's easy to make a custom DefaultDecoder that injects before/after behavior, e.g. caching or re-attaching custom bits to plans. I'd imagine we get rid of the Codec bit and have defaults just error if you don't match and handle extension/custom types yourself.

Anyway that's a half baked idea and that discussion may be a blocker for this PR but I think it is largely unrelated to "is the deduplication worth doing by default", I'll address that in my next comment.

adriangb · 2025-10-25T20:04:24Z

Most users do not have this problem to start with.

I'd argue that most users do have this problem. Consider a query like:

SELECT *
FROM 'file.parquet'
WHERE id IN (1, 2, 3, 4, 5...);

This PR improves memory usage for this query by avoiding duplicating the InList expression in a FilterExec and ParquetSource when deserializing.

If DecodeContext can't be re-used, how many times cache will be hit, and would it really save that much resources to add additional moving part, and changing public interface again?

The idea is not to cache expressions across plans, rather within a plan.

Here are flame graphs from main and dedupe-expr respectively for ~ that query:

Raw data:
pprof.zip

I generated this by adding the following example to datafusion-examples:

use datafusion::{common::Result, prelude::*};
use datafusion_proto::bytes::{physical_plan_from_bytes, physical_plan_to_bytes};
use parquet::{arrow::ArrowWriter, file::properties::WriterProperties};


#[cfg(not(target_env = "msvc"))]
#[global_allocator]
static ALLOC: tikv_jemallocator::Jemalloc = tikv_jemallocator::Jemalloc;

#[allow(non_upper_case_globals)]
#[export_name = "malloc_conf"]
pub static malloc_conf: &[u8] = b"prof:true,prof_active:true,lg_prof_sample:19\0";


#[tokio::main]
async fn main() -> Result<()> {
    let mut prof_ctl = jemalloc_pprof::PROF_CTL.as_ref().unwrap().lock().await;
    prof_ctl.activate().unwrap();

    let _plan = {
        let ctx = SessionContext::new();
        let batches = ctx.sql("SELECT c FROM generate_series(1, 1000000) t(c)").await?.collect().await?;
        let file = std::fs::File::create("test.parquet")?;
        let props = WriterProperties::builder()
            // limit batch sizes so that we have useful statistics
            .set_max_row_group_size(4096)
            .build();
        let mut writer = ArrowWriter::try_new(file, batches[0].schema(), Some(props))?;
        for batch in &batches {
            writer.write(batch)?;
        }
        writer.close()?;

        let mut df = ctx.read_parquet("test.parquet", ParquetReadOptions::default()).await?;
        df = df.filter(col("c").in_list((1_000..10_000).map(|v| lit(v)).collect(), false))?;
        let plan = df.create_physical_plan().await?;
        physical_plan_from_bytes(&physical_plan_to_bytes(plan)?, &ctx.task_ctx())?
    };

    let pprof = prof_ctl.dump_pprof().unwrap();
    std::fs::write("proto_memory.pprof", pprof).unwrap();

    Ok(())
}

Full diff:
diff.txt

It looks like this was able to deduplicate the InList expression, taking memory use after deserializing from 5.7MB to 4MB. Not every plan is going to have a 40% memory savings (I obviously chose a large expression on purpose, but you can imagine it's not unrealistic if there's a couple large strings in there like a list of 100 names), but I think many plans will have some small amount of memory saving, some will have even larger than 40%. This is also not accounting for the CPU cycles saved by not deserializing the same thing multiple times.

milenkovicm · 2025-10-25T21:09:44Z

5.7 to 4 is around 30% saved, if I'm not mistaken 😀

This optimisation is on the "control plane" not on the "data plane" (30% on data plane would make huge difference, we would not have this discussion in that case). IMHO, small improvements on the "control plane" does not justify additional moving part or increase of interface/protocol complexity. If we put current limitations in the API description we have made it (API) more complex, and provided an avenue to introduce bugs as someone did not read documentation

I have never seen decoding having any significant impact, most tasks will take quite more time crunching data, compared to few microseconds saved in decoding (on top of that data has already been moved over the network).

adriangb · 2025-10-25T21:45:08Z

It depends on how you do the math (I did (5.7-4)/4) but that's a detail.

IMO "control plane" vs "data plane" can get a bit blurry, eg in the case of InList it's possible (and I think common) to have relatively large amounts of data in the "control plane".

Besides: all of this is to further enable an optimization (dynamic filters) that can make queries 25x faster. We are in fact discussing using InList to push down join hash tables so large InList expressions are an excellent example. So what I'm saying is "let's make the data plane 25x faster by at the same time making the control plane use 30% less memory, that requires some breaking API changes, we should figure out what those are". I think that's a pretty compelling story.

DataFusion makes plenty of breaking API changes, I don't even think this is that egregious of one. Is it API changes in this part of code in general that you're opposed to, or mainly the footgun of re-using a cache / context causing collisions? I'm sure the later can be addressed in some way.

milenkovicm · 2025-10-25T22:21:41Z

It depends on how you do the math (I did (5.7-4)/4) but that's a detail.

lol, Im not sure you can redefine mats, there are strict rules around it 😂

IMO "control plane" vs "data plane" can get a bit blurry, eg in the case of InList it's possible (and I think common) to have relatively large amounts of data in the "control plane".

Besides: all of this is to further enable an optimization (dynamic filters) that can make queries 25x faster. We are in fact discussing using InList to push down join hash tables so large InList expressions are an excellent example. So what I'm saying is "let's make the data plane 25x faster by at the same time making the control plane use 30% less memory, that requires some breaking API changes, we should figure out what those are". I think that's a pretty compelling story.

I'm not sure why this argument is valid in this context 😕 I had sad nothing related to dynamic filters.

DataFusion makes plenty of breaking API changes, I don't even think this is that egregious of one.

That's perfectly fine but let's have some kind of high bar when we do that.

Is it API changes in this part of code in general that you're opposed to, or mainly the footgun of re-using a cache / context causing collisions? I'm sure the later can be addressed in some way.

Saving a 1.7MB on a executor which use 8GB does not make huge difference, that like 0.02%, yet for that you have introduced a possibility to shoot your foot and made interface more complex

gabotechs · 2025-10-27T12:00:24Z

I think it might be worth to have an end-to-end plan for bringing dynamic filters into distributed contexts before continuing this discussion, as having the full picture on the table can help us better inform decisions. Created a ticket for bringing discussions there:

#18296

github-actions bot added proto Related to proto crate ffi Changes to the ffi crate labels Oct 21, 2025

adriangb commented Oct 21, 2025

View reviewed changes

datafusion/proto/src/physical_plan/mod.rs Show resolved Hide resolved

adriangb mentioned this pull request Oct 21, 2025

add id and update callbacks to dynamic filters #17370

Closed

gabotechs reviewed Oct 21, 2025

View reviewed changes

adriangb requested a review from Jefffrey October 21, 2025 15:24

Jefffrey reviewed Oct 22, 2025

View reviewed changes

adriangb added 4 commits October 22, 2025 09:29

Dedplicate PhysicalExpr on proto ser/de using Arc pointer addresses

a014ab2

ensure safety of insert using entry api in single lock hold

3776705

remove clone

f2aa11b

fix

5a3fa78

adriangb force-pushed the dedupe-expr branch from 0fd1d54 to 5a3fa78 Compare October 22, 2025 16:32

update docstring with pr review rec

7edec6e

Regen

07fb426

milenkovicm reviewed Oct 25, 2025

View reviewed changes

gabotechs mentioned this pull request Oct 27, 2025

Access DynamicFilterPhysicalExpr expressions from outside the plan #18296

Open

Uh oh!

Deduplicate PhysicalExpr on proto ser/de using Arc pointer addresses #18192

Are you sure you want to change the base?

Deduplicate PhysicalExpr on proto ser/de using Arc pointer addresses #18192

Conversation

adriangb commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

gabotechs Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gabotechs Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gabotechs Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adriangb commented Oct 21, 2025

Uh oh!

Jefffrey left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adriangb commented Oct 22, 2025

Uh oh!

adriangb commented Oct 22, 2025

Uh oh!

Jefffrey commented Oct 23, 2025

Uh oh!

timsaucer commented Oct 24, 2025

Uh oh!

adriangb commented Oct 24, 2025

Uh oh!

Jefffrey commented Oct 25, 2025

Uh oh!

gabotechs commented Oct 25, 2025

Uh oh!

milenkovicm commented Oct 25, 2025

Uh oh!

milenkovicm commented Oct 25, 2025

Uh oh!

milenkovicm commented Oct 25, 2025

Uh oh!

milenkovicm left a comment

Choose a reason for hiding this comment

Uh oh!

adriangb commented Oct 25, 2025

Uh oh!

timsaucer commented Oct 25, 2025

Uh oh!

milenkovicm commented Oct 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

adriangb commented Oct 21, 2025 •

edited

Loading

gabotechs Oct 21, 2025 •

edited

Loading

gabotechs Oct 21, 2025 •

edited

Loading

gabotechs Oct 21, 2025 •

edited

Loading

milenkovicm commented Oct 25, 2025 •

edited

Loading

milenkovicm commented Oct 25, 2025 •

edited

Loading

adriangb commented Oct 25, 2025 •

edited

Loading

adriangb commented Oct 25, 2025 •

edited

Loading