Skip to content

Conversation

@waynexia
Copy link
Member

@waynexia waynexia commented Aug 5, 2025

Which issue does this PR close?

We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax.

Rationale for this change

#7303 implements the fundamental symbols for tracking memory. This patch exposes those APIs to a higher level Array and ArrayData.

What changes are included in this PR?

New APIs claim for Array and ArrayData. New feature pool to arrow, arrow-array and arrow-data for the new API

Are these changes tested?

Yes, and there is an example to demo basic usage.

Are there any user-facing changes?

New API and feature

@github-actions github-actions bot added the arrow Changes to the arrow crate label Aug 5, 2025
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
@waynexia waynexia force-pushed the array-mem-tracking branch from 9ef341f to 4fa363c Compare August 5, 2025 02:49
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
@waynexia waynexia force-pushed the array-mem-tracking branch from 31b8c72 to 27f8c46 Compare August 5, 2025 03:07
@alamb
Copy link
Contributor

alamb commented Aug 6, 2025

Thanks @waynexia -- is there a high level description / "[EPIC]" style ticket of what we are doing with these APIs? I think it would be good to make an issue that describes the high level project / direction for wider visibility (and likely to get other people involved)

I found some related issues. Do any of them reflect your plan?

use arrow_schema::{DataType, Field};
use std::sync::Arc;

fn main() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we write this as a (doc?)test instead (or as well)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please -- I think it would be much easier to find as an doc test -- perhaps you could just move it to Array::claim

@Dandandan
Copy link
Contributor

@alamb I think it's mostly #6439 so consumers (like DataFusion) can better keep track memory usage without over-counting memory after e.g. (.slice)

@alamb
Copy link
Contributor

alamb commented Aug 14, 2025

@alamb I think it's mostly #6439 so consumers (like DataFusion) can better keep track memory usage without over-counting memory after e.g. (.slice)

Sounds like a great idea -- thank you @waynexia and @Dandandan

I am quite backed up reviewing other projects and PRs , but will try and review this one over the next few days. Any help reviewing would be most apprecaited.

I filed a epic ticket to track this project, and maybe we can use that to help organize our work a bit better

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @waynexia -- I think the code makes a lot of sense and we could merge it as is.

However, I feel strongly that without some additional documentation on how to use claim downstream projects are not likely to be able to take advantage of this new API

It do think it would be ok to update the docs as a follow on PR, especially if you wanted to get this PR into the next release (I plan to make a RC in the next few days)

/// let pool = TrackingMemoryPool::default();
///
/// // Claim the array's memory in the pool
/// array.claim(&pool);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add an example (either here or elsewhere) of how one would use claim?

For example, if we now did

let array2 = array1.slice(0, 1);

Is the idea that now array2.array_memory_size() would be zero?

slice2.claim(&pool);
let final_usage = pool.used();

println!("After claiming 2 slices: {final_usage} bytes");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these should actually test the bytes used (not just print them out)

use arrow_schema::{DataType, Field};
use std::sync::Arc;

fn main() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please -- I think it would be much easier to find as an doc test -- perhaps you could just move it to Array::claim

@alamb
Copy link
Contributor

alamb commented Aug 20, 2025

@waynexia I am preparing for another arrow release, hopefully I'll make the RC tomorrow. Would you like to

  1. address comments on this PR before merge
  2. Merge as is (and maybe address comments later)
  3. Wait for the next release

Just let me know

@alchemist51
Copy link
Contributor

Is it being currently worked upon? I can help in moving this PR forward @waynexia @Dandandan @alamb

notfilippo added a commit to notfilippo/arrow-rs that referenced this pull request Nov 24, 2025
New `claim` API for NullBuffer, ArrayData, and Array.
New `pool` feature-flag to arrow, arrow-array and arrow-data.

Part of apache#8137.
Replaces apache#8040.
@notfilippo
Copy link
Contributor

👋 I've reheated this PR over at feat(memory-tracking): expose API to NullBuffer, ArrayData, and Array. Let me know what you think! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants