Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Raw Data Size Stat to Nimble Files #74

Closed

Conversation

phoenixawe
Copy link
Contributor

Summary:

Context

Knowing the size of the raw data being written is useful for identifying the cause of changes to file sizes after compressing and encoding when writing to files.

Changes

  • Adding the optional Stats section to Tablet
  • Calculating the raw data size in bytes via running sum of memory used right before encoding and flushing, and writing this to tablet.
  • Unit testing and fuzz testing

Important Notes

  • In VeloxWriter.cpp we have special handling for chunked null streams, which is why materialize is called before one of the rawSize stat calculations and not the other.
  • rawStripeSize name updated for clarity and value is now as the name implies, which I can guarantee is correct because it calculated in the same way as raw file size.
    • Some investigation determined that rawStripeSize in stripe flush metrics is also currently not used anywhere.

Differential Revision: D60534808

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 19, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60534808

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60534808

phoenixawe added a commit to phoenixawe/nimble that referenced this pull request Aug 20, 2024
Summary:
Pull Request resolved: facebookincubator#74

# Context
Knowing the size of the raw data being written is useful for identifying the cause of changes to file sizes after compressing and encoding when writing to files.

# Changes
- Adding the optional Stats section to Tablet
- Calculating the raw data size in bytes via running sum of memory used right before encoding and flushing, and writing this to tablet.
- Unit testing and fuzz testing

# Important Notes
- In `VeloxWriter.cpp` we have special handling for chunked null streams, which is why `materialize` is called before one of the rawSize stat calculations and not the other.
- `rawStripeSize` name updated for clarity and value is now as the name implies, which I can guarantee is correct because it calculated in the same way as raw file size.
  - Some investigation determined that rawStripeSize in stripe flush metrics is also currently not used anywhere.

Differential Revision: D60534808
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60534808

phoenixawe added a commit to phoenixawe/nimble that referenced this pull request Aug 20, 2024
Summary:
Pull Request resolved: facebookincubator#74

# Context
Knowing the size of the raw data being written is useful for identifying the cause of changes to file sizes after compressing and encoding when writing to files.

# Changes
- Adding the optional Stats section to Tablet
- `VeloxWriter.cpp`: Calculating the raw data size in bytes via running sum of memory used right before encoding and flushing, and writing this to tablet.
  - `FieldWriter.h`: Added `rawSize()` virtual function to handle string and nullable cases.
- Unit testing and fuzz testing

# Important Notes

- In `VeloxWriter.cpp` we have special handling for chunked null streams, which is why `materialize` is called before one of the rawSize stat calculations and not the other.
- `rawStripeSize` name updated for clarity and value is now as the name implies, which I can guarantee is correct because it calculated in the same way as raw file size.
  - Some investigation determined that rawStripeSize in stripe flush metrics is also currently not used anywhere.

Differential Revision: D60534808
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60534808

phoenixawe added a commit to phoenixawe/nimble that referenced this pull request Aug 20, 2024
Summary:
Pull Request resolved: facebookincubator#74

# Context
Knowing the size of the raw data being written is useful for identifying the cause of changes to file sizes after compressing and encoding when writing to files.

# Changes
- Adding the optional Stats section to Tablet
- `VeloxWriter.cpp`: Calculating the raw data size in bytes via running sum of memory used right before encoding and flushing, and writing this to tablet.
  - `FieldWriter.h`: Added `rawSize()` virtual function to handle string and nullable cases.
- Unit testing and fuzz testing

# Important Notes

- In `VeloxWriter.cpp` we have special handling for chunked null streams, which is why `materialize` is called before one of the rawSize stat calculations and not the other.
- `rawStripeSize` name updated for clarity and value is now as the name implies, which I can guarantee is correct because it calculated in the same way as raw file size.
  - Some investigation determined that rawStripeSize in stripe flush metrics is also currently not used anywhere.

Differential Revision: D60534808
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60534808

phoenixawe added a commit to phoenixawe/nimble that referenced this pull request Aug 21, 2024
Summary:
Pull Request resolved: facebookincubator#74

# Context
Knowing the size of the raw data being written is useful for identifying the cause of changes to file sizes after compressing and encoding when writing to files.

# Changes
- Adding the optional Stats section to Tablet
- `VeloxWriter.cpp`: Calculating the raw data size in bytes via running sum of memory used right before encoding and flushing, and writing this to tablet.
  - `FieldWriter.h`: Added `rawSize()` virtual function to handle string and nullable cases.
- Unit testing and fuzz testing

# Important Notes

- In `VeloxWriter.cpp` we have special handling for chunked null streams, which is why `materialize` is called before one of the rawSize stat calculations and not the other.
- `rawStripeSize` name updated for clarity and value is now as the name implies, which I can guarantee is correct because it calculated in the same way as raw file size.
  - Some investigation determined that rawStripeSize in stripe flush metrics is also currently not used anywhere.

Differential Revision: D60534808
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60534808

phoenixawe added a commit to phoenixawe/nimble that referenced this pull request Aug 22, 2024
Summary:
Pull Request resolved: facebookincubator#74

# Context
Knowing the size of the raw data being written is useful for identifying the cause of changes to file sizes after compressing and encoding when writing to files.

# Changes
- Adding the optional Stats section to Tablet
- `VeloxWriter.cpp`: Calculating the raw data size in bytes via running sum of memory used right before encoding and flushing, and writing this to tablet.
  - `FieldWriter.h`: Added `rawSize()` virtual function to handle string and nullable cases.
- Unit testing and fuzz testing

# Important Notes

- In `VeloxWriter.cpp` we have special handling for chunked null streams, which is why `materialize` is called before one of the rawSize stat calculations and not the other.
- `rawStripeSize` name updated for clarity and value is now as the name implies, which I can guarantee is correct because it calculated in the same way as raw file size.
  - Some investigation determined that rawStripeSize in stripe flush metrics is also currently not used anywhere.

Differential Revision: D60534808
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60534808

phoenixawe added a commit to phoenixawe/nimble that referenced this pull request Aug 22, 2024
Summary:
Pull Request resolved: facebookincubator#74

# Context
Knowing the size of the raw data being written is useful for identifying the cause of changes to file sizes after compressing and encoding when writing to files.

# Changes
- Adding the optional Stats section to Tablet
- `VeloxWriter.cpp`: Calculating the raw data size in bytes via running sum of memory used right before encoding and flushing, and writing this to tablet.
  - `FieldWriter.h`: Added `rawSize()` virtual function to handle string and nullable cases.
- Unit testing and fuzz testing

# Important Notes

- In `VeloxWriter.cpp` we have special handling for chunked null streams, which is why `materialize` is called before one of the rawSize stat calculations and not the other.
- `rawStripeSize` name updated for clarity and value is now as the name implies, which I can guarantee is correct because it calculated in the same way as raw file size.
  - Some investigation determined that rawStripeSize in stripe flush metrics is also currently not used anywhere.

Differential Revision: D60534808
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60534808

phoenixawe added a commit to phoenixawe/nimble that referenced this pull request Aug 22, 2024
Summary:
Pull Request resolved: facebookincubator#74

# Context
Knowing the size of the raw data being written is useful for identifying the cause of changes to file sizes after compressing and encoding when writing to files.

# Changes
- Adding the optional Stats section to Tablet
- `VeloxWriter.cpp`: Calculating the raw data size in bytes via running sum of memory used right before encoding and flushing, and writing this to tablet.
  - `FieldWriter.h`: Added `rawSize()` virtual function to handle string and nullable cases.
- Unit testing and fuzz testing

# Important Notes

- In `VeloxWriter.cpp` we have special handling for chunked null streams, which is why `materialize` is called before one of the rawSize stat calculations and not the other.
- `rawStripeSize` name updated for clarity and value is now as the name implies, which I can guarantee is correct because it calculated in the same way as raw file size.
  - Some investigation determined that rawStripeSize in stripe flush metrics is also currently not used anywhere.

Differential Revision: D60534808
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60534808

phoenixawe added a commit to phoenixawe/nimble that referenced this pull request Aug 23, 2024
Summary:
Pull Request resolved: facebookincubator#74

# Context
Knowing the size of the raw data being written is useful for identifying the cause of changes to file sizes after compressing and encoding when writing to files.

# Changes
- Adding the optional Stats section to Tablet
- `VeloxWriter.cpp`: Calculating the raw data size in bytes via running sum of memory used right before encoding and flushing, and writing this to tablet.
  - `FieldWriter.h`: Added `rawSize()` virtual function to handle string and nullable cases.
- Unit testing and fuzz testing

# Important Notes

- In `VeloxWriter.cpp` we have special handling for chunked null streams, which is why `materialize` is called before one of the rawSize stat calculations and not the other.
- `rawStripeSize` name updated for clarity and value is now as the name implies, which I can guarantee is correct because it calculated in the same way as raw file size.
  - Some investigation determined that rawStripeSize in stripe flush metrics is also currently not used anywhere.

Differential Revision: D60534808
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60534808

phoenixawe added a commit to phoenixawe/nimble that referenced this pull request Aug 23, 2024
Summary:
Pull Request resolved: facebookincubator#74

# Context
Knowing the size of the raw data being written is useful for identifying the cause of changes to file sizes after compressing and encoding when writing to files.

# Changes
- Adding the optional Stats section to Tablet
- `VeloxWriter.cpp`: Calculating the raw data size in bytes via running sum of memory used right before encoding and flushing, and writing this to tablet.
  - `FieldWriter.h`: Added `rawSize()` virtual function to handle string and nullable cases.
- Unit testing and fuzz testing

# Important Notes

- In `VeloxWriter.cpp` we have special handling for chunked null streams, which is why `materialize` is called before one of the rawSize stat calculations and not the other.
- `rawStripeSize` name updated for clarity and value is now as the name implies, which I can guarantee is correct because it calculated in the same way as raw file size.
  - Some investigation determined that rawStripeSize in stripe flush metrics is also currently not used anywhere.

Differential Revision: D60534808
phoenixawe added a commit to phoenixawe/nimble that referenced this pull request Sep 19, 2024
Summary:
Pull Request resolved: facebookincubator#74

# Context
Knowing the size of the raw data being written is useful for identifying the cause of changes to file sizes after compressing and encoding when writing to files.
The implementation calculating the logical raw data size for the file will be implemented in a followup task.

# Changes
- Adding the optional Stats section to Tablet
- `VeloxWriter.cpp`: Calculating the raw data size in bytes via running sum of memory used right before encoding and flushing
  - temporarily writing 0 to rawSize to maintain the status quo while the final calculation is not yet implemented.
  - `FieldWriter.h`: Added `rawSize()` virtual function to handle string and nullable cases.

# Important Notes

- In `VeloxWriter.cpp` we have special handling for chunked null streams, which is why `materialize` is called before one of the rawSize stat calculations and not the other.
- `rawStripeSize` name updated for clarity.
  - Some investigation determined that rawStripeSize in stripe flush metrics is also currently not used anywhere.

Differential Revision: D60534808
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60534808

phoenixawe added a commit to phoenixawe/nimble that referenced this pull request Sep 19, 2024
Summary:
Pull Request resolved: facebookincubator#74

# Context
Knowing the size of the raw data being written is useful for identifying the cause of changes to file sizes after compressing and encoding when writing to files.
The implementation calculating the logical raw data size for the file will be implemented in a followup task.

# Changes
- Adding the optional Stats section to Tablet
- `VeloxWriter.cpp`: Calculating the raw data size in bytes via running sum of memory used right before encoding and flushing
  - temporarily writing 0 to rawSize to maintain the status quo while the final calculation is not yet implemented.
  - `FieldWriter.h`: Added `rawSize()` virtual function to handle string and nullable cases.

# Important Notes

- In `VeloxWriter.cpp` we have special handling for chunked null streams, which is why `materialize` is called before one of the rawSize stat calculations and not the other.
- `rawStripeSize` name updated for clarity.
  - Some investigation determined that rawStripeSize in stripe flush metrics is also currently not used anywhere.

Differential Revision: D60534808
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60534808

phoenixawe added a commit to phoenixawe/nimble that referenced this pull request Sep 19, 2024
Summary:
Pull Request resolved: facebookincubator#74

# Context
Knowing the size of the raw data being written is useful for identifying the cause of changes to file sizes after compressing and encoding when writing to files.
The implementation calculating the logical raw data size for the file will be implemented in a followup task.

# Changes
- Adding the optional Stats section to Tablet
- `VeloxWriter.cpp`: Calculating the raw data size in bytes via running sum of memory used right before encoding and flushing
  - temporarily writing 0 to rawSize to maintain the status quo while the final calculation is not yet implemented.
  - `FieldWriter.h`: Added `rawSize()` virtual function to handle string and nullable cases.

# Important Notes

- In `VeloxWriter.cpp` we have special handling for chunked null streams, which is why `materialize` is called before one of the rawSize stat calculations and not the other.
- `rawStripeSize` name updated for clarity.
  - Some investigation determined that rawStripeSize in stripe flush metrics is also currently not used anywhere.

Differential Revision: D60534808
phoenixawe added a commit to phoenixawe/nimble that referenced this pull request Sep 19, 2024
Summary:
Pull Request resolved: facebookincubator#74

# Context
Knowing the size of the raw data being written is useful for identifying the cause of changes to file sizes after compressing and encoding when writing to files.
The implementation calculating the logical raw data size for the file will be implemented in a followup task.

# Changes
- Adding the optional Stats section to Tablet
- `VeloxWriter.cpp`: Calculating the raw data size in bytes via running sum of memory used right before encoding and flushing
  - temporarily writing 0 to rawSize to maintain the status quo while the final calculation is not yet implemented.
  - `FieldWriter.h`: Added `rawSize()` virtual function to handle string and nullable cases.

# Important Notes

- In `VeloxWriter.cpp` we have special handling for chunked null streams, which is why `materialize` is called before one of the rawSize stat calculations and not the other.
- `rawStripeSize` name updated for clarity.
  - Some investigation determined that rawStripeSize in stripe flush metrics is also currently not used anywhere.

Differential Revision: D60534808
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60534808

phoenixawe added a commit to phoenixawe/nimble that referenced this pull request Sep 19, 2024
Summary:
Pull Request resolved: facebookincubator#74

# Context
Knowing the size of the raw data being written is useful for identifying the cause of changes to file sizes after compressing and encoding when writing to files.
The implementation calculating the logical raw data size for the file will be implemented in a followup task.

# Changes
- Adding the optional Stats section to Tablet
- `VeloxWriter.cpp`: Calculating the raw data size in bytes via running sum of memory used right before encoding and flushing
  - temporarily writing 0 to rawSize to maintain the status quo while the final calculation is not yet implemented.
  - `FieldWriter.h`: Added `rawSize()` virtual function to handle string and nullable cases.

# Important Notes

- In `VeloxWriter.cpp` we have special handling for chunked null streams, which is why `materialize` is called before one of the rawSize stat calculations and not the other.
- `rawStripeSize` name updated for clarity.
  - Some investigation determined that rawStripeSize in stripe flush metrics is also currently not used anywhere.

Differential Revision: D60534808
phoenixawe added a commit to phoenixawe/nimble that referenced this pull request Sep 19, 2024
Summary:
Pull Request resolved: facebookincubator#74

Raw size will be implemented in a followup task, to better reflect logical data size that is file format agnostic.

# Changes
- Adding the optional Stats section to Tablet

- ~~`VeloxWriter.cpp`: Calculating the raw data size in bytes via running sum of memory used right before encoding and flushing~~
  - ~~`FieldWriter.h`: Added `rawSize()` virtual function to handle string and nullable cases.~~

# Important Notes

- In `VeloxWriter.cpp` we have special handling for chunked null streams, which is why `materialize` is called in an if statement.
- `rawStripeSize` name updated for clarity.
  - Some investigation determined that rawStripeSize in stripe flush metrics is also currently not used anywhere.

Differential Revision: D60534808
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60534808

phoenixawe added a commit to phoenixawe/nimble that referenced this pull request Sep 20, 2024
Summary:
Pull Request resolved: facebookincubator#74

# Context
Knowing the size of the raw data being written is useful for identifying the cause of changes to file sizes after compressing and encoding when writing to files.
The implementation calculating the logical raw data size for the file will be implemented in a followup task.

# Changes
- Adding the optional Stats section to Tablet
- `VeloxWriter.cpp`: Calculating the raw data size in bytes via running sum of memory used right before encoding and flushing
  - temporarily writing 0 to rawSize to maintain the status quo while the final calculation is not yet implemented.
  - `FieldWriter.h`: Added `rawSize()` virtual function to handle string and nullable cases.

# Important Notes

- In `VeloxWriter.cpp` we have special handling for chunked null streams, which is why `materialize` is called before one of the rawSize stat calculations and not the other.
- `rawStripeSize` name updated for clarity.
  - Some investigation determined that rawStripeSize in stripe flush metrics is also currently not used anywhere.

Differential Revision: D60534808
phoenixawe added a commit to phoenixawe/nimble that referenced this pull request Sep 20, 2024
Summary:
Pull Request resolved: facebookincubator#74

Raw size will be implemented in a followup task, to better reflect logical data size that is file format agnostic.

# Changes
- Adding the optional Stats section to Tablet

- ~~`VeloxWriter.cpp`: Calculating the raw data size in bytes via running sum of memory used right before encoding and flushing~~
  - ~~`FieldWriter.h`: Added `rawSize()` virtual function to handle string and nullable cases.~~

# Important Notes

- In `VeloxWriter.cpp` we have special handling for chunked null streams, which is why `materialize` is called in an if statement.
- `rawStripeSize` name updated for clarity.
  - Some investigation determined that rawStripeSize in stripe flush metrics is also currently not used anywhere.

Differential Revision: D60534808
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60534808

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60534808

phoenixawe added a commit to phoenixawe/nimble that referenced this pull request Sep 20, 2024
Summary:
Pull Request resolved: facebookincubator#74

Raw size will be implemented in a followup task, to better reflect logical data size that is file format agnostic.

# Changes
- Adding the optional Stats section to Tablet

- ~~`VeloxWriter.cpp`: Calculating the raw data size in bytes via running sum of memory used right before encoding and flushing~~
  - ~~`FieldWriter.h`: Added `rawSize()` virtual function to handle string and nullable cases.~~

# Important Notes

- In `VeloxWriter.cpp` we have special handling for chunked null streams, which is why `materialize` is called in an if statement.

Differential Revision: D60534808
Summary:
Pull Request resolved: facebookincubator#74

Raw size will be implemented in a followup task, to better reflect logical data size that is file format agnostic.

# Changes
- Adding the optional Stats section to Tablet `Constants.h`

- Updated some comments for clarity

Differential Revision: D60534808
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60534808

phoenixawe added a commit to phoenixawe/nimble that referenced this pull request Sep 20, 2024
Summary:
Pull Request resolved: facebookincubator#74

Raw size will be implemented in a followup task, to better reflect logical data size that is file format agnostic.

# Changes
- Adding the optional Stats section to Tablet `Constants.h`

- Updated some comments for clarity

Differential Revision: D60534808
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 9095413.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants