-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Raw Data Size Stat to Nimble Files #74
Conversation
This pull request was exported from Phabricator. Differential Revision: D60534808 |
This pull request was exported from Phabricator. Differential Revision: D60534808 |
Summary: Pull Request resolved: facebookincubator#74 # Context Knowing the size of the raw data being written is useful for identifying the cause of changes to file sizes after compressing and encoding when writing to files. # Changes - Adding the optional Stats section to Tablet - Calculating the raw data size in bytes via running sum of memory used right before encoding and flushing, and writing this to tablet. - Unit testing and fuzz testing # Important Notes - In `VeloxWriter.cpp` we have special handling for chunked null streams, which is why `materialize` is called before one of the rawSize stat calculations and not the other. - `rawStripeSize` name updated for clarity and value is now as the name implies, which I can guarantee is correct because it calculated in the same way as raw file size. - Some investigation determined that rawStripeSize in stripe flush metrics is also currently not used anywhere. Differential Revision: D60534808
5a4891e
to
1c71502
Compare
This pull request was exported from Phabricator. Differential Revision: D60534808 |
Summary: Pull Request resolved: facebookincubator#74 # Context Knowing the size of the raw data being written is useful for identifying the cause of changes to file sizes after compressing and encoding when writing to files. # Changes - Adding the optional Stats section to Tablet - `VeloxWriter.cpp`: Calculating the raw data size in bytes via running sum of memory used right before encoding and flushing, and writing this to tablet. - `FieldWriter.h`: Added `rawSize()` virtual function to handle string and nullable cases. - Unit testing and fuzz testing # Important Notes - In `VeloxWriter.cpp` we have special handling for chunked null streams, which is why `materialize` is called before one of the rawSize stat calculations and not the other. - `rawStripeSize` name updated for clarity and value is now as the name implies, which I can guarantee is correct because it calculated in the same way as raw file size. - Some investigation determined that rawStripeSize in stripe flush metrics is also currently not used anywhere. Differential Revision: D60534808
1c71502
to
fa51648
Compare
This pull request was exported from Phabricator. Differential Revision: D60534808 |
Summary: Pull Request resolved: facebookincubator#74 # Context Knowing the size of the raw data being written is useful for identifying the cause of changes to file sizes after compressing and encoding when writing to files. # Changes - Adding the optional Stats section to Tablet - `VeloxWriter.cpp`: Calculating the raw data size in bytes via running sum of memory used right before encoding and flushing, and writing this to tablet. - `FieldWriter.h`: Added `rawSize()` virtual function to handle string and nullable cases. - Unit testing and fuzz testing # Important Notes - In `VeloxWriter.cpp` we have special handling for chunked null streams, which is why `materialize` is called before one of the rawSize stat calculations and not the other. - `rawStripeSize` name updated for clarity and value is now as the name implies, which I can guarantee is correct because it calculated in the same way as raw file size. - Some investigation determined that rawStripeSize in stripe flush metrics is also currently not used anywhere. Differential Revision: D60534808
fa51648
to
0f6d768
Compare
This pull request was exported from Phabricator. Differential Revision: D60534808 |
Summary: Pull Request resolved: facebookincubator#74 # Context Knowing the size of the raw data being written is useful for identifying the cause of changes to file sizes after compressing and encoding when writing to files. # Changes - Adding the optional Stats section to Tablet - `VeloxWriter.cpp`: Calculating the raw data size in bytes via running sum of memory used right before encoding and flushing, and writing this to tablet. - `FieldWriter.h`: Added `rawSize()` virtual function to handle string and nullable cases. - Unit testing and fuzz testing # Important Notes - In `VeloxWriter.cpp` we have special handling for chunked null streams, which is why `materialize` is called before one of the rawSize stat calculations and not the other. - `rawStripeSize` name updated for clarity and value is now as the name implies, which I can guarantee is correct because it calculated in the same way as raw file size. - Some investigation determined that rawStripeSize in stripe flush metrics is also currently not used anywhere. Differential Revision: D60534808
0f6d768
to
ce44599
Compare
This pull request was exported from Phabricator. Differential Revision: D60534808 |
ce44599
to
a234e45
Compare
Summary: Pull Request resolved: facebookincubator#74 # Context Knowing the size of the raw data being written is useful for identifying the cause of changes to file sizes after compressing and encoding when writing to files. # Changes - Adding the optional Stats section to Tablet - `VeloxWriter.cpp`: Calculating the raw data size in bytes via running sum of memory used right before encoding and flushing, and writing this to tablet. - `FieldWriter.h`: Added `rawSize()` virtual function to handle string and nullable cases. - Unit testing and fuzz testing # Important Notes - In `VeloxWriter.cpp` we have special handling for chunked null streams, which is why `materialize` is called before one of the rawSize stat calculations and not the other. - `rawStripeSize` name updated for clarity and value is now as the name implies, which I can guarantee is correct because it calculated in the same way as raw file size. - Some investigation determined that rawStripeSize in stripe flush metrics is also currently not used anywhere. Differential Revision: D60534808
This pull request was exported from Phabricator. Differential Revision: D60534808 |
Summary: Pull Request resolved: facebookincubator#74 # Context Knowing the size of the raw data being written is useful for identifying the cause of changes to file sizes after compressing and encoding when writing to files. # Changes - Adding the optional Stats section to Tablet - `VeloxWriter.cpp`: Calculating the raw data size in bytes via running sum of memory used right before encoding and flushing, and writing this to tablet. - `FieldWriter.h`: Added `rawSize()` virtual function to handle string and nullable cases. - Unit testing and fuzz testing # Important Notes - In `VeloxWriter.cpp` we have special handling for chunked null streams, which is why `materialize` is called before one of the rawSize stat calculations and not the other. - `rawStripeSize` name updated for clarity and value is now as the name implies, which I can guarantee is correct because it calculated in the same way as raw file size. - Some investigation determined that rawStripeSize in stripe flush metrics is also currently not used anywhere. Differential Revision: D60534808
a234e45
to
79c2944
Compare
This pull request was exported from Phabricator. Differential Revision: D60534808 |
Summary: Pull Request resolved: facebookincubator#74 # Context Knowing the size of the raw data being written is useful for identifying the cause of changes to file sizes after compressing and encoding when writing to files. # Changes - Adding the optional Stats section to Tablet - `VeloxWriter.cpp`: Calculating the raw data size in bytes via running sum of memory used right before encoding and flushing, and writing this to tablet. - `FieldWriter.h`: Added `rawSize()` virtual function to handle string and nullable cases. - Unit testing and fuzz testing # Important Notes - In `VeloxWriter.cpp` we have special handling for chunked null streams, which is why `materialize` is called before one of the rawSize stat calculations and not the other. - `rawStripeSize` name updated for clarity and value is now as the name implies, which I can guarantee is correct because it calculated in the same way as raw file size. - Some investigation determined that rawStripeSize in stripe flush metrics is also currently not used anywhere. Differential Revision: D60534808
79c2944
to
3f6df76
Compare
This pull request was exported from Phabricator. Differential Revision: D60534808 |
Summary: Pull Request resolved: facebookincubator#74 # Context Knowing the size of the raw data being written is useful for identifying the cause of changes to file sizes after compressing and encoding when writing to files. # Changes - Adding the optional Stats section to Tablet - `VeloxWriter.cpp`: Calculating the raw data size in bytes via running sum of memory used right before encoding and flushing, and writing this to tablet. - `FieldWriter.h`: Added `rawSize()` virtual function to handle string and nullable cases. - Unit testing and fuzz testing # Important Notes - In `VeloxWriter.cpp` we have special handling for chunked null streams, which is why `materialize` is called before one of the rawSize stat calculations and not the other. - `rawStripeSize` name updated for clarity and value is now as the name implies, which I can guarantee is correct because it calculated in the same way as raw file size. - Some investigation determined that rawStripeSize in stripe flush metrics is also currently not used anywhere. Differential Revision: D60534808
3f6df76
to
7e8e3eb
Compare
This pull request was exported from Phabricator. Differential Revision: D60534808 |
Summary: Pull Request resolved: facebookincubator#74 # Context Knowing the size of the raw data being written is useful for identifying the cause of changes to file sizes after compressing and encoding when writing to files. # Changes - Adding the optional Stats section to Tablet - `VeloxWriter.cpp`: Calculating the raw data size in bytes via running sum of memory used right before encoding and flushing, and writing this to tablet. - `FieldWriter.h`: Added `rawSize()` virtual function to handle string and nullable cases. - Unit testing and fuzz testing # Important Notes - In `VeloxWriter.cpp` we have special handling for chunked null streams, which is why `materialize` is called before one of the rawSize stat calculations and not the other. - `rawStripeSize` name updated for clarity and value is now as the name implies, which I can guarantee is correct because it calculated in the same way as raw file size. - Some investigation determined that rawStripeSize in stripe flush metrics is also currently not used anywhere. Differential Revision: D60534808
7e8e3eb
to
b5f4129
Compare
Summary: Pull Request resolved: facebookincubator#74 # Context Knowing the size of the raw data being written is useful for identifying the cause of changes to file sizes after compressing and encoding when writing to files. The implementation calculating the logical raw data size for the file will be implemented in a followup task. # Changes - Adding the optional Stats section to Tablet - `VeloxWriter.cpp`: Calculating the raw data size in bytes via running sum of memory used right before encoding and flushing - temporarily writing 0 to rawSize to maintain the status quo while the final calculation is not yet implemented. - `FieldWriter.h`: Added `rawSize()` virtual function to handle string and nullable cases. # Important Notes - In `VeloxWriter.cpp` we have special handling for chunked null streams, which is why `materialize` is called before one of the rawSize stat calculations and not the other. - `rawStripeSize` name updated for clarity. - Some investigation determined that rawStripeSize in stripe flush metrics is also currently not used anywhere. Differential Revision: D60534808
0df7136
to
26d596f
Compare
This pull request was exported from Phabricator. Differential Revision: D60534808 |
Summary: Pull Request resolved: facebookincubator#74 # Context Knowing the size of the raw data being written is useful for identifying the cause of changes to file sizes after compressing and encoding when writing to files. The implementation calculating the logical raw data size for the file will be implemented in a followup task. # Changes - Adding the optional Stats section to Tablet - `VeloxWriter.cpp`: Calculating the raw data size in bytes via running sum of memory used right before encoding and flushing - temporarily writing 0 to rawSize to maintain the status quo while the final calculation is not yet implemented. - `FieldWriter.h`: Added `rawSize()` virtual function to handle string and nullable cases. # Important Notes - In `VeloxWriter.cpp` we have special handling for chunked null streams, which is why `materialize` is called before one of the rawSize stat calculations and not the other. - `rawStripeSize` name updated for clarity. - Some investigation determined that rawStripeSize in stripe flush metrics is also currently not used anywhere. Differential Revision: D60534808
26d596f
to
fc2a7d1
Compare
This pull request was exported from Phabricator. Differential Revision: D60534808 |
Summary: Pull Request resolved: facebookincubator#74 # Context Knowing the size of the raw data being written is useful for identifying the cause of changes to file sizes after compressing and encoding when writing to files. The implementation calculating the logical raw data size for the file will be implemented in a followup task. # Changes - Adding the optional Stats section to Tablet - `VeloxWriter.cpp`: Calculating the raw data size in bytes via running sum of memory used right before encoding and flushing - temporarily writing 0 to rawSize to maintain the status quo while the final calculation is not yet implemented. - `FieldWriter.h`: Added `rawSize()` virtual function to handle string and nullable cases. # Important Notes - In `VeloxWriter.cpp` we have special handling for chunked null streams, which is why `materialize` is called before one of the rawSize stat calculations and not the other. - `rawStripeSize` name updated for clarity. - Some investigation determined that rawStripeSize in stripe flush metrics is also currently not used anywhere. Differential Revision: D60534808
fc2a7d1
to
b9da7db
Compare
Summary: Pull Request resolved: facebookincubator#74 # Context Knowing the size of the raw data being written is useful for identifying the cause of changes to file sizes after compressing and encoding when writing to files. The implementation calculating the logical raw data size for the file will be implemented in a followup task. # Changes - Adding the optional Stats section to Tablet - `VeloxWriter.cpp`: Calculating the raw data size in bytes via running sum of memory used right before encoding and flushing - temporarily writing 0 to rawSize to maintain the status quo while the final calculation is not yet implemented. - `FieldWriter.h`: Added `rawSize()` virtual function to handle string and nullable cases. # Important Notes - In `VeloxWriter.cpp` we have special handling for chunked null streams, which is why `materialize` is called before one of the rawSize stat calculations and not the other. - `rawStripeSize` name updated for clarity. - Some investigation determined that rawStripeSize in stripe flush metrics is also currently not used anywhere. Differential Revision: D60534808
This pull request was exported from Phabricator. Differential Revision: D60534808 |
Summary: Pull Request resolved: facebookincubator#74 # Context Knowing the size of the raw data being written is useful for identifying the cause of changes to file sizes after compressing and encoding when writing to files. The implementation calculating the logical raw data size for the file will be implemented in a followup task. # Changes - Adding the optional Stats section to Tablet - `VeloxWriter.cpp`: Calculating the raw data size in bytes via running sum of memory used right before encoding and flushing - temporarily writing 0 to rawSize to maintain the status quo while the final calculation is not yet implemented. - `FieldWriter.h`: Added `rawSize()` virtual function to handle string and nullable cases. # Important Notes - In `VeloxWriter.cpp` we have special handling for chunked null streams, which is why `materialize` is called before one of the rawSize stat calculations and not the other. - `rawStripeSize` name updated for clarity. - Some investigation determined that rawStripeSize in stripe flush metrics is also currently not used anywhere. Differential Revision: D60534808
b9da7db
to
4a852c6
Compare
Summary: Pull Request resolved: facebookincubator#74 Raw size will be implemented in a followup task, to better reflect logical data size that is file format agnostic. # Changes - Adding the optional Stats section to Tablet - ~~`VeloxWriter.cpp`: Calculating the raw data size in bytes via running sum of memory used right before encoding and flushing~~ - ~~`FieldWriter.h`: Added `rawSize()` virtual function to handle string and nullable cases.~~ # Important Notes - In `VeloxWriter.cpp` we have special handling for chunked null streams, which is why `materialize` is called in an if statement. - `rawStripeSize` name updated for clarity. - Some investigation determined that rawStripeSize in stripe flush metrics is also currently not used anywhere. Differential Revision: D60534808
4a852c6
to
033340a
Compare
This pull request was exported from Phabricator. Differential Revision: D60534808 |
Summary: Pull Request resolved: facebookincubator#74 # Context Knowing the size of the raw data being written is useful for identifying the cause of changes to file sizes after compressing and encoding when writing to files. The implementation calculating the logical raw data size for the file will be implemented in a followup task. # Changes - Adding the optional Stats section to Tablet - `VeloxWriter.cpp`: Calculating the raw data size in bytes via running sum of memory used right before encoding and flushing - temporarily writing 0 to rawSize to maintain the status quo while the final calculation is not yet implemented. - `FieldWriter.h`: Added `rawSize()` virtual function to handle string and nullable cases. # Important Notes - In `VeloxWriter.cpp` we have special handling for chunked null streams, which is why `materialize` is called before one of the rawSize stat calculations and not the other. - `rawStripeSize` name updated for clarity. - Some investigation determined that rawStripeSize in stripe flush metrics is also currently not used anywhere. Differential Revision: D60534808
Summary: Pull Request resolved: facebookincubator#74 Raw size will be implemented in a followup task, to better reflect logical data size that is file format agnostic. # Changes - Adding the optional Stats section to Tablet - ~~`VeloxWriter.cpp`: Calculating the raw data size in bytes via running sum of memory used right before encoding and flushing~~ - ~~`FieldWriter.h`: Added `rawSize()` virtual function to handle string and nullable cases.~~ # Important Notes - In `VeloxWriter.cpp` we have special handling for chunked null streams, which is why `materialize` is called in an if statement. - `rawStripeSize` name updated for clarity. - Some investigation determined that rawStripeSize in stripe flush metrics is also currently not used anywhere. Differential Revision: D60534808
033340a
to
8ce0fcf
Compare
This pull request was exported from Phabricator. Differential Revision: D60534808 |
1 similar comment
This pull request was exported from Phabricator. Differential Revision: D60534808 |
8ce0fcf
to
a19ba3b
Compare
Summary: Pull Request resolved: facebookincubator#74 Raw size will be implemented in a followup task, to better reflect logical data size that is file format agnostic. # Changes - Adding the optional Stats section to Tablet - ~~`VeloxWriter.cpp`: Calculating the raw data size in bytes via running sum of memory used right before encoding and flushing~~ - ~~`FieldWriter.h`: Added `rawSize()` virtual function to handle string and nullable cases.~~ # Important Notes - In `VeloxWriter.cpp` we have special handling for chunked null streams, which is why `materialize` is called in an if statement. Differential Revision: D60534808
Summary: Pull Request resolved: facebookincubator#74 Raw size will be implemented in a followup task, to better reflect logical data size that is file format agnostic. # Changes - Adding the optional Stats section to Tablet `Constants.h` - Updated some comments for clarity Differential Revision: D60534808
a19ba3b
to
6f0f259
Compare
This pull request was exported from Phabricator. Differential Revision: D60534808 |
Summary: Pull Request resolved: facebookincubator#74 Raw size will be implemented in a followup task, to better reflect logical data size that is file format agnostic. # Changes - Adding the optional Stats section to Tablet `Constants.h` - Updated some comments for clarity Differential Revision: D60534808
This pull request has been merged in 9095413. |
Summary:
Context
Knowing the size of the raw data being written is useful for identifying the cause of changes to file sizes after compressing and encoding when writing to files.
Changes
Important Notes
VeloxWriter.cpp
we have special handling for chunked null streams, which is whymaterialize
is called before one of the rawSize stat calculations and not the other.rawStripeSize
name updated for clarity and value is now as the name implies, which I can guarantee is correct because it calculated in the same way as raw file size.Differential Revision: D60534808