From 8dff196c6b672430821c76801cdefd977ada1304 Mon Sep 17 00:00:00 2001 From: Natan Muntean Date: Fri, 16 Feb 2024 18:28:11 +0200 Subject: [PATCH 01/15] Update wafermap API description --- .../src/wafer-map/specs/features/rendering.md | 40 +++++++++++++++++++ 1 file changed, 40 insertions(+) diff --git a/packages/nimble-components/src/wafer-map/specs/features/rendering.md b/packages/nimble-components/src/wafer-map/specs/features/rendering.md index effc10146d..6e4e23d2cd 100644 --- a/packages/nimble-components/src/wafer-map/specs/features/rendering.md +++ b/packages/nimble-components/src/wafer-map/specs/features/rendering.md @@ -82,6 +82,46 @@ The POC is found in this branch [Worker Rendering POC](https://github.com/ni/nim ### Data Structure and Interface +The best solution to solve teh API of the wafermap is to use both of the proposed methods. + +The Public API will be the following: + +```TS +public diesTable: Table<{ + colIndex: Int32, + rowIndex: Int32, + value: Float32, + tags: Uint32; + metadata: never; + }> +``` + +This will be the [Apache Arrow](https://arrow.apache.org/docs/js/classes/Arrow_dom.Table.html) table schema. +The row and column indices will be `Int32` columns, the values will be `Float32` columns. +The tags for each die will be represented as a 32 bit mask stored in a `Uint32` column. +The metadata column will be stored in an wildcard typed column. + +This approach has the benefits of a row based format that aligns well with the existing public api, a nice public API to use and the ease of future improvements. + +The limits for this approach are the following: + +1. There seems to be no support for columns of lists of strings. We decided to overcome this using a bit mask of tags. Another possible solution can be a dynamic number of rows for storing tags, but the performance may suffer. +1. There is no support currently for [searching or filtering the table](https://github.com/apache/arrow/issues/13233). The possible solutions for this are searching by iterating over the whole table, which is not feasible (see 3.) or using a higher level library such as [aquero](https://uwdata.github.io/arquero/).The solution we chose is using a custom method for finding rows based on column and row indexes cached as typed arrays. This method provides faster access to row values and metadata and does not induce additional dependencies. +1. The transfer method for arrow tables is cumbersome, we would have to use another higher level library [geoarrow](https://github.com/geoarrow/geoarrow-js/blob/main/src/worker/transferable.ts). Fortunately we can skip over this problem by not transferring the tables. +1. The iteration over stored rows is very bad compared to typed arrays as seen in the table below. This impacts the goals we set for this rendering improvement. The solution to this issue and the transferring issue is splitting the relevant columns from the table (rows, columns, values, tags mask) and messaging them to the worker separately. This can be done with a very small overhead using the [getChild](https://arrow.apache.org/docs/js/classes/Arrow_dom.Table.html#getChild) method and calling [toArray](https://arrow.apache.org/docs/js/classes/Arrow_dom.Vector.html#toArray) on the resulting vector. After being transferred, The buffers can be cached to speed up value access and filtering. + +| name | duration (ms) [1] | duration (ms) [2] | detail | +| ------------------------- | ------------------ | ------------------ | --------------------------------------------------------------------------------------------------------- | +| typed iterate | 7.551699995994568 | 6.052600026130676 | iterating over two 1M typed arrays and calculating the sums | +| typed from table iterate | 6.4953999519348145 | 5.136900067329407 | iterating over two 1M typed arrays from Table columns and calculating the sums (time includes conversion) | +| vector iterate | 76.4708000421524 | 66.58230006694794 | iterating over two 1M Vectors and calculating the sums | +| table get() iterate | 1350.0404000282288 | 1030.582899928093 | iterating over the 1M Table using `table.get(rowIndex)` and calculating the sums | +| table [iterator] iterate | 1091.6706000566483 | 1011.069100022316 | iterating over the 1M Table using the [iterator] and calculating the sums | +| array from table iterate | 943.0076999664307 | 980.0875999927521 | iterating over the 1M Table after converting `toArray()` and calculating the sums | +| vector from table iterate | 965.2465000152588 | 1012.9023000001907 | iterating over the 1M Vector after converting the Table with `makeVector()` and calculating the sums | + +#### Previously proposed solutions + We have two possible solutions for representing the data in the memory. They will be decided with a spec update. The fist one is an in-house solution: ```TS From 314e4b14453cd5c86ccb8f4fc403f7a2f2ca32f9 Mon Sep 17 00:00:00 2001 From: Natan Muntean Date: Fri, 16 Feb 2024 18:52:28 +0200 Subject: [PATCH 02/15] Change files --- ...le-components-c0f72f9f-b879-43ad-bc5b-ee8b74e2700f.json | 7 +++++++ 1 file changed, 7 insertions(+) create mode 100644 change/@ni-nimble-components-c0f72f9f-b879-43ad-bc5b-ee8b74e2700f.json diff --git a/change/@ni-nimble-components-c0f72f9f-b879-43ad-bc5b-ee8b74e2700f.json b/change/@ni-nimble-components-c0f72f9f-b879-43ad-bc5b-ee8b74e2700f.json new file mode 100644 index 0000000000..3069713da1 --- /dev/null +++ b/change/@ni-nimble-components-c0f72f9f-b879-43ad-bc5b-ee8b74e2700f.json @@ -0,0 +1,7 @@ +{ + "type": "none", + "comment": "Update rendering spec for Wafer Map component with API changes", + "packageName": "@ni/nimble-components", + "email": "33986780+munteannatan@users.noreply.github.com", + "dependentChangeType": "none" +} From 345f3e27b23e98b23b6c0b2a5f8f85fbb4161e41 Mon Sep 17 00:00:00 2001 From: Natan Muntean Date: Fri, 16 Feb 2024 19:01:35 +0200 Subject: [PATCH 03/15] updated typos and memory concerns --- .../src/wafer-map/specs/features/rendering.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/packages/nimble-components/src/wafer-map/specs/features/rendering.md b/packages/nimble-components/src/wafer-map/specs/features/rendering.md index 6e4e23d2cd..4646ce2a7d 100644 --- a/packages/nimble-components/src/wafer-map/specs/features/rendering.md +++ b/packages/nimble-components/src/wafer-map/specs/features/rendering.md @@ -82,7 +82,7 @@ The POC is found in this branch [Worker Rendering POC](https://github.com/ni/nim ### Data Structure and Interface -The best solution to solve teh API of the wafermap is to use both of the proposed methods. +The best solution to solve the API of the wafermap is to use both of the proposed methods. The Public API will be the following: @@ -105,8 +105,8 @@ This approach has the benefits of a row based format that aligns well with the e The limits for this approach are the following: -1. There seems to be no support for columns of lists of strings. We decided to overcome this using a bit mask of tags. Another possible solution can be a dynamic number of rows for storing tags, but the performance may suffer. -1. There is no support currently for [searching or filtering the table](https://github.com/apache/arrow/issues/13233). The possible solutions for this are searching by iterating over the whole table, which is not feasible (see 3.) or using a higher level library such as [aquero](https://uwdata.github.io/arquero/).The solution we chose is using a custom method for finding rows based on column and row indexes cached as typed arrays. This method provides faster access to row values and metadata and does not induce additional dependencies. +1. There seems to be no support for columns of lists of strings. We decided to overcome this using a bit mask of tags. Another possible solution can be a dynamic number of columns for storing tags, but the performance may suffer. +1. There is no support currently for [searching or filtering the table](https://github.com/apache/arrow/issues/13233). The possible solutions for this are searching by iterating over the whole table, which is not feasible (see 4.) or using a higher level library such as [aquero](https://uwdata.github.io/arquero/).The solution we chose is using a custom method for finding rows based on column and row indexes cached as typed arrays. This method provides faster access to row values and metadata and does not induce additional dependencies. 1. The transfer method for arrow tables is cumbersome, we would have to use another higher level library [geoarrow](https://github.com/geoarrow/geoarrow-js/blob/main/src/worker/transferable.ts). Fortunately we can skip over this problem by not transferring the tables. 1. The iteration over stored rows is very bad compared to typed arrays as seen in the table below. This impacts the goals we set for this rendering improvement. The solution to this issue and the transferring issue is splitting the relevant columns from the table (rows, columns, values, tags mask) and messaging them to the worker separately. This can be done with a very small overhead using the [getChild](https://arrow.apache.org/docs/js/classes/Arrow_dom.Table.html#getChild) method and calling [toArray](https://arrow.apache.org/docs/js/classes/Arrow_dom.Vector.html#toArray) on the resulting vector. After being transferred, The buffers can be cached to speed up value access and filtering. @@ -120,6 +120,8 @@ The limits for this approach are the following: | array from table iterate | 943.0076999664307 | 980.0875999927521 | iterating over the 1M Table after converting `toArray()` and calculating the sums | | vector from table iterate | 965.2465000152588 | 1012.9023000001907 | iterating over the 1M Vector after converting the Table with `makeVector()` and calculating the sums | +The memory impact is not very significant, amounting to 74.01MB for 1M dies compared with 44.65MB for the previously prototyped API. + #### Previously proposed solutions We have two possible solutions for representing the data in the memory. They will be decided with a spec update. The fist one is an in-house solution: From be7aac72dc8034f9bd01efb3ad9c1ba6e4781d90 Mon Sep 17 00:00:00 2001 From: Natan Muntean Date: Mon, 19 Feb 2024 10:10:05 +0200 Subject: [PATCH 04/15] more clear section intro, fixed ordering --- .../src/wafer-map/specs/features/rendering.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/packages/nimble-components/src/wafer-map/specs/features/rendering.md b/packages/nimble-components/src/wafer-map/specs/features/rendering.md index 4646ce2a7d..f58e1859fe 100644 --- a/packages/nimble-components/src/wafer-map/specs/features/rendering.md +++ b/packages/nimble-components/src/wafer-map/specs/features/rendering.md @@ -82,7 +82,7 @@ The POC is found in this branch [Worker Rendering POC](https://github.com/ni/nim ### Data Structure and Interface -The best solution to solve the API of the wafermap is to use both of the proposed methods. +The best solution to solve the API of the wafermap is to use parts of both of the proposed methods mentioned in the next section. This means using Apache Arrow as the wafer component API and Typed Arrays for their iterating performance and transferability to worker threads. The Public API will be the following: @@ -101,14 +101,14 @@ The row and column indices will be `Int32` columns, the values will be `Float32` The tags for each die will be represented as a 32 bit mask stored in a `Uint32` column. The metadata column will be stored in an wildcard typed column. -This approach has the benefits of a row based format that aligns well with the existing public api, a nice public API to use and the ease of future improvements. +This approach has the benefits of a row based format that aligns well with the existing public API, as well as a nice public API that easily allows future improvements. The limits for this approach are the following: 1. There seems to be no support for columns of lists of strings. We decided to overcome this using a bit mask of tags. Another possible solution can be a dynamic number of columns for storing tags, but the performance may suffer. -1. There is no support currently for [searching or filtering the table](https://github.com/apache/arrow/issues/13233). The possible solutions for this are searching by iterating over the whole table, which is not feasible (see 4.) or using a higher level library such as [aquero](https://uwdata.github.io/arquero/).The solution we chose is using a custom method for finding rows based on column and row indexes cached as typed arrays. This method provides faster access to row values and metadata and does not induce additional dependencies. -1. The transfer method for arrow tables is cumbersome, we would have to use another higher level library [geoarrow](https://github.com/geoarrow/geoarrow-js/blob/main/src/worker/transferable.ts). Fortunately we can skip over this problem by not transferring the tables. -1. The iteration over stored rows is very bad compared to typed arrays as seen in the table below. This impacts the goals we set for this rendering improvement. The solution to this issue and the transferring issue is splitting the relevant columns from the table (rows, columns, values, tags mask) and messaging them to the worker separately. This can be done with a very small overhead using the [getChild](https://arrow.apache.org/docs/js/classes/Arrow_dom.Table.html#getChild) method and calling [toArray](https://arrow.apache.org/docs/js/classes/Arrow_dom.Vector.html#toArray) on the resulting vector. After being transferred, The buffers can be cached to speed up value access and filtering. +2. There is no support currently for [searching or filtering the table](https://github.com/apache/arrow/issues/13233). The possible solutions for this are searching by iterating over the whole table, which is not feasible (see 4.) or using a higher level library such as [aquero](https://uwdata.github.io/arquero/).The solution we chose is using a custom method for finding rows based on column and row indexes cached as typed arrays. This method provides faster access to row values and metadata and does not induce additional dependencies. +3. The transfer method for arrow tables is cumbersome, we would have to use another higher level library [geoarrow](https://github.com/geoarrow/geoarrow-js/blob/main/src/worker/transferable.ts). Fortunately we can skip over this problem by not transferring the tables. +4. The iteration over stored rows is very bad compared to typed arrays as seen in the table below. This impacts the goals we set for this rendering improvement. The solution to this issue and the transferring issue is splitting the relevant columns from the table (rows, columns, values, tags mask) and messaging them to the worker separately. This can be done with a very small overhead using the [getChild](https://arrow.apache.org/docs/js/classes/Arrow_dom.Table.html#getChild) method and calling [toArray](https://arrow.apache.org/docs/js/classes/Arrow_dom.Vector.html#toArray) on the resulting vector. After being transferred, The buffers can be cached to speed up value access and filtering. | name | duration (ms) [1] | duration (ms) [2] | detail | | ------------------------- | ------------------ | ------------------ | --------------------------------------------------------------------------------------------------------- | @@ -124,7 +124,7 @@ The memory impact is not very significant, amounting to 74.01MB for 1M dies comp #### Previously proposed solutions -We have two possible solutions for representing the data in the memory. They will be decided with a spec update. The fist one is an in-house solution: +These are the two possible solutions identified for representing the data in the memory. The fist one is an in-house solution: ```TS class WaferData { From 5a2fbb9ad7622f4b6ee131c0d94636cc67f8a2a0 Mon Sep 17 00:00:00 2001 From: Natan Muntean Date: Tue, 20 Feb 2024 10:34:20 +0200 Subject: [PATCH 05/15] clarified transfer and search. added mention of partial render --- .../src/wafer-map/specs/features/rendering.md | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/packages/nimble-components/src/wafer-map/specs/features/rendering.md b/packages/nimble-components/src/wafer-map/specs/features/rendering.md index f58e1859fe..0ff1637c78 100644 --- a/packages/nimble-components/src/wafer-map/specs/features/rendering.md +++ b/packages/nimble-components/src/wafer-map/specs/features/rendering.md @@ -87,6 +87,9 @@ The best solution to solve the API of the wafermap is to use parts of both of th The Public API will be the following: ```TS +import { Table } from 'apache-arrow'; +export class WaferMap extends FoundationElement { +... public diesTable: Table<{ colIndex: Int32, rowIndex: Int32, @@ -94,6 +97,8 @@ public diesTable: Table<{ tags: Uint32; metadata: never; }> +... +} ``` This will be the [Apache Arrow](https://arrow.apache.org/docs/js/classes/Arrow_dom.Table.html) table schema. @@ -106,9 +111,9 @@ This approach has the benefits of a row based format that aligns well with the e The limits for this approach are the following: 1. There seems to be no support for columns of lists of strings. We decided to overcome this using a bit mask of tags. Another possible solution can be a dynamic number of columns for storing tags, but the performance may suffer. -2. There is no support currently for [searching or filtering the table](https://github.com/apache/arrow/issues/13233). The possible solutions for this are searching by iterating over the whole table, which is not feasible (see 4.) or using a higher level library such as [aquero](https://uwdata.github.io/arquero/).The solution we chose is using a custom method for finding rows based on column and row indexes cached as typed arrays. This method provides faster access to row values and metadata and does not induce additional dependencies. -3. The transfer method for arrow tables is cumbersome, we would have to use another higher level library [geoarrow](https://github.com/geoarrow/geoarrow-js/blob/main/src/worker/transferable.ts). Fortunately we can skip over this problem by not transferring the tables. -4. The iteration over stored rows is very bad compared to typed arrays as seen in the table below. This impacts the goals we set for this rendering improvement. The solution to this issue and the transferring issue is splitting the relevant columns from the table (rows, columns, values, tags mask) and messaging them to the worker separately. This can be done with a very small overhead using the [getChild](https://arrow.apache.org/docs/js/classes/Arrow_dom.Table.html#getChild) method and calling [toArray](https://arrow.apache.org/docs/js/classes/Arrow_dom.Vector.html#toArray) on the resulting vector. After being transferred, The buffers can be cached to speed up value access and filtering. +2. There is no support currently for [searching or filtering the table](https://github.com/apache/arrow/issues/13233). The possible solutions for this are searching by iterating over the whole table, which is not feasible (see 4.) or using a higher level library such as [aquero](https://uwdata.github.io/arquero/). Searching for dies based on their position is crucial for highlighting and sending the highlighted die metadata with the `die-hover` event. The solution we chose is using a custom method for finding rows based on column and row indexes cached as typed arrays. This method provides faster access to row values and metadata and does not induce additional dependencies. +3. The transfer method between the main an worker thread for arrow tables is cumbersome, we would have to use another higher level library [geoarrow](https://github.com/geoarrow/geoarrow-js/blob/main/src/worker/transferable.ts). Fortunately we can skip over this problem by not transferring the tables to the worker. +4. The iteration over stored rows is very slow compared to typed arrays as seen in the table below. This impacts the goals we set for this rendering improvement. The solution to this issue and the transferring issue is splitting the relevant columns from the table (rows, columns, values, tags mask) and messaging them to the worker separately. This can be done with a very small overhead using the [getChild](https://arrow.apache.org/docs/js/classes/Arrow_dom.Table.html#getChild) method and calling [toArray](https://arrow.apache.org/docs/js/classes/Arrow_dom.Vector.html#toArray) on the resulting vector. After being transferred, The buffers can be cached to speed up value access and filtering. | name | duration (ms) [1] | duration (ms) [2] | detail | | ------------------------- | ------------------ | ------------------ | --------------------------------------------------------------------------------------------------------- | @@ -277,6 +282,7 @@ User Indication for [interactions in progress (>200ms)](https://web.dev/articles - the wafer-map will use bitmap scaling in addition to a spinner - the wafer-map will immediately show the spinner / fire event or only after, for example 200ms - the renderer will report progress for larger wait times. +- the rendering will be done sequentially in animation frames so the user will see the progress at 60Hz A follow-on HLD update will specify the approved decision. From b39802eb21800f76671d704665405a9eec634ce2 Mon Sep 17 00:00:00 2001 From: Natan Muntean Date: Tue, 20 Feb 2024 12:50:35 +0200 Subject: [PATCH 06/15] moved alternative API --- .../src/wafer-map/specs/features/rendering.md | 78 ++++++++----------- 1 file changed, 34 insertions(+), 44 deletions(-) diff --git a/packages/nimble-components/src/wafer-map/specs/features/rendering.md b/packages/nimble-components/src/wafer-map/specs/features/rendering.md index 0ff1637c78..728cf1b8cc 100644 --- a/packages/nimble-components/src/wafer-map/specs/features/rendering.md +++ b/packages/nimble-components/src/wafer-map/specs/features/rendering.md @@ -82,7 +82,7 @@ The POC is found in this branch [Worker Rendering POC](https://github.com/ni/nim ### Data Structure and Interface -The best solution to solve the API of the wafermap is to use parts of both of the proposed methods mentioned in the next section. This means using Apache Arrow as the wafer component API and Typed Arrays for their iterating performance and transferability to worker threads. +The best solution to solve the API of the wafermap is to use Apache Arrow as the wafer component API, and Typed Arrays as teh worker API for their iterating performance and transferability to worker threads. The Public API will be the following: @@ -127,49 +127,6 @@ The limits for this approach are the following: The memory impact is not very significant, amounting to 74.01MB for 1M dies compared with 44.65MB for the previously prototyped API. -#### Previously proposed solutions - -These are the two possible solutions identified for representing the data in the memory. The fist one is an in-house solution: - -```TS -class WaferData { - // the x coordinates of each column of dies - dieColIndexArray: Int32Array; - // the lengths of each row of dies - rowLengthsArray: Int32Array; - // the y coordinates of each die as a matrix row by row - dieRowIndexLayer: Int32Array; - // the value of each die as a matrix row by row - dieValuesLayer: Float64Array; - // the highlight approach is still undecided, we have two options: - // the highlight state of each die as a matrix; user will have to pre-calculate tags into highlighted conditions. - dieHighlightsLayer: Int8Array; - // a 32 bitset array of tags for each die; aligns more closely to the existing public api but limits users to 32 tags. - dieHighlightsLayer: Int32Array; - // metadata array for each die; it will not be sent to the worker - metadata : unknown[] -} -``` - -Using TypedArrays has the benefit of direct transfer to web workers without structured cloning of the object by transferring the arrayBuffers and reconstructing the object. Other benefits of typedArrays include the low access time when iterating over the values, more memory efficiency and faster direct access to metadata layers values. The previous inputs can be adapted to this new structure to maintain backwards compatibility. - -This API will have [optimized byte-array interop from Blazor](https://learn.microsoft.com/en-us/dotnet/core/compatibility/aspnet-core/6.0/byte-array-interop) and should be supported by Angular as a [vanilla javascript feature](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer). - -The alternative to the above mentioned data structure is an [apache arrow](https://arrow.apache.org/docs/js/index.html) table with columns and metadata. - -Pros of using Apache Arrow: - -- A row based format that aligns well with the existing public api -- Well supported and tested format -- Nice public API to use, we don't have to invent a new format, just document our schema for the arrow tables -- Designed for large dataset visualizations - -In order to choose from these alternatives we will prototype and check: - -- Does it have comparable memory performance -- Does it perform well or have significant overhead -- Is it easy to divide and use in parallel - ### Rendering An alternate renderer inside a worker thread will be created to live in parallel in the wafer-map : @@ -257,6 +214,39 @@ The current expectation is for a singular wafer component to be displayed on the ### Alternative Data Structures and Interfaces +The alternative to using Apache Arrow tables is an in-house solution: + +```TS +class WaferData { + // the x coordinates of each column of dies + dieColIndexArray: Int32Array; + // the lengths of each row of dies + rowLengthsArray: Int32Array; + // the y coordinates of each die as a matrix row by row + dieRowIndexLayer: Int32Array; + // the value of each die as a matrix row by row + dieValuesLayer: Float64Array; + // the highlight approach is still undecided, we have two options: + // the highlight state of each die as a matrix; user will have to pre-calculate tags into highlighted conditions. + dieHighlightsLayer: Int8Array; + // a 32 bitset array of tags for each die; aligns more closely to the existing public api but limits users to 32 tags. + dieHighlightsLayer: Int32Array; + // metadata array for each die; it will not be sent to the worker + metadata : unknown[] +} +``` + +Using TypedArrays has the benefit of direct transfer to web workers without structured cloning of the object by transferring the arrayBuffers and reconstructing the object. Other benefits of typedArrays include the low access time when iterating over the values, more memory efficiency and faster direct access to metadata layers values. The previous inputs can be adapted to this new structure to maintain backwards compatibility. + +This API will have [optimized byte-array interop from Blazor](https://learn.microsoft.com/en-us/dotnet/core/compatibility/aspnet-core/6.0/byte-array-interop) and should be supported by Angular as a [vanilla javascript feature](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer). + +Pros of using Apache Arrow: + +- A row based format that aligns well with the existing public api +- Well supported and tested format +- Nice public API to use, we don't have to invent a new format, just document our schema for the arrow tables +- Designed for large dataset visualizations + Another option is to break each object property as a separate attribute for the wafer map component. This can also lead to increased complexity and confusion for the user which will need to pass several structured objects instead of a singular object. ### Alternative Rendering From b217790623d57c07132d71e0bc54345bc6670a56 Mon Sep 17 00:00:00 2001 From: Natan Muntean Date: Fri, 23 Feb 2024 15:31:06 +0200 Subject: [PATCH 07/15] fixed typos and removed fractional ms --- .../src/wafer-map/specs/features/rendering.md | 22 +++++++++---------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/packages/nimble-components/src/wafer-map/specs/features/rendering.md b/packages/nimble-components/src/wafer-map/specs/features/rendering.md index 728cf1b8cc..c1dc976931 100644 --- a/packages/nimble-components/src/wafer-map/specs/features/rendering.md +++ b/packages/nimble-components/src/wafer-map/specs/features/rendering.md @@ -82,7 +82,7 @@ The POC is found in this branch [Worker Rendering POC](https://github.com/ni/nim ### Data Structure and Interface -The best solution to solve the API of the wafermap is to use Apache Arrow as the wafer component API, and Typed Arrays as teh worker API for their iterating performance and transferability to worker threads. +The best solution to solve the API of the wafermap is to use Apache Arrow as the wafer component API, and Typed Arrays as the worker API for their iterating performance and transferability to worker threads. The Public API will be the following: @@ -111,19 +111,19 @@ This approach has the benefits of a row based format that aligns well with the e The limits for this approach are the following: 1. There seems to be no support for columns of lists of strings. We decided to overcome this using a bit mask of tags. Another possible solution can be a dynamic number of columns for storing tags, but the performance may suffer. -2. There is no support currently for [searching or filtering the table](https://github.com/apache/arrow/issues/13233). The possible solutions for this are searching by iterating over the whole table, which is not feasible (see 4.) or using a higher level library such as [aquero](https://uwdata.github.io/arquero/). Searching for dies based on their position is crucial for highlighting and sending the highlighted die metadata with the `die-hover` event. The solution we chose is using a custom method for finding rows based on column and row indexes cached as typed arrays. This method provides faster access to row values and metadata and does not induce additional dependencies. +2. There is no support currently for [searching or filtering the table](https://github.com/apache/arrow/issues/13233). Searching for dies based on their position is crucial for highlighting and sending the highlighted die metadata with the `die-hover` event. The solution we chose is using a custom method for finding rows based on column and row indexes cached as typed arrays. This method provides faster access to row values and metadata and does not induce additional dependencies. Other possible solutions for this are searching by iterating over the whole table, which is not feasible (see 4.) or using a higher level library such as [aquero](https://uwdata.github.io/arquero/). 3. The transfer method between the main an worker thread for arrow tables is cumbersome, we would have to use another higher level library [geoarrow](https://github.com/geoarrow/geoarrow-js/blob/main/src/worker/transferable.ts). Fortunately we can skip over this problem by not transferring the tables to the worker. 4. The iteration over stored rows is very slow compared to typed arrays as seen in the table below. This impacts the goals we set for this rendering improvement. The solution to this issue and the transferring issue is splitting the relevant columns from the table (rows, columns, values, tags mask) and messaging them to the worker separately. This can be done with a very small overhead using the [getChild](https://arrow.apache.org/docs/js/classes/Arrow_dom.Table.html#getChild) method and calling [toArray](https://arrow.apache.org/docs/js/classes/Arrow_dom.Vector.html#toArray) on the resulting vector. After being transferred, The buffers can be cached to speed up value access and filtering. -| name | duration (ms) [1] | duration (ms) [2] | detail | -| ------------------------- | ------------------ | ------------------ | --------------------------------------------------------------------------------------------------------- | -| typed iterate | 7.551699995994568 | 6.052600026130676 | iterating over two 1M typed arrays and calculating the sums | -| typed from table iterate | 6.4953999519348145 | 5.136900067329407 | iterating over two 1M typed arrays from Table columns and calculating the sums (time includes conversion) | -| vector iterate | 76.4708000421524 | 66.58230006694794 | iterating over two 1M Vectors and calculating the sums | -| table get() iterate | 1350.0404000282288 | 1030.582899928093 | iterating over the 1M Table using `table.get(rowIndex)` and calculating the sums | -| table [iterator] iterate | 1091.6706000566483 | 1011.069100022316 | iterating over the 1M Table using the [iterator] and calculating the sums | -| array from table iterate | 943.0076999664307 | 980.0875999927521 | iterating over the 1M Table after converting `toArray()` and calculating the sums | -| vector from table iterate | 965.2465000152588 | 1012.9023000001907 | iterating over the 1M Vector after converting the Table with `makeVector()` and calculating the sums | +| name | duration (ms) [1] | duration (ms) [2] | detail | +| ------------------------- | ----------------- | ----------------- | --------------------------------------------------------------------------------------------------------- | +| typed iterate | 7 | 6 | iterating over two 1M typed arrays and calculating the sums | +| typed from table iterate | 6 | 5 | iterating over two 1M typed arrays from Table columns and calculating the sums (time includes conversion) | +| vector iterate | 76 | 66 | iterating over two 1M Vectors and calculating the sums | +| table get() iterate | 1350 | 1030 | iterating over the 1M Table using `table.get(rowIndex)` and calculating the sums | +| table [iterator] iterate | 1091 | 1011 | iterating over the 1M Table using the [iterator] and calculating the sums | +| array from table iterate | 943 | 980 | iterating over the 1M Table after converting `toArray()` and calculating the sums | +| vector from table iterate | 965 | 1012 | iterating over the 1M Vector after converting the Table with `makeVector()` and calculating the sums | The memory impact is not very significant, amounting to 74.01MB for 1M dies compared with 44.65MB for the previously prototyped API. From f88db8739c007a05602dc69a6d3a8395901fd619 Mon Sep 17 00:00:00 2001 From: Natan Muntean Date: Fri, 23 Feb 2024 16:00:47 +0200 Subject: [PATCH 08/15] moved alternatives and performance in the specific section --- .../src/wafer-map/specs/features/rendering.md | 53 +++++++++++++------ 1 file changed, 36 insertions(+), 17 deletions(-) diff --git a/packages/nimble-components/src/wafer-map/specs/features/rendering.md b/packages/nimble-components/src/wafer-map/specs/features/rendering.md index c1dc976931..5eb6f53a1b 100644 --- a/packages/nimble-components/src/wafer-map/specs/features/rendering.md +++ b/packages/nimble-components/src/wafer-map/specs/features/rendering.md @@ -108,24 +108,15 @@ The metadata column will be stored in an wildcard typed column. This approach has the benefits of a row based format that aligns well with the existing public API, as well as a nice public API that easily allows future improvements. -The limits for this approach are the following: - -1. There seems to be no support for columns of lists of strings. We decided to overcome this using a bit mask of tags. Another possible solution can be a dynamic number of columns for storing tags, but the performance may suffer. -2. There is no support currently for [searching or filtering the table](https://github.com/apache/arrow/issues/13233). Searching for dies based on their position is crucial for highlighting and sending the highlighted die metadata with the `die-hover` event. The solution we chose is using a custom method for finding rows based on column and row indexes cached as typed arrays. This method provides faster access to row values and metadata and does not induce additional dependencies. Other possible solutions for this are searching by iterating over the whole table, which is not feasible (see 4.) or using a higher level library such as [aquero](https://uwdata.github.io/arquero/). -3. The transfer method between the main an worker thread for arrow tables is cumbersome, we would have to use another higher level library [geoarrow](https://github.com/geoarrow/geoarrow-js/blob/main/src/worker/transferable.ts). Fortunately we can skip over this problem by not transferring the tables to the worker. -4. The iteration over stored rows is very slow compared to typed arrays as seen in the table below. This impacts the goals we set for this rendering improvement. The solution to this issue and the transferring issue is splitting the relevant columns from the table (rows, columns, values, tags mask) and messaging them to the worker separately. This can be done with a very small overhead using the [getChild](https://arrow.apache.org/docs/js/classes/Arrow_dom.Table.html#getChild) method and calling [toArray](https://arrow.apache.org/docs/js/classes/Arrow_dom.Vector.html#toArray) on the resulting vector. After being transferred, The buffers can be cached to speed up value access and filtering. - -| name | duration (ms) [1] | duration (ms) [2] | detail | -| ------------------------- | ----------------- | ----------------- | --------------------------------------------------------------------------------------------------------- | -| typed iterate | 7 | 6 | iterating over two 1M typed arrays and calculating the sums | -| typed from table iterate | 6 | 5 | iterating over two 1M typed arrays from Table columns and calculating the sums (time includes conversion) | -| vector iterate | 76 | 66 | iterating over two 1M Vectors and calculating the sums | -| table get() iterate | 1350 | 1030 | iterating over the 1M Table using `table.get(rowIndex)` and calculating the sums | -| table [iterator] iterate | 1091 | 1011 | iterating over the 1M Table using the [iterator] and calculating the sums | -| array from table iterate | 943 | 980 | iterating over the 1M Table after converting `toArray()` and calculating the sums | -| vector from table iterate | 965 | 1012 | iterating over the 1M Vector after converting the Table with `makeVector()` and calculating the sums | +We are going to split the columns relevant to rendering from the table (rows, columns, values, tags mask) and transfer them to the worker separately. This can be done with a very small overhead using the method below on the resulting vector. After being transferred, the buffers can be cached to speed up value access and filtering. -The memory impact is not very significant, amounting to 74.01MB for 1M dies compared with 44.65MB for the previously prototyped API. +The same approach will be used when searching for the highlighted die metadata. + +```TS + const colIndex: Int32Array = diesTable.getChild('colIndex').toArray(); + const rowIndex: Int32Array = diesTable.getChild('rowIndex').toArray(); + ... +``` ### Rendering @@ -249,6 +240,34 @@ Pros of using Apache Arrow: Another option is to break each object property as a separate attribute for the wafer map component. This can also lead to increased complexity and confusion for the user which will need to pass several structured objects instead of a singular object. +#### Alternative Iteration and Filtering with Apache Arrow Table + +The limits for the apache arrow table approach are the following: + +1. There seems to be no support for columns of lists of strings. +2. There is no support currently for [searching or filtering the table](https://github.com/apache/arrow/issues/13233). Searching for dies based on their position is crucial for highlighting and sending the highlighted die metadata with the `die-hover` event. +3. The transfer method between the main an worker thread for arrow tables is cumbersome. +4. The iteration over stored rows is very slow compared to typed arrays as seen in the table below. This impacts the goals we set for this rendering improvement. + +Alternatives for solving these problems are the following: + +1. A dynamic number of columns for storing tags, but the performance may suffer. +2. Possible solutions for this are searching by iterating over the whole table, which is not feasible (see 4.) or using a higher level library such as [aquero](https://uwdata.github.io/arquero/). +3. The use of a higher level library [geoarrow](https://github.com/geoarrow/geoarrow-js/blob/main/src/worker/transferable.ts) +4. In the following table are presented different iteration strategies over 1M long arrays, and how they compare with the chosen method and the basic typed array iteration: + +| name | duration (ms) [1] | duration (ms) [2] | detail | +| ------------------------- | ----------------- | ----------------- | --------------------------------------------------------------- | +| typed array | 7 | 6 | basic typed arrays iteration | +| typed array from table | 6 | 5 | typed arrays converted from Table columns | +| vector from typed array | 76 | 66 | arrow Vectors directly created from typed arrays | +| vector from table | 965 | 1012 | arrow Vector converted from the arrow Table with `makeVector()` | +| list array from table | 943 | 980 | list array converted from the arrow Table with `toArray()` | +| table get() | 1350 | 1030 | arrow Table using `table.get(rowIndex)` | +| table [iterator] | 1091 | 1011 | arrow Table using the [iterator] | + +The memory impact is not very significant, amounting to 74.01MB for 1M dies compared with 44.65MB for the previously prototyped API. + ### Alternative Rendering Alternatives to the described rendering are splitting the data and canvas and using multiple threads to enhance performance even more. This approach introduces the overhead of managing multiple canvases, splitting the dataset and handling any race conditions, which may not be needed if the single worker approach satisfies the performance requirements. From 1f61c923a881c4dc5618813ad2b97f4166a62fab Mon Sep 17 00:00:00 2001 From: Natan Muntean Date: Fri, 23 Feb 2024 16:26:10 +0200 Subject: [PATCH 09/15] lint --- .../src/wafer-map/specs/features/rendering.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/packages/nimble-components/src/wafer-map/specs/features/rendering.md b/packages/nimble-components/src/wafer-map/specs/features/rendering.md index 5eb6f53a1b..df3743b0f8 100644 --- a/packages/nimble-components/src/wafer-map/specs/features/rendering.md +++ b/packages/nimble-components/src/wafer-map/specs/features/rendering.md @@ -256,15 +256,15 @@ Alternatives for solving these problems are the following: 3. The use of a higher level library [geoarrow](https://github.com/geoarrow/geoarrow-js/blob/main/src/worker/transferable.ts) 4. In the following table are presented different iteration strategies over 1M long arrays, and how they compare with the chosen method and the basic typed array iteration: -| name | duration (ms) [1] | duration (ms) [2] | detail | -| ------------------------- | ----------------- | ----------------- | --------------------------------------------------------------- | -| typed array | 7 | 6 | basic typed arrays iteration | -| typed array from table | 6 | 5 | typed arrays converted from Table columns | -| vector from typed array | 76 | 66 | arrow Vectors directly created from typed arrays | -| vector from table | 965 | 1012 | arrow Vector converted from the arrow Table with `makeVector()` | -| list array from table | 943 | 980 | list array converted from the arrow Table with `toArray()` | -| table get() | 1350 | 1030 | arrow Table using `table.get(rowIndex)` | -| table [iterator] | 1091 | 1011 | arrow Table using the [iterator] | +| name | duration (ms) [1] | duration (ms) [2] | detail | +| ----------------------- | ----------------- | ----------------- | --------------------------------------------------------------- | +| typed array | 7 | 6 | basic typed arrays iteration | +| typed array from table | 6 | 5 | typed arrays converted from Table columns | +| vector from typed array | 76 | 66 | arrow Vectors directly created from typed arrays | +| vector from table | 965 | 1012 | arrow Vector converted from the arrow Table with `makeVector()` | +| list array from table | 943 | 980 | list array converted from the arrow Table with `toArray()` | +| table get() | 1350 | 1030 | arrow Table using `table.get(rowIndex)` | +| table [iterator] | 1091 | 1011 | arrow Table using the [iterator] | The memory impact is not very significant, amounting to 74.01MB for 1M dies compared with 44.65MB for the previously prototyped API. From 542194cbfc83644a00b4fb9dc8b36da14c7a7534 Mon Sep 17 00:00:00 2001 From: Natan Muntean Date: Fri, 23 Feb 2024 23:58:22 +0200 Subject: [PATCH 10/15] updated API with metadata and highlighted dies --- .../src/wafer-map/specs/features/rendering.md | 53 +++++++++++++------ 1 file changed, 36 insertions(+), 17 deletions(-) diff --git a/packages/nimble-components/src/wafer-map/specs/features/rendering.md b/packages/nimble-components/src/wafer-map/specs/features/rendering.md index df3743b0f8..002e2a24ac 100644 --- a/packages/nimble-components/src/wafer-map/specs/features/rendering.md +++ b/packages/nimble-components/src/wafer-map/specs/features/rendering.md @@ -10,7 +10,6 @@ The proposed design should consider the following factors: - Minimize rendering time and improve overall performance - Measure and improve performance metrics - Maintain compatibility with existing design patterns and web standards -- Avoid introducing new requirements on clients or breaking any APIs - Address any potential impact on testing, documentation, security, and other relevant areas By addressing these challenges, we aim to enhance the rendering capabilities of our application and provide a smoother and more responsive user interface. @@ -87,30 +86,35 @@ The best solution to solve the API of the wafermap is to use Apache Arrow as the The Public API will be the following: ```TS -import { Table } from 'apache-arrow'; -export class WaferMap extends FoundationElement { +import { Table, TypeMap } from 'apache-arrow'; + +export interface WaferMapTableType extends TypeMap { + colIndex: Int32; + rowIndex: Int32; + value: Float32; +} + +export class WaferMap extends FoundationElement { ... -public diesTable: Table<{ - colIndex: Int32, - rowIndex: Int32, - value: Float32, - tags: Uint32; - metadata: never; - }> +public diesTable: Table | undefined; +public highlightedTable: Table | undefined; ... } ``` This will be the [Apache Arrow](https://arrow.apache.org/docs/js/classes/Arrow_dom.Table.html) table schema. -The row and column indices will be `Int32` columns, the values will be `Float32` columns. -The tags for each die will be represented as a 32 bit mask stored in a `Uint32` column. -The metadata column will be stored in an wildcard typed column. +It will require at least three columns for the `diesTable`: -This approach has the benefits of a row based format that aligns well with the existing public API, as well as a nice public API that easily allows future improvements. +- The row and column indices will be `Int32` columns +- The values will be a `Float32` column. -We are going to split the columns relevant to rendering from the table (rows, columns, values, tags mask) and transfer them to the worker separately. This can be done with a very small overhead using the method below on the resulting vector. After being transferred, the buffers can be cached to speed up value access and filtering. +If there are more columns needed to store metadata or other values the schema will be extensible. -The same approach will be used when searching for the highlighted die metadata. +The `highlightedTable` will contain rows partially filled with values which will be used to filter the `diesTable` and enable highlighting. + +This approach has the benefits of a row based format that aligns well with the existing public API, as well as a nice public API that easily allows future improvements. It allows for more advanced filtering techniques such as using inner and outer joins for tables, slicing the tables to distribute values to separate workers and applying operations over whole columns. + +We are going to split the columns relevant to rendering from the table (rows, columns, values) and transfer them to the worker separately. This can be done with a very small overhead using the method below on the resulting vector. After being transferred, the buffers can be cached to speed up value access and filtering. ```TS const colIndex: Int32Array = diesTable.getChild('colIndex').toArray(); @@ -118,6 +122,8 @@ The same approach will be used when searching for the highlighted die metadata. ... ``` +When filtering the highlighted dies and searching for their metadata we will use [arquero](https://uwdata.github.io/arquero/) to perform joins and other operations involving the tables. + ### Rendering An alternate renderer inside a worker thread will be created to live in parallel in the wafer-map : @@ -252,7 +258,7 @@ The limits for the apache arrow table approach are the following: Alternatives for solving these problems are the following: 1. A dynamic number of columns for storing tags, but the performance may suffer. -2. Possible solutions for this are searching by iterating over the whole table, which is not feasible (see 4.) or using a higher level library such as [aquero](https://uwdata.github.io/arquero/). +2. Possible solutions for this are searching by iterating over the whole table, which is not feasible (see 4.) or using typed arrays and caching to speed up the search for the relevant columns. 3. The use of a higher level library [geoarrow](https://github.com/geoarrow/geoarrow-js/blob/main/src/worker/transferable.ts) 4. In the following table are presented different iteration strategies over 1M long arrays, and how they compare with the chosen method and the basic typed array iteration: @@ -284,6 +290,19 @@ We may also implement an external queue canceling functionality. ## Open Issues +### Rendering Iterating + +From preliminary tests it seems that typed array iteration is the most performant approach for rendering. +Further inquiries will be made of apache-arrow dev team to make sure the best approach. + +### Highlights and Metadata + +We decided to use [arquero](https://uwdata.github.io/arquero/) to filter highlighted dies and metadata. +This approach shows promise, but it may pose a risk. +If it will be apparent that it's not useful, we will resort to reusing and adapting the existing logic. + +### Progress Indicator + User Indication for [interactions in progress (>200ms)](https://web.dev/articles/inp) possibilities: - the wafer-map itself will show a spinner From 808184e300c9a44d427c5563c54708b8fca72012 Mon Sep 17 00:00:00 2001 From: Natan Muntean Date: Mon, 26 Feb 2024 18:00:07 +0200 Subject: [PATCH 11/15] added angular/blazor support and detailed highlight --- .../src/wafer-map/specs/features/rendering.md | 25 +++++++++---------- 1 file changed, 12 insertions(+), 13 deletions(-) diff --git a/packages/nimble-components/src/wafer-map/specs/features/rendering.md b/packages/nimble-components/src/wafer-map/specs/features/rendering.md index 002e2a24ac..e9e7613abc 100644 --- a/packages/nimble-components/src/wafer-map/specs/features/rendering.md +++ b/packages/nimble-components/src/wafer-map/specs/features/rendering.md @@ -86,31 +86,27 @@ The best solution to solve the API of the wafermap is to use Apache Arrow as the The Public API will be the following: ```TS -import { Table, TypeMap } from 'apache-arrow'; +import { Table } from 'apache-arrow'; -export interface WaferMapTableType extends TypeMap { - colIndex: Int32; - rowIndex: Int32; - value: Float32; -} - -export class WaferMap extends FoundationElement { +export class WaferMap extends FoundationElement { ... -public diesTable: Table | undefined; -public highlightedTable: Table | undefined; +public diesTable: Table | undefined; +public highlightedTable: Table | undefined; ... } ``` -This will be the [Apache Arrow](https://arrow.apache.org/docs/js/classes/Arrow_dom.Table.html) table schema. +It will be using the [Apache Arrow Table](https://arrow.apache.org/docs/js/classes/Arrow_dom.Table.html). It will require at least three columns for the `diesTable`: - The row and column indices will be `Int32` columns - The values will be a `Float32` column. -If there are more columns needed to store metadata or other values the schema will be extensible. +They will be checked at runtime and a `WaferMapValidity` flag will be raised signaling an `invalidTableInput`. -The `highlightedTable` will contain rows partially filled with values which will be used to filter the `diesTable` and enable highlighting. +If there are more columns needed to store metadata or other values the schema will be extensible. This will induce a breaking change in the API, as the metadata which was previously `unknown` will have to be recorded in table using the supported column types. + +The `highlightedTable` will have the same columns, but they will contain rows only partially filled with values, which will be used to filter the `diesTable` and enable highlighting. The values which are not empty on each individual row, including `colIndex`, `rowIndex`, `value` and others will be used to filter the table as an `AND` operation. Multiple rows will be used as filters with the `OR` operation. This approach has the benefits of a row based format that aligns well with the existing public API, as well as a nice public API that easily allows future improvements. It allows for more advanced filtering techniques such as using inner and outer joins for tables, slicing the tables to distribute values to separate workers and applying operations over whole columns. @@ -124,6 +120,9 @@ We are going to split the columns relevant to rendering from the table (rows, co When filtering the highlighted dies and searching for their metadata we will use [arquero](https://uwdata.github.io/arquero/) to perform joins and other operations involving the tables. +The [JavaScript implementation of Apache Arrow](https://arrow.apache.org/docs/js/index.html) provides TypeScript Types which will work in Angular applications. +The [C# implementation of Apache Arrow](https://github.com/apache/arrow/blob/main/csharp/README.md) is also providing support for reading Arrow IPC streams which can be used to convert inputs from Balzor. + ### Rendering An alternate renderer inside a worker thread will be created to live in parallel in the wafer-map : From aa46714eb79b06b674c64a78de9b2836e44b240a Mon Sep 17 00:00:00 2001 From: Natan Muntean Date: Tue, 27 Feb 2024 20:04:33 +0200 Subject: [PATCH 12/15] updated metadata and highlight --- .../src/wafer-map/specs/features/rendering.md | 49 ++++++++++++++++++- 1 file changed, 47 insertions(+), 2 deletions(-) diff --git a/packages/nimble-components/src/wafer-map/specs/features/rendering.md b/packages/nimble-components/src/wafer-map/specs/features/rendering.md index e9e7613abc..0edaef64ac 100644 --- a/packages/nimble-components/src/wafer-map/specs/features/rendering.md +++ b/packages/nimble-components/src/wafer-map/specs/features/rendering.md @@ -104,9 +104,9 @@ It will require at least three columns for the `diesTable`: They will be checked at runtime and a `WaferMapValidity` flag will be raised signaling an `invalidTableInput`. -If there are more columns needed to store metadata or other values the schema will be extensible. This will induce a breaking change in the API, as the metadata which was previously `unknown` will have to be recorded in table using the supported column types. +The schema will be extensible. This will induce a breaking change in the API, as the metadata which was previously `unknown` will not recorded in table, but the hover event will reference an index which can e used by the client to select the metadata outside the component. -The `highlightedTable` will have the same columns, but they will contain rows only partially filled with values, which will be used to filter the `diesTable` and enable highlighting. The values which are not empty on each individual row, including `colIndex`, `rowIndex`, `value` and others will be used to filter the table as an `AND` operation. Multiple rows will be used as filters with the `OR` operation. +The `highlightedTable` will have the same columns, but they will contain rows only partially filled with values, which will be used to filter the `diesTable` and enable highlighting. The values which are not empty on each individual row, including `colIndex`, `rowIndex`, `value` and others will be used to filter the table as an `AND` operation. Multiple rows will be used as filters with the `OR` operation. More details regarding highlights will be discussed in an open issue. This approach has the benefits of a row based format that aligns well with the existing public API, as well as a nice public API that easily allows future improvements. It allows for more advanced filtering techniques such as using inner and outer joins for tables, slicing the tables to distribute values to separate workers and applying operations over whole columns. @@ -289,6 +289,51 @@ We may also implement an external queue canceling functionality. ## Open Issues +### Highlighting + +The current proposal is for the highlight table to be used as a filter for the main dies table. This can be realized by using the [`semijoin`](https://uwdata.github.io/arquero/api/verbs#semijoin) operation from the Arquero library. this will function as follows. + +The main table: + +| (index) | colIndex | rowIndex | value | firstTag | secondTag | +| ------- | -------- | -------- | ------------------ | -------- | --------- | +| 0 | 0 | 2 | 14.239999771118164 | a | b | +| 1 | 1 | 2 | 76.43000030517578 | b | c | +| 2 | 1 | 1 | 44.630001068115234 | g | null | +| 3 | 1 | 3 | 67.93000030517578 | a | null | +| 4 | 2 | 2 | 72.70999908447266 | h | e | +| 5 | 2 | 1 | 79.04000091552734 | b | null | +| 6 | 2 | 0 | 26.489999771118164 | c | null | +| 7 | 2 | 3 | 37.790000915527344 | null | null | +| 8 | 2 | 4 | 59.81999969482422 | null | null | +| 9 | 3 | 2 | 52.900001525878906 | null | null | +| 10 | 3 | 1 | 98.5 | g | null | +| 11 | 3 | 3 | 20.829999923706055 | c | null | +| 12 | 4 | 2 | 62.79999923706055 | g | null | + +The highlight table: + +| (index) | firstTag | +| ------- | -------- | +| 0 | a | +| 1 | b | +| 2 | c | + +The filtered table: + +| (index) | colIndex | rowIndex | value | firstTag | secondTag | +| ------- | -------- | -------- | ------------------ | -------- | --------- | +| 0 | 0 | 2 | 14.239999771118164 | a | b | +| 1 | 1 | 2 | 76.43000030517578 | b | c | +| 2 | 1 | 3 | 67.93000030517578 | a | null | +| 3 | 2 | 1 | 79.04000091552734 | b | null | +| 4 | 2 | 0 | 26.489999771118164 | c | null | +| 5 | 3 | 3 | 20.829999923706055 | c | null | + +The filter matched the rows with the same values from the highlight table. This can be used for tags when filtering value ranges, values themselves, column and row indexes or other types of supported data types. + +The details of the implementation and more refined filtering will be discussed. + ### Rendering Iterating From preliminary tests it seems that typed array iteration is the most performant approach for rendering. From 14e6e280d9bda0c4eb265f5a35ddaa449dd7ee15 Mon Sep 17 00:00:00 2001 From: Natan Muntean Date: Wed, 28 Feb 2024 14:35:52 +0200 Subject: [PATCH 13/15] updated open issues --- .../src/wafer-map/specs/features/rendering.md | 29 +++++++++++++------ 1 file changed, 20 insertions(+), 9 deletions(-) diff --git a/packages/nimble-components/src/wafer-map/specs/features/rendering.md b/packages/nimble-components/src/wafer-map/specs/features/rendering.md index 0edaef64ac..b5c7792a87 100644 --- a/packages/nimble-components/src/wafer-map/specs/features/rendering.md +++ b/packages/nimble-components/src/wafer-map/specs/features/rendering.md @@ -89,10 +89,9 @@ The Public API will be the following: import { Table } from 'apache-arrow'; export class WaferMap extends FoundationElement { -... -public diesTable: Table | undefined; -public highlightedTable: Table | undefined; -... + ... + public diesTable: Table | undefined; + ... } ``` @@ -106,8 +105,6 @@ They will be checked at runtime and a `WaferMapValidity` flag will be raised sig The schema will be extensible. This will induce a breaking change in the API, as the metadata which was previously `unknown` will not recorded in table, but the hover event will reference an index which can e used by the client to select the metadata outside the component. -The `highlightedTable` will have the same columns, but they will contain rows only partially filled with values, which will be used to filter the `diesTable` and enable highlighting. The values which are not empty on each individual row, including `colIndex`, `rowIndex`, `value` and others will be used to filter the table as an `AND` operation. Multiple rows will be used as filters with the `OR` operation. More details regarding highlights will be discussed in an open issue. - This approach has the benefits of a row based format that aligns well with the existing public API, as well as a nice public API that easily allows future improvements. It allows for more advanced filtering techniques such as using inner and outer joins for tables, slicing the tables to distribute values to separate workers and applying operations over whole columns. We are going to split the columns relevant to rendering from the table (rows, columns, values) and transfer them to the worker separately. This can be done with a very small overhead using the method below on the resulting vector. After being transferred, the buffers can be cached to speed up value access and filtering. @@ -118,7 +115,7 @@ We are going to split the columns relevant to rendering from the table (rows, co ... ``` -When filtering the highlighted dies and searching for their metadata we will use [arquero](https://uwdata.github.io/arquero/) to perform joins and other operations involving the tables. +When filtering the highlighted dies and searching for their index we will use [arquero](https://uwdata.github.io/arquero/) to perform joins and other operations involving the tables. The [JavaScript implementation of Apache Arrow](https://arrow.apache.org/docs/js/index.html) provides TypeScript Types which will work in Angular applications. The [C# implementation of Apache Arrow](https://github.com/apache/arrow/blob/main/csharp/README.md) is also providing support for reading Arrow IPC streams which can be used to convert inputs from Balzor. @@ -222,7 +219,7 @@ class WaferData { dieRowIndexLayer: Int32Array; // the value of each die as a matrix row by row dieValuesLayer: Float64Array; - // the highlight approach is still undecided, we have two options: + // we have two options to highlight: // the highlight state of each die as a matrix; user will have to pre-calculate tags into highlighted conditions. dieHighlightsLayer: Int8Array; // a 32 bitset array of tags for each die; aligns more closely to the existing public api but limits users to 32 tags. @@ -289,7 +286,19 @@ We may also implement an external queue canceling functionality. ## Open Issues -### Highlighting +### Highlighting API + +```TS +import { Table } from 'apache-arrow'; + +export class WaferMap extends FoundationElement { + ... + public highlightedTable: Table | undefined; + ... +} +``` + +The `highlightedTable` will have the same columns as the `diesTable`, but they will contain rows only partially filled with values, which will be used to filter the `diesTable` and enable highlighting. The values which are not empty on each individual row, including `colIndex`, `rowIndex`, `value` and others will be used to filter the table as an `AND` operation. Multiple rows will be used as filters with the `OR` operation. More details regarding highlights will be discussed in an open issue. The current proposal is for the highlight table to be used as a filter for the main dies table. This can be realized by using the [`semijoin`](https://uwdata.github.io/arquero/api/verbs#semijoin) operation from the Arquero library. this will function as follows. @@ -334,6 +343,8 @@ The filter matched the rows with the same values from the highlight table. This The details of the implementation and more refined filtering will be discussed. +Anther option is using the existing `highlightedTags` API with a `List` column in the table [(listed as supported)](https://arrow.apache.org/docs/status.html). + ### Rendering Iterating From preliminary tests it seems that typed array iteration is the most performant approach for rendering. From e2e20be9f9e844a5363c5007cfded2ca0ebd0a7f Mon Sep 17 00:00:00 2001 From: Natan Muntean Date: Wed, 28 Feb 2024 21:53:01 +0200 Subject: [PATCH 14/15] applied suggestions --- .../src/wafer-map/specs/features/rendering.md | 33 ++++++++++++++----- 1 file changed, 25 insertions(+), 8 deletions(-) diff --git a/packages/nimble-components/src/wafer-map/specs/features/rendering.md b/packages/nimble-components/src/wafer-map/specs/features/rendering.md index b5c7792a87..c208423fe0 100644 --- a/packages/nimble-components/src/wafer-map/specs/features/rendering.md +++ b/packages/nimble-components/src/wafer-map/specs/features/rendering.md @@ -98,8 +98,8 @@ export class WaferMap extends FoundationElement { It will be using the [Apache Arrow Table](https://arrow.apache.org/docs/js/classes/Arrow_dom.Table.html). It will require at least three columns for the `diesTable`: -- The row and column indices will be `Int32` columns -- The values will be a `Float32` column. +- The `rowIndex` and `colIndex` will be `Int32` columns +- The `value` will be a `Float64` column. They will be checked at runtime and a `WaferMapValidity` flag will be raised signaling an `invalidTableInput`. @@ -118,7 +118,7 @@ We are going to split the columns relevant to rendering from the table (rows, co When filtering the highlighted dies and searching for their index we will use [arquero](https://uwdata.github.io/arquero/) to perform joins and other operations involving the tables. The [JavaScript implementation of Apache Arrow](https://arrow.apache.org/docs/js/index.html) provides TypeScript Types which will work in Angular applications. -The [C# implementation of Apache Arrow](https://github.com/apache/arrow/blob/main/csharp/README.md) is also providing support for reading Arrow IPC streams which can be used to convert inputs from Balzor. +The [C# implementation of Apache Arrow](https://github.com/apache/arrow/blob/main/csharp/README.md) is also providing support for reading Arrow IPC streams which can be used to convert inputs from Blazor. ### Rendering @@ -246,16 +246,16 @@ Another option is to break each object property as a separate attribute for the The limits for the apache arrow table approach are the following: -1. There seems to be no support for columns of lists of strings. -2. There is no support currently for [searching or filtering the table](https://github.com/apache/arrow/issues/13233). Searching for dies based on their position is crucial for highlighting and sending the highlighted die metadata with the `die-hover` event. -3. The transfer method between the main an worker thread for arrow tables is cumbersome. +1. Apache arrow documentation shows [list types are supported](https://arrow.apache.org/docs/status.html) but more research is needed to understand their usage, if they are useful in Tables, and their performance if used. +2. There is no support currently for [searching or filtering the table](https://github.com/apache/arrow/issues/13233). Searching for dies based on their position is crucial for highlighting and sending the highlighted die metadata with the `die-hover` event. More research is needed to see if alternative libraries can be used for filtering / data analysis. ([POC with arquero](https://stackblitz.com/edit/geoarrow-worker-arquero-demo?file=src%2Fmain.ts)). +3. Apache arrow does not yet have first class support for efficiently transferring arrow data structures in workers. When asked they said they are [supportive of adding the APIs](https://github.com/apache/arrow/issues/39017#issuecomment-1955653556). 4. The iteration over stored rows is very slow compared to typed arrays as seen in the table below. This impacts the goals we set for this rendering improvement. Alternatives for solving these problems are the following: 1. A dynamic number of columns for storing tags, but the performance may suffer. 2. Possible solutions for this are searching by iterating over the whole table, which is not feasible (see 4.) or using typed arrays and caching to speed up the search for the relevant columns. -3. The use of a higher level library [geoarrow](https://github.com/geoarrow/geoarrow-js/blob/main/src/worker/transferable.ts) +3. The use of a higher level library [geoarrow](https://github.com/geoarrow/geoarrow-js/blob/main/src/worker/transferable.ts). ([POC](https://stackblitz.com/edit/geoarrow-worker-arquero-demo?file=src%2Fmain.ts)). 4. In the following table are presented different iteration strategies over 1M long arrays, and how they compare with the chosen method and the basic typed array iteration: | name | duration (ms) [1] | duration (ms) [2] | detail | @@ -343,7 +343,24 @@ The filter matched the rows with the same values from the highlight table. This The details of the implementation and more refined filtering will be discussed. -Anther option is using the existing `highlightedTags` API with a `List` column in the table [(listed as supported)](https://arrow.apache.org/docs/status.html). +Another option is using the existing `highlightedTags` API with a `List` column in the table [(listed as supported)](https://arrow.apache.org/docs/status.html). + +Specific open questions: + +- For a highlightedTags table api: + - Should the API be constrained in the [supported arrow types](https://arrow.apache.org/docs/status.html)? + - Should just primitive types be supported? + - Should just types supported in the JavaScript and C# languages be supported? + - Should just types well-supported in all existing implementation be supported? + - Would certain types, ex. like strings, lead to poor performance and be discouraged? + - Are there implementation challenges transferring buffers of arbitrary types across the Web Worker boundary? + - Should the API be constrained in the set of columns that participate in highlighting instead of all columns? Maybe columns with a specific name prefix like `highlighted_`? + - Is there real known benefits for specifying per floating point `value`? Or specific `rowIndex` / `columnIndex` independently? + - Does the highlightTable need to contain all columns used in diesTable? Can it just be a subset of columns? +- For a tags columns api: + - Do columns of List work in tables? + - Do dictionary columns work in tables to improve efficiency compared to List? + - What is the performance of a List / Dictionary column api compared to the alternatives? ### Rendering Iterating From 477358e98ba641e21f030415305f3905ff4b65b0 Mon Sep 17 00:00:00 2001 From: Natan Muntean Date: Thu, 29 Feb 2024 09:13:00 +0200 Subject: [PATCH 15/15] typo fix and removed mention --- .../src/wafer-map/specs/features/rendering.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/packages/nimble-components/src/wafer-map/specs/features/rendering.md b/packages/nimble-components/src/wafer-map/specs/features/rendering.md index c208423fe0..517ba80f8a 100644 --- a/packages/nimble-components/src/wafer-map/specs/features/rendering.md +++ b/packages/nimble-components/src/wafer-map/specs/features/rendering.md @@ -103,7 +103,7 @@ It will require at least three columns for the `diesTable`: They will be checked at runtime and a `WaferMapValidity` flag will be raised signaling an `invalidTableInput`. -The schema will be extensible. This will induce a breaking change in the API, as the metadata which was previously `unknown` will not recorded in table, but the hover event will reference an index which can e used by the client to select the metadata outside the component. +The schema will be extensible. This will induce a breaking change in the API, as the metadata which was previously `unknown` will not recorded in table, but the hover event will reference an index which can be used by the client to select the metadata outside the component. This approach has the benefits of a row based format that aligns well with the existing public API, as well as a nice public API that easily allows future improvements. It allows for more advanced filtering techniques such as using inner and outer joins for tables, slicing the tables to distribute values to separate workers and applying operations over whole columns. @@ -115,8 +115,6 @@ We are going to split the columns relevant to rendering from the table (rows, co ... ``` -When filtering the highlighted dies and searching for their index we will use [arquero](https://uwdata.github.io/arquero/) to perform joins and other operations involving the tables. - The [JavaScript implementation of Apache Arrow](https://arrow.apache.org/docs/js/index.html) provides TypeScript Types which will work in Angular applications. The [C# implementation of Apache Arrow](https://github.com/apache/arrow/blob/main/csharp/README.md) is also providing support for reading Arrow IPC streams which can be used to convert inputs from Blazor.