Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update spec with wafer map API decision #1834

Merged
merged 19 commits into from
Feb 29, 2024
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
munteannatan marked this conversation as resolved.
Show resolved Hide resolved
"type": "none",
"comment": "Update rendering spec for Wafer Map component with API changes",
"packageName": "@ni/nimble-components",
"email": "33986780+munteannatan@users.noreply.github.com",
"dependentChangeType": "none"
}
136 changes: 106 additions & 30 deletions packages/nimble-components/src/wafer-map/specs/features/rendering.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@ The proposed design should consider the following factors:
- Minimize rendering time and improve overall performance
- Measure and improve performance metrics
- Maintain compatibility with existing design patterns and web standards
- Avoid introducing new requirements on clients or breaking any APIs
- Address any potential impact on testing, documentation, security, and other relevant areas

By addressing these challenges, we aim to enhance the rendering capabilities of our application and provide a smoother and more responsive user interface.
Expand Down Expand Up @@ -82,46 +81,48 @@ The POC is found in this branch [Worker Rendering POC](https://github.com/ni/nim

### Data Structure and Interface

We have two possible solutions for representing the data in the memory. They will be decided with a spec update. The fist one is an in-house solution:
The best solution to solve the API of the wafermap is to use Apache Arrow as the wafer component API, and Typed Arrays as the worker API for their iterating performance and transferability to worker threads.

The Public API will be the following:

```TS
class WaferData {
// the x coordinates of each column of dies
dieColIndexArray: Int32Array;
// the lengths of each row of dies
rowLengthsArray: Int32Array;
// the y coordinates of each die as a matrix row by row
dieRowIndexLayer: Int32Array;
// the value of each die as a matrix row by row
dieValuesLayer: Float64Array;
// the highlight approach is still undecided, we have two options:
// the highlight state of each die as a matrix; user will have to pre-calculate tags into highlighted conditions.
dieHighlightsLayer: Int8Array;
// a 32 bitset array of tags for each die; aligns more closely to the existing public api but limits users to 32 tags.
dieHighlightsLayer: Int32Array;
// metadata array for each die; it will not be sent to the worker
metadata : unknown[]
import { Table, TypeMap } from 'apache-arrow';

export interface WaferMapTableType extends TypeMap {
colIndex: Int32;
rowIndex: Int32;
value: Float32;
}

export class WaferMap<T extends WaferMapTableType> extends FoundationElement {
...
public diesTable: Table<T> | undefined;
public highlightedTable: Table<T> | undefined;
...
}
```

Using TypedArrays has the benefit of direct transfer to web workers without structured cloning of the object by transferring the arrayBuffers and reconstructing the object. Other benefits of typedArrays include the low access time when iterating over the values, more memory efficiency and faster direct access to metadata layers values. The previous inputs can be adapted to this new structure to maintain backwards compatibility.
This will be the [Apache Arrow](https://arrow.apache.org/docs/js/classes/Arrow_dom.Table.html) table schema.
It will require at least three columns for the `diesTable`:

This API will have [optimized byte-array interop from Blazor](https://learn.microsoft.com/en-us/dotnet/core/compatibility/aspnet-core/6.0/byte-array-interop) and should be supported by Angular as a [vanilla javascript feature](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer).
- The row and column indices will be `Int32` columns
- The values will be a `Float32` column.
munteannatan marked this conversation as resolved.
Show resolved Hide resolved
munteannatan marked this conversation as resolved.
Show resolved Hide resolved

The alternative to the above mentioned data structure is an [apache arrow](https://arrow.apache.org/docs/js/index.html) table with columns and metadata.
If there are more columns needed to store metadata or other values the schema will be extensible.

Pros of using Apache Arrow:
The `highlightedTable` will contain rows partially filled with values which will be used to filter the `diesTable` and enable highlighting.
munteannatan marked this conversation as resolved.
Show resolved Hide resolved

- A row based format that aligns well with the existing public api
- Well supported and tested format
- Nice public API to use, we don't have to invent a new format, just document our schema for the arrow tables
- Designed for large dataset visualizations
This approach has the benefits of a row based format that aligns well with the existing public API, as well as a nice public API that easily allows future improvements. It allows for more advanced filtering techniques such as using inner and outer joins for tables, slicing the tables to distribute values to separate workers and applying operations over whole columns.

We are going to split the columns relevant to rendering from the table (rows, columns, values) and transfer them to the worker separately. This can be done with a very small overhead using the method below on the resulting vector. After being transferred, the buffers can be cached to speed up value access and filtering.

munteannatan marked this conversation as resolved.
Show resolved Hide resolved
In order to choose from these alternatives we will prototype and check:
```TS
const colIndex: Int32Array = diesTable.getChild('colIndex').toArray();
const rowIndex: Int32Array = diesTable.getChild('rowIndex').toArray();
...
```

- Does it have comparable memory performance
- Does it perform well or have significant overhead
- Is it easy to divide and use in parallel
When filtering the highlighted dies and searching for their metadata we will use [arquero](https://uwdata.github.io/arquero/) to perform joins and other operations involving the tables.

### Rendering

Expand Down Expand Up @@ -210,8 +211,69 @@ The current expectation is for a singular wafer component to be displayed on the

### Alternative Data Structures and Interfaces

The alternative to using Apache Arrow tables is an in-house solution:

```TS
class WaferData {
// the x coordinates of each column of dies
dieColIndexArray: Int32Array;
// the lengths of each row of dies
rowLengthsArray: Int32Array;
// the y coordinates of each die as a matrix row by row
dieRowIndexLayer: Int32Array;
// the value of each die as a matrix row by row
dieValuesLayer: Float64Array;
// the highlight approach is still undecided, we have two options:
// the highlight state of each die as a matrix; user will have to pre-calculate tags into highlighted conditions.
dieHighlightsLayer: Int8Array;
// a 32 bitset array of tags for each die; aligns more closely to the existing public api but limits users to 32 tags.
dieHighlightsLayer: Int32Array;
// metadata array for each die; it will not be sent to the worker
metadata : unknown[]
}
```

Using TypedArrays has the benefit of direct transfer to web workers without structured cloning of the object by transferring the arrayBuffers and reconstructing the object. Other benefits of typedArrays include the low access time when iterating over the values, more memory efficiency and faster direct access to metadata layers values. The previous inputs can be adapted to this new structure to maintain backwards compatibility.

This API will have [optimized byte-array interop from Blazor](https://learn.microsoft.com/en-us/dotnet/core/compatibility/aspnet-core/6.0/byte-array-interop) and should be supported by Angular as a [vanilla javascript feature](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer).
munteannatan marked this conversation as resolved.
Show resolved Hide resolved

Pros of using Apache Arrow:

- A row based format that aligns well with the existing public api
- Well supported and tested format
- Nice public API to use, we don't have to invent a new format, just document our schema for the arrow tables
- Designed for large dataset visualizations

Another option is to break each object property as a separate attribute for the wafer map component. This can also lead to increased complexity and confusion for the user which will need to pass several structured objects instead of a singular object.

#### Alternative Iteration and Filtering with Apache Arrow Table

The limits for the apache arrow table approach are the following:

1. There seems to be no support for columns of lists of strings.
munteannatan marked this conversation as resolved.
Show resolved Hide resolved
2. There is no support currently for [searching or filtering the table](https://github.com/apache/arrow/issues/13233). Searching for dies based on their position is crucial for highlighting and sending the highlighted die metadata with the `die-hover` event.
munteannatan marked this conversation as resolved.
Show resolved Hide resolved
3. The transfer method between the main an worker thread for arrow tables is cumbersome.
munteannatan marked this conversation as resolved.
Show resolved Hide resolved
4. The iteration over stored rows is very slow compared to typed arrays as seen in the table below. This impacts the goals we set for this rendering improvement.

Alternatives for solving these problems are the following:

1. A dynamic number of columns for storing tags, but the performance may suffer.
2. Possible solutions for this are searching by iterating over the whole table, which is not feasible (see 4.) or using typed arrays and caching to speed up the search for the relevant columns.
3. The use of a higher level library [geoarrow](https://github.com/geoarrow/geoarrow-js/blob/main/src/worker/transferable.ts)
munteannatan marked this conversation as resolved.
Show resolved Hide resolved
4. In the following table are presented different iteration strategies over 1M long arrays, and how they compare with the chosen method and the basic typed array iteration:

| name | duration (ms) [1] | duration (ms) [2] | detail |
| ----------------------- | ----------------- | ----------------- | --------------------------------------------------------------- |
| typed array | 7 | 6 | basic typed arrays iteration |
| typed array from table | 6 | 5 | typed arrays converted from Table columns |
| vector from typed array | 76 | 66 | arrow Vectors directly created from typed arrays |
| vector from table | 965 | 1012 | arrow Vector converted from the arrow Table with `makeVector()` |
| list array from table | 943 | 980 | list array converted from the arrow Table with `toArray()` |
| table get() | 1350 | 1030 | arrow Table using `table.get(rowIndex)` |
| table [iterator] | 1091 | 1011 | arrow Table using the [iterator] |

The memory impact is not very significant, amounting to 74.01MB for 1M dies compared with 44.65MB for the previously prototyped API.

### Alternative Rendering

Alternatives to the described rendering are splitting the data and canvas and using multiple threads to enhance performance even more. This approach introduces the overhead of managing multiple canvases, splitting the dataset and handling any race conditions, which may not be needed if the single worker approach satisfies the performance requirements.
Expand All @@ -228,13 +290,27 @@ We may also implement an external queue canceling functionality.

## Open Issues

### Rendering Iterating

From preliminary tests it seems that typed array iteration is the most performant approach for rendering.
Further inquiries will be made of apache-arrow dev team to make sure the best approach.

### Highlights and Metadata

We decided to use [arquero](https://uwdata.github.io/arquero/) to filter highlighted dies and metadata.
This approach shows promise, but it may pose a risk.
If it will be apparent that it's not useful, we will resort to reusing and adapting the existing logic.

### Progress Indicator

User Indication for [interactions in progress (>200ms)](https://web.dev/articles/inp) possibilities:

- the wafer-map itself will show a spinner
- the wafer-map will fire an event to notify the app to present something that work is in progress
- the wafer-map will use bitmap scaling in addition to a spinner
- the wafer-map will immediately show the spinner / fire event or only after, for example 200ms
- the renderer will report progress for larger wait times.
- the rendering will be done sequentially in animation frames so the user will see the progress at 60Hz

A follow-on HLD update will specify the approved decision.

Expand Down
Loading