Add operator `select` which selects a sample from one of the inputs #3130

mzient · 2021-07-06T23:26:28Z

Signed-off-by: Michał Zientkiewicz mzient@gmail.com

Why we need this PR?

Pick one, remove the rest

It adds new feature needed for sample masking/selection/conditional execution

What happened in this PR?

Fill relevant points, put NA otherwise. Replace anything inside []

What solution was applied:
- A simple operator which just copies one of the inputs to the output
Affected modules and functionalities:
- None (it's a new code)
Key points relevant for the review:
- N/A
Validation and testing:
- Python unit tests for success
- Python unit tests for errors
- Local tests with Valgrind
Documentation (including examples):
- Docstrings

JIRA TASK: DALI-2202

based on a run-time provided index. Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

dali-automaton · 2021-07-06T23:31:24Z

CI MESSAGE: [2553368]: BUILD STARTED

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

dali-automaton · 2021-07-06T23:51:49Z

CI MESSAGE: [2553418]: BUILD STARTED

dali-automaton · 2021-07-07T01:26:51Z

CI MESSAGE: [2553418]: BUILD FAILED

JanuszL · 2021-07-07T12:23:41Z

dali/operators/generic/select.cc

+namespace dali {
+
+DALI_SCHEMA(Select)
+  .DocStr(R"(Builds a batch by selecting each sample from one of the input batches.


Suggested change

.DocStr(R"(Builds a batch by selecting each sample from one of the input batches.

.DocStr(R"(Builds an output by selecting each sample from one of the inputs.

I don't think we need to stress out that the input/output is batch of samples.

JanuszL · 2021-07-07T12:24:24Z

dali/operators/generic/select.cc

+  .DocStr(R"(Builds a batch by selecting each sample from one of the input batches.
+
+This operator is useful for conditionally selecting results of different operations.
+The shapes of the corresponding samples in the inputs may differ, but the number of dimensions


How about layouts?

Done. See below.

JanuszL · 2021-07-07T12:25:41Z

dali/operators/generic/select.cc

+sample is taken.
+Providing a negative index will produce an empty tensor with the same number of dimensions as


Suggested change

sample is taken.

Providing a negative index will produce an empty tensor with the same number of dimensions as

sample is taken.

Providing a negative index will produce an empty tensor with the same number of dimensions as

If you want to have a newline.

JanuszL · 2021-07-07T12:32:56Z

dali/operators/generic/select.cc

+    char *out_ptr = static_cast<char*>(out.raw_mutable_tensor(i));
+    const char *in_ptr = static_cast<const char*>(inp.raw_tensor(i));
+    ptrdiff_t start = 0;
+    for (int block = 0; block < blocks; block++) {
+      ptrdiff_t end = sample_size * (block + 1) / blocks;
+      tp.AddWork([in_ptr, out_ptr, start, end](int) {
+        memcpy(out_ptr + start, in_ptr + start, end - start);
+      }, end - start);
+      start = end;
+    }


I think this is a similar pattern used numpy reader. Maybe we can have a fucntion for this (even, make contiguous can use that in future).

I guess we can factor it out when there are more usages of this kind of copy. NumpyReader is a bit more involved, though.
Regarding MakeContiguous - we won't have cpu2cpu (mixed) MakeContiguous when we unify workspaces and buffer objects.

dali/operators/generic/select.h

JanuszL · 2021-07-07T12:39:22Z

dali/operators/generic/select.h

+        "All inputs must have the same type. "
+        "Got: ", inp0.type().id(), " and ", inp.type().id()));
+
+      DALI_ENFORCE(inp.sample_dim() == sample_dim, make_string(


Shouldn't we check layouts as well?

No; the semantics are that we pick the first non-empty layout - this sort-of implies that they can differ - and there are checks in the executor that require the non-empty input layouts to be of correct length.

JanuszL · 2021-07-07T13:08:17Z

dali/test/python/test_dali_cpu_only.py

        return out
    check_single_input(fn.squeeze, axis_names="YZ", get_data=get_data, input_layout="HWCYZ")

+def test_select_cpu():


I think you need to update test_dali_variable_batch_size.py as well.

The regular test uses variable batch size. I don't think we need to double that.

We need as the test_dali_variable_batch_size.py checks if all operators are tested. There is no other way to enforce this kind of test.

JanuszL · 2021-07-07T13:09:39Z

dali/test/python/test_dali_cpu_only.py

+    data = fn.external_source(source=get_data, layout="HWC")
+    data2 = fn.external_source(source=get_data, layout="HWC")
+    data3 = fn.external_source(source=get_data, layout="HWC")
+    idx = fn.random.uniform(range=[0, 3], dtype=types.INT32)


Suggested change

idx = fn.random.uniform(range=[0, 3], dtype=types.INT32)

idx = fn.random.uniform(range=[0, 2], dtype=types.INT32)

JanuszL · 2021-07-07T20:45:15Z

dali/test/python/test_operator_select.py

+
+from numpy import random
+from numpy.core.fromnumeric import shape
+from nvidia.dali import Pipeline


Why not use decorator?

The decorator makes it easier to have a function that returns a pipeline (it does away with forwarding arguments, with pipe, set_outputs, etc - here there's no such function and forwarding arguments would likely make the code more repetitive.

JanuszL · 2021-07-07T20:57:40Z

dali/test/python/test_operator_select.py

+def test_error_inconsistent_ndim():
+    def check(device):
+        pipe = Pipeline(1, 3, 0)
+        pipe.set_outputs(fn.select(np.float32([0,1]), np.float32([[2,3,4]]), input_idx=2))


Suggested change

pipe.set_outputs(fn.select(np.float32([0,1]), np.float32([[2,3,4]]), input_idx=2))

pipe.set_outputs(fn.select(np.float32([0,1]), np.float32([[2,3,4]]), input_idx=1))

To make sure that dimension problem is the only one and the order of error reporting doesn't matter here.

JanuszL · 2021-07-07T20:57:44Z

dali/test/python/test_operator_select.py

+def test_error_inconsistent_type():
+    def check(device):
+        pipe = Pipeline(1, 3, 0)
+        pipe.set_outputs(fn.select(np.float32([0,1]), np.int32([2,3,4]), input_idx=2))


Suggested change

pipe.set_outputs(fn.select(np.float32([0,1]), np.int32([2,3,4]), input_idx=2))

pipe.set_outputs(fn.select(np.float32([0,1]), np.float32([[2,3,4]]), input_idx=1))

To make sure that dimension problem is the only one and the order of error reporting doesn't matter here.

It's about type:

Suggested change

pipe.set_outputs(fn.select(np.float32([0,1]), np.int32([2,3,4]), input_idx=2))

pipe.set_outputs(fn.select(np.float32([0,1]), np.int32([2,3,4]), input_idx=1))

Right. Just a copy paste from the previous example.

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

mzient · 2021-07-07T22:10:30Z

dali/test/python/test_operator_select.py

+np.random.seed(1234)
+
+def generate_data(ndim, ninp, type, max_batch_size):
+    batch_size = np.random.randint(1, max_batch_size+1)


Variable batch size - voila!

dali-automaton · 2021-07-07T23:11:43Z

CI MESSAGE: [2558517]: BUILD STARTED

dali-automaton · 2021-07-08T00:00:56Z

CI MESSAGE: [2558517]: BUILD FAILED

dali-automaton · 2022-01-27T16:15:41Z

CI MESSAGE: [3840109]: BUILD STARTED

Add operator select which selects a sample from one of the inputs

c197326

based on a run-time provided index. Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

mzient requested a review from a team July 6, 2021 23:26

Minor improvements in docstring style.

c27f49a

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

Add cpu-only test.

23ae133

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

mzient force-pushed the SelectOp branch from 5953b31 to 23ae133 Compare July 6, 2021 23:51

jantonguirao assigned banasraf and JanuszL Jul 7, 2021

jantonguirao marked this pull request as draft July 7, 2021 10:25

JanuszL reviewed Jul 7, 2021

View reviewed changes

dali/operators/generic/select.h Show resolved Hide resolved

JanuszL reviewed Jul 7, 2021

View reviewed changes

Fix review issues.

256c579

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

mzient commented Jul 7, 2021

View reviewed changes

NVIDIA deleted a comment from dali-automaton Jan 27, 2022

	.DocStr(R"(Builds a batch by selecting each sample from one of the input batches.
	.DocStr(R"(Builds an output by selecting each sample from one of the inputs.

		sample is taken.
		Providing a negative index will produce an empty tensor with the same number of dimensions as

	idx = fn.random.uniform(range=[0, 3], dtype=types.INT32)
	idx = fn.random.uniform(range=[0, 2], dtype=types.INT32)

	pipe.set_outputs(fn.select(np.float32([0,1]), np.float32([[2,3,4]]), input_idx=2))
	pipe.set_outputs(fn.select(np.float32([0,1]), np.float32([[2,3,4]]), input_idx=1))

Add operator select which selects a sample from one of the inputs #3130

Are you sure you want to change the base?

Add operator select which selects a sample from one of the inputs #3130

Uh oh!

Conversation

mzient commented Jul 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why we need this PR?

What happened in this PR?

Uh oh!

dali-automaton commented Jul 6, 2021

Uh oh!

dali-automaton commented Jul 6, 2021

Uh oh!

dali-automaton commented Jul 7, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mzient Jul 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mzient Jul 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dali-automaton commented Jul 7, 2021

Uh oh!

dali-automaton commented Jul 8, 2021

Uh oh!

dali-automaton commented Jan 27, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add operator `select` which selects a sample from one of the inputs #3130

Add operator `select` which selects a sample from one of the inputs #3130

mzient commented Jul 6, 2021 •

edited

Loading

mzient Jul 7, 2021 •

edited

Loading

mzient Jul 7, 2021 •

edited

Loading