-
Notifications
You must be signed in to change notification settings - Fork 648
Add operator select which selects a sample from one of the inputs
#3130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
based on a run-time provided index. Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
|
CI MESSAGE: [2553368]: BUILD STARTED |
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
|
CI MESSAGE: [2553418]: BUILD STARTED |
|
CI MESSAGE: [2553418]: BUILD FAILED |
dali/operators/generic/select.cc
Outdated
| namespace dali { | ||
|
|
||
| DALI_SCHEMA(Select) | ||
| .DocStr(R"(Builds a batch by selecting each sample from one of the input batches. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| .DocStr(R"(Builds a batch by selecting each sample from one of the input batches. | |
| .DocStr(R"(Builds an output by selecting each sample from one of the inputs. |
I don't think we need to stress out that the input/output is batch of samples.
| .DocStr(R"(Builds a batch by selecting each sample from one of the input batches. | ||
| This operator is useful for conditionally selecting results of different operations. | ||
| The shapes of the corresponding samples in the inputs may differ, but the number of dimensions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about layouts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. See below.
| sample is taken. | ||
| Providing a negative index will produce an empty tensor with the same number of dimensions as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| sample is taken. | |
| Providing a negative index will produce an empty tensor with the same number of dimensions as | |
| sample is taken. | |
| Providing a negative index will produce an empty tensor with the same number of dimensions as |
If you want to have a newline.
| char *out_ptr = static_cast<char*>(out.raw_mutable_tensor(i)); | ||
| const char *in_ptr = static_cast<const char*>(inp.raw_tensor(i)); | ||
| ptrdiff_t start = 0; | ||
| for (int block = 0; block < blocks; block++) { | ||
| ptrdiff_t end = sample_size * (block + 1) / blocks; | ||
| tp.AddWork([in_ptr, out_ptr, start, end](int) { | ||
| memcpy(out_ptr + start, in_ptr + start, end - start); | ||
| }, end - start); | ||
| start = end; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a similar pattern used numpy reader. Maybe we can have a fucntion for this (even, make contiguous can use that in future).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we can factor it out when there are more usages of this kind of copy. NumpyReader is a bit more involved, though.
Regarding MakeContiguous - we won't have cpu2cpu (mixed) MakeContiguous when we unify workspaces and buffer objects.
| "All inputs must have the same type. " | ||
| "Got: ", inp0.type().id(), " and ", inp.type().id())); | ||
|
|
||
| DALI_ENFORCE(inp.sample_dim() == sample_dim, make_string( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we check layouts as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No; the semantics are that we pick the first non-empty layout - this sort-of implies that they can differ - and there are checks in the executor that require the non-empty input layouts to be of correct length.
| return out | ||
| check_single_input(fn.squeeze, axis_names="YZ", get_data=get_data, input_layout="HWCYZ") | ||
|
|
||
| def test_select_cpu(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to update test_dali_variable_batch_size.py as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regular test uses variable batch size. I don't think we need to double that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need as the test_dali_variable_batch_size.py checks if all operators are tested. There is no other way to enforce this kind of test.
| data = fn.external_source(source=get_data, layout="HWC") | ||
| data2 = fn.external_source(source=get_data, layout="HWC") | ||
| data3 = fn.external_source(source=get_data, layout="HWC") | ||
| idx = fn.random.uniform(range=[0, 3], dtype=types.INT32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| idx = fn.random.uniform(range=[0, 3], dtype=types.INT32) | |
| idx = fn.random.uniform(range=[0, 2], dtype=types.INT32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup.
|
|
||
| from numpy import random | ||
| from numpy.core.fromnumeric import shape | ||
| from nvidia.dali import Pipeline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use decorator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The decorator makes it easier to have a function that returns a pipeline (it does away with forwarding arguments, with pipe, set_outputs, etc - here there's no such function and forwarding arguments would likely make the code more repetitive.
| def test_error_inconsistent_ndim(): | ||
| def check(device): | ||
| pipe = Pipeline(1, 3, 0) | ||
| pipe.set_outputs(fn.select(np.float32([0,1]), np.float32([[2,3,4]]), input_idx=2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| pipe.set_outputs(fn.select(np.float32([0,1]), np.float32([[2,3,4]]), input_idx=2)) | |
| pipe.set_outputs(fn.select(np.float32([0,1]), np.float32([[2,3,4]]), input_idx=1)) |
To make sure that dimension problem is the only one and the order of error reporting doesn't matter here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True.
| def test_error_inconsistent_type(): | ||
| def check(device): | ||
| pipe = Pipeline(1, 3, 0) | ||
| pipe.set_outputs(fn.select(np.float32([0,1]), np.int32([2,3,4]), input_idx=2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| pipe.set_outputs(fn.select(np.float32([0,1]), np.int32([2,3,4]), input_idx=2)) | |
| pipe.set_outputs(fn.select(np.float32([0,1]), np.float32([[2,3,4]]), input_idx=1)) |
To make sure that dimension problem is the only one and the order of error reporting doesn't matter here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's about type:
| pipe.set_outputs(fn.select(np.float32([0,1]), np.int32([2,3,4]), input_idx=2)) | |
| pipe.set_outputs(fn.select(np.float32([0,1]), np.int32([2,3,4]), input_idx=1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Just a copy paste from the previous example.
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
| np.random.seed(1234) | ||
|
|
||
| def generate_data(ndim, ninp, type, max_batch_size): | ||
| batch_size = np.random.randint(1, max_batch_size+1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable batch size - voila!
|
CI MESSAGE: [2558517]: BUILD STARTED |
|
CI MESSAGE: [2558517]: BUILD FAILED |
|
CI MESSAGE: [3840109]: BUILD STARTED |
Signed-off-by: Michał Zientkiewicz mzient@gmail.com
Why we need this PR?
Pick one, remove the rest
What happened in this PR?
Fill relevant points, put NA otherwise. Replace anything inside []
JIRA TASK: DALI-2202