Skip to content

Commit

Permalink
Propagate identity through NumpyArray::getitem. (#9)
Browse files Browse the repository at this point in the history
`Identity` is correctly passed through `NumpyArray` slices and `__getitem__` uses `get`, `slice`, or the full `getitem`, depending on argument complexity.

* Identity must remain contiguous, so NumpyArray::getitem_next will need to handle the SliceAt case.

* Identity passes through slicing.

* Add tests for identity and slicing.

* Simple __getitem__ goes through ::get or ::slice.

* [skip ci] update README
  • Loading branch information
jpivarski authored Sep 21, 2019
1 parent 5af7e6b commit 9fb0a9a
Show file tree
Hide file tree
Showing 13 changed files with 216 additions and 56 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ The following features of awkward 0.x will be features of awkward 1.x.
* 2019-08-30 (PR [#6](../../pull/6)): added iteration to both C++ and Numba, as well as the first "operation," `awkward1.tolist`, which turns an awkward array into Python lists (and eventually dicts, etc.).
* 2019-09-02 (PR [#7](../../pull/7)): refactored `Index`, `Identity`, and `ListOffsetArray` (and any other array types with `Index`, which is nearly all of them) to have a 32-bit and a 64-bit version. My original plan to only support 64-bit in "chunked arrays" with 32-bit everywhere else is hereby scrapped—both bit widths will be supported on all indexes. Non-native endian, non-trivial strides, and multidimensional `Index`/`Identity` are not supported, though all of these features are allowed for `NumpyArray` (which is _content_, not an _index_). The only limitation on `NumpyArray` is that data must be C-ordered, not Fortran-ordered.
* 2019-09-21 (PR [#8](../../pull/8)): C++ NumpyArray::getitem is done, setting the pattern for other classes (external C functions). The Numba and Identity extensions are not done, which would be necessary to fully set the pattern. This involved a lot of investigation (see [studies/getitem.py](https://github.com/jpivarski/awkward-1.0/blob/master/studies/getitem.py)).
* 2019-09-21 (PR [#9](../../pull/9)): `Identity` is correctly passed through `NumpyArray` slices and `__getitem__` uses `get`, `slice`, or the full `getitem`, depending on argument complexity.

## Roadmap

Expand Down
5 changes: 4 additions & 1 deletion include/awkward/Identity.h
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
#include <memory>

#include "awkward/cpu-kernels/util.h"
#include "awkward/Index.h"

namespace awkward {
class Identity {
Expand All @@ -34,8 +35,9 @@ namespace awkward {
virtual const std::string tostring_part(const std::string indent, const std::string pre, const std::string post) const = 0;
virtual const std::shared_ptr<Identity> slice(int64_t start, int64_t stop) const = 0;
virtual const std::shared_ptr<Identity> shallow_copy() const = 0;
virtual const std::shared_ptr<Identity> getitem_carry_64(Index64& carry) const = 0;

private:
protected:
const Ref ref_;
const FieldLoc fieldloc_;
int64_t offset_;
Expand All @@ -58,6 +60,7 @@ namespace awkward {
virtual const std::string tostring_part(const std::string indent, const std::string pre, const std::string post) const;
virtual const std::shared_ptr<Identity> slice(int64_t start, int64_t stop) const;
virtual const std::shared_ptr<Identity> shallow_copy() const;
virtual const std::shared_ptr<Identity> getitem_carry_64(Index64& carry) const;

const std::string tostring() const;
const std::vector<T> get(int64_t at) const;
Expand Down
3 changes: 1 addition & 2 deletions include/awkward/RawArray.h
Original file line number Diff line number Diff line change
Expand Up @@ -60,8 +60,7 @@ namespace awkward {
virtual void setid() {
Identity32* id32 = new Identity32(Identity::newref(), Identity::FieldLoc(), 1, length());
std::shared_ptr<Identity> newid(id32);
Error err = awkward_identity_new32(length(), id32->ptr().get());
HANDLE_ERROR(err);
awkward_identity_new32(length(), id32->ptr().get());
setid(newid);
}
virtual void setid(const std::shared_ptr<Identity> id) { id_ = id; }
Expand Down
4 changes: 4 additions & 0 deletions include/awkward/cpu-kernels/getitem.h
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,14 @@ extern "C" {

void awkward_slicearray_ravel_64(int64_t* toptr, const int64_t* fromptr, int64_t ndim, const int64_t* shape, const int64_t* strides);

Error awkward_identity32_getitem_carry_64(int32_t* newidentityptr, const int32_t* identityptr, const int64_t* carryptr, int64_t lencarry, int64_t offset, int64_t width, int64_t length);
Error awkward_identity64_getitem_carry_64(int64_t* newidentityptr, const int64_t* identityptr, const int64_t* carryptr, int64_t lencarry, int64_t offset, int64_t width, int64_t length);

void awkward_numpyarray_contiguous_init_64(int64_t* toptr, int64_t skip, int64_t stride);
void awkward_numpyarray_contiguous_copy_64(uint8_t* toptr, const uint8_t* fromptr, int64_t len, int64_t stride, int64_t offset, const int64_t* pos);
void awkward_numpyarray_contiguous_next_64(int64_t* topos, const int64_t* frompos, int64_t len, int64_t skip, int64_t stride);
void awkward_numpyarray_getitem_next_null_64(uint8_t* toptr, const uint8_t* fromptr, int64_t len, int64_t stride, int64_t offset, const int64_t* pos);
void awkward_numpyarray_getitem_next_at_64(int64_t* nextcarryptr, const int64_t* carryptr, int64_t lencarry, int64_t skip, int64_t at);
void awkward_numpyarray_getitem_next_slice_64(int64_t* nextcarryptr, const int64_t* carryptr, int64_t lencarry, int64_t lenhead, int64_t skip, int64_t start, int64_t step);
void awkward_numpyarray_getitem_next_slice_advanced_64(int64_t* nextcarryptr, int64_t* nextadvancedptr, const int64_t* carryptr, const int64_t* advancedptr, int64_t lencarry, int64_t lenhead, int64_t skip, int64_t start, int64_t step);
void awkward_numpyarray_getitem_next_array_64(int64_t* nextcarryptr, int64_t* nextadvancedptr, const int64_t* carryptr, const int64_t* flatheadptr, int64_t lencarry, int64_t lenflathead, int64_t skip);
Expand Down
10 changes: 5 additions & 5 deletions include/awkward/cpu-kernels/identity.h
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,11 @@
#include "awkward/cpu-kernels/util.h"

extern "C" {
Error awkward_identity_new32(int64_t length, int32_t* to);
Error awkward_identity_new64(int64_t length, int32_t* to);
Error awkward_identity_32to64(int64_t length, int32_t* from, int64_t* to);
Error awkward_identity_from_listfoffsets32(int64_t length, int64_t width, int32_t* offsets, int32_t* from, int64_t tolength, int32_t* to);
Error awkward_identity_from_listfoffsets64(int64_t length, int64_t width, int64_t* offsets, int64_t* from, int64_t tolength, int64_t* to);
void awkward_identity_new32(int64_t length, int32_t* to);
void awkward_identity_new64(int64_t length, int32_t* to);
void awkward_identity_32to64(int64_t length, int32_t* from, int64_t* to);
void awkward_identity_from_listfoffsets32(int64_t length, int64_t width, int32_t* offsets, int32_t* from, int64_t tolength, int32_t* to);
void awkward_identity_from_listfoffsets64(int64_t length, int64_t width, int64_t* offsets, int64_t* from, int64_t tolength, int64_t* to);
}

#endif // AWKWARDCPU_IDENTITY_H_
29 changes: 29 additions & 0 deletions src/cpu-kernels/getitem.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,25 @@ void awkward_slicearray_ravel_64(int64_t* toptr, const int64_t* fromptr, int64_t
awkward_slicearray_ravel<int64_t>(toptr, fromptr, ndim, shape, strides);
}

template <typename ID, typename T>
Error awkward_identity_getitem_carry(ID* newidentityptr, const ID* identityptr, const T* carryptr, int64_t lencarry, int64_t offset, int64_t width, int64_t length) {
for (int64_t i = 0; i < lencarry; i++) {
if (carryptr[i] >= length) {
return "index out of range for identity";
}
for (int64_t j = 0; j < width; j++) {
newidentityptr[width*i + j] = identityptr[offset + width*carryptr[i] + j];
}
}
return kNoError;
}
Error awkward_identity32_getitem_carry_64(int32_t* newidentityptr, const int32_t* identityptr, const int64_t* carryptr, int64_t lencarry, int64_t offset, int64_t width, int64_t length) {
return awkward_identity_getitem_carry<int32_t, int64_t>(newidentityptr, identityptr, carryptr, lencarry, offset, width, length);
}
Error awkward_identity64_getitem_carry_64(int64_t* newidentityptr, const int64_t* identityptr, const int64_t* carryptr, int64_t lencarry, int64_t offset, int64_t width, int64_t length) {
return awkward_identity_getitem_carry<int64_t, int64_t>(newidentityptr, identityptr, carryptr, lencarry, offset, width, length);
}

template <typename T>
void awkward_numpyarray_contiguous_init(T* toptr, int64_t skip, int64_t stride) {
for (int64_t i = 0; i < skip; i++) {
Expand Down Expand Up @@ -107,6 +126,16 @@ void awkward_numpyarray_getitem_next_null_64(uint8_t* toptr, const uint8_t* from
awkward_numpyarray_getitem_next_null(toptr, fromptr, len, stride, offset, pos);
}

template <typename T>
void awkward_numpyarray_getitem_next_at(T* nextcarryptr, const T* carryptr, int64_t lencarry, int64_t skip, int64_t at) {
for (int64_t i = 0; i < lencarry; i++) {
nextcarryptr[i] = skip*carryptr[i] + at;
}
}
void awkward_numpyarray_getitem_next_at_64(int64_t* nextcarryptr, const int64_t* carryptr, int64_t lencarry, int64_t skip, int64_t at) {
awkward_numpyarray_getitem_next_at(nextcarryptr, carryptr, lencarry, skip, at);
}

template <typename T>
void awkward_numpyarray_getitem_next_slice(T* nextcarryptr, const T* carryptr, int64_t lencarry, int64_t lenhead, int64_t skip, int64_t start, int64_t step) {
for (int64_t i = 0; i < lencarry; i++) {
Expand Down
25 changes: 11 additions & 14 deletions src/cpu-kernels/identity.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3,28 +3,26 @@
#include "awkward/cpu-kernels/identity.h"

template <typename T>
Error awkward_identity_new(int64_t length, T* to) {
void awkward_identity_new(int64_t length, T* to) {
for (T i = 0; i < length; i++) {
to[i] = i;
}
return kNoError;
}
Error awkward_identity_new32(int64_t length, int32_t* to) {
return awkward_identity_new<int32_t>(length, to);
void awkward_identity_new32(int64_t length, int32_t* to) {
awkward_identity_new<int32_t>(length, to);
}
Error awkward_identity_new64(int64_t length, int64_t* to) {
return awkward_identity_new<int64_t>(length, to);
void awkward_identity_new64(int64_t length, int64_t* to) {
awkward_identity_new<int64_t>(length, to);
}

Error awkward_identity_32to64(int64_t length, int32_t* from, int64_t* to) {
void awkward_identity_32to64(int64_t length, int32_t* from, int64_t* to) {
for (int64_t i = 0; i < length; i++) {
to[i]= (int64_t)from[i];
}
return kNoError;
}

template <typename T>
Error awkward_identity_from_listfoffsets(int64_t length, int64_t width, T* offsets, T* from, int64_t tolength, T* to) {
void awkward_identity_from_listfoffsets(int64_t length, int64_t width, T* offsets, T* from, int64_t tolength, T* to) {
int64_t k = 0;
for (int64_t i = 0; i < length; i++) {
for (T subi = 0; subi < offsets[i + 1] - offsets[i]; subi++) {
Expand All @@ -35,11 +33,10 @@ Error awkward_identity_from_listfoffsets(int64_t length, int64_t width, T* offse
k++;
}
}
return kNoError;
}
Error awkward_identity_from_listfoffsets32(int64_t length, int64_t width, int32_t* offsets, int32_t* from, int64_t tolength, int32_t* to) {
return awkward_identity_from_listfoffsets<int32_t>(length, width, offsets, from, tolength, to);
void awkward_identity_from_listfoffsets32(int64_t length, int64_t width, int32_t* offsets, int32_t* from, int64_t tolength, int32_t* to) {
awkward_identity_from_listfoffsets<int32_t>(length, width, offsets, from, tolength, to);
}
Error awkward_identity_from_listfoffsets64(int64_t length, int64_t width, int64_t* offsets, int64_t* from, int64_t tolength, int64_t* to) {
return awkward_identity_from_listfoffsets<int64_t>(length, width, offsets, from, tolength, to);
void awkward_identity_from_listfoffsets64(int64_t length, int64_t width, int64_t* offsets, int64_t* from, int64_t tolength, int64_t* to) {
awkward_identity_from_listfoffsets<int64_t>(length, width, offsets, from, tolength, to);
}
37 changes: 36 additions & 1 deletion src/libawkward/Identity.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@
#include <iomanip>
#include <sstream>
#include <type_traits>
// #include <utility>

#include "awkward/cpu-kernels/getitem.h"

#include "awkward/Identity.h"

Expand Down Expand Up @@ -54,6 +55,40 @@ const std::shared_ptr<Identity> IdentityOf<T>::shallow_copy() const {
return std::shared_ptr<Identity>(new IdentityOf<T>(ref(), fieldloc(), offset(), width(), length(), ptr_));
}

template <typename T>
const std::shared_ptr<Identity> IdentityOf<T>::getitem_carry_64(Index64& carry) const {
IdentityOf<T>* rawout = new IdentityOf<T>(ref_, fieldloc_, width_, carry.length());
std::shared_ptr<Identity> out(rawout);

Error assign_err = kNoError;
if (std::is_same<T, int32_t>::value) {
assign_err = awkward_identity32_getitem_carry_64(
reinterpret_cast<int32_t*>(rawout->ptr().get()),
reinterpret_cast<int32_t*>(ptr_.get()),
carry.ptr().get(),
carry.length(),
offset_,
width_,
length_);
}
else if (std::is_same<T, int64_t>::value) {
assign_err = awkward_identity64_getitem_carry_64(
reinterpret_cast<int64_t*>(rawout->ptr().get()),
reinterpret_cast<int64_t*>(ptr_.get()),
carry.ptr().get(),
carry.length(),
offset_,
width_,
length_);
}
else {
throw std::runtime_error("unrecognized identity");
}
HANDLE_ERROR(assign_err)

return out;
}

template <typename T>
const std::vector<T> IdentityOf<T>::get(int64_t at) const {
std::vector<T> out;
Expand Down
6 changes: 2 additions & 4 deletions src/libawkward/ListOffsetArray.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,7 @@ template <typename T>
void ListOffsetArrayOf<T>::setid() {
Identity32* rawid = new Identity32(Identity::newref(), Identity::FieldLoc(), 1, length());
std::shared_ptr<Identity> newid(rawid);
Error err = awkward_identity_new32(length(), rawid->ptr().get());
HANDLE_ERROR(err);
awkward_identity_new32(length(), rawid->ptr().get());
setid(newid);
}

Expand All @@ -30,8 +29,7 @@ void ListOffsetArrayOf<T>::setid(const std::shared_ptr<Identity> id) {
if (rawid32 && std::is_same<T, int32_t>::value) {
Identity32* rawsubid = new Identity32(Identity::newref(), rawid32->fieldloc(), rawid32->width() + 1, content_.get()->length());
std::shared_ptr<Identity> newsubid(rawsubid);
Error err = awkward_identity_from_listfoffsets32(length(), rawid32->width(), reinterpret_cast<int32_t*>(offsets_.ptr().get()), rawid32->ptr().get(), content_.get()->length(), rawsubid->ptr().get());
HANDLE_ERROR(err);
awkward_identity_from_listfoffsets32(length(), rawid32->width(), reinterpret_cast<int32_t*>(offsets_.ptr().get()), rawid32->ptr().get(), content_.get()->length(), rawsubid->ptr().get());
content_.get()->setid(newsubid);
}
else {
Expand Down
Loading

0 comments on commit 9fb0a9a

Please sign in to comment.