Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for structured data #40

Open
esheldon opened this issue Nov 19, 2015 · 2 comments
Open

support for structured data #40

esheldon opened this issue Nov 19, 2015 · 2 comments

Comments

@esheldon
Copy link

It is convenient to have data packed into structures. For example, if a
calculation requires a large number of pieces of information, it is preferable
to have the following ( I realize this is a bit of a contrived example)

def func(sarray):
    for i=0,range(sarray.size):
        x = sarray['a'][i] + sarray['b'][i] + ... sarray['z'][i]
        # do something with x

as opposed to

def func(a, b, c, d, ......, z):
    for i in xrange(a.size):
        x = a[i] + b[i]  ... + z[i];
        # do something with x

This could be solved by accepting structured arrays for input

sarray = zeros(n, dtype=[('a','f8'),('b','f8'),....('z','f8')])

res=func(sarray)

(edited for bugs)

@cosmo-ethz
Copy link
Collaborator

Thank for the input @esheldon. In this particular example the number of parameters to pass is of course reduced but on the other hand the equation becomes more difficult to read. Anyway I do see cases where this could be convenient.

However, introducing structured arrays is a bit tricky:

  • HOPE doesn’t support string literals, which would be required to access the columns
  • Numpy’s structured arrays allow the user to define arrays with different data types per column. Something that is not possible in pure C.

I’m personally not a big fan of structured arrays (I don’t like the synthax sarray[“a”], prefer Pandas approach sarray.a). Anyway, let me think about this, maybe there is a good solution to this.
J

@esheldon
Copy link
Author

(sorry the formatting didn't go through in the email)

structured arrays map directly to an array of C structures with the same
datatypes. The array can be created with or without alignment of the
structure

dt=[('ra','f8'),('dec','f8'),('index','i4')]

# maps to packed C structures, no alignment
a = zeros(n, dtype=dt)

# maps to normal, unpacked C structures
dtype=numpy.dtype(dt, align=True)
a = zeros(n, dtype=dtype)

For the packed version you would need to make sure the struct in C is also
packed, but for aligned it is a direct map. For simplicity you could demand
only arrays created with align=True

In C the python sarray['a'][35] maps to sarray[35].a

notation:

structured arrays are built into numpy, so they are in a sense fundamental.
Codes like pyfits and fitsio return structured arrays (although pyfits wraps
it)

Also the sarray.a notation conflicts with python attributes. For example, you
can't have a field called "size" because that is already used for the size of
the array.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants