Skip to content

library for working with tabular data in Julia

Notifications You must be signed in to change notification settings

fpepin/JuliaData

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

255 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

JuliaData

Library for working with tabular data in Julia using DataFrame's.

Features

  • DataFrame for efficient tabular storage of two-dimensional data
  • Minimized data copying
  • Default columns can handle missing values (NA's) of any type
  • PooledDataFrame for efficient storage of factor-like arrays for characters, integers, and other types
  • Flexible indexing
  • SubDataFrame for efficient subset referencing without copies
  • Grouping operations inspired by plyr, pandas, and data.table
  • Basic merge functionality
  • stack and unstack for long/wide conversions
  • Pipelining support (|) for many operations
  • Several typical R-style functions, including head, tail, summary, unique, duplicated, with, within, and more
  • Formula and design matrix implementation

Demos

Here's a minimal demo showing some grouping operations:

julia> d = DataFrame(quote     # expressions are one way to create a DataFrame
           x = randn(10)
           y = randn(10)
           i = randi(3,10)
           j = randi(3,10)
       end);

julia> dump(d)    # dump() is like R's str()
DataFrame  10 observations of 4 variables
  x: DataVec{Float64}(10) [-0.22496343871037897,-0.4033933555989207,0.6027847717547058,0.06671669747901597]
  y: DataVec{Float64}(10) [0.21904975091285417,-1.3275512477731726,2.266353546459277,-0.19840910239041679]
  i: DataVec{Int64}(10) [2,1,3,1]
  j: DataVec{Int64}(10) [3,2,1,2]

julia> head(d)
DataFrame  (6,4)
                x         y i j
[1,]    -0.224963   0.21905 2 3
[2,]    -0.403393  -1.32755 1 2
[3,]     0.602785   2.26635 3 1
[4,]    0.0667167 -0.198409 1 2
[5,]      1.68303  -1.11183 1 3
[6,]     0.346034   1.68227 2 1

julia> d[1:3, ["x","y"]]     # indexing is similar to R's
DataFrame  (3,2)
                x        y
[1,]    -0.224963  0.21905
[2,]    -0.403393 -1.32755
[3,]     0.602785  2.26635

julia> # Group on column i, and pipe (|) that result to an expression
julia> # that creates the column x_sum. 
julia> groupby(d, "i") | :(x_sum = sum(x))     
DataFrame  (3,2)
        i    x_sum
[1,]    1  2.06822
[2,]    2 -1.80867
[3,]    3 0.319517

julia> groupby(d, "i") | :sum   # Another way to operate on a grouping
DataFrame  (3,4)
        i    x_sum    y_sum j_sum
[1,]    1  2.06822 -2.73985     8
[2,]    2 -1.80867  1.83489     7
[3,]    3 0.319517  1.03072     2

See demo/workflow_demo.jl for a basic demo of the parts of a Julian data workflow.

See demo/design_demo.jl for a more in-depth demo of DataFrame and related types and library.

Documentation

Development work

The Issues highlight a number of issues and ideas for enhancements. Here are some particular enhancements under way or under discussion:

Possible changes to Julia

DataFrames fit well with Julia's syntax, but some features would improve the user experience, including keyword function arguments (Julia issue 485), "~" for easier expression syntax, and overloading "." for easier column access (df.colA). See here for a bit more information.

Current status

Please consider this a development preview. Many things work, but expect some rough edges. We hope that this can become a standard Julia package.

About

library for working with tabular data in Julia

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published