Skip to content

interregna/JArrow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

J language addon for Apache Arrow

Read (and eventually write) Apache Arrow and Parquet files to and from J. Uses C API.

Installation and Loading

  1. Ensure that you have installed the Arrow GLib (C) Packages for your OS. Instructions can be found at: arrow.apache.org/install.

  2. From your J session:

   install 'github:interregna/JArrow@main'
   load 'data/arrow'

Usage

   install 'github:interregna/JArrow@main'

   load 'data/arrow'
   readParquetTable '~addons/data/arrow/test/test1.parquet'
┌─┬───────────────┐
│a│0 1 2 3 4 5 6 7│
├─┼───────────────┤
│b│8 7 6 5 4 3 2 1│
└─┴───────────────┘
   readsParquetTable '~addons/data/arrow/test/test2.parquet'
┌────────┬──────────┬────────┬─────────┬───────┬────────┬───────┬───────┬────────┬────────┬────────┬──────────┬──────────┬───────────┬────────────┬─────────┬─────────┬───────┬───────────────┐
│Column 1│Column Two│shortCol│ushortCol│intcCol│uintcCol│int_Col│uintCol│int16Col│int32Col│int64Col│float32Col│float64Col│longlongCol│ulonglongCol│DoubleCol│StringCol│boolCol│datetime64Col  │
├────────┼──────────┼────────┼─────────┼───────┼────────┼───────┼───────┼────────┼────────┼────────┼──────────┼──────────┼───────────┼────────────┼─────────┼─────────┼───────┼───────────────┤
│0100000100100100300500100600700100100100    │This     │1946684800000000│
│188.7511188908826344388531.25613.75888888.75    │ is      │0946771200000000│
│277.522277807722738777462.5527.5777777.5    │all      │0946857600000000│
│366.2533366706619133166393.75441.25666666.25    │ valid   │0946944000000000│
│45544455605515527555325355555555    │text     │1947030400000000│
│543.7555543504311821843256.25268.75434343.75    │         │0947116800000000│
│632.56663240328216232187.5182.5323232.5    │data.    │0947203200000000│
│721.257772130214610621118.7596.25212121.25    │         │0947289600000000│
└────────┴──────────┴────────┴─────────┴───────┴────────┴───────┴───────┴────────┴────────┴────────┴──────────┴──────────┴───────────┴────────────┴─────────┴─────────┴───────┴───────────────┘
   readCSVTable '~addons/data/arrow/test/test1.csv'
┌──┬───────────────────────────...
│ID│1 2 3 4 5 8 10 11 12 14 15 ...
├──┼───────────────────────────...
│y │100.669 100.669 100.669 100...
└──┴───────────────────────────...
  NB. Note this is json-line format, not json-format. See: https://jsonlines.org
  readsJsonTable'~Jaddons/data/arrow/test/test1.json'
┌───────┬──────────┐
│name   │date      │
├───────┼──────────┤
│Gilbert│12-13-2014│
│Alexa  │09-04-1983│
│May    │01-01-1924│
│Deloise│04-25-1894│
└───────┴──────────┘
   readsFeatherTable '~addons/data/arrow/test/test1.feather'
┌────┬───┬──────┐
│team│pos│points│
├────┼───┼──────┤
│A   │G  │17    │
│A   │F  │17    │
│B   │G  │15    │
│B   │F  │ 5    │
│C   │G  │11    │
│C   │F  │10    │
│D   │G  │ 5    │
│D   │F  │14    │
└────┴───┴──────┘

(6!:16) and (6!:17) can be used to convert Arrow datetime64 types to and from ISO 8601 format (e.g. 2000-01-11T22:58:04). fromdate32 can be used to convert Arrow date32 types to YYYY M D tuples.

Notes

readsTable minimizes display time in the UI but uses more space

readTable minimizes space but can take more time to display

Development

  1. In Jqt, identify your path for ~Projects jpath '~Projects'

  2. Git clone the JArrow repo within ~Projects

  3. Restart Jqt and open the Arrow project Project > Open > Projects > jarrow

  4. Re-build the addon. Ctrl + F9

  5. Run the addon. F9 (Re-build addon scripts, reload and run tests)

Examples: see test/test1.ijs

TODO
  • Error catching for empty pointers, missing files, and general errors.
  • Dereference / cleanup gobjects and allocated memory
  • Additional data types
    • Dictionaries (need to store lookup tables)
    • Lists
    • Maps
  • Tensors
  • Documentation (see: ~/addons/gui/cobrowser/scriptdoc.ijs)
  • CSV reader
  • JSONL reader
  • Arrow Feather (IPC v1) reader
  • IPC files (".arrow" files)
  • IPC streams (".arrows" files when stored on disk)
  • Flight client
  • Flight server
  • Non-local filesystems (S3)
  • IPC streaming with event-driven calls

About

J add-on for Apache Arrow, Parquet, CSV, & JSON

Topics

Resources

License

Stars

Watchers

Forks

Contributors 4

  •  
  •  
  •  
  •  

Languages