Skip to content

Commit

Permalink
Add docs
Browse files Browse the repository at this point in the history
  • Loading branch information
jaakkor2 committed Sep 29, 2024
1 parent b92f6d1 commit de34ae5
Show file tree
Hide file tree
Showing 5 changed files with 103 additions and 1 deletion.
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "JMPReader"
uuid = "d9f7e686-cf87-4d12-8d7a-0e9b8c9fba29"
authors = ["Jaakko Ruohio <jaakkor2@gmail.com>"]
version = "0.1.10"
version = "0.1.11"

[deps]
CodecZlib = "944b1d66-785c-5afd-91f1-9de20f533193"
Expand Down
2 changes: 2 additions & 0 deletions docs/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
build/
site/
6 changes: 6 additions & 0 deletions docs/Project.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
[deps]
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
JMPReader = "d9f7e686-cf87-4d12-8d7a-0e9b8c9fba29"

[compat]
Documenter = "1"
51 changes: 51 additions & 0 deletions docs/src/dev.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
## Testing

Basic testing with limited number of files
```julia
using Pkg
Pkg.test("JMPReader")
```

Utility function `JMPReader.scandir` is provided that scans recursively the argument directory.
For example,
```julia
JMPReader.scandir(joinpath(pathof(JMPReader), "..", "..", "test"))
```
reads 12 JMP-files, and
```julia
JMPReader.scandir(raw"C:\Program Files\SAS\JMPPRO\17\Samples\Data")
```
reads successfully 605 JMP-files.

## Looking into the binary .jmp file

### Finding strings

Location of strings in the binary `.jmp` can be found using a snippet like
```julia
fn = joinpath(pathof(JMPReader), "..", "..", "test", "example1.jmp")
raw = read(fn)
seq = reinterpret(UInt8, codeunits("jäääär"))
findall(seq, raw)
```
returns
```
1-element Vector{UnitRange{Int64}}:
1986:1995
```

Hex editor can be useful, for example [Hex Editor for VS Code](https://github.com/microsoft/vscode-hexeditor).

If string is not found, columns could be GZ compressed. In that case, see options in JMP File->Preferences.

### Reading columns

This snippet reads the fourth column

```julia
fn = joinpath(pathof(JMPReader), "..", "..", "test", "example1.jmp")
io = open(fn)
info = JMPReader.metadata(io)
d = JMPReader.column_data(io, info, 4, Vector{UInt8}())
close(io)
```
43 changes: 43 additions & 0 deletions docs/src/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# JMPReader.jl Documentation

[JMP](https://en.wikipedia.org/wiki/JMP_(statistical_software)) is commercial statistical software. This package provides an independent reader for `.jmp` files
implemented in Julia.

## Basic usage

Basic usage is
```
using JMPReader
fn = joinpath(pathof(JMPReader), "..", "..", "test", "example1.jmp")
df = readjmp(fn)
```
to read file `fn` and get the data as a Julia `DataFrame`. All columns are included
```
4×12 DataFrame
Row │ ints floats charconstwidth time date duration charconstwidth2 charvariable16 formula pressures char utf8 charvariable8
│ Int8 Float64 String DateTime? Date? Millisec… String String String Float64? String String
─────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ 1 11.1 a 1976-04-01T21:12:00 2024-01-13 2322000 milliseconds a aa 2 101.325 ꙮꙮꙮ a
2 │ 2 22.2 b 1984-08-06T23:58:00 2024-01-14 364000 milliseconds bb bbbb 4 missing 🚴💨 bb
3 │ 3 33.3 c 2003-06-02T17:00:00 missing 229000 milliseconds ccc cccccccc 6 2.6 jäääär cc
4 │ 4 44.4 d missing 2032-02-12 0 milliseconds dddd abcdefghijabcdefghijabcdefghijab… 8 4.63309e110 辛口 abcdefghijkl
```

## Choosing columns

Two keyword arguments are available, `include_columns` and `exclude_columns`
```
df = readjmp(fn, include_columns=[2, "date", r"^char"], exclude_columns=[r"varia"])
```
returns the second column `floats`, a column named `date`, columns that start with `char`,
but excluding columns whose name contain a string `varia`.
```
4×5 DataFrame
Row │ floats charconstwidth date charconstwidth2 char utf8
│ Float64 String Date? String String
─────┼─────────────────────────────────────────────────────────────────
1 │ 11.1 a 2024-01-13 a ꙮꙮꙮ
2 │ 22.2 b 2024-01-14 bb 🚴💨
3 │ 33.3 c missing ccc jäääär
4 │ 44.4 d 2032-02-12 dddd 辛口
```

0 comments on commit de34ae5

Please sign in to comment.