diff --git a/Project.toml b/Project.toml index 6e9b3ef..9c3fa29 100644 --- a/Project.toml +++ b/Project.toml @@ -1,7 +1,7 @@ name = "JMPReader" uuid = "d9f7e686-cf87-4d12-8d7a-0e9b8c9fba29" authors = ["Jaakko Ruohio "] -version = "0.1.10" +version = "0.1.11" [deps] CodecZlib = "944b1d66-785c-5afd-91f1-9de20f533193" diff --git a/docs/.gitignore b/docs/.gitignore new file mode 100644 index 0000000..da3d337 --- /dev/null +++ b/docs/.gitignore @@ -0,0 +1,2 @@ +build/ +site/ \ No newline at end of file diff --git a/docs/Project.toml b/docs/Project.toml new file mode 100644 index 0000000..2460d58 --- /dev/null +++ b/docs/Project.toml @@ -0,0 +1,6 @@ +[deps] +Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4" +JMPReader = "d9f7e686-cf87-4d12-8d7a-0e9b8c9fba29" + +[compat] +Documenter = "1" diff --git a/docs/src/dev.md b/docs/src/dev.md new file mode 100644 index 0000000..60a3d42 --- /dev/null +++ b/docs/src/dev.md @@ -0,0 +1,51 @@ +## Testing + +Basic testing with limited number of files +```julia +using Pkg +Pkg.test("JMPReader") +``` + +Utility function `JMPReader.scandir` is provided that scans recursively the argument directory. +For example, +```julia +JMPReader.scandir(joinpath(pathof(JMPReader), "..", "..", "test")) +``` +reads 12 JMP-files, and +```julia +JMPReader.scandir(raw"C:\Program Files\SAS\JMPPRO\17\Samples\Data") +``` +reads successfully 605 JMP-files. + +## Looking into the binary .jmp file + +### Finding strings + +Location of strings in the binary `.jmp` can be found using a snippet like +```julia +fn = joinpath(pathof(JMPReader), "..", "..", "test", "example1.jmp") +raw = read(fn) +seq = reinterpret(UInt8, codeunits("jäääär")) +findall(seq, raw) +``` +returns +``` +1-element Vector{UnitRange{Int64}}: + 1986:1995 +``` + +Hex editor can be useful, for example [Hex Editor for VS Code](https://github.com/microsoft/vscode-hexeditor). + +If string is not found, columns could be GZ compressed. In that case, see options in JMP File->Preferences. + +### Reading columns + +This snippet reads the fourth column + +```julia +fn = joinpath(pathof(JMPReader), "..", "..", "test", "example1.jmp") +io = open(fn) +info = JMPReader.metadata(io) +d = JMPReader.column_data(io, info, 4, Vector{UInt8}()) +close(io) +``` diff --git a/docs/src/index.md b/docs/src/index.md new file mode 100644 index 0000000..2f3b24b --- /dev/null +++ b/docs/src/index.md @@ -0,0 +1,43 @@ +# JMPReader.jl Documentation + +[JMP](https://en.wikipedia.org/wiki/JMP_(statistical_software)) is commercial statistical software. This package provides an independent reader for `.jmp` files +implemented in Julia. + +## Basic usage + +Basic usage is +``` +using JMPReader +fn = joinpath(pathof(JMPReader), "..", "..", "test", "example1.jmp") +df = readjmp(fn) +``` +to read file `fn` and get the data as a Julia `DataFrame`. All columns are included +``` +4×12 DataFrame + Row │ ints floats charconstwidth time date duration charconstwidth2 charvariable16 formula pressures char utf8 charvariable8 + │ Int8 Float64 String DateTime? Date? Millisec… String String String Float64? String String +─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── + 1 │ 1 11.1 a 1976-04-01T21:12:00 2024-01-13 2322000 milliseconds a aa 2 101.325 ꙮꙮꙮ a + 2 │ 2 22.2 b 1984-08-06T23:58:00 2024-01-14 364000 milliseconds bb bbbb 4 missing 🚴💨 bb + 3 │ 3 33.3 c 2003-06-02T17:00:00 missing 229000 milliseconds ccc cccccccc 6 2.6 jäääär cc + 4 │ 4 44.4 d missing 2032-02-12 0 milliseconds dddd abcdefghijabcdefghijabcdefghijab… 8 4.63309e110 辛口 abcdefghijkl +``` + +## Choosing columns + +Two keyword arguments are available, `include_columns` and `exclude_columns` +``` +df = readjmp(fn, include_columns=[2, "date", r"^char"], exclude_columns=[r"varia"]) +``` +returns the second column `floats`, a column named `date`, columns that start with `char`, +but excluding columns whose name contain a string `varia`. +``` +4×5 DataFrame + Row │ floats charconstwidth date charconstwidth2 char utf8 + │ Float64 String Date? String String +─────┼───────────────────────────────────────────────────────────────── + 1 │ 11.1 a 2024-01-13 a ꙮꙮꙮ + 2 │ 22.2 b 2024-01-14 bb 🚴💨 + 3 │ 33.3 c missing ccc jäääär + 4 │ 44.4 d 2032-02-12 dddd 辛口 +```