This is a PowerShell module to provide helper cmdlets to use Stan nicely in PowerShell.
This module is built on top of CmdStan v2.18.1.
The module is available in PowerShell Gallery.
Install-Module -Scope CurrentUser psstan
-
Download and install CmdStan according to the guidance of the CmdStan official site. The documentation "CmdStan Interface User's Guide" available on the release page contains the step-by-step instructions to install CmdStan in Windows.
-
Define the variables
$PSSTAN_PATH
and$PSSTAN_TOOLS_PATHS
(in yourprofile.ps1
, for example). The former should be set to the directory where CmdStan is installed. The latter an array of the directories whereg++
andmake
to compile Stan models are installed. For example:
Import-Module psstan
$PSSTAN_PATH = "C:\your_app_path\cmdstan"
$PSSTAN_TOOLS_PATHS = @(
"C:\RTools\bin"
"C:\RTools\mingw_64\bin"
)
New-StanExecutable [[-Path] <string>] [[-MakeOptions] <string>]
This cmdlet compiles the stan model file (.stan
) and builds an executable for training. Internally, this cmdlet uses the make
build tool which calls stanc
and g++
in turn.
As input file, you can specifiy a .stan
model file or a target .exe
executable.
Start-StanSampling [-ModelFile] <string> [-DataFile] <string> [[-ChainCount] <int>] [[-OutputFile] <string>] [[-CombinedFile] <string>] [[-ConsoleFile] <string>] [[-Parallel]] [[-NumSamples] <int>] [[-NumWarmup] <int>] [[-SaveWarmup] <bool>] [[-Thin] <int>] [[-RandomSeed] <int>] [[-Option] <string>]
This cmdlet starts sampling based on the model file specified by the -ModelFile
parameter. When the executable corresponding to the model file does not exist or is older than it, the cmdlet will compile the model file.
The result of sampling is written to the file specified by the -OUtputFile
parameter. The value of the -OUtputFile
parameter should contain '{0}
' as the placeholder of a sampling chain. The default value of the -OUtputFile
parameter is output{0}.csv
(The output file names will be output1.csv
, output2.csv
and so on).
Additionally, the stripped versions of the outputs (that is, without any diagnosis information in them) are saved. Their file names end with _stripped
.
You can specify the number of sampling chains by the -ChainCount
parameter. If you add the -Parallel
switch parameter, each sampling chain is running in parallel.
The outputs of all chains are combined to a single file and saved to the file specified by the -CombinedFile
parameter. The default value of the -CombinedFile
parameter is combined.csv
.
The other parameters take the same effects as those of the original cmdstan executable.
Show-StanSummary [-Path] <string> [[-SigFig] <int>] [[-Autocorr] <int>] [[-CsvFile] <string>]
This cmdlet reads the output file generated by the CmdStan executable and displays its summary. Internally, this cmdlet calls the stansummary
command and accepts the same optional arguments.
Get-StanSummary [-Path] <string> [[-Autocorr] <int>]
This cmdlet reads the output file generated by the CmdStan executable and returns its summary as PSObjects. Internally, this cmdlet calls the stansummary
command.
ConvertTo-StanData [-InputObject] <psobject> [[-DataCountName] <string>] [[-AsString]]
This cmdlet takes objects from the input stream and converts them to StanData
objects from which you can produce output in the R data format that CmdStan requires as training data.
New-StanData [-Name] <string> [-Data] <double[]> [[-Dimensions] <int[]>]
New-StanData [-Name] <string> [-First] <double> [-Last] <double> [[-Dimensions] <int[]>]
New-StanData [-Name] <string> [-Type] {integer | double} [-Count] <int> [[-Dimensions] <int[]>]
This cmdlet creates StanData
objects directly to prepare data in the R data format. See the example section for more details.
The following session shows how to compile and train the bernoulli.stan
model file included in the CmdStan source code.
PS> cd C:\your_app_path\cmdstan\examples\bernoulli
PS> Start-StanSampling bernoulli.stan -ChainCount 2 -Parallel
:
(snip)
:
PS> dir *.csv | fw
Directory: C:\your_app_path\cmdstan\examples\bernoulli
combined.csv output1.csv
output2.csv output_stripped1.csv
output_stripped1.csv
PS> Show-StanSummary output1.csv
Inference for Stan model: bernoulli_model
1 chains: each with iter=(1000); warmup=(0); thin=(1); 1000 iterations saved.
Warmup took (0.011) seconds, 0.011 seconds total
Sampling took (0.043) seconds, 0.043 seconds total
Mean MCSE StdDev 5% 50% 95% N_Eff N_Eff/s R_hat
lp__ -7.3 3.3e-002 7.3e-001 -8.8 -7.0 -6.8 4.8e+002 1.1e+004 1.0e+000
accept_stat__ 0.91 4.4e-003 1.4e-001 0.63 0.97 1.0 9.8e+002 2.3e+004 1.0e+000
stepsize__ 1.1 2.2e-015 1.6e-015 1.1 1.1 1.1 5.0e-001 1.2e+001 1.0e+000
treedepth__ 1.4 1.7e-002 4.9e-001 1.0 1.0 2.0 8.2e+002 1.9e+004 1.0e+000
n_leapfrog__ 2.3 3.2e-002 9.7e-001 1.0 3.0 3.0 9.2e+002 2.1e+004 1.0e+000
divergent__ 0.00 nan 0.0e+000 0.00 0.00 0.00 nan nan nan
energy__ 7.8 4.7e-002 9.7e-001 6.8 7.5 9.7 4.3e+002 1.0e+004 1.0e+000
theta 0.24 7.4e-003 1.2e-001 0.075 0.23 0.45 2.6e+002 6.1e+003 1.0e+000
Samples were drawn using hmc with nuts.
For each parameter, N_Eff is a crude measure of effective sample size,
and R_hat is the potential scale reduction factor on split chains (at
convergence, R_hat=1).
PS> $params = Get-StanSummary output1.csv
PS> $params.theta
name : theta
Mean : 0.244519
MCSE : 0.00736044
StdDev : 0.119169
5% : 0.0748537
50% : 0.229183
95% : 0.45154
N_Eff : 262.133
N_Eff/s : 6096.12
R_hat : 1.00098
PS>
The following example shows how to prepare an R data format file by the ConvertTo-StanData
cmdlet.
PS> Get-Content example.csv
age,income
21,413
34,599
40,779
PS> Import-Csv example.csv | ConvertTo-StanData -DataCountName N | Set-Content example.data.R
PS> Get-Content example.data.R
age <- c(21, 34, 40)
income <- c(413, 599, 779)
N <- 3
The following example shows to how to generate data records in the R data format programatically.
PS> New-StanData array 10, 20, 30, 40 | Set-Content example2.data.R
PS> New-StanData struct 1, 0, 0, 0, 1, 0, 0, 0, 1 -Dimensions 3, 3 | Add-Content example2.data.R
PS> New-StanData zero_values -Type double -Count 10 | Add-Content example2.data.R
PS> New-StanData range -First 100 -Last 200 | Add-Content example2.data.R
PS> Get-Content example2.data.R
array <- c(10, 20, 30, 40)
struct <- structure(c(1, 0, 0, 0, 1, 0, 0, 0, 1), .Dim = c(3, 3))
zero_values <- double(10)
range <- 100:200
- Documentation
- Jugged array support in
New-StanData
This module is licensed under the MIT License. See LICENSE.txt for more information.