Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
etwinn authored Nov 28, 2023
1 parent 6d6941e commit ef8c437
Showing 1 changed file with 11 additions and 3 deletions.
14 changes: 11 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ This repository contains an R package for generating synthetic alpha shapes from

## Introduction

Understanding morphological variation is an important task in many applications. Recent studies in computational biology have focused on developing computational tools for the task of sub-image selection which aims at identifying structural features that best describe the variation between classes of shapes. A major part in assessing the utility of these approaches is to demonstrate their performance on both simulated and real datasets. However, when creating a model for shape statistics, real data can be difficult to access and the sample sizes for these data are often small due to them being expensive to collect. Meanwhile, the landscape of current shape simulation methods has been mostly limited to approaches that use black-box inference---making it difficult to systematically assess the power and calibration of sub-image models.

## The Method

The **ashapesampler** package supports two mechanisms for sampling shapes in two and three dimensions, which we outline below. The first, empirically sampling based on an existing data set, was highlighted in the original main text of the paper. The second, probabalistic sampling from a known distrubtion, is the computational implementation of the theory derived in that paper.
Expand All @@ -12,15 +14,21 @@ The **ashapesampler** package supports two mechanisms for sampling shapes in two

The pipeline consists of four key steps:
1. Input the aligned shapes as simplicial complexes. A simplicial complex object in this case is a list containing (a) the Euclidean coordinates of the vertices and (b) a list of all vertices, edges, faces, and tetrahedra.
2. Calculate the reach for each shape in the data set - this reach is estimated based on boundary points of the simplicial complex. Users can tune the summary statistic used for the estimated reach to be mean, median, or minimum. Default is mean.
3. Sample new points.
2. Calculate the reach for each shape in the data set - this reach is estimated based on boundary points of the simplicial complex. Users can tune the summary statistic used for the estimated reach to be mean, median, or minimum. Default is mean. Once we have the reach for each shape, users can take some summary statistic - usually mean - of the `J` shapes randomly chosen to produce the new shape.
3. Sample new points, using the combined point cloud of the randomly selected `J` shapes and the estimated reach `tau` derived from the `J` shapes. Parameters for rejection sampling can be adjusted by the users and are discussed further in the vignettes. Note that this step is generally the longest computationally - if the user reaches a bottleneck, check to the value of `tau` relative to the area/volume of the combined point cloud. Parallelizing also speeds up the algorithm.
4. Output new shape as an alpha shape object.

Users should note that it is critical to align shapes to maximize the pipeline's success, and that there may be some manual parameter tuning for the best results.

Demonstrations for pipeline implementation are in the vignettes. Functions are broken into parts instead of integrated altogether so that users can troubleshoot the pipeline at different stages.

### Sampling New Shapes from Probability Distribution

Users an also use our package to generate shapes in two and three dimensions from a probability distribution. The tool can prove particularly useful for simulating shapes and comparing analyises of multiple methods.
Users an also use our package to generate shapes in two and three dimensions from a probability distribution. The tool can prove particularly useful for simulating shapes and comparing analyises of multiple methods. Here, we list the parameters for simulating new shapes in two and three dimensions. Options for user-adjusted parameters and defaults can be found in the vignettes. Users should keep in mind a few key points:
* The ``bound`` parameter is the manifold from which points are sampled. At this time, the package only supports a square, a circle (disk - function assumes it is filled in), and an annulus in two dimensions and a cube, sphere (ball - function assumes it is filled in), and torus in three dimensions. The size of these manifolds can be specified using the ``rmax`` parameter and the ``rmin`` parameter, where applicable. Adjusting the size may affect computational time if ``tau`` is not adjusted with it.
* The reach ``tau`` needs to be specified as a finite value in advance, as this hyperparameter affects the choice of ``alpha``. Default of ``tau`` is 1, but it can be any finite value. Keep in mind that the smaller that ``tau`` is relative to the area or volume of the manifold, the more detail in the shapes produced, but the more time it will take to produce a shape.
* By default, ``alpha`` will be as large as theoretically allowed. The smaller ``alpha`` is relative to ``tau``, the more points will need to be sampled, and the more time it will take to produce a shape. This is particularly true when the bounds of the number of points are tied to desire for connectivity/no isolated points as well as preserving the homology.
* At this time, the package only supports the truncated normal distribution for randomly selecting ``alpha``. Bounds of this truncated normal can be adjusted by the user up to what is theoretically allowed. Keep in mind that the general bounds of this distribution should keep ``alpha`` as large as possible for best computational performance.

## R Packages for ashapesampler and Tutorials

Expand Down

0 comments on commit ef8c437

Please sign in to comment.