Skip to content

Commit

Permalink
Docs
Browse files Browse the repository at this point in the history
  • Loading branch information
taehyounpark committed Oct 23, 2023
1 parent dfcf5f2 commit e062a40
Show file tree
Hide file tree
Showing 6 changed files with 18 additions and 20 deletions.
10 changes: 4 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,10 @@ Its key features include:

## Design goals

- **Clear interface.** Higher-level languages have a myriad of libraries available to do intuitive and efficient data analysis. The syntax here aims to achieve a similar level of abstraction in its own way.
- **Interface-only.** No implementation of a data formats or aggregation output is provided out-of-the-box. Instead, the interface allows defining operations with arbitrary inputs, execution, and outputs as needed.
- **Sensitivity analysis.** Often times, changes to an analysis need to be explored for sensitivity analysis. How many times has this required the dataset to be re-processed? With built-in handling of systematic variations, changes can their impacts retrieved all together.
- **Computational efficiency.** All operations within the dataset processing is performed at most once per-entry, only when needed. All systematic variations are processed at once. The dataset processing is multithreaded for thread-safe plugins.
- **Clear interface.** Higher-level languages have an abundance of available libraries to do intuitive and efficient data analysis. An interface with a similar level of abstraction with modern C++ syntax.
- **Customizable plugins.** Arbitrary operations with custom input(s), execution, and output(s) receive first-class treatment. From non-trivial datasets to complex computations and aggregations, there is an ABC available for implementation.
- **Sensitivity analysis.** With built-in handling of systematic variations, changes in operations can be processed *once* to retrieve all results under nominal and varied scenarios simultaneously.
- **Computational efficiency.** Operations within the dataset processing are performed at most once per-entry and only when needed. If enabled, the processing is multithreaded.

## Documentation

Expand All @@ -32,8 +32,6 @@ Its key features include:

## Installation

Requirements: Unix OS, C++17

### [Single-header](https://raw.githubusercontent.com/taehyounpark/analogical/master/analogical.h)
```cpp
#include "analogical.h"
Expand Down
6 changes: 3 additions & 3 deletions docs/features/basic.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,6 @@ table th:nth-of-type(4) {
| `selection` | A boolean/floating-point decision | `filter()` | Apply a cut. |
| | | `weight()` | Apply a statistical significance. |
| | | `channel()` | Same as filter, but remember its "path". |
| `aggregation` | Perform an action and output a result | `book()` | Book the creation of a result. |
| | | `fill()` | Perform aggregation with column value(s). |
| | | `at()` | Perform aggregation for entries passing the selection(s). |
| `aggregation` | Perform an action and output a result | `agg()` | Create an aggregation. |
| | | `fill()` | Fill with column value(s) of the entry. |
| | | `book()` | Book execution for entries passing the selection(s). |
8 changes: 4 additions & 4 deletions docs/features/column/column.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## Reading from dataset
## Read from dataset

Consider the following JSON data:
```json
Expand Down Expand Up @@ -84,7 +84,7 @@ Consider the following JSON data:
It can be opened by a dataflow:
```{ .cpp .annotate }
#include <nlohmann/json.hpp>
#include "analogical"
#include "analogical.h"

using dataflow = ana::dataflow;

Expand All @@ -109,7 +109,7 @@ auto [a, b, c] = df.open<ana::json>(data)\
1. Note the initializer braces around the column names.
!!! info "Arbitrary column types"
The interface is agnostic (ignorant, to be more precise) to the underlying column data types.
The interface is completely agnostic to the underlying column data types.
As long the `dataset::column` of a given arbitrary type is properly implemented, it can be used.
Even in the "worst" case, explicit template specialization can be used to cherry-pick how to read a specific data type.
```cpp
Expand All @@ -122,7 +122,7 @@ auto [a, b, c] = df.open<ana::json>(data)\
```cpp
auto x = ds.read<CustomData>("x"); // success!
```
## Computing from dataflow
## Compute quantities
### Simple expressions
Expand Down
10 changes: 5 additions & 5 deletions docs/home/design.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
## Promises

- **Clear interface.** Higher-level languages have an abundance of available libraries to do intuitive and efficient data analysis. The aim is to achieve a similar level of abstraction with modern C++ syntax.
- **Customizable plugins.** Custom operations with arbitrary input(s), execution, and output(s) receive first-class treatment. From non-trivial datasets to complex computations and aggregations, there is an ABC that can be implemented.
- **Sensitivity analysis.** When changes to select column(s) need to be explored for sensitivity analysis, they have often required the dataset to be re-processed each time. With built-in handling of systematic variations, the dataset is processed *once* to retrieve all results under nominal and varied scenarios together.
- **Computational efficiency.** All operations within the dataset processing is performed at most once per-entry and only when needed. The dataset processing can be multithreaded for thread-safe operations.
- **Clear interface.** Higher-level languages have an abundance of available libraries to do intuitive and efficient data analysis. An interface with a similar level of abstraction with modern C++ syntax.
- **Customizable plugins.** Arbitrary operations with custom input(s), execution, and output(s) receive first-class treatment. From non-trivial datasets to complex computations and aggregations, there is an ABC available for implementation.
- **Sensitivity analysis.** With built-in handling of systematic variations, changes in operations can be processed *once* to retrieve all results under nominal and varied scenarios simultaneously.
- **Computational efficiency.** Operations within the dataset processing are performed at most once per-entry and only when needed. If enabled, the processing is multithreaded.

## What it is *not* suited for

- Columnar analysis. `analogical` is **designed to handle non-trivial/highly-nested data types**, and the dataset processing is **inherently row-wise**. If an analysis can be expressed entirely in terms of by array(-esque) operations, e.g. [`awkward`](https://awkward-array.org/doc/main/), then those libraries with an indexing API and SIMD support will likely be cleaner and faster.
- Columnar analysis. `analogical` is **designed to handle non-trivial/highly-nested data types**, and the dataset processing is **inherently row-wise**. If an analysis can be expressed entirely in terms of by array operations, then libraries with an index-based API (and SIMD support) will be cleaner (and faster).
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ _**Ana**lysis **Logic** **A**bstraction **L**ayer_
![Version](https://img.shields.io/badge/Version-0.1.0-blue.svg)
[![Ubuntu](https://github.com/taehyounpark/analogical/actions/workflows/ubuntu.yml/badge.svg?branch=master)](https://github.com/taehyounpark/analogical/actions/workflows/ubuntu.yml)
[![macOS](https://github.com/taehyounpark/analogical/actions/workflows/macos.yml/badge.svg?branch=master)](https://github.com/taehyounpark/analogical/actions/workflows/macos.yml)
[![Documentation](https://img.shields.io/badge/mkdocs-Documentation-blue.svg)](https://opensource.org/licenses/MIT)
[![Documentation](https://img.shields.io/badge/Documentation-mkdocs-blue.svg)](https://opensource.org/licenses/MIT)
[![MIT License](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

`analogical` is a C++ library for dataset transformation.
Expand Down
2 changes: 1 addition & 1 deletion include/ana/interface/dataset_column.h
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ template <typename T> T const &ana::dataset::column<T>::value() const {
template <typename T>
void ana::dataset::column<T>::execute(const ana::dataset::range &part,
unsigned long long entry) {
this->m_entry = entry;
this->m_part = &part;
this->m_entry = entry;
this->m_updated = false;
}

0 comments on commit e062a40

Please sign in to comment.