From e062a40074803606ac92d3931d9607619b2fea44 Mon Sep 17 00:00:00 2001 From: taehyounpark <taehyounpark@icloud.com> Date: Mon, 23 Oct 2023 11:26:29 -0400 Subject: [PATCH] Docs --- README.md | 10 ++++------ docs/features/basic.md | 6 +++--- docs/features/column/column.md | 8 ++++---- docs/home/design.md | 10 +++++----- docs/index.md | 2 +- include/ana/interface/dataset_column.h | 2 +- 6 files changed, 18 insertions(+), 20 deletions(-) diff --git a/README.md b/README.md index af7a624d..2882be1c 100644 --- a/README.md +++ b/README.md @@ -20,10 +20,10 @@ Its key features include: ## Design goals -- **Clear interface.** Higher-level languages have a myriad of libraries available to do intuitive and efficient data analysis. The syntax here aims to achieve a similar level of abstraction in its own way. -- **Interface-only.** No implementation of a data formats or aggregation output is provided out-of-the-box. Instead, the interface allows defining operations with arbitrary inputs, execution, and outputs as needed. -- **Sensitivity analysis.** Often times, changes to an analysis need to be explored for sensitivity analysis. How many times has this required the dataset to be re-processed? With built-in handling of systematic variations, changes can their impacts retrieved all together. -- **Computational efficiency.** All operations within the dataset processing is performed at most once per-entry, only when needed. All systematic variations are processed at once. The dataset processing is multithreaded for thread-safe plugins. +- **Clear interface.** Higher-level languages have an abundance of available libraries to do intuitive and efficient data analysis. An interface with a similar level of abstraction with modern C++ syntax. +- **Customizable plugins.** Arbitrary operations with custom input(s), execution, and output(s) receive first-class treatment. From non-trivial datasets to complex computations and aggregations, there is an ABC available for implementation. +- **Sensitivity analysis.** With built-in handling of systematic variations, changes in operations can be processed *once* to retrieve all results under nominal and varied scenarios simultaneously. +- **Computational efficiency.** Operations within the dataset processing are performed at most once per-entry and only when needed. If enabled, the processing is multithreaded. ## Documentation @@ -32,8 +32,6 @@ Its key features include: ## Installation -Requirements: Unix OS, C++17 - ### [Single-header](https://raw.githubusercontent.com/taehyounpark/analogical/master/analogical.h) ```cpp #include "analogical.h" diff --git a/docs/features/basic.md b/docs/features/basic.md index 8fbb33f9..bad8aee7 100644 --- a/docs/features/basic.md +++ b/docs/features/basic.md @@ -30,6 +30,6 @@ table th:nth-of-type(4) { | `selection` | A boolean/floating-point decision | `filter()` | Apply a cut. | | | | `weight()` | Apply a statistical significance. | | | | `channel()` | Same as filter, but remember its "path". | -| `aggregation` | Perform an action and output a result | `book()` | Book the creation of a result. | -| | | `fill()` | Perform aggregation with column value(s). | -| | | `at()` | Perform aggregation for entries passing the selection(s). | +| `aggregation` | Perform an action and output a result | `agg()` | Create an aggregation. | +| | | `fill()` | Fill with column value(s) of the entry. | +| | | `book()` | Book execution for entries passing the selection(s). | diff --git a/docs/features/column/column.md b/docs/features/column/column.md index 45343277..95416e5b 100644 --- a/docs/features/column/column.md +++ b/docs/features/column/column.md @@ -1,4 +1,4 @@ -## Reading from dataset +## Read from dataset Consider the following JSON data: ```json @@ -84,7 +84,7 @@ Consider the following JSON data: It can be opened by a dataflow: ```{ .cpp .annotate } #include <nlohmann/json.hpp> -#include "analogical" +#include "analogical.h" using dataflow = ana::dataflow; @@ -109,7 +109,7 @@ auto [a, b, c] = df.open<ana::json>(data)\ 1. Note the initializer braces around the column names. !!! info "Arbitrary column types" - The interface is agnostic (ignorant, to be more precise) to the underlying column data types. + The interface is completely agnostic to the underlying column data types. As long the `dataset::column` of a given arbitrary type is properly implemented, it can be used. Even in the "worst" case, explicit template specialization can be used to cherry-pick how to read a specific data type. ```cpp @@ -122,7 +122,7 @@ auto [a, b, c] = df.open<ana::json>(data)\ ```cpp auto x = ds.read<CustomData>("x"); // success! ``` -## Computing from dataflow +## Compute quantities ### Simple expressions diff --git a/docs/home/design.md b/docs/home/design.md index f0014fc6..98d3180a 100644 --- a/docs/home/design.md +++ b/docs/home/design.md @@ -1,10 +1,10 @@ ## Promises -- **Clear interface.** Higher-level languages have an abundance of available libraries to do intuitive and efficient data analysis. The aim is to achieve a similar level of abstraction with modern C++ syntax. -- **Customizable plugins.** Custom operations with arbitrary input(s), execution, and output(s) receive first-class treatment. From non-trivial datasets to complex computations and aggregations, there is an ABC that can be implemented. -- **Sensitivity analysis.** When changes to select column(s) need to be explored for sensitivity analysis, they have often required the dataset to be re-processed each time. With built-in handling of systematic variations, the dataset is processed *once* to retrieve all results under nominal and varied scenarios together. -- **Computational efficiency.** All operations within the dataset processing is performed at most once per-entry and only when needed. The dataset processing can be multithreaded for thread-safe operations. +- **Clear interface.** Higher-level languages have an abundance of available libraries to do intuitive and efficient data analysis. An interface with a similar level of abstraction with modern C++ syntax. +- **Customizable plugins.** Arbitrary operations with custom input(s), execution, and output(s) receive first-class treatment. From non-trivial datasets to complex computations and aggregations, there is an ABC available for implementation. +- **Sensitivity analysis.** With built-in handling of systematic variations, changes in operations can be processed *once* to retrieve all results under nominal and varied scenarios simultaneously. +- **Computational efficiency.** Operations within the dataset processing are performed at most once per-entry and only when needed. If enabled, the processing is multithreaded. ## What it is *not* suited for -- Columnar analysis. `analogical` is **designed to handle non-trivial/highly-nested data types**, and the dataset processing is **inherently row-wise**. If an analysis can be expressed entirely in terms of by array(-esque) operations, e.g. [`awkward`](https://awkward-array.org/doc/main/), then those libraries with an indexing API and SIMD support will likely be cleaner and faster. \ No newline at end of file +- Columnar analysis. `analogical` is **designed to handle non-trivial/highly-nested data types**, and the dataset processing is **inherently row-wise**. If an analysis can be expressed entirely in terms of by array operations, then libraries with an index-based API (and SIMD support) will be cleaner (and faster). \ No newline at end of file diff --git a/docs/index.md b/docs/index.md index 0b504df7..957110bf 100644 --- a/docs/index.md +++ b/docs/index.md @@ -10,7 +10,7 @@ _**Ana**lysis **Logic** **A**bstraction **L**ayer_ ![Version](https://img.shields.io/badge/Version-0.1.0-blue.svg) [![Ubuntu](https://github.com/taehyounpark/analogical/actions/workflows/ubuntu.yml/badge.svg?branch=master)](https://github.com/taehyounpark/analogical/actions/workflows/ubuntu.yml) [![macOS](https://github.com/taehyounpark/analogical/actions/workflows/macos.yml/badge.svg?branch=master)](https://github.com/taehyounpark/analogical/actions/workflows/macos.yml) -[![Documentation](https://img.shields.io/badge/mkdocs-Documentation-blue.svg)](https://opensource.org/licenses/MIT) +[![Documentation](https://img.shields.io/badge/Documentation-mkdocs-blue.svg)](https://opensource.org/licenses/MIT) [![MIT License](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) `analogical` is a C++ library for dataset transformation. diff --git a/include/ana/interface/dataset_column.h b/include/ana/interface/dataset_column.h index 815eeb34..04edb2ac 100644 --- a/include/ana/interface/dataset_column.h +++ b/include/ana/interface/dataset_column.h @@ -63,7 +63,7 @@ template <typename T> T const &ana::dataset::column<T>::value() const { template <typename T> void ana::dataset::column<T>::execute(const ana::dataset::range &part, unsigned long long entry) { - this->m_entry = entry; this->m_part = ∂ + this->m_entry = entry; this->m_updated = false; } \ No newline at end of file