Skip to content

Commit

Permalink
feat: Add allocation logging and visualization (#64)
Browse files Browse the repository at this point in the history
* Basic Logging in binary format

* write correct Segment size

* minor

* Fix None unwrap

* First visualization of allocations

* Storage saving binary logging

* faster bitmap getting

* minor

* visualize bitmap by storage layer

* Speed up plotting by precalculating things

* New get with output parameter, initial timestep is 1

* Experimented with unpacked storage

* Enforce max timestep

* Remove simple plot

* visual improvements

* First export to mp4

* minor

* Color disks seperately

* Validation helpers

* minor

* Calculate and Plot fragmentation

* Indicate Timestep, handle empty storage

* Type hints, formatting

* parallel fragmentation building

* Code reorganization

* parallel video exporting

* Rewrite of visualization script with better performance, memory usage and organization

* backend selection parameter for matplotlib

* Handle missing input file correctly

* Plot correct global fragmentation

* runtime plot adjustments

* remove dependency of slider for exporting

* Plot failed allocations

* CLI improvements

* plot free blocks

* use packed bits for faster plotting

* plot allocation sizes

* minor changes for visual clarity

* minor

* minor

* unaligned allocation tries

* lines to seperate segments

* allocation_log feature flag

* more complete allocation_log feature flag

* runtime allocation_log path

* information on how to visualize allocations

* moved scripts to project root

* removed allocation tries, instead count cycles
Different allocators have different or even no notion of a try. So
instead remove tries completely and introduce an allocator agnostic
metric to measure how long an allocation takes.

* proportion of cycles spent in allocator
  • Loading branch information
pzittlau authored Jan 13, 2025
1 parent 03effd6 commit 268a2ac
Show file tree
Hide file tree
Showing 8 changed files with 1,518 additions and 15 deletions.
2 changes: 2 additions & 0 deletions betree/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -83,4 +83,6 @@ figment_config = ["figment"]
latency_metrics = []
experimental-api = []
nvm = ["pmdk"]
# Log the allocations and deallocations done for later analysis
allocation_log = []

5 changes: 3 additions & 2 deletions betree/src/allocator.rs
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
//! This module provides `SegmentAllocator` and `SegmentId` for bitmap
//! allocation of 1GiB segments.
use crate::{cow_bytes::CowBytes, storage_pool::DiskOffset, vdev::Block};
use crate::{cow_bytes::CowBytes, storage_pool::DiskOffset, vdev::Block, Error};
use bitvec::prelude::*;
use byteorder::{BigEndian, ByteOrder};
use std::io::Write;

/// 256KiB, so that `vdev::BLOCK_SIZE * SEGMENT_SIZE == 1GiB`
pub const SEGMENT_SIZE: usize = 1 << SEGMENT_SIZE_LOG_2;
Expand Down Expand Up @@ -55,7 +56,7 @@ impl SegmentAllocator {
}
};
self.mark(offset, size, Action::Allocate);
Some(offset)
return Some(offset);
}

/// Allocates a block of the given `size` at `offset`.
Expand Down
119 changes: 111 additions & 8 deletions betree/src/data_management/dmu.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ use super::{
CopyOnWriteEvent, Dml, HasStoragePreference, Object, ObjectReference,
};
use crate::{
allocator::{Action, SegmentAllocator, SegmentId},
allocator::{Action, SegmentAllocator, SegmentId, SEGMENT_SIZE},
buffer::Buf,
cache::{Cache, ChangeKeyError, RemoveError},
checksum::{Builder, Checksum, State},
Expand All @@ -17,16 +17,21 @@ use crate::{
size::{Size, SizeMut, StaticSize},
storage_pool::{DiskOffset, StoragePoolLayer, NUM_STORAGE_CLASSES},
tree::{Node, PivotKey},
vdev::{Block, BLOCK_SIZE},
vdev::{Block, File, BLOCK_SIZE},
StoragePreference,
};
use byteorder::{LittleEndian, WriteBytesExt};
use crossbeam_channel::Sender;
use futures::{executor::block_on, future::ok, prelude::*};
use parking_lot::{Mutex, RwLock, RwLockReadGuard, RwLockWriteGuard};
use std::{
arch::x86_64::{__rdtscp, _rdtsc},
collections::HashMap,
fs::OpenOptions,
io::{BufWriter, Write},
mem::replace,
ops::DerefMut,
path::PathBuf,
pin::Pin,
sync::{
atomic::{AtomicU64, Ordering},
Expand Down Expand Up @@ -60,6 +65,8 @@ where
next_modified_node_id: AtomicU64,
next_disk_id: AtomicU64,
report_tx: Option<Sender<DmlMsg>>,
#[cfg(feature = "allocation_log")]
allocation_log_file: Mutex<BufWriter<std::fs::File>>,
}

impl<E, SPL> Dmu<E, SPL>
Expand All @@ -76,6 +83,7 @@ where
alloc_strategy: [[Option<u8>; NUM_STORAGE_CLASSES]; NUM_STORAGE_CLASSES],
cache: E,
handler: Handler<ObjRef<ObjectPointer<SPL::Checksum>>>,
#[cfg(feature = "allocation_log")] allocation_log_file_path: PathBuf,
) -> Self {
let allocation_data = (0..pool.storage_class_count())
.map(|class| {
Expand All @@ -87,6 +95,16 @@ where
.collect::<Vec<_>>()
.into_boxed_slice();

#[cfg(feature = "allocation_log")]
let allocation_log_file = Mutex::new(BufWriter::new(
OpenOptions::new()
.create(true)
.write(true)
.truncate(true)
.open(allocation_log_file_path)
.expect("Failed to create allocation log file"),
));

Dmu {
// default_compression_state: default_compression.new_compression().expect("Can't create compression state"),
default_compression,
Expand All @@ -103,6 +121,8 @@ where
next_modified_node_id: AtomicU64::new(1),
next_disk_id: AtomicU64::new(0),
report_tx: None,
#[cfg(feature = "allocation_log")]
allocation_log_file,
}
}

Expand All @@ -120,6 +140,36 @@ where
pub fn pool(&self) -> &SPL {
&self.pool
}

/// Writes the global header for the allocation logging.
pub fn write_global_header(&self) -> Result<(), Error> {
#[cfg(feature = "allocation_log")]
{
let mut file = self.allocation_log_file.lock();

// Number of storage classes
file.write_u8(self.pool.storage_class_count())?;

// Disks per class
for class in 0..self.pool.storage_class_count() {
let disk_count = self.pool.disk_count(class);
file.write_u16::<LittleEndian>(disk_count)?;
}

// Segments per disk
for class in 0..self.pool.storage_class_count() {
for disk in 0..self.pool.disk_count(class) {
let segment_count = self.pool.size_in_blocks(class, disk);
file.write_u64::<LittleEndian>(segment_count.as_u64())?;
}
}

// Blocks per segment (constant)
file.write_u64::<LittleEndian>(SEGMENT_SIZE.try_into().unwrap())?;
}

Ok(())
}
}

impl<E, SPL> Dmu<E, SPL>
Expand Down Expand Up @@ -201,6 +251,15 @@ where
obj_ptr.offset().disk_id(),
obj_ptr.size(),
);
#[cfg(feature = "allocation_log")]
{
let mut file = self.allocation_log_file.lock();
let _ = file.write_u8(Action::Deallocate.as_bool() as u8);
let _ = file.write_u64::<LittleEndian>(obj_ptr.offset.as_u64());
let _ = file.write_u32::<LittleEndian>(obj_ptr.size.as_u32());
let _ = file.write_u64::<LittleEndian>(0);
let _ = file.write_u64::<LittleEndian>(0);
}
if let (CopyOnWriteEvent::Removed, Some(tx), CopyOnWriteReason::Remove) = (
self.handler.copy_on_write(
obj_ptr.offset(),
Expand Down Expand Up @@ -484,6 +543,16 @@ where

let strategy = self.alloc_strategy[storage_preference as usize];

// NOTE: Could we mark classes, disks and/or segments as full to prevent looping over them?
// We would then also need to handle this, when deallocating things.
// Would full mean completely full or just not having enough contiguous memory of some
// size?
// Or save the largest contiguous memory region as a value and compare against that. For
// that the allocator needs to support that and we have to 'bubble' the largest value up.
#[cfg(feature = "allocation_log")]
let mut start_cycles_global = get_cycles();
#[cfg(feature = "allocation_log")]
let mut total_cycles_local: u64 = 0;
'class: for &class in strategy.iter().flatten() {
let disks_in_class = self.pool.disk_count(class);
if disks_in_class == 0 {
Expand Down Expand Up @@ -536,14 +605,40 @@ where

let first_seen_segment_id = *segment_id;
loop {
if let Some(segment_offset) = self
.handler
.get_allocation_bitmap(*segment_id, self)?
.access()
.allocate(size.as_u32())
// Has to be split because else the temporary value is dropped while borrowing
let bitmap = self.handler.get_allocation_bitmap(*segment_id, self)?;
let mut allocator = bitmap.access();

#[cfg(not(feature = "allocation_log"))]
{
let allocation = allocator.allocate(size.as_u32());
if let Some(segment_offset) = allocation {
let disk_offset = segment_id.disk_offset(segment_offset);
break disk_offset;
}
}
#[cfg(feature = "allocation_log")]
{
break segment_id.disk_offset(segment_offset);
let start_cycles_allocation = get_cycles();
let allocation = allocator.allocate(size.as_u32());
let end_cycles_allocation = get_cycles();
total_cycles_local += end_cycles_allocation - start_cycles_allocation;

if let Some(segment_offset) = allocation {
let disk_offset = segment_id.disk_offset(segment_offset);
let total_cycles_global = end_cycles_allocation - start_cycles_global;

let mut file = self.allocation_log_file.lock();
file.write_u8(Action::Allocate.as_bool() as u8)?;
file.write_u64::<LittleEndian>(disk_offset.as_u64())?;
file.write_u32::<LittleEndian>(size.as_u32())?;
file.write_u64::<LittleEndian>(total_cycles_local)?;
file.write_u64::<LittleEndian>(total_cycles_global)?;

break disk_offset;
}
}

let next_segment_id = segment_id.next(disk_size);
trace!(
"Next allocator segment: {:?} -> {:?} ({:?})",
Expand Down Expand Up @@ -1031,3 +1126,11 @@ where
self.report_tx = Some(tx);
}
}

fn get_cycles() -> u64 {
unsafe {
//let mut aux = 0;
//__rdtscp(aux)
_rdtsc()
}
}
11 changes: 10 additions & 1 deletion betree/src/database/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ use serde::{de::DeserializeOwned, Deserialize, Serialize};
use std::{
collections::HashMap,
iter::FromIterator,
path::Path,
path::{Path, PathBuf},
sync::{
atomic::{AtomicU64, Ordering},
Arc,
Expand Down Expand Up @@ -147,6 +147,9 @@ pub struct DatabaseConfiguration {

/// If and how to log database metrics
pub metrics: Option<MetricsConfiguration>,

/// Where to log the allocations
pub allocation_log_file_path: PathBuf,
}

impl Default for DatabaseConfiguration {
Expand All @@ -162,6 +165,7 @@ impl Default for DatabaseConfiguration {
sync_interval_ms: Some(DEFAULT_SYNC_INTERVAL_MS),
metrics: None,
migration_policy: None,
allocation_log_file_path: PathBuf::from("allocation_log.bin"),
}
}
}
Expand Down Expand Up @@ -237,6 +241,8 @@ impl DatabaseConfiguration {
strategy,
ClockCache::new(self.cache_size),
handler,
#[cfg(feature = "allocation_log")]
self.allocation_log_file_path.clone(),
)
}

Expand Down Expand Up @@ -432,6 +438,9 @@ impl Database {
dmu.set_report(tx.clone());
}

#[cfg(feature = "allocation_log")]
dmu.write_global_header()?;

let (tree, root_ptr) = builder.select_root_tree(Arc::new(dmu))?;

*tree.dmu().handler().current_generation.lock_write() = root_ptr.generation().next();
Expand Down
5 changes: 1 addition & 4 deletions betree/src/tree/imp/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -393,14 +393,11 @@ where
self.msg_action().apply(key, &msg, &mut tmp);
}

// This may never be false.
let data = tmp.unwrap();

drop(node);
if self.evict {
self.dml.evict()?;
}
Ok(Some((info, data)))
Ok(tmp.map(|data| (info, data)))
}
}
}
Expand Down
37 changes: 37 additions & 0 deletions scripts/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Allocation Log Visualization

This script visualizes the allocation and deallocation of blocks within the key-value database. It helps to understand how storage space is being used and identify potential optimization opportunities.

The allocation log visualization script is tested with Python 3.12.7 and the packages listed in `requirements.txt`.

The main dependencies are matplotlib, tqdm and sortedcontainers.

## Setup

Run the following to create a working environment for the script:

```bash
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install -r scripts/requirements.txt
```

## Generating the Allocation Log

To generate the `allocation_log.bin` file, you need to enable the allocation_log feature flag when compiling the `betree` crate. For instance by running
```bash
cargo build --features allocation_log
```
or by enabling it in the `Cargo.toml`.

The path where the log is saved can be set with the runtime configuration parameter `allocation_log_file_path`. The default is `$PWD/allocation_log.bin`

## Using the Allocation Log

Once a log file has been obtained simply run the following to visualize the (de-)allocations recorded.
```bash
./scripts/visualize_allocation_log allocation_log.bin
```

To get help and see the options available run the script with the `-h` flag.

13 changes: 13 additions & 0 deletions scripts/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
contourpy==1.3.1
cycler==0.12.1
fonttools==4.55.3
kiwisolver==1.4.7
matplotlib==3.9.3
numpy==2.2.0
packaging==24.2
pillow==11.0.0
pyparsing==3.2.0
python-dateutil==2.9.0.post0
six==1.17.0
sortedcontainers==2.4.0
tqdm==4.67.1
Loading

0 comments on commit 268a2ac

Please sign in to comment.