Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,3 +95,8 @@ When necessary, commands can be executed for the Rust code.
- `cargo install --path rust/rubydex-mcp`: installs the MCP server binary
- `bundle exec rake lint_rust`: lints the Rust code
- `bundle exec rake format_rust`: auto formats the Rust code

### Benchmarking

When verifying the performance of implementations, use the `utils/bench` script to get statistics. The user should have
configured a `DEFAULT_BENCH_WORKSPACE`. If not, prompt them to do so.
100 changes: 62 additions & 38 deletions utils/bench
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,17 @@
# improvements or regressions
#
# Usage:
# utils/bench # defaults to huge corpus
# utils/bench # uses $DEFAULT_BENCH_WORKSPACE
# utils/bench tiny # uses tiny corpus (scale 0.1)
# utils/bench small # uses small corpus (scale 1)
# utils/bench medium # uses medium corpus (scale 10)
# utils/bench large # uses large corpus (scale 100)
# utils/bench huge # uses huge corpus (scale 1000)
# utils/bench /path/to/dir # uses existing directory
#
# Environment variables:
# DEFAULT_BENCH_WORKSPACE # path to a codebase used when no argument is given
#
# What this script does:
# 1. Generate the corpus if it doesn't exist (for predefined sizes only)
# 2. Build rubydex_cli in release mode
Expand All @@ -20,45 +23,66 @@
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROJECT_ROOT="$(dirname "$SCRIPT_DIR")"

# Get corpus size from argument or default to huge
CORPUS_SIZE="${1:-huge}"

# Corpora base directory - outside project to avoid triggering LSP
CORPUS_BASE="$(dirname "$PROJECT_ROOT")/rubydex_corpora"

# Determine corpus path and scale based on input
case "$CORPUS_SIZE" in
tiny)
CORPUS_PATH="$CORPUS_BASE/tiny"
SCALE=0.1
;;
small)
CORPUS_PATH="$CORPUS_BASE/small"
SCALE=1
;;
medium)
CORPUS_PATH="$CORPUS_BASE/medium"
SCALE=10
;;
large)
CORPUS_PATH="$CORPUS_BASE/large"
SCALE=100
;;
huge)
CORPUS_PATH="$CORPUS_BASE/huge"
SCALE=1000
;;
*)
# Assume it's a path to an existing directory
CORPUS_PATH="$CORPUS_SIZE"
if [ ! -d "$CORPUS_PATH" ]; then
echo "Error: Directory '$CORPUS_PATH' does not exist"
echo "Usage: $0 [tiny|small|medium|large|huge|/path/to/existing/dir]"
exit 1
fi
SCALE=""
;;
esac
# If no argument given, use DEFAULT_BENCH_WORKSPACE
if [ -z "$1" ]; then
if [ -z "$DEFAULT_BENCH_WORKSPACE" ]; then
echo "Error: No argument given and DEFAULT_BENCH_WORKSPACE is not set"
echo ""
echo "Either pass an argument:"
echo " $0 [tiny|small|medium|large|huge|/path/to/existing/dir]"
echo ""
echo "Or set DEFAULT_BENCH_WORKSPACE in your shell config:"
echo " export DEFAULT_BENCH_WORKSPACE=/path/to/your/codebase"
exit 1
fi

if [ ! -d "$DEFAULT_BENCH_WORKSPACE" ]; then
echo "Error: DEFAULT_BENCH_WORKSPACE directory '$DEFAULT_BENCH_WORKSPACE' does not exist"
exit 1
fi

CORPUS_PATH="$DEFAULT_BENCH_WORKSPACE"
SCALE=""
else
CORPUS_SIZE="$1"

# Determine corpus path and scale based on input
case "$CORPUS_SIZE" in
tiny)
CORPUS_PATH="$CORPUS_BASE/tiny"
SCALE=0.1
;;
small)
CORPUS_PATH="$CORPUS_BASE/small"
SCALE=1
;;
medium)
CORPUS_PATH="$CORPUS_BASE/medium"
SCALE=10
;;
large)
CORPUS_PATH="$CORPUS_BASE/large"
SCALE=100
;;
huge)
CORPUS_PATH="$CORPUS_BASE/huge"
SCALE=1000
;;
*)
# Assume it's a path to an existing directory
CORPUS_PATH="$CORPUS_SIZE"
if [ ! -d "$CORPUS_PATH" ]; then
echo "Error: Directory '$CORPUS_PATH' does not exist"
echo "Usage: $0 [tiny|small|medium|large|huge|/path/to/existing/dir]"
exit 1
fi
SCALE=""
;;
esac
fi

# Only generate corpus for predefined sizes
if [ -n "$SCALE" ] && [ ! -d "$CORPUS_PATH" ]; then
Expand All @@ -69,6 +93,6 @@ fi
echo "Benchmarking with corpus: $CORPUS_PATH"
echo "----------------------------------------"

cd "$PROJECT_ROOT/rust" || exit 1
cd "$PROJECT_ROOT/rust/rubydex" || exit 1
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just to speed up compilation. The MCP server takes a while to compile in release mode and it is not required for this bench script.

cargo build --release
$PROJECT_ROOT/utils/mem-use "$PROJECT_ROOT/rust/target/release/rubydex_cli" "$CORPUS_PATH" --stats
Loading