This is an interview challenge that tests your ability to optimize a graph structure for a database system that uses random walks to find data.
BogoDB is a database where:
- Data is stored as nodes in a weighted graph
- Queries use random walks guided by edge weights to find target nodes
- Each query performs multiple random walks to find the target node
- The goal is to improve both success rate and path efficiency
Your task is to optimize the graph structure for better query performance:
- Fork this repository
- Implement your optimization strategy in
candidate_submission/optimize_graph.py - Document your approach directly in this README (see "Your Solution" section below)
- Submit your repository URL as your solution
# Clone your fork of the repository
git clone https://github.com/YOUR-USERNAME/bogoDB.git
cd bogoDB
# Create a virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Run the full pipeline
./run_all.shThe system uses these constants (defined in scripts/constants.py):
| Parameter | Value | Description |
|---|---|---|
| NUM_NODES | 500 | Total number of nodes in the graph |
| MAX_EDGES_PER_NODE | 3 | Maximum outgoing edges per node |
| MAX_TOTAL_EDGES | 1000 | Maximum total edges in the graph |
| NUM_QUERIES | 200 | Number of queries in the test set |
| MAX_DEPTH | 10000 | Maximum steps in a random walk |
| NUM_WALKS_PER_QUERY | 10 | Number of random walks per query |
- Nodes: Simple integers from 0 to 499
- Edges: Weighted connections (weights control probability of traversal)
- Queries: Generated using an exponential distribution (some nodes queried more often)
- Random Walks: Start at random nodes and follow weighted edges
- Success: A query is successful if it reaches the target before max depth
Your task is to complete the optimize_graph function in candidate_submission/optimize_graph.py:
def optimize_graph(initial_graph, results, ...):
# TODO: Implement your optimization strategy here
# ...
return optimized_graphYour optimized graph must adhere to:
- Maximum total edges: 1000 (2 * number of nodes)
- Maximum outgoing edges per node: 3
- All 500 nodes must be present in the graph
- Edge weights: Must be positive and ≤ 10
- No self-loops are allowed (a node cannot connect to itself)
The evaluation script will reject graphs that don't meet these constraints.
Your solution will be evaluated based on:
-
Approach and Reasoning (primary)
- How you analyze the problem
- Your optimization strategies
- The trade-offs you consider
-
Performance (secondary)
- Success rate: % of queries that find their target
- Path length: Shorter paths are better
- A combined score that balances both metrics
- Analyze the query pattern in the initial results
- Consider which nodes are frequently queried
- Think about the balance between exploration and exploitation
- Random walks behave differently than shortest-path algorithms
- Both success rate and path length matter, but they often trade off
# Run the full pipeline
./run_all.sh
# Or run just your optimizer
python candidate_submission/optimize_graph.py
# Then evaluate your solution
python scripts/evaluate_graph.pyThe evaluation will compare your optimized graph against the initial random graph.
To submit your solution:
- Implement your optimization strategy in
optimize_graph.py - Run the full pipeline (
./run_all.sh) to test your solution - Update the "Your Solution" section below with your documentation
- Commit and push your changes to your forked repository
- Submit the URL to your forked repository
We expect you to iterate on multiple approaches before arriving at your final solution. In your documentation, include insights from your journey - what worked, what didn't, and how your understanding evolved.
Write your solution to solution.md.