Skip to content

Commit

Permalink
Optimize search performance and add stress test
Browse files Browse the repository at this point in the history
  • Loading branch information
yifanfeng97 committed Dec 16, 2024
1 parent 8460405 commit c097f05
Show file tree
Hide file tree
Showing 7 changed files with 263 additions and 22 deletions.
33 changes: 33 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
name: Run Tests

on:
push:
branches:
- main
pull_request:
branches:
- main

jobs:
unit-tests:
runs-on: ubuntu-latest
steps:
# Step 1: Checkout the repository code
- name: Checkout code
uses: actions/checkout@v3

# Step 2: Setup Python environment
- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: 3.10

# Step 3: Install dependencies from requirements.txt
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
# Step 4: Run unit tests using pytest
- name: Run unit tests
run: pytest tests
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ __pycache__/
*.py[cod]
*$py.class
*.DS_Store
logs/


# C extensions
Expand Down
60 changes: 53 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,15 +48,61 @@

<br>

## :dart: About ##
## :dart: About

Hypergraph-DB is a lightweight, flexible, and Python-based database designed to model and manage **hypergraphs**—a generalized graph structure where edges (hyperedges) can connect any number of vertices. This makes Hypergraph-DB an ideal solution for representing complex relationships between entities in various domains, such as knowledge graphs, social networks, and scientific data modeling.

Hypergraph-DB provides a high-level abstraction for working with vertices and hyperedges, making it easy to add, update, query, and manage hypergraph data. With built-in support for persistence, caching, and efficient operations, Hypergraph-DB simplifies the management of hypergraph data structures.

**:bar_chart: Performance Test Results**

To demonstrate the performance of **Hypergraph-DB**, let’s consider an example:

- Suppose we want to construct a **hypergraph** with **1,000,000 vertices** and **200,000 hyperedges**.
- Using Hypergraph-DB, it takes approximately:
- **1.75 seconds** to add **1,000,000 vertices**.
- **1.82 seconds** to add **200,000 hyperedges**.
- Querying this hypergraph:
- Retrieving information for **400,000 vertices** takes **0.51 seconds**.
- Retrieving information for **400,000 hyperedges** takes **2.52 seconds**.

This example demonstrates the efficiency of Hypergraph-DB, even when working with large-scale hypergraphs. Below is a detailed table showing how the performance scales as the size of the hypergraph increases.

**Detailed Performance Results**

The following table shows the results of stress tests performed on Hypergraph-DB with varying scales. The tests measure the time taken to add vertices, add hyperedges, and query vertices and hyperedges.

| **Number of Vertices** | **Number of Hyperedges** | **Add Vertices (s)** | **Add Edges (s)** | **Query Vertices (s/queries)** | **Query Edges (s/queries)** | **Total Time (s)** |
|-------------------------|--------------------------|-----------------------|-------------------|-------------------------------|----------------------------|--------------------|
| 5,000 | 1,000 | 0.01 | 0.01 | 0.00/2,000 | 0.01/2,000 | 0.02 |
| 10,000 | 2,000 | 0.01 | 0.01 | 0.00/4,000 | 0.02/4,000 | 0.05 |
| 25,000 | 5,000 | 0.03 | 0.04 | 0.01/10,000 | 0.05/10,000 | 0.13 |
| 50,000 | 10,000 | 0.06 | 0.07 | 0.02/20,000 | 0.12/20,000 | 0.26 |
| 100,000 | 20,000 | 0.12 | 0.17 | 0.04/40,000 | 0.24/40,000 | 0.58 |
| 250,000 | 50,000 | 0.35 | 0.40 | 0.11/100,000 | 0.61/100,000 | 1.47 |
| 500,000 | 100,000 | 0.85 | 1.07 | 0.22/200,000 | 1.20/200,000 | 3.34 |
| 1,000,000 | 200,000 | 1.75 | 1.82 | 0.51/400,000 | 2.52/400,000 | 6.60 |

---

**Key Observations:**

1. **Scalability**:
Hypergraph-DB scales efficiently with the number of vertices and hyperedges. The time to add vertices and hyperedges grows linearly with the size of the hypergraph.

2. **Query Performance**:
Querying vertices and hyperedges remains fast, even for large-scale hypergraphs. For instance:
- Querying **200,000 vertices** takes only **0.22 seconds**.
- Querying **200,000 hyperedges** takes only **1.20 seconds**.

3. **Total Time**:
The total time to construct and query a hypergraph with **1,000,000 vertices** and **200,000 hyperedges** is only **6.60 seconds**, showcasing the overall efficiency of Hypergraph-DB.

This performance makes **Hypergraph-DB** a great choice for applications requiring fast and scalable hypergraph data management.

---

## :sparkles: Features ##
## :sparkles: Features

:heavy_check_mark: **Flexible Hypergraph Representation**
- Supports vertices (`v`) and hyperedges (`e`), where hyperedges can connect any number of vertices.
Expand All @@ -78,7 +124,7 @@ Hypergraph-DB provides a high-level abstraction for working with vertices and hy

---

## :rocket: Installation ##
## :rocket: Installation


Hypergraph-DB is a Python library. You can install it directly from PyPI using `pip`.
Expand All @@ -100,7 +146,7 @@ pip install -r requirements.txt

---

## :checkered_flag: Starting ##
## :checkered_flag: Starting

This section provides a quick guide to get started with Hypergraph-DB, including iusage, and running basic operations. Below is an example of how to use Hypergraph-DB, based on the provided test cases.

Expand Down Expand Up @@ -174,7 +220,7 @@ print(hg.nbr_v(1)) # Output: {3, 4}
print(hg.nbr_e_of_v(1)) # Output: {(1, 3, 4)}
```

#### **6. Persistence (Save and Load)**
#### **6. Persistence (Save and Load)

```python
# Save the hypergraph to a file
Expand All @@ -190,14 +236,14 @@ print(hg2.all_e) # Output: {(1, 3, 4)}
---


## :memo: License ##
## :memo: License

Hypergraph-DB is open-source and licensed under the [Apache License 2.0](LICENSE). Feel free to use, modify, and distribute it as per the license terms.


---

## :email: Contact ##
## :email: Contact

Hypergraph-DB is maintained by [iMoon-Lab](http://moon-lab.tech/), Tsinghua University. If you have any questions, please feel free to contact us via email: [Yifan Feng](mailto:evanfeng97@gmail.com).

Expand Down
2 changes: 1 addition & 1 deletion hyperdb/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,6 @@

from ._global import AUTHOR_EMAIL

__version__ = "0.1.0"
__version__ = "0.1.1"

__all__ = {"AUTHOR_EMAIL", "BaseHypergraphDB", "HypergraphDB"}
28 changes: 14 additions & 14 deletions hyperdb/hypergraph.py
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ def encode_e(self, e_tuple: Union[List, Set, Tuple]) -> Tuple:
for v_id in tmp:
assert isinstance(v_id, Hashable), "The vertex id must be hashable."
assert (
v_id in self.all_v
v_id in self._v_data
), f"The vertex {v_id} does not exist in the hypergraph."
return tuple(tmp)

Expand Down Expand Up @@ -157,7 +157,7 @@ def add_v(self, v_id: Any, v_data: Optional[Dict] = None):
assert isinstance(v_data, dict), "The vertex data must be a dictionary."
else:
v_data = {}
if v_id not in self.all_v:
if v_id not in self._v_data:
self._v_data[v_id] = v_data
self._v_inci[v_id] = set()
else:
Expand All @@ -180,7 +180,7 @@ def add_e(self, e_tuple: Union[List, Set, Tuple], e_data: Optional[Dict] = None)
else:
e_data = {}
e_tuple = self.encode_e(e_tuple)
if e_tuple not in self.all_e:
if e_tuple not in self._e_data:
self._e_data[e_tuple] = e_data
for v in e_tuple:
self._v_inci[v].add(e_tuple)
Expand All @@ -197,7 +197,7 @@ def remove_v(self, v_id: Any):
"""
assert isinstance(v_id, Hashable), "The vertex id must be hashable."
assert (
v_id in self.all_v
v_id in self._v_data
), f"The vertex {v_id} does not exist in the hypergraph."
del self._v_data[v_id]
for e_tuple in self._v_inci[v_id]:
Expand All @@ -220,7 +220,7 @@ def remove_e(self, e_tuple: Union[List, Set, Tuple]):
), "The hyperedge must be a list, set, or tuple of vertex ids."
e_tuple = self.encode_e(e_tuple)
assert (
e_tuple in self.all_e
e_tuple in self._e_data
), f"The hyperedge {e_tuple} does not exist in the hypergraph."
for v in e_tuple:
self._v_inci[v].remove(e_tuple)
Expand All @@ -238,7 +238,7 @@ def update_v(self, v_id: Any, v_data: dict):
assert isinstance(v_id, Hashable), "The vertex id must be hashable."
assert isinstance(v_data, dict), "The vertex data must be a dictionary."
assert (
v_id in self.all_v
v_id in self._v_data
), f"The vertex {v_id} does not exist in the hypergraph."
self._v_data[v_id].update(v_data)
self._clear_cache()
Expand All @@ -257,7 +257,7 @@ def update_e(self, e_tuple: Union[List, Set, Tuple], e_data: dict):
assert isinstance(e_data, dict), "The hyperedge data must be a dictionary."
e_tuple = self.encode_e(e_tuple)
assert (
e_tuple in self.all_e
e_tuple in self._e_data
), f"The hyperedge {e_tuple} does not exist in the hypergraph."
self._e_data[e_tuple].update(e_data)
self._clear_cache()
Expand All @@ -270,7 +270,7 @@ def has_v(self, v_id: Any) -> bool:
``v_id`` (``Any``): The vertex id.
"""
assert isinstance(v_id, Hashable), "The vertex id must be hashable."
return v_id in self.all_v
return v_id in self._v_data

def has_e(self, e_tuple: Union[List, Set, Tuple]) -> bool:
r"""
Expand All @@ -286,7 +286,7 @@ def has_e(self, e_tuple: Union[List, Set, Tuple]) -> bool:
e_tuple = self.encode_e(e_tuple)
except AssertionError:
return False
return e_tuple in self.all_e
return e_tuple in self._e_data

def degree_v(self, v_id: Any) -> int:
r"""
Expand All @@ -297,7 +297,7 @@ def degree_v(self, v_id: Any) -> int:
"""
assert isinstance(v_id, Hashable), "The vertex id must be hashable."
assert (
v_id in self.all_v
v_id in self._v_data
), f"The vertex {v_id} does not exist in the hypergraph."
return len(self._v_inci[v_id])

Expand All @@ -313,7 +313,7 @@ def degree_e(self, e_tuple: Union[List, Set, Tuple]) -> int:
), "The hyperedge must be a list, set, or tuple of vertex ids."
e_tuple = self.encode_e(e_tuple)
assert (
e_tuple in self.all_e
e_tuple in self._e_data
), f"The hyperedge {e_tuple} does not exist in the hypergraph."
return len(e_tuple)

Expand All @@ -326,7 +326,7 @@ def nbr_e_of_v(self, v_id: Any) -> list:
"""
assert isinstance(v_id, Hashable), "The vertex id must be hashable."
assert (
v_id in self.all_v
v_id in self._v_data
), f"The vertex {v_id} does not exist in the hypergraph."
return set(self._v_inci[v_id])

Expand All @@ -342,7 +342,7 @@ def nbr_v_of_e(self, e_tuple: Union[List, Set, Tuple]) -> list:
), "The hyperedge must be a list, set, or tuple of vertex ids."
e_tuple = self.encode_e(e_tuple)
assert (
e_tuple in self.all_e
e_tuple in self._e_data
), f"The hyperedge {e_tuple} does not exist in the hypergraph."
return set(e_tuple)

Expand All @@ -355,7 +355,7 @@ def nbr_v(self, v_id: Any, exclude_self=True) -> list:
"""
assert isinstance(v_id, Hashable), "The vertex id must be hashable."
assert (
v_id in self.all_v
v_id in self._v_data
), f"The vertex {v_id} does not exist in the hypergraph."
nbrs = set()
for e_tuple in self._v_inci[v_id]:
Expand Down
Empty file added performance/__init__.py
Empty file.
Loading

0 comments on commit c097f05

Please sign in to comment.