Skip to content

Commit 757da80

Browse files
Refactor README to enhance clarity and structure.
Simplified and restructured content for better readability and usability. Updated badges, added concise feature descriptions, and improved example code formatting. Introduced a detailed table of contents and clarified usage instructions.
1 parent 9f5a515 commit 757da80

File tree

1 file changed

+104
-40
lines changed

1 file changed

+104
-40
lines changed

README.md

Lines changed: 104 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,6 @@
1-
[!["Buy Me A Coffee"](https://www.buymeacoffee.com/assets/img/custom_images/orange_img.png)](https://www.buymeacoffee.com/pradolima)
2-
[![SonarCloud](https://sonarcloud.io/images/project_badges/sonarcloud-white.svg)](https://sonarcloud.io/summary/new_code?id=jacksonpradolima_gps-py)
3-
4-
5-
# GSP-Py
6-
A **Python implementation** of the Generalized Sequence Pattern (GSP) algorithm for mining sequential patterns in datasets. GSP is a powerful algorithm for discovering sequences of events or items that are frequently observed, making it suitable for a wide range of domains such as market basket analysis, web usage mining, and bioinformatics.
7-
81
[![PyPI License](https://img.shields.io/pypi/l/gsppy.svg?style=flat-square)]()
9-
![](https://img.shields.io/badge/python-3.11.4+-blue.svg)
2+
[![PyPI Downloads](https://img.shields.io/pypi/dm/gsppy.svg?style=flat-square)](https://pypi.org/project/gsppy/)
3+
![](https://img.shields.io/badge/python-3.11+-blue.svg)
104
[![DOI](https://zenodo.org/badge/108451832.svg)](https://zenodo.org/badge/latestdoi/108451832)
115

126
[![Bugs](https://sonarcloud.io/api/project_badges/measure?project=jacksonpradolima_gsp-py&metric=bugs)](https://sonarcloud.io/summary/new_code?id=jacksonpradolima_gsp-py)
@@ -15,37 +9,50 @@ A **Python implementation** of the Generalized Sequence Pattern (GSP) algorithm
159
[![Maintainability Rating](https://sonarcloud.io/api/project_badges/measure?project=jacksonpradolima_gsp-py&metric=sqale_rating)](https://sonarcloud.io/summary/new_code?id=jacksonpradolima_gsp-py)
1610
[![codecov](https://codecov.io/github/jacksonpradolima/gsp-py/branch/main/graph/badge.svg?token=BW04LB0B5Y)](https://codecov.io/github/jacksonpradolima/gsp-py)
1711

12+
# GSP-Py
13+
14+
**GSP-Py**: A Python-powered library to mine sequential patterns in large datasets, based on the robust **Generalized
15+
Sequence Pattern (GSP)** algorithm. Ideal for market basket analysis, temporal mining, and user journey discovery.
16+
1817
---
1918

2019
## 📚 Table of Contents
21-
- [What is GSP?](#what-is-gsp)
22-
- [Requirements](#requirements)
23-
- [Installation](#installation)
24-
- [Developer Installation](#developer-installation)
25-
- [Usage](#usage)
26-
- [Planned Features](#planned-features)
27-
- [Contributing](#contributing)
28-
- [License](#license)
29-
- [Citation](#citation)
20+
21+
1. [🔍 What is GSP?](#what-is-gsp)
22+
2. [🔧 Requirements](#requirements)
23+
3. [🚀 Installation](#installation)
24+
- [❖ Clone Repository](#option-1-clone-the-repository)
25+
- [❖ Install via PyPI](#option-2-install-via-pip)
26+
4. [🛠️ Developer Installation](#developer-installation)
27+
5. [💡 Usage](#usage)
28+
- [✅ Example: Analyzing Sales Data](#example-analyzing-sales-data)
29+
- [📊 Explanation: Support and Results](#explanation-support-and-results)
30+
6. [🌟 Planned Features](#planned-features)
31+
7. [🤝 Contributing](#contributing)
32+
8. [📝 License](#license)
33+
9. [📖 Citation](#citation)
3034

3135
---
3236

3337
## 🔍 What is GSP?
3438

35-
The **Generalized Sequential Pattern (GSP)** algorithm is a sequential pattern mining technique based on **Apriori principles**. Using support thresholds, GSP identifies frequent sequences of items in transaction datasets.
39+
The **Generalized Sequential Pattern (GSP)** algorithm is a sequential pattern mining technique based on **Apriori
40+
principles**. Using support thresholds, GSP identifies frequent sequences of items in transaction datasets.
3641

3742
### Key Features:
43+
3844
- **Support-based pruning**: Only retains sequences that meet the minimum support threshold.
3945
- **Candidate generation**: Iteratively generates candidate sequences of increasing length.
4046
- **General-purpose**: Useful in retail, web analytics, social networks, temporal sequence mining, and more.
4147

4248
For example:
49+
4350
- In a shopping dataset, GSP can identify patterns like "Customers who buy bread and milk often purchase diapers next."
4451
- In a website clickstream, GSP might find patterns like "Users visit A, then go to B, and later proceed to C."
4552

4653
---
4754

48-
## Requirements
55+
## 🔧 Requirements
4956

5057
You will need Python installed on your system. On most Linux systems, you can install Python with:
5158

@@ -55,22 +62,30 @@ sudo apt install python3
5562

5663
For package dependencies of GSP-Py, they will automatically be installed when using `pip`.
5764

65+
> [!IMPORTANT]
66+
> GSP-Py is compatible with Python 3.11 and later versions.
67+
> We didn't test it on Python 3.10 or earlier versions.
68+
5869
---
5970

6071
## 🚀 Installation
6172

62-
GSP-Py can be easily installed either by cloning the repository or using pip.
73+
GSP-Py can be easily installed from either the **repository** or PyPI.
6374

6475
### Option 1: Clone the Repository
76+
6577
To manually clone the repository and install:
78+
6679
```bash
6780
git clone https://github.com/jacksonpradolima/gsp-py.git
6881
cd gsp-py
6982
python setup.py install
7083
```
7184

7285
### Option 2: Install via `pip`
86+
7387
Alternatively, install GSP-Py from PyPI with:
88+
7489
```bash
7590
pip install gsppy
7691
```
@@ -79,9 +94,11 @@ pip install gsppy
7994

8095
## 🛠️ Developer Installation
8196

82-
For contributors and developers, GSP-Py provides additional dependencies for development purposes (e.g., testing and linting).
97+
For contributors and developers, GSP-Py provides additional dependencies for development purposes (e.g., testing and
98+
linting).
8399

84100
To install the package along with development dependencies, use:
101+
85102
```bash
86103
pip install .[dev]
87104
```
@@ -90,10 +107,13 @@ The `dev` category includes tools such as `pytest`, `pylint`, and others to ensu
90107

91108
## 💡 Usage
92109

93-
The library is designed to be easy to use and integrate with your own projects. Below is an example of how you can configure and run GSP-Py.
110+
The library is designed to be easy to use and integrate with your own projects. Below is an example of how you can
111+
configure and run GSP-Py.
94112

95113
### Example Input Data
114+
96115
The input to the algorithm is a sequence of transactions, where each transaction contains a sequence of items:
116+
97117
```python
98118
transactions = [
99119
['Bread', 'Milk'],
@@ -105,20 +125,23 @@ transactions = [
105125
```
106126

107127
### Importing and Initializing the GSP Algorithm
108-
Import the `GSP` class from the `gsppy` package and call the `search` method to find frequent patterns with a support threshold (e.g., `0.3`):
128+
129+
Import the `GSP` class from the `gsppy` package and call the `search` method to find frequent patterns with a support
130+
threshold (e.g., `0.3`):
131+
109132
```python
110133
from gsppy.gsp import GSP
111134

112-
# Define the input data
135+
# Example transactions: customer purchases
113136
transactions = [
114-
['Bread', 'Milk'],
115-
['Bread', 'Diaper', 'Beer', 'Eggs'],
116-
['Milk', 'Diaper', 'Beer', 'Coke'],
117-
['Bread', 'Milk', 'Diaper', 'Beer'],
118-
['Bread', 'Milk', 'Diaper', 'Coke']
137+
['Bread', 'Milk'], # Transaction 1
138+
['Bread', 'Diaper', 'Beer', 'Eggs'], # Transaction 2
139+
['Milk', 'Diaper', 'Beer', 'Coke'], # Transaction 3
140+
['Bread', 'Milk', 'Diaper', 'Beer'], # Transaction 4
141+
['Bread', 'Milk', 'Diaper', 'Coke'] # Transaction 5
119142
]
120143

121-
# Minimum support set to 30%
144+
# Set minimum support threshold (30%)
122145
min_support = 0.3
123146

124147
# Find frequent patterns
@@ -129,12 +152,35 @@ print(result)
129152
```
130153

131154
### Output
155+
132156
The algorithm will return a list of patterns with their corresponding support.
133157

134-
### Understanding Support
135-
The **support** of a sequence is the fraction of total data-sequences that "contain" the sequence. For instance, if the pattern `[Bread, Milk]` appears in 3 out of 5 transactions, its support is `3 / 5 = 0.6`.
158+
Sample Output:
159+
160+
```python
161+
[
162+
{('Bread',): 4, ('Milk',): 4, ('Diaper',): 4, ('Beer',): 3, ('Coke',): 2},
163+
{('Bread', 'Milk'): 3, ('Milk', 'Diaper'): 3, ('Diaper', 'Beer'): 3},
164+
{('Bread', 'Milk', 'Diaper'): 2, ('Milk', 'Diaper', 'Beer'): 2}
165+
]
166+
```
136167

137-
For more complex examples, find example scripts in the [`gsppy/tests`](gsppy/tests) folder.
168+
- The **first dictionary** contains single-item sequences with their frequencies (e.g., `('Bread',): 4` means "Bread"
169+
appears in 4 transactions).
170+
- The **second dictionary** contains 2-item sequential patterns (e.g., `('Bread', 'Milk'): 3` means the sequence "
171+
Bread → Milk" appears in 3 transactions).
172+
- The **third dictionary** contains 3-item sequential patterns (e.g., `('Bread', 'Milk', 'Diaper'): 2` means the
173+
sequence "Bread → Milk → Diaper" appears in 2 transactions).
174+
175+
> [!NOTE]
176+
> The **support** of a sequence is calculated as the fraction of transactions containing the sequence, e.g.,
177+
`[Bread, Milk]` appears in 3 out of 5 transactions → Support = `3 / 5 = 0.6` (60%).
178+
> This insight helps identify frequently occurring sequential patterns in datasets, such as shopping trends or user
179+
> behavior.
180+
181+
182+
> [!TIP]
183+
> For more complex examples, find example scripts in the [`gsppy/tests`](gsppy/tests) folder.
138184

139185
---
140186

@@ -143,29 +189,35 @@ For more complex examples, find example scripts in the [`gsppy/tests`](gsppy/tes
143189
We are actively working to improve GSP-Py. Here are some exciting features planned for future releases:
144190

145191
1. **Custom Filters for Candidate Pruning**:
146-
- Enable users to define their own pruning logic during the mining process.
192+
- Enable users to define their own pruning logic during the mining process.
147193

148194
2. **Support for Preprocessing and Postprocessing**:
149-
- Add hooks to allow users to transform datasets before mining and customize the output results.
195+
- Add hooks to allow users to transform datasets before mining and customize the output results.
150196

151197
3. **Support for Time-Constrained Pattern Mining**:
152-
- Extend GSP-Py to handle temporal datasets by allowing users to define time constraints (e.g., maximum time gaps between events, time windows) during the sequence mining process.
153-
- Enable candidate pruning and support calculations based on these temporal constraints.
198+
- Extend GSP-Py to handle temporal datasets by allowing users to define time constraints (e.g., maximum time gaps
199+
between events, time windows) during the sequence mining process.
200+
- Enable candidate pruning and support calculations based on these temporal constraints.
154201

155-
Want to contribute or suggest an improvement? [Open a discussion or issue!](https://github.com/jacksonpradolima/gsp-py/issues)
202+
Want to contribute or suggest an
203+
improvement? [Open a discussion or issue!](https://github.com/jacksonpradolima/gsp-py/issues)
156204

157205
---
158206

159207
## 🤝 Contributing
160208

161-
We welcome contributions from the community! If you'd like to help improve GSP-Py, read our [CONTRIBUTING.md](CONTRIBUTING.md) guide to get started.
209+
We welcome contributions from the community! If you'd like to help improve GSP-Py, read
210+
our [CONTRIBUTING.md](CONTRIBUTING.md) guide to get started.
211+
212+
Development dependencies (e.g., testing and linting tools) are included in the `dev` category in `setup.py`. To install
213+
these dependencies, run:
162214

163-
Development dependencies (e.g., testing and linting tools) are included in the `dev` category in `setup.py`. To install these dependencies, run:
164215
```bash
165216
pip install .[dev]
166217
```
167218

168219
### General Steps:
220+
169221
1. Fork the repository.
170222
2. Create a feature branch: `git checkout -b feature/my-feature`.
171223
3. Commit your changes: `git commit -m "Add my feature."`
@@ -177,10 +229,22 @@ Looking for ideas? Check out our [Planned Features](#planned-features) section.
177229
---
178230

179231
## 📝 License
232+
180233
This project is licensed under the terms of the **MIT License**. For more details, refer to the [LICENSE](LICENSE) file.
181234

182235
---
183236

184237
## 📖 Citation
185238

186239
If GSP-Py contributed to your research or project that led to a publication, we kindly ask that you cite it as follows:
240+
241+
```
242+
@misc{pradolima_gsppy,
243+
author = {Prado Lima, Jackson Antonio do},
244+
title = {{GSP-Py - Generalized Sequence Pattern algorithm in Python}},
245+
month = May,
246+
year = 2020,
247+
doi = {10.5281/zenodo.3333987},
248+
url = {https://doi.org/10.5281/zenodo.3333987}
249+
}
250+
```

0 commit comments

Comments
 (0)