You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Simplified and restructured content for better readability and usability. Updated badges, added concise feature descriptions, and improved example code formatting. Introduced a detailed table of contents and clarified usage instructions.
A **Python implementation** of the Generalized Sequence Pattern (GSP) algorithm for mining sequential patterns in datasets. GSP is a powerful algorithm for discovering sequences of events or items that are frequently observed, making it suitable for a wide range of domains such as market basket analysis, web usage mining, and bioinformatics.
- [📊 Explanation: Support and Results](#explanation-support-and-results)
30
+
6. [🌟 Planned Features](#planned-features)
31
+
7. [🤝 Contributing](#contributing)
32
+
8. [📝 License](#license)
33
+
9. [📖 Citation](#citation)
30
34
31
35
---
32
36
33
37
## 🔍 What is GSP?
34
38
35
-
The **Generalized Sequential Pattern (GSP)** algorithm is a sequential pattern mining technique based on **Apriori principles**. Using support thresholds, GSP identifies frequent sequences of items in transaction datasets.
39
+
The **Generalized Sequential Pattern (GSP)** algorithm is a sequential pattern mining technique based on **Apriori
40
+
principles**. Using support thresholds, GSP identifies frequent sequences of items in transaction datasets.
36
41
37
42
### Key Features:
43
+
38
44
- **Support-based pruning**: Only retains sequences that meet the minimum support threshold.
39
45
- **Candidate generation**: Iteratively generates candidate sequences of increasing length.
40
46
- **General-purpose**: Useful in retail, web analytics, social networks, temporal sequence mining, and more.
41
47
42
48
For example:
49
+
43
50
- In a shopping dataset, GSP can identify patterns like "Customers who buy bread and milk often purchase diapers next."
44
51
- In a website clickstream, GSP might find patterns like "Users visit A, then go to B, and later proceed to C."
45
52
46
53
---
47
54
48
-
## ✅ Requirements
55
+
## 🔧 Requirements
49
56
50
57
You will need Python installed on your system. On most Linux systems, you can install Python with:
51
58
@@ -55,22 +62,30 @@ sudo apt install python3
55
62
56
63
For package dependencies of GSP-Py, they will automatically be installed when using `pip`.
57
64
65
+
> [!IMPORTANT]
66
+
> GSP-Py is compatible with Python 3.11 and later versions.
67
+
> We didn't test it on Python 3.10 or earlier versions.
68
+
58
69
---
59
70
60
71
## 🚀 Installation
61
72
62
-
GSP-Py can be easily installed either by cloning the repository or using pip.
73
+
GSP-Py can be easily installed from either the **repository** or PyPI.
The algorithm will return a list of patterns with their corresponding support.
133
157
134
-
### Understanding Support
135
-
The **support** of a sequence is the fraction of total data-sequences that "contain" the sequence. For instance, if the pattern `[Bread, Milk]` appears in 3 out of 5 transactions, its support is `3 / 5 = 0.6`.
For more complex examples, find example scripts in the [`gsppy/tests`](gsppy/tests) folder.
168
+
- The **first dictionary** contains single-item sequences with their frequencies (e.g., `('Bread',): 4` means "Bread"
169
+
appears in 4 transactions).
170
+
- The **second dictionary** contains 2-item sequential patterns (e.g., `('Bread', 'Milk'): 3` means the sequence "
171
+
Bread → Milk" appears in 3 transactions).
172
+
- The **third dictionary** contains 3-item sequential patterns (e.g., `('Bread', 'Milk', 'Diaper'): 2` means the
173
+
sequence "Bread → Milk → Diaper" appears in 2 transactions).
174
+
175
+
> [!NOTE]
176
+
> The **support** of a sequence is calculated as the fraction of transactions containing the sequence, e.g.,
177
+
`[Bread, Milk]` appears in 3 out of 5 transactions → Support = `3 / 5 = 0.6` (60%).
178
+
> This insight helps identify frequently occurring sequential patterns in datasets, such as shopping trends or user
179
+
> behavior.
180
+
181
+
182
+
> [!TIP]
183
+
> For more complex examples, find example scripts in the [`gsppy/tests`](gsppy/tests) folder.
138
184
139
185
---
140
186
@@ -143,29 +189,35 @@ For more complex examples, find example scripts in the [`gsppy/tests`](gsppy/tes
143
189
We are actively working to improve GSP-Py. Here are some exciting features planned for future releases:
144
190
145
191
1. **Custom Filters for Candidate Pruning**:
146
-
- Enable users to define their own pruning logic during the mining process.
192
+
- Enable users to define their own pruning logic during the mining process.
147
193
148
194
2. **Support for Preprocessing and Postprocessing**:
149
-
- Add hooks to allow users to transform datasets before mining and customize the output results.
195
+
- Add hooks to allow users to transform datasets before mining and customize the output results.
150
196
151
197
3. **Support for Time-Constrained Pattern Mining**:
152
-
- Extend GSP-Py to handle temporal datasets by allowing users to define time constraints (e.g., maximum time gaps between events, time windows) during the sequence mining process.
153
-
- Enable candidate pruning and support calculations based on these temporal constraints.
198
+
- Extend GSP-Py to handle temporal datasets by allowing users to define time constraints (e.g., maximum time gaps
199
+
between events, time windows) during the sequence mining process.
200
+
- Enable candidate pruning and support calculations based on these temporal constraints.
154
201
155
-
Want to contribute or suggest an improvement? [Open a discussion or issue!](https://github.com/jacksonpradolima/gsp-py/issues)
202
+
Want to contribute or suggest an
203
+
improvement? [Open a discussion or issue!](https://github.com/jacksonpradolima/gsp-py/issues)
156
204
157
205
---
158
206
159
207
## 🤝 Contributing
160
208
161
-
We welcome contributions from the community! If you'd like to help improve GSP-Py, read our [CONTRIBUTING.md](CONTRIBUTING.md) guide to get started.
209
+
We welcome contributions from the community! If you'd like to help improve GSP-Py, read
210
+
our [CONTRIBUTING.md](CONTRIBUTING.md) guide to get started.
211
+
212
+
Development dependencies (e.g., testing and linting tools) are included in the `dev` category in `setup.py`. To install
213
+
these dependencies, run:
162
214
163
-
Development dependencies (e.g., testing and linting tools) are included in the `dev` category in `setup.py`. To install these dependencies, run:
164
215
```bash
165
216
pip install .[dev]
166
217
```
167
218
168
219
### General Steps:
220
+
169
221
1. Fork the repository.
170
222
2. Create a feature branch: `git checkout -b feature/my-feature`.
171
223
3. Commit your changes: `git commit -m "Add my feature."`
@@ -177,10 +229,22 @@ Looking for ideas? Check out our [Planned Features](#planned-features) section.
177
229
---
178
230
179
231
## 📝 License
232
+
180
233
This project is licensed under the terms of the **MIT License**. For more details, refer to the [LICENSE](LICENSE) file.
181
234
182
235
---
183
236
184
237
## 📖 Citation
185
238
186
239
If GSP-Py contributed to your research or project that led to a publication, we kindly ask that you cite it as follows:
240
+
241
+
```
242
+
@misc{pradolima_gsppy,
243
+
author = {Prado Lima, Jackson Antonio do},
244
+
title = {{GSP-Py - Generalized Sequence Pattern algorithm in Python}},
0 commit comments