feat: optimizing the prune function at the apriori_algorithm.py archive #12992

joaoneto9 · 2025-09-24T18:48:08Z

Describe your change:

Added an optimized version of the prune function using Counter to improve performance
when checking candidate itemsets for frequent items.

I used as a test base a gradually increasing size of the itemset list to demonstrate
the inefficiency of the original algorithm, which had a complexity of O(n * c * i),
where n is the size of itemset, c is the number of candidates, and i is the number of
items in each candidate.

The new solution reduces the complexity to O(n + c * i). Previously, the algorithm would
iterate over itemset (O(n)) and count occurrences for each item (O(n)) every time it
needed to check a candidate, resulting in repeated costly operations.

To optimize this, I used an auxiliary dictionary (via Counter) where each key is an
item and its value is the number of occurrences in itemset. This allows both the check
and count operations to be performed in constant time O(1).

As a result, the performance improvement is significant, at the cost of a small additional
memory usage, which is a worthwhile trade-off. This improvement can be observed by
comparing the execution of both algorithms (as shown in the attached image).

Here is the graph comparing both functions:
pruneOptimized_prune_algoritm_results.pdf

Unit tests were also conducted on my local machine to ensure the consistency of results between the two methods, but they are not included in this PR.

Add an algorithm?
Fix a bug or typo in an existing algorithm?
Add or change doctests? -- Note: Please avoid changing both code and tests in a single pull request.
Documentation change?

Checklist:

for more information, see https://pre-commit.ci

…cture

joaoneto9 · 2025-09-24T20:04:31Z

I hadn't realized that the itemset could be a list of lists. As a result, hashing these data structures was not possible, so I switched to using tuples, which are immutable, as keys for the Counter. After this change, I noticed a slight overhead, since each item now needs to be converted into a tuple to be checked within the Counter structure. Nonetheless, there is a significant efficiency gain in the worst-case scenario, and I believe it will also improve performance in average cases. I have not yet tested these other scenarios or generated their corresponding graphs. Below is the graph reflecting the new modification.

pruneOptimized_prune_algoritm_results.pdf

Copilot

Pull Request Overview

This PR optimizes the prune function in the Apriori algorithm implementation to improve performance when checking candidate itemsets. The optimization uses Counter to precompute item frequencies instead of repeatedly counting occurrences during candidate validation.

Key changes:

Replaces linear search and counting with hash-based lookup using Counter
Reduces time complexity from O(n * c * i) to O(n + c * i)
Updates function documentation to reflect the optimization

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-01T15:24:02Z

machine_learning/apriori_algorithm.py

    >>> prune(itemset, candidates, 3)
    []
    """
+    itemset_counter = Counter(tuple(x) for x in itemset)


The tuple conversion is performed twice for the same data - once when creating the Counter and again when checking each item. Consider converting items to tuples consistently or using a different approach to avoid this duplication.

Copilot · 2025-10-01T15:24:02Z

machine_learning/apriori_algorithm.py

+            tupla = tuple(item)
+            if tupla not in itemset_counter or itemset_counter[tupla] < length - 1:


The tuple conversion is performed twice for the same data - once when creating the Counter and again when checking each item. Consider converting items to tuples consistently or using a different approach to avoid this duplication.

joaoneto9 and others added 2 commits September 24, 2025 15:19

feat: optimizing the prune function at the apriori_algorithm.py archive

def174d

[pre-commit.ci] auto fixes from pre-commit.com hooks

c2d0613

for more information, see https://pre-commit.ci

algorithms-keeper bot added the tests are failing Do not merge until tests pass label Sep 24, 2025

joaoneto9 added 2 commits September 24, 2025 15:51

fix: fixing the unsorted importing statment

839c43a

Merge branch 'master' of https://github.com/joaoneto9/Python

81a9d8d

algorithms-keeper bot added the awaiting reviews This PR is ready to be reviewed label Sep 24, 2025

pre-commit-ci bot and others added 3 commits September 24, 2025 18:54

[pre-commit.ci] auto fixes from pre-commit.com hooks

38e849b

for more information, see https://pre-commit.ci

fix: fixing the key structure to a tuple that can be an hashable stru…

789f76d

…cture

Merge branch 'master' of https://github.com/joaoneto9/Python

42fe4b6

algorithms-keeper bot removed tests are failing Do not merge until tests pass labels Sep 24, 2025

Merge branch 'master' into master

c88b71f

AnupKumarPanwar requested a review from Copilot October 1, 2025 15:23

AnupKumarPanwar approved these changes Oct 1, 2025

View reviewed changes

algorithms-keeper bot removed the awaiting reviews This PR is ready to be reviewed label Oct 1, 2025

Copilot AI reviewed Oct 1, 2025

View reviewed changes

Merge branch 'master' into master

30aa721

algorithms-keeper bot added the awaiting reviews This PR is ready to be reviewed label Oct 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: optimizing the prune function at the apriori_algorithm.py archive #12992

feat: optimizing the prune function at the apriori_algorithm.py archive #12992

Uh oh!

joaoneto9 commented Sep 24, 2025

Uh oh!

joaoneto9 commented Sep 24, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Oct 1, 2025

Uh oh!

Copilot AI Oct 1, 2025

Uh oh!

Uh oh!

		tupla = tuple(item)
		if tupla not in itemset_counter or itemset_counter[tupla] < length - 1:

Uh oh!

feat: optimizing the prune function at the apriori_algorithm.py archive #12992

Are you sure you want to change the base?

feat: optimizing the prune function at the apriori_algorithm.py archive #12992

Uh oh!

Conversation

joaoneto9 commented Sep 24, 2025

Describe your change:

Checklist:

Uh oh!

joaoneto9 commented Sep 24, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!