tsvar

Association rules for TSV (tab-separated values) files.

Association rules describe the relationship between two categorical variables. They are represented in the form A -> B, where being/having A (the antecedent) increases the probability of being/having B (the consequent). A and B can be sets. For more details, see mlxtend's overview.

Dependencies

Python 3.7+ (probably works on lower versions), Pandas and Mlxtend.

Primer

TsvAR is a generalization of the association rule miner in bandori-2019-stats. It can create rules from survey response data, such as:

| Favorite Operators            | Gender | Region        |
| ----------------------------- | ------ | ------------- |
| Beanstalk,Blue Poison,Phantom | Male   | Europe        |
| Scene,Astesia,Texas,Hellagur  | Other  | East Asia     |
| Perfumer,Mountain,Blemishine  | Female | Latin America |

... or from really any data where each person (or thing) has a closed-set of categorical properties (e.g. social media profiles, networks in general), such as:

| Friends                  | Hobbies                      | Age Group | Invited |
| ------------------------ | ---------------------------- | --------- | ------- |
| Alice,Bob,Claris         | Gardening,Hiking,Board Games | 18-22     | No      |
| Derek,Ethan,Fumino,Alice | Reading                      | 26-30     | No      |
| Giovanni,Claris          | Movies,Soccer,Hiking         | 18-22     | Yes     |

Usage

In path-to-file.tsv:

Favorite Operators	Gender	Region
Beanstalk,Blue Poison,Phantom	Male	Europe
Scene,Astesia,Texas,Hellagur	Other	East Asia
Perfumer,Mountain,Blemishine	Female	Latin America

In your code:

from tsvar import AssociationMiner

miner = AssociationMiner("path-to-file.tsv", export_to_tsv=True)
rules = miner.mine(["Favorite Operators", "Region"], min_frequency=0.01, metric="confidence", metric_threshold=0.3)

This will return association rules regarding both Favorite Operators and/or Region values that occur in at least 1% of all entries and have a confidence of at least 30%. It will also export those rules as a TSV file, which you can open in Excel etc.

To search for rules based on specific value occurrences, or reorganize the rules, do something like this:

resultant_rules_df = rules.search(["Blemishine"], location="antecedent")

rules.organize(sort_by="support")
organized_rules_df = rules.table_organized

To get a list of unique values for a category:

import pandas
from tsvar import Helper

df = pandas.read_table("path-to-file.tsv")
answers = Helper.unique_answers(df, "Hobbies")

In general, see inline documentation for more options, examples, and explanations.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
README.md		README.md
miner.py		miner.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tsvar

Dependencies

Primer

Usage

About

Releases

Packages

Languages

License

supreme-chocomint/tsvar

Folders and files

Latest commit

History

Repository files navigation

tsvar

Dependencies

Primer

Usage

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages