Skip to content
This repository has been archived by the owner on Jun 28, 2024. It is now read-only.

Commit

Permalink
Merge pull request #251 from CamDavidsonPilon/v0.11.0
Browse files Browse the repository at this point in the history
v0.11.0
  • Loading branch information
CamDavidsonPilon authored Mar 7, 2019
2 parents 5e316c0 + acbb9b5 commit 91e184e
Show file tree
Hide file tree
Showing 32 changed files with 4,212 additions and 1,996 deletions.
16 changes: 16 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v2.0.0
hooks:
- id: trailing-whitespace
- id: check-ast
- id: check-yaml
- id: end-of-file-fixer
- id: fix-encoding-pragma
- id: mixed-line-ending
- id: trailing-whitespace
- repo: https://github.com/ambv/black
rev: stable
hooks:
- id: black
args: ["--line-length", "120"]
49 changes: 49 additions & 0 deletions .prospector.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
strictness: medium

pylint:
options:
bad-names: foo,baz,toto,tutu,tata
# max-args default = 5
max-args: 15
# max-locals default = 15
max-locals: 50
# max-branches default = 15
max-branches: 15
disable:
- line-too-long
- protected-access
- no-value-for-parameter
- assignment-from-no-return
- invalid-unary-operand-type
# remove if python2.7 support is dropped
- useless-object-inheritance
- old-style-class

pyflakes:
disable:
- F401
- F841
# let pylint used-before-assignment handle this
- F821

pep8:
options:
max-line-length: 120
disable:
- E501
- E241

mccabe:
options:
# max-complexity default = 10
max-complexity: 23

pyroma:
run: true

pep257:
run: false

ignore-paths:
- build
- benchmarks
19 changes: 15 additions & 4 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,17 +1,28 @@
language: python
cache: pip
dist: trusty
python:
- "2.7"
- "3.5"
- "3.6"
env:
- export PANDAS_VERSION=0.21.1
- export PANDAS_VERSION=0.22.0
- export PANDAS_VERSION=0.23.4
- export PANDAS_VERSION=0.24.1
# Enable newer 3.7 without globally enabling sudo and dist: xenial for other build jobs
matrix:
include:
- python: 3.7
dist: xenial
sudo: true
- python: 3.7
dist: xenial
sudo: true
env: export PANDAS_VERSION=0.24.1
- python: 3.7
dist: xenial
sudo: true
env: export PANDAS_VERSION=0.23.4
before_install:
- sudo apt-get update
- ls
install:
- "pip install -r dev_requirements.txt"
# command to run tests
Expand Down
28 changes: 18 additions & 10 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,26 @@
# Changelog

### 0.11.0
- Move most models (all but Pareto) to autograd for automatic differentiation of their likelihood. This results in faster (at least 3x) and more successful convergence, plus allows for some really exciting extensions (coming soon).
- `GammaGammaFitter`, `BetaGeoFitter`, `ModifiedBetaGeoFitter` and `BetaGeoBetaBinomFitter` have three new attributes: `confidence_interval_`, `variance_matrix_` and `standard_errors_`
- `params_` on fitted models is not longer an OrderedDict, but a Pandas Series
- `GammaGammaFitter` can accept a `weights` argument now.
- `customer_lifelime_value` in `GammaGamma` now accepts a frequency argument.
- fixed a bug that was causing `ParetoNBDFitter` to generate data incorrectly.

### 0.10.1
- performance improvements to `generate_data.py` for large datasets #195
- performance improvements to `summary_data_from_transaction_data`, thanks @MichaelSchreier
- Previously, `GammaGammaFitter` would have an infinite mean when its `q` parameter was less than 1. This was possible for some datasets. In 0.10.1, a new argument is added to `GammaGammaFitter` to constrain that `q` is greater than 1. This can be done with `q_constraint=True` in the call to `GammaGammaFitter.fit`. See issue #146. Thanks @vruvora
- Previously, `GammaGammaFitter` would have an infinite mean when its `q` parameter was less than 1. This was possible for some datasets. In 0.10.1, a new argument is added to `GammaGammaFitter` to constrain that `q` is greater than 1. This can be done with `q_constraint=True` in the call to `GammaGammaFitter.fit`. See issue #146. Thanks @vruvora
- Stop support of scipy < 1.0.
- Stop support of < Python 3.5.

### 0.10.0
- `BetaGeoBetaBinomFitter.fit` has replaced `n_custs` with the more appropriately named `weights` (to align with other statisical libraries). By default and if unspecified, `weights` is equal to an array of 1s.
- The `conditional_` methods on `BetaGeoBetaBinomFitter` have been updated to handle exogenously provided recency, frequency and periods.
- Performance improvements in `BetaGeoBetaBinomFitter`. `fit` takes about 50% less time than previously.
- `BetaGeoFitter`, `ParetoNBDFitter`, and `ModifiedBetaGeoFitter` both have a new `weights` argument in their `fit`. This can be used to reduce the size of the data (collapsing subjects with the same recency, frequency, T).
- `BetaGeoBetaBinomFitter.fit` has replaced `n_custs` with the more appropriately named `weights` (to align with other statisical libraries). By default and if unspecified, `weights` is equal to an array of 1s.
- The `conditional_` methods on `BetaGeoBetaBinomFitter` have been updated to handle exogenously provided recency, frequency and periods.
- Performance improvements in `BetaGeoBetaBinomFitter`. `fit` takes about 50% less time than previously.
- `BetaGeoFitter`, `ParetoNBDFitter`, and `ModifiedBetaGeoFitter` both have a new `weights` argument in their `fit`. This can be used to reduce the size of the data (collapsing subjects with the same recency, frequency, T).

### 0.9.1
- Added a data generation method, `generate_new_data` to `BetaGeoBetaBinomFitter`. @zscore
- Fixed a bug in `summary_data_from_transaction_data` that was casting values to `int` prematurely. This was solved by including a new param `freq_multiplier` to be used to scale the resulting durations. See #100 for the original issue. @aprotopopov
Expand All @@ -27,7 +35,7 @@

### 0.8.1
- adding new `save_model` and `load_model` functions to all fitters. This will save the model locally as a pickle file.
- `observation_period_end` in `summary_data_from_transaction_data` and `calibration_and_holdout_data` now defaults to the max date in the dataset, instead of current time.
- improved stability of estimators.
- improve Runtime warnings.
- All fitters are now in a local file. This doesn't change the API however.
- `observation_period_end` in `summary_data_from_transaction_data` and `calibration_and_holdout_data` now defaults to the max date in the dataset, instead of current time.
- improved stability of estimators.
- improve Runtime warnings.
- All fitters are now in a local file. This doesn't change the API however.
37 changes: 29 additions & 8 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,13 +1,34 @@
autopep8:
autopep8 --ignore E501,E241,W690 --in-place --recursive --aggressive lifetimes/
init:
ifeq ($(TRAVIS), true)
pip install -r reqs/travis-requirements.txt
pip install pandas==${PANDAS_VERSION}
pip list --local
else
pip install -r dev_requirements.txt
pre-commit install
endif

test:
py.test -rfs --cov=lifetimes --block=False --cov-report term-missing

lint:
flake8 lifetimes
ifeq ($(TRAVIS_PYTHON_VERSION), 2.7)
echo "Skip linting for Python2.7"
else
black lifetimes/ -l 120 --fast
black tests/ -l 120 --fast
prospector --output-format grouped
endif

autolint: autopep8 lint
format:
black . --line-length 120

pycodestyle:
pycodestyle lifetimes
check_format:
ifeq ($(TRAVIS_PYTHON_VERSION), 3.6)
black . --check --line-length 120
else
echo "Only check format on Python3.6"
endif

pydocstyle:
pydocstyle lifetimes
pre:
pre-commit run --all-files
2 changes: 0 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,6 @@ As emphasized by P. Fader and B. Hardie, understanding and acting on customer li

pip install lifetimes

Requirements are only Numpy, Scipy, Pandas, [Dill](https://github.com/uqfoundation/dill) (and optionally-but-seriously matplotlib).

## Documentation and tutorials
[Official documentation](http://lifetimes.readthedocs.io/en/latest/)

Expand Down
102 changes: 102 additions & 0 deletions docs/Changelog.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
Changelog
=========

0.11.0
~~~~~~

- Move most models (all but Pareto) to autograd for automatic
differentiation of their likelihood. This results in faster (at least
3x) and more successful convergence, plus allows for some really
exciting extensions (coming soon).
- ``GammaGammaFitter``, ``BetaGeoFitter``, ``ModifiedBetaGeoFitter``
and ``BetaGeoBetaBinomFitter`` have three new attributes:
``confidence_interval_``, ``variance_matrix_`` and
``standard_errors_``
- ``params_`` on fitted models is not longer an OrderedDict, but a
Pandas Series
- ``GammaGammaFitter`` can accept a ``weights`` argument now.
- ``customer_lifelime_value`` in ``GammaGamma`` now accepts a frequency
argument.
- fixed a bug that was causing ``ParetoNBDFitter`` to generate data
incorrectly.

.. _section-1:

0.10.1
~~~~~~

- performance improvements to ``generate_data.py`` for large datasets
#195
- performance improvements to ``summary_data_from_transaction_data``,
thanks @MichaelSchreier
- Previously, ``GammaGammaFitter`` would have an infinite mean when its
``q`` parameter was less than 1. This was possible for some datasets.
In 0.10.1, a new argument is added to ``GammaGammaFitter`` to
constrain that ``q`` is greater than 1. This can be done with
``q_constraint=True`` in the call to ``GammaGammaFitter.fit``. See
issue #146. Thanks @vruvora
- Stop support of scipy < 1.0.
- Stop support of < Python 3.5.

.. _section-2:

0.10.0
~~~~~~

- ``BetaGeoBetaBinomFitter.fit`` has replaced ``n_custs`` with the more
appropriately named ``weights`` (to align with other statisical
libraries). By default and if unspecified, ``weights`` is equal to an
array of 1s.
- The ``conditional_`` methods on ``BetaGeoBetaBinomFitter`` have been
updated to handle exogenously provided recency, frequency and
periods.
- Performance improvements in ``BetaGeoBetaBinomFitter``. ``fit`` takes
about 50% less time than previously.
- ``BetaGeoFitter``, ``ParetoNBDFitter``, and ``ModifiedBetaGeoFitter``
both have a new ``weights`` argument in their ``fit``. This can be
used to reduce the size of the data (collapsing subjects with the
same recency, frequency, T).

.. _section-3:

0.9.1
~~~~~

- Added a data generation method, ``generate_new_data`` to
``BetaGeoBetaBinomFitter``. @zscore
- Fixed a bug in ``summary_data_from_transaction_data`` that was
casting values to ``int`` prematurely. This was solved by including a
new param ``freq_multiplier`` to be used to scale the resulting
durations. See #100 for the original issue. @aprotopopov
- Performance and bug fixes in
``utils.expected_cumulative_transactions``. @aprotopopov
- Fixed a bug in ``utils.calculate_alive_path`` that was causing a
difference in values compared to ``summary_from_transaction_data``.
@DaniGate

.. _section-4:

0.9.0
~~~~~

- fixed many of the numpy warnings as the result of fitting
- added optional ``initial_params`` to all models
- Added ``conditional_probability_of_n_purchases_up_to_time`` to
``ParetoNBDFitter``
- Fixed a bug in ``expected_cumulative_transactions`` and
``plot_cumulative_transactions``

.. _section-5:

0.8.1
~~~~~

- adding new ``save_model`` and ``load_model`` functions to all
fitters. This will save the model locally as a pickle file.
- ``observation_period_end`` in ``summary_data_from_transaction_data``
and ``calibration_and_holdout_data`` now defaults to the max date in
the dataset, instead of current time.
- improved stability of estimators.
- improve Runtime warnings.
- All fitters are now in a local file. This doesn’t change the API
however.
6 changes: 3 additions & 3 deletions docs/Quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ ID
#### The shape of your data
For all models, the following nomenclature is used:

- `frequency` represents the number of *repeat* purchases the customer has made. This means that it's one less than the total number of purchases. This is actually slightly wrong. It's the count of time periods the customer had a purchase in. So if using days as units, then it's the count of days the customer had a purchase on.
- `frequency` represents the number of *repeat* purchases the customer has made. This means that it's one less than the total number of purchases. This is actually slightly wrong. It's the count of time periods the customer had a purchase in. So if using days as units, then it's the count of days the customer had a purchase on.
- `T` represents the age of the customer in whatever time units chosen (weekly, in the above dataset). This is equal to the duration between a customer's first purchase and the end of the period under study.
- `recency` represents the age of the customer when they made their most recent purchases. This is equal to the duration between a customer's first purchase and their latest purchase. (Thus if they have made only 1 purchase, the recency is 0.)

Expand All @@ -43,7 +43,7 @@ print(bgf)
"""
```

After fitting, we have lots of nice methods and properties attached to the fitter object.
After fitting, we have lots of nice methods and properties attached to the fitter object, like ``param_`` and ``summary``.

For small samples sizes, the parameters can get implausibly large, so by adding an l2 penalty the likelihood, we can control how large these parameters can be. This is implemented as setting as positive `penalizer_coef` in the initialization of the model. In typical applications, penalizers on the order of 0.001 to 0.1 are effective.

Expand Down Expand Up @@ -153,7 +153,7 @@ from lifetimes.utils import calibration_and_holdout_data

summary_cal_holdout = calibration_and_holdout_data(transaction_data, 'id', 'date',
calibration_period_end='2014-09-01',
observation_period_end='2014-12-31' )
observation_period_end='2014-12-31' )
print(summary_cal_holdout.head())
"""
frequency_cal recency_cal T_cal frequency_holdout duration_holdout
Expand Down
2 changes: 1 addition & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@
# built documents.
#
# The short X.Y version.
version = '0.10.1'
version = '0.11.0'
# The full version, including alpha/beta/rc tags.
release = version

Expand Down
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
Quickstart
Saving and loading model
More examples and recipes
Changelog


Indices and tables
Expand Down
4 changes: 0 additions & 4 deletions docs/intro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -51,10 +51,6 @@ Installation

pip install lifetimes

Requirements are only Numpy, Scipy, Pandas,
`Dill <https://github.com/uqfoundation/dill>`__ (and
optionally-but-seriously matplotlib).

Documentation and tutorials
---------------------------

Expand Down
18 changes: 16 additions & 2 deletions lifetimes/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,18 @@
from .estimation import BetaGeoFitter, ParetoNBDFitter, GammaGammaFitter, ModifiedBetaGeoFitter, BetaGeoBetaBinomFitter
# -*- coding: utf-8 -*-
"""All fitters from fitters directory."""
from .version import __version__
from .fitters import BaseFitter
from .fitters.beta_geo_fitter import BetaGeoFitter
from .fitters.beta_geo_beta_binom_fitter import BetaGeoBetaBinomFitter
from .fitters.modified_beta_geo_fitter import ModifiedBetaGeoFitter
from .fitters.pareto_nbd_fitter import ParetoNBDFitter
from .fitters.gamma_gamma_fitter import GammaGammaFitter

__all__ = ['BetaGeoFitter', 'ParetoNBDFitter', 'GammaGammaFitter', 'ModifiedBetaGeoFitter', 'BetaGeoBetaBinomFitter']
__all__ = (
"__version__",
"BetaGeoFitter",
"ParetoNBDFitter",
"GammaGammaFitter",
"ModifiedBetaGeoFitter",
"BetaGeoBetaBinomFitter",
)
Loading

0 comments on commit 91e184e

Please sign in to comment.