Merge pull request #251 from CamDavidsonPilon/v0.11.0

v0.11.0
CamDavidsonPilon · Mar 7, 2019 · 91e184e · 91e184e
2 parents 5e316c0 + acbb9b5
commit 91e184e
Show file tree

Hide file tree

Showing 32 changed files with 4,212 additions and 1,996 deletions.
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -0,0 +1,16 @@
+repos:
+-   repo: https://github.com/pre-commit/pre-commit-hooks
+    rev: v2.0.0
+    hooks:
+    -   id: trailing-whitespace
+    -   id: check-ast
+    -   id: check-yaml
+    -   id: end-of-file-fixer
+    -   id: fix-encoding-pragma
+    -   id: mixed-line-ending
+    -   id: trailing-whitespace
+-   repo: https://github.com/ambv/black
+    rev: stable
+    hooks:
+    - id: black
+      args: ["--line-length", "120"]
diff --git a/.prospector.yaml b/.prospector.yaml
@@ -0,0 +1,49 @@
+strictness: medium
+
+pylint:
+  options:
+    bad-names: foo,baz,toto,tutu,tata
+    # max-args default = 5
+    max-args: 15
+    # max-locals default = 15
+    max-locals: 50
+    # max-branches default = 15
+    max-branches: 15
+  disable:
+    - line-too-long
+    - protected-access
+    - no-value-for-parameter
+    - assignment-from-no-return
+    - invalid-unary-operand-type
+    # remove if python2.7 support is dropped
+    - useless-object-inheritance
+    - old-style-class
+
+pyflakes:
+  disable:
+    - F401
+    - F841
+    # let pylint used-before-assignment handle this
+    - F821
+
+pep8:
+  options:
+    max-line-length: 120
+  disable:
+    - E501
+    - E241
+
+mccabe:
+  options:
+    # max-complexity default = 10
+    max-complexity: 23
+
+pyroma:
+  run: true
+
+pep257:
+  run: false
+
+ignore-paths:
+  - build
+  - benchmarks
diff --git a/.travis.yml b/.travis.yml
@@ -1,17 +1,28 @@
 language: python
+cache: pip
 dist: trusty
 python:
    - "2.7"
    - "3.5"
    - "3.6"
+env:
+- export PANDAS_VERSION=0.21.1
+- export PANDAS_VERSION=0.22.0
+- export PANDAS_VERSION=0.23.4
+- export PANDAS_VERSION=0.24.1
 # Enable newer 3.7 without globally enabling sudo and dist: xenial for other build jobs
 matrix:
   include:
-    - python: 3.7
-      dist: xenial
-      sudo: true
+  - python: 3.7
+    dist: xenial
+    sudo: true
+    env: export PANDAS_VERSION=0.24.1
+  - python: 3.7
+    dist: xenial
+    sudo: true
+    env: export PANDAS_VERSION=0.23.4
 before_install:
-  - sudo apt-get update
+  - ls
 install:
   - "pip install -r dev_requirements.txt"
 # command to run tests

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,18 +1,26 @@
 # Changelog
 
+### 0.11.0
+ - Move most models (all but Pareto) to autograd for automatic differentiation of their likelihood. This results in faster (at least 3x) and more successful convergence, plus allows for some really exciting extensions (coming soon).
+ - `GammaGammaFitter`, `BetaGeoFitter`, `ModifiedBetaGeoFitter` and `BetaGeoBetaBinomFitter` have three new attributes: `confidence_interval_`, `variance_matrix_` and `standard_errors_`
+ - `params_` on fitted models is not longer an OrderedDict, but a Pandas Series
+ - `GammaGammaFitter` can accept a `weights` argument now.
+ - `customer_lifelime_value` in `GammaGamma` now accepts a frequency argument.
+ - fixed a bug that was causing `ParetoNBDFitter` to generate data incorrectly.
+
 ### 0.10.1
  - performance improvements to `generate_data.py` for large datasets #195
  - performance improvements to `summary_data_from_transaction_data`, thanks @MichaelSchreier
- - Previously, `GammaGammaFitter` would have an infinite mean when its `q` parameter was less than 1. This was possible for some datasets. In 0.10.1, a new argument is added to `GammaGammaFitter` to constrain that `q` is greater than 1. This can be done with `q_constraint=True` in the call to `GammaGammaFitter.fit`. See issue #146. Thanks @vruvora 
+ - Previously, `GammaGammaFitter` would have an infinite mean when its `q` parameter was less than 1. This was possible for some datasets. In 0.10.1, a new argument is added to `GammaGammaFitter` to constrain that `q` is greater than 1. This can be done with `q_constraint=True` in the call to `GammaGammaFitter.fit`. See issue #146. Thanks @vruvora
  - Stop support of scipy < 1.0.
  - Stop support of < Python 3.5.
 
 ### 0.10.0
- - `BetaGeoBetaBinomFitter.fit` has replaced `n_custs` with the more appropriately named `weights` (to align with other statisical libraries). By default and if unspecified, `weights` is equal to an array of 1s. 
- - The `conditional_` methods on `BetaGeoBetaBinomFitter` have been updated to handle exogenously provided recency, frequency and periods. 
- - Performance improvements in `BetaGeoBetaBinomFitter`. `fit` takes about 50% less time than previously. 
- - `BetaGeoFitter`, `ParetoNBDFitter`, and `ModifiedBetaGeoFitter` both have a new `weights` argument in their `fit`. This can be used to reduce the size of the data (collapsing subjects with the same recency, frequency, T). 
- 
+ - `BetaGeoBetaBinomFitter.fit` has replaced `n_custs` with the more appropriately named `weights` (to align with other statisical libraries). By default and if unspecified, `weights` is equal to an array of 1s.
+ - The `conditional_` methods on `BetaGeoBetaBinomFitter` have been updated to handle exogenously provided recency, frequency and periods.
+ - Performance improvements in `BetaGeoBetaBinomFitter`. `fit` takes about 50% less time than previously.
+ - `BetaGeoFitter`, `ParetoNBDFitter`, and `ModifiedBetaGeoFitter` both have a new `weights` argument in their `fit`. This can be used to reduce the size of the data (collapsing subjects with the same recency, frequency, T).
+
 ### 0.9.1
  - Added a data generation method, `generate_new_data` to `BetaGeoBetaBinomFitter`. @zscore
  - Fixed a bug in `summary_data_from_transaction_data` that was casting values to `int` prematurely. This was solved by including a new param `freq_multiplier` to be used to scale the resulting durations. See #100 for the original issue.  @aprotopopov
@@ -27,7 +35,7 @@
 
 ### 0.8.1
  - adding new `save_model` and `load_model` functions to all fitters. This will save the model locally as a pickle file.
- - `observation_period_end` in `summary_data_from_transaction_data` and `calibration_and_holdout_data` now defaults to the max date in the dataset, instead of current time. 
- - improved stability of estimators. 
- - improve Runtime warnings. 
- - All fitters are now in a local file. This doesn't change the API however. 
+ - `observation_period_end` in `summary_data_from_transaction_data` and `calibration_and_holdout_data` now defaults to the max date in the dataset, instead of current time.
+ - improved stability of estimators.
+ - improve Runtime warnings.
+ - All fitters are now in a local file. This doesn't change the API however.
diff --git a/Makefile b/Makefile
@@ -1,13 +1,34 @@
-autopep8:
-	autopep8 --ignore E501,E241,W690 --in-place --recursive --aggressive lifetimes/
+init:
+ifeq ($(TRAVIS), true)
+		pip install -r reqs/travis-requirements.txt
+		pip install pandas==${PANDAS_VERSION}
+		pip list --local
+else
+		pip install -r dev_requirements.txt
+		pre-commit install
+endif
+
+test:
+	py.test -rfs --cov=lifetimes --block=False --cov-report term-missing
 
 lint:
-	flake8 lifetimes
+ifeq ($(TRAVIS_PYTHON_VERSION), 2.7)
+		echo "Skip linting for Python2.7"
+else
+		black lifetimes/ -l 120 --fast
+		black tests/ -l 120 --fast
+		prospector --output-format grouped
+endif
 
-autolint: autopep8 lint
+format:
+	black . --line-length 120
 
-pycodestyle:
-	pycodestyle lifetimes
+check_format:
+ifeq ($(TRAVIS_PYTHON_VERSION), 3.6)
+		black . --check --line-length 120
+else
+		echo "Only check format on Python3.6"
+endif
 
-pydocstyle:
-	pydocstyle lifetimes
+pre:
+	pre-commit run --all-files
diff --git a/README.md b/README.md
@@ -34,8 +34,6 @@ As emphasized by P. Fader and B. Hardie, understanding and acting on customer li
 
     pip install lifetimes
 
-Requirements are only Numpy, Scipy, Pandas, [Dill](https://github.com/uqfoundation/dill) (and optionally-but-seriously matplotlib).
-
 ## Documentation and tutorials
 [Official documentation](http://lifetimes.readthedocs.io/en/latest/)
 

diff --git a/docs/Changelog.rst b/docs/Changelog.rst
@@ -0,0 +1,102 @@
+Changelog
+=========
+
+0.11.0
+~~~~~~
+
+-  Move most models (all but Pareto) to autograd for automatic
+   differentiation of their likelihood. This results in faster (at least
+   3x) and more successful convergence, plus allows for some really
+   exciting extensions (coming soon).
+-  ``GammaGammaFitter``, ``BetaGeoFitter``, ``ModifiedBetaGeoFitter``
+   and ``BetaGeoBetaBinomFitter`` have three new attributes:
+   ``confidence_interval_``, ``variance_matrix_`` and
+   ``standard_errors_``
+-  ``params_`` on fitted models is not longer an OrderedDict, but a
+   Pandas Series
+-  ``GammaGammaFitter`` can accept a ``weights`` argument now.
+-  ``customer_lifelime_value`` in ``GammaGamma`` now accepts a frequency
+   argument.
+-  fixed a bug that was causing ``ParetoNBDFitter`` to generate data
+   incorrectly.
+
+.. _section-1:
+
+0.10.1
+~~~~~~
+
+-  performance improvements to ``generate_data.py`` for large datasets
+   #195
+-  performance improvements to ``summary_data_from_transaction_data``,
+   thanks @MichaelSchreier
+-  Previously, ``GammaGammaFitter`` would have an infinite mean when its
+   ``q`` parameter was less than 1. This was possible for some datasets.
+   In 0.10.1, a new argument is added to ``GammaGammaFitter`` to
+   constrain that ``q`` is greater than 1. This can be done with
+   ``q_constraint=True`` in the call to ``GammaGammaFitter.fit``. See
+   issue #146. Thanks @vruvora
+-  Stop support of scipy < 1.0.
+-  Stop support of < Python 3.5.
+
+.. _section-2:
+
+0.10.0
+~~~~~~
+
+-  ``BetaGeoBetaBinomFitter.fit`` has replaced ``n_custs`` with the more
+   appropriately named ``weights`` (to align with other statisical
+   libraries). By default and if unspecified, ``weights`` is equal to an
+   array of 1s.
+-  The ``conditional_`` methods on ``BetaGeoBetaBinomFitter`` have been
+   updated to handle exogenously provided recency, frequency and
+   periods.
+-  Performance improvements in ``BetaGeoBetaBinomFitter``. ``fit`` takes
+   about 50% less time than previously.
+-  ``BetaGeoFitter``, ``ParetoNBDFitter``, and ``ModifiedBetaGeoFitter``
+   both have a new ``weights`` argument in their ``fit``. This can be
+   used to reduce the size of the data (collapsing subjects with the
+   same recency, frequency, T).
+
+.. _section-3:
+
+0.9.1
+~~~~~
+
+-  Added a data generation method, ``generate_new_data`` to
+   ``BetaGeoBetaBinomFitter``. @zscore
+-  Fixed a bug in ``summary_data_from_transaction_data`` that was
+   casting values to ``int`` prematurely. This was solved by including a
+   new param ``freq_multiplier`` to be used to scale the resulting
+   durations. See #100 for the original issue. @aprotopopov
+-  Performance and bug fixes in
+   ``utils.expected_cumulative_transactions``. @aprotopopov
+-  Fixed a bug in ``utils.calculate_alive_path`` that was causing a
+   difference in values compared to ``summary_from_transaction_data``.
+   @DaniGate
+
+.. _section-4:
+
+0.9.0
+~~~~~
+
+-  fixed many of the numpy warnings as the result of fitting
+-  added optional ``initial_params`` to all models
+-  Added ``conditional_probability_of_n_purchases_up_to_time`` to
+   ``ParetoNBDFitter``
+-  Fixed a bug in ``expected_cumulative_transactions`` and
+   ``plot_cumulative_transactions``
+
+.. _section-5:
+
+0.8.1
+~~~~~
+
+-  adding new ``save_model`` and ``load_model`` functions to all
+   fitters. This will save the model locally as a pickle file.
+-  ``observation_period_end`` in ``summary_data_from_transaction_data``
+   and ``calibration_and_holdout_data`` now defaults to the max date in
+   the dataset, instead of current time.
+-  improved stability of estimators.
+-  improve Runtime warnings.
+-  All fitters are now in a local file. This doesn’t change the API
+   however.
diff --git a/docs/Quickstart.md b/docs/Quickstart.md
@@ -21,7 +21,7 @@ ID
 #### The shape of your data
 For all models, the following nomenclature is used:
 
-- `frequency` represents the number of *repeat* purchases the customer has made. This means that it's one less than the total number of purchases. This is actually slightly wrong. It's the count of time periods the customer had a purchase in. So if using days as units, then it's the count of days the customer had a purchase on.   
+- `frequency` represents the number of *repeat* purchases the customer has made. This means that it's one less than the total number of purchases. This is actually slightly wrong. It's the count of time periods the customer had a purchase in. So if using days as units, then it's the count of days the customer had a purchase on.
 - `T` represents the age of the customer in whatever time units chosen (weekly, in the above dataset). This is equal to the duration between a customer's first purchase and the end of the period under study.
 - `recency` represents the age of the customer when they made their most recent purchases. This is equal to the duration between a customer's first purchase and their latest purchase. (Thus if they have made only 1 purchase, the recency is 0.)
 
@@ -43,7 +43,7 @@ print(bgf)
 """
 ```
 
-After fitting, we have lots of nice methods and properties attached to the fitter object.
+After fitting, we have lots of nice methods and properties attached to the fitter object, like ``param_`` and ``summary``.
 
 For small samples sizes, the parameters can get implausibly large, so by adding an l2 penalty the likelihood, we can control how large these parameters can be. This is implemented as setting as positive `penalizer_coef` in the initialization of the model. In typical applications, penalizers on the order of 0.001 to 0.1 are effective.
 
@@ -153,7 +153,7 @@ from lifetimes.utils import calibration_and_holdout_data
 
 summary_cal_holdout = calibration_and_holdout_data(transaction_data, 'id', 'date',
                                         calibration_period_end='2014-09-01',
-                                        observation_period_end='2014-12-31' )   
+                                        observation_period_end='2014-12-31' )
 print(summary_cal_holdout.head())
 """
     frequency_cal  recency_cal  T_cal  frequency_holdout  duration_holdout

diff --git a/docs/conf.py b/docs/conf.py
@@ -77,7 +77,7 @@
 # built documents.
 #
 # The short X.Y version.
-version = '0.10.1'
+version = '0.11.0'
 # The full version, including alpha/beta/rc tags.
 release = version
 

diff --git a/docs/index.rst b/docs/index.rst
@@ -12,6 +12,7 @@
    Quickstart
    Saving and loading model
    More examples and recipes
+   Changelog
 
 
 Indices and tables

diff --git a/docs/intro.rst b/docs/intro.rst
@@ -51,10 +51,6 @@ Installation
 
    pip install lifetimes
 
-Requirements are only Numpy, Scipy, Pandas,
-`Dill <https://github.com/uqfoundation/dill>`__ (and
-optionally-but-seriously matplotlib).
-
 Documentation and tutorials
 ---------------------------
 

diff --git a/lifetimes/__init__.py b/lifetimes/__init__.py
@@ -1,4 +1,18 @@
-from .estimation import BetaGeoFitter, ParetoNBDFitter, GammaGammaFitter, ModifiedBetaGeoFitter, BetaGeoBetaBinomFitter
+# -*- coding: utf-8 -*-
+"""All fitters from fitters directory."""
 from .version import __version__
+from .fitters import BaseFitter
+from .fitters.beta_geo_fitter import BetaGeoFitter
+from .fitters.beta_geo_beta_binom_fitter import BetaGeoBetaBinomFitter
+from .fitters.modified_beta_geo_fitter import ModifiedBetaGeoFitter
+from .fitters.pareto_nbd_fitter import ParetoNBDFitter
+from .fitters.gamma_gamma_fitter import GammaGammaFitter
 
-__all__ = ['BetaGeoFitter', 'ParetoNBDFitter', 'GammaGammaFitter', 'ModifiedBetaGeoFitter', 'BetaGeoBetaBinomFitter']
+__all__ = (
+    "__version__",
+    "BetaGeoFitter",
+    "ParetoNBDFitter",
+    "GammaGammaFitter",
+    "ModifiedBetaGeoFitter",
+    "BetaGeoBetaBinomFitter",
+)