Skip to content

Commit

Permalink
Merge branch 'release/0.4.4'
Browse files Browse the repository at this point in the history
  • Loading branch information
wmayner committed Jul 19, 2017
2 parents 284ea57 + d43104e commit ef08133
Show file tree
Hide file tree
Showing 8 changed files with 150 additions and 95 deletions.
6 changes: 4 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
__pycache__
build
.cache
.tox
.env
.ropeproject
*.so
*.pyc
dist
MANIFEST
*.egg*
build
dist
pyemd/emd.cpp
16 changes: 6 additions & 10 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,16 +1,12 @@
sudo: false
language: python
python:
- '3.3'
- '3.4'
- '3.5'
- '3.6'
install:
- pip install Cython
- make
- pip install -e .
- pip install pytest
script: python -m pytest
- '2.7'
- '3.6'
install:
- pip install tox-travis
- pip install cython
script: tox
notifications:
email: false
slack:
Expand Down
21 changes: 21 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
Installation issues
===================

Before opening an issue related to installation, please try to install PyEMD in
a fresh, empty Python 3 virtual environment and check that the problem
persists:

```shell
pip install virtualenvwrapper
mkvirtualenv -p `which python3` pyemd
# Now we're an empty Python 3 virtual environment
pip install pyemd
```

PyEMD is not officially supported for (but may nonetheless work with) the following:

- Python 2
- Anaconda distributions
- Windows operating systems

However, if you need to use it in these cases, pull requests are welcome!
83 changes: 42 additions & 41 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
.. image:: https://travis-ci.org/wmayner/pyemd.svg?branch=develop
.. image:: https://img.shields.io/travis/wmayner/pyemd/develop.svg?style=flat-square&maxAge=3600
:target: https://travis-ci.org/wmayner/pyemd
.. image:: http://img.shields.io/badge/Python%203%20-compatible-brightgreen.svg
.. image:: https://img.shields.io/pypi/pyversions/pyemd.svg?style=flat-square&maxAge=86400
:target: https://wiki.python.org/moin/Python2orPython3
:alt: Python 3 compatible
:alt: Python versions badge

**************************
PyEMD: Fast EMD for Python
Expand All @@ -14,10 +14,6 @@ Distance <http://en.wikipedia.org/wiki/Earth_mover%27s_distance>`_ that allows
it to be used with NumPy. **If you use this code, please cite the papers listed
at the end of this document.**

This wrapper does not expose the full functionality of the underlying
implementation; it can only used be with the ``np.float`` data type, and with a
symmetric distance matrix that represents a true metric. See the documentation
for the original Pele and Werman library for the other options it provides.

Installation
~~~~~~~~~~~~
Expand All @@ -28,11 +24,10 @@ To install the latest release:
pip install pyemd
To install the latest development version:
Before opening an issue related to installation, please try to install PyEMD in
a fresh, empty Python 3 virtual environment and check that the problem
persists.

.. code:: bash
pip install "git+https://github.com/wmayner/pyemd@develop#egg=pyemd"

Usage
~~~~~
Expand All @@ -41,60 +36,60 @@ Usage
>>> from pyemd import emd
>>> import numpy as np
>>> first_signature = np.array([0.0, 1.0])
>>> second_signature = np.array([5.0, 3.0])
>>> distance_matrix = np.array([[0.0, 0.5], [0.5, 0.0]])
>>> emd(first_signature, second_signature, distance_matrix)
>>> first_histogram = np.array([0.0, 1.0])
>>> second_histogram = np.array([5.0, 3.0])
>>> distance_matrix = np.array([[0.0, 0.5],
... [0.5, 0.0]])
>>> emd(first_histogram, second_histogram, distance_matrix)
3.5
You can also get the associated minimum-cost flow:

.. code:: python
>>> from pyemd import emd_with_flow
>>> emd_with_flow(first_signature, second_signature, distance_matrix)
>>> emd_with_flow(first_histogram, second_histogram, distance_matrix)
(3.5, [[0.0, 0.0], [0.0, 1.0]])
API
~~~

.. code:: python
emd(first_signature, second_signature, distance_matrix)
- ``first_signature``: A 1-dimensional numpy array of ``np.float``, of size N.
- ``second_signature``: A 1-dimensional numpy array of ``np.float``, of size N.
- ``distance_matrix``: A 2-dimensional array of ``np.float``, of size NxN. Must
be symmetric and represent a metric.
emd(first_histogram, second_histogram, distance_matrix)
- ``first_histogram``: A 1-dimensional numpy array of type ``np.float64``, of
length :math:`N`.
- ``second_histogram``: A 1-dimensional numpy array of type ``np.float64``, of
length :math:`N`.
- ``distance_matrix``: A 2-dimensional array of type ``np.float64``, of size at
least :math:`N \times N`. This defines the underlying metric, or ground
distance, by giving the pairwise distances between the histogram bins. It
must represent a metric; there is no warning if it doesn't.

.. code:: python
emd, flow = emd_with_flow(first_signature, second_signature, distance_matrix)
- ``first_signature``: A 1-dimensional numpy array of ``np.float``, of size N.
- ``second_signature``: A 1-dimensional numpy array of ``np.float``, of size N.
- ``distance_matrix``: A 2-dimensional array of ``np.float``, of size NxN. Must
be symmetric and represent a metric.
The arguments to ``emd_with_flow`` are the same.


Limitations and Caveats
~~~~~~~~~~~~~~~~~~~~~~~

- ``distance_matrix`` must be symmetric.
- ``distance_matrix`` is assumed to represent a true metric. This must be
enforced by the user. See the documentation in ``pyemd/lib/emd_hat.hpp``.
- ``distance_matrix`` is assumed to represent a metric; there is no check to
ensure that this is true. See the documentation in ``pyemd/lib/emd_hat.hpp``
for more information.
- The flow matrix does not contain the flows to/from the extra mass bin.
- The signatures and distance matrix must be numpy arrays of ``np.float``. The
original C++ template function can accept any numerical C++ type, but this
wrapper only instantiates the template with ``double`` (Cython converts
``np.float`` to ``double``). If there's demand, I can add support for other
types.
- The histograms and distance matrix must be numpy arrays of type
``np.float64``. The original C++ template function can accept any numerical
C++ type, but this wrapper only instantiates the template with ``double``
(Cython converts ``np.float64`` to ``double``). If there's demand, I can add
support for other types.


Contributing
~~~~~~~~~~~~

To help develop PyEMD, fork the project on GitHub and install the requirements with ``pip``.
To help develop PyEMD, fork the project on GitHub and install the requirements
with ``pip``.

The ``Makefile`` defines some tasks to help with development:

Expand All @@ -104,6 +99,8 @@ The ``Makefile`` defines some tasks to help with development:
* ``clean``: remove the build directory and the compiled C++ extension
* ``test``: run unit tests with ``py.test``

Tests for different Python environments can be run by installing ``tox`` with
``pip install tox`` and running the ``tox`` command.

Credit
~~~~~~
Expand All @@ -118,7 +115,9 @@ Credit
Please cite these papers if you use this code:
``````````````````````````````````````````````

Ofir Pele and Michael Werman, "A linear time histogram metric for improved SIFT matching," in *Computer Vision - ECCV 2008*, Marseille, France, 2008, pp. 495-508.
Ofir Pele and Michael Werman, "A linear time histogram metric for improved SIFT
matching," in *Computer Vision - ECCV 2008*, Marseille, France, 2008, pp.
495-508.

.. code-block:: latex

Expand All @@ -132,7 +131,9 @@ Ofir Pele and Michael Werman, "A linear time histogram metric for improved SIFT
publisher={Springer}
}

Ofir Pele and Michael Werman, "Fast and robust earth mover's distances," in *Proc. 2009 IEEE 12th Int. Conf. on Computer Vision*, Kyoto, Japan, 2009, pp. 460-467.
Ofir Pele and Michael Werman, "Fast and robust earth mover's distances," in
*Proc. 2009 IEEE 12th Int. Conf. on Computer Vision*, Kyoto, Japan, 2009, pp.
460-467.

.. code-block:: latex

Expand Down
2 changes: 1 addition & 1 deletion pyemd/__about__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
"""PyEMD metadata"""

__title__ = 'pyemd'
__version__ = '0.4.3'
__version__ = '0.4.4'
__description__ = ("A Python wrapper for Ofir Pele and Michael Werman's "
"implementation of the Earth Mover's Distance.")
__author__ = 'Will Mayner'
Expand Down
105 changes: 67 additions & 38 deletions pyemd/emd.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -36,30 +36,39 @@ cdef extern from "lib/emd_hat.hpp":
DEFAULT_EXTRA_MASS_PENALTY = -1.0


def validate(first_signature, second_signature, distance_matrix):
def validate(first_histogram, second_histogram, distance_matrix):
"""Validate input."""
if (first_signature.shape[0] > distance_matrix.shape[0] or
second_signature.shape[0] > distance_matrix.shape[0]):
raise ValueError('Signature dimension cannot be larger than '
'dimensions of distance matrix')
if (first_signature.shape[0] != second_signature.shape[0]):
raise ValueError('Signature dimensions must be equal')
if (first_histogram.shape[0] > distance_matrix.shape[0] or
second_histogram.shape[0] > distance_matrix.shape[0]):
raise ValueError('Histogram lengths cannot be greater than the '
'number of rows or columns of the distance matrix')
if (first_histogram.shape[0] != second_histogram.shape[0]):
raise ValueError('Histogram lengths must be equal')


def emd(np.ndarray[np.float64_t, ndim=1, mode="c"] first_signature,
np.ndarray[np.float64_t, ndim=1, mode="c"] second_signature,
def emd(np.ndarray[np.float64_t, ndim=1, mode="c"] first_histogram,
np.ndarray[np.float64_t, ndim=1, mode="c"] second_histogram,
np.ndarray[np.float64_t, ndim=2, mode="c"] distance_matrix,
extra_mass_penalty=DEFAULT_EXTRA_MASS_PENALTY):
"""
Compute the EMD between signatures with the given distance matrix.
Args:
first_signature (np.ndarray): A 1-dimensional array of type
``np.double``, of length :math:`N`.
second_signature (np.ndarray): A 1-dimensional array of ``np.double``,
also of length :math:`N`.
distance_matrix (np.ndarray): A 2-dimensional array of ``np.double``,
of size :math:`N \cross N`.
u"""Return the EMD between two histograms using the given distance matrix.
The Earth Mover's Distance is the minimal cost of turning one histogram
into another by moving around the “dirt” in the bins, where the cost of
moving dirt from one bin to another is given by the amount of dirt times
the “ground distance” between the bins.
Arguments:
first_histogram (np.ndarray): A 1-dimensional array of type np.float64,
of length N.
second_histogram (np.ndarray): A 1-dimensional array of np.float64,
also of length N.
distance_matrix (np.ndarray): A 2-dimensional array of np.float64, of
size at least N × N. This defines the underlying metric, or ground
distance, by giving the pairwise distances between the histogram
bins. It must represent a metric; there is no warning if it
doesn't.
Keyword Arguments:
extra_mass_penalty: The penalty for extra mass. If you want the
resulting distance to be a metric, it should be at least half the
diameter of the space (maximum possible distance between any two
Expand All @@ -70,28 +79,42 @@ def emd(np.ndarray[np.float64_t, ndim=1, mode="c"] first_signature,
Returns:
float: The EMD value.
Raises:
ValueError: If the length of either histogram is greater than the
number of rows or columns of the distance matrix, or if the histograms
aren't the same length.
"""
validate(first_signature, second_signature, distance_matrix)
return emd_hat_gd_metric_double(first_signature,
second_signature,
validate(first_histogram, second_histogram, distance_matrix)
return emd_hat_gd_metric_double(first_histogram,
second_histogram,
distance_matrix,
extra_mass_penalty)


def emd_with_flow(np.ndarray[np.float64_t, ndim=1, mode="c"] first_signature,
np.ndarray[np.float64_t, ndim=1, mode="c"] second_signature,
def emd_with_flow(np.ndarray[np.float64_t, ndim=1, mode="c"] first_histogram,
np.ndarray[np.float64_t, ndim=1, mode="c"] second_histogram,
np.ndarray[np.float64_t, ndim=2, mode="c"] distance_matrix,
extra_mass_penalty=DEFAULT_EXTRA_MASS_PENALTY):
"""
Compute the EMD between signatures with the given distance matrix.
Args:
first_signature (np.ndarray): A 1-dimensional array of type
``np.double``, of length :math:`N`.
second_signature (np.ndarray): A 1-dimensional array of ``np.double``,
also of length :math:`N`.
distance_matrix (np.ndarray): A 2-dimensional array of ``np.double``,
of size :math:`N \cross N`.
u"""Return the EMD between two histograms using the given distance matrix.
The Earth Mover's Distance is the minimal cost of turning one histogram
into another by moving around the “dirt” in the bins, where the cost of
moving dirt from one bin to another is given by the amount of dirt times
the “ground distance” between the bins.
Arguments:
first_histogram (np.ndarray): A 1-dimensional array of type np.float64,
of length N.
second_histogram (np.ndarray): A 1-dimensional array of np.float64,
also of length N.
distance_matrix (np.ndarray): A 2-dimensional array of np.float64, of
size at least N × N. This defines the underlying metric, or ground
distance, by giving the pairwise distances between the histogram
bins. It must represent a metric; there is no warning if it
doesn't.
Keyword Arguments:
extra_mass_penalty: The penalty for extra mass. If you want the
resulting distance to be a metric, it should be at least half the
diameter of the space (maximum possible distance between any two
Expand All @@ -101,10 +124,16 @@ def emd_with_flow(np.ndarray[np.float64_t, ndim=1, mode="c"] first_signature,
used.
Returns:
(float, list(float)): The EMD value and the associated minimum-cost flow.
(float, list(list(float))): The EMD value and the associated
minimum-cost flow.
Raises:
ValueError: If the length of either histogram is greater than the
number of rows or columns of the distance matrix, or if the histograms
aren't the same length.
"""
validate(first_signature, second_signature, distance_matrix)
return emd_hat_gd_metric_double_with_flow_wrapper(first_signature,
second_signature,
validate(first_histogram, second_histogram, distance_matrix)
return emd_hat_gd_metric_double_with_flow_wrapper(first_histogram,
second_histogram,
distance_matrix,
extra_mass_penalty)
5 changes: 2 additions & 3 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,10 +98,9 @@ def no_cythonize(extensions, **_ignore):
'Natural Language :: English',
'License :: OSI Approved :: MIT License',
'Programming Language :: Python',
'Programming Language :: Python :: 2',
'Programming Language :: Python :: 2.7',
'Programming Language :: Python :: 3',
'Programming Language :: Python :: 3.3',
'Programming Language :: Python :: 3.4',
'Programming Language :: Python :: 3.5',
'Programming Language :: Python :: 3.6'
],
)
7 changes: 7 additions & 0 deletions tox.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
[tox]
envlist = py{27,36}

[testenv]
deps = pytest
commands = make test
whitelist_externals = make

0 comments on commit ef08133

Please sign in to comment.