Skip to content

This is a python implementation of Query By Committee Made Real by Gilad-Bachrach et al.

Notifications You must be signed in to change notification settings

berryweinst/kqbc.python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

kqbc.python

This is python implementation for this paper: https://dl.acm.org/citation.cfm?id=2976304 \

Package content:

  • test_synth.py - the test that runs the synthetic data experiment
  • kqbc.py - the python functions converted from matlab code
  • matlab/KQBC.m - matlab original code from the paper
  • matlab/hit_n_run.m - matlab original code from the paper

Data

Synthetic data

The synthtetic data test executes the KQBC algorithm for learning a linear classifier in a d-dimensional space. The target classifier is the vector w∗ = (1, 0, . . . , 0) thus the label of an instance x ∈ IRd is the sign of its first coordinate. The instances were normally distributed N (µ = 0, Σ = Id).

Mnist data

The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image. For out experiment I degenerated the dataset into 2 classes (1,-1), which are one of the figures and all the rest, and balanced between them by eliminating most of the other classes samples. The final data was a sub version of the original one, consinting of 1000 training examples for each class, so 2000 in total, and 200 for each test class in the same manner.

Running the test:

Flags:

  • steps - defines the number of repetitions of the experiment for statistical significance.
  • dim - the dimention of the samplse and classifiers
  • samples - a list of number of samples to train on
  • hnr_ - number of hit and run walks inside the polytope

Command line example:

python test_synth.py --steps 20 --dim 5 --hnr_iter 21 --plot

Dependencies:

Python
matlab
numpy
matplotlib

Results:

These plots describe the error rate on the test set vs the number of samples the classifier was trained on.

  • SVM - random k entries
  • KQBC - using the classifier (point in the version space) after T steps (using dot product)

Synthetic data (5/15 dimensions):

Log scale

Linear scale

Mnist data (only 2 catagories - 1 class vs all):

Log scale

Linear scale

It is clear that we get exponential improvenet when the classifier is extracted from KQBC algorithm.

About

This is a python implementation of Query By Committee Made Real by Gilad-Bachrach et al.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published