BigML makes machine learning easy by taking care of the details required to add data-driven decisions and predictive power to your company. Unlike other machine learning services, BigML creates beautiful predictive models that can be easily understood and interacted with.
These BigML Node.js bindings allow you to interact with BigML.io, the API for BigML. You can use it to easily create, retrieve, list, update, and delete BigML resources (i.e., sources, datasets, models and predictions).
This module is licensed under the Apache License, Version 2.0.
Please report problems and bugs to our BigML.io issue tracker.
Discussions about the different bindings take place in the general BigML mailing list. Or join us in our Campfire chatroom.
Node 0.10 is currently supported by these bindings.
The only mandatory third-party dependencies are the request, winston and form-data libraries.
The testing environment requires the additional mocha package that can be installed with the following command:
$ sudo npm install -g mocha
To install the latest stable release with npm:
$ npm install bigml
You can also install the development version of the bindings by cloning the Git repository to your local computer and issuing:
$ npm install .
The test suite is run automatically using mocha
as test framework. As all the
tested api objects perform one or more connections to the remote resources in
bigml.com, you may have to enlarge the default timeout used by mocha
in
each test. For instance:
$ mocha -t 20000
will set the timeout limit to 20 seconds. This limit should typically be enough, but you can change it to fit the latencies of your connection.
To use the library, import it with require
:
$ node
> bigml = require('bigml');
this will give you access to the following library structure:
- bigml.constants common constants
- bigml.BigML connection object
- bigml.Resource common API methods
- bigml.Source Source API methods
- bigml.Dataset Dataset API methods
- bigml.Model Model API methods
- bigml.Ensemble Ensemble API methods
- bigml.Prediction Prediction API methods
- bigml.BatchPrediction BatchPrediction API methods
- bigml.Evaluation Evaluation API methods
- bigml.Cluster Cluster API methods
- bigml.Centroid Centroid API methods
- bigml.BatchCentroid BatchCentroid API methods
- bigml.Anomaly Anomaly detector API methods
- bigml.AnomalyScore Anomaly score API methods
- bigml.BatchAnomalyScore BatchAnomalyScore API methods
- bigml.Project Project API methods
- bigml.Sample Sample API methods
- bigml.Correlation Correlation API methods
- bigml.StatisticalTests StatisticalTest API methods
- bigml.LogisticRegression LogisticRegression API methods
- bigml.Association Association API methods
- bigml.Script Script API methods
- bigml.Execution Execution API methods
- bigml.Library Library API methods
- bigml.LocalModel Model for local predictions
- bigml.LocalEnsemble Ensemble for local predictions
- bigml.LocalCluster Cluster for local centroids
- bigml.LocalAnomaly Anomaly detector for local anomaly scores
- bigml.LocalLogisticRegression Logistic regression model for local predictions
- bigml.LocalAssociation Association model for associaton rules
All the requests to BigML.io must be authenticated using your username and API key and are always transmitted over HTTPS.
This module will look for your username and API key in the environment
variables BIGML_USERNAME
and BIGML_API_KEY
respectively. You can
add the following lines to your .bashrc
or .bash_profile
to set
those variables automatically when you log in::
export BIGML_USERNAME=myusername
export BIGML_API_KEY=ae579e7e53fb9abd646a6ff8aa99d4afe83ac291
With that environment set up, connecting to BigML is a breeze::
connection = new bigml.BigML();
Otherwise, you can initialize directly when instantiating the BigML class as follows::
connection = new bigml.BigML('myusername',
'ae579e7e53fb9abd646a6ff8aa99d4afe83ac291')
Also, you can initialize the library to work in the Sandbox environment by
setting the third parameter devMode
to true
::
connection = new bigml.BigML('myusername',
'ae579e7e53fb9abd646a6ff8aa99d4afe83ac291',
true)
Let's see the steps that will lead you from this csv
file containing the Iris
flower dataset to
predicting the species of a flower whose sepal length
is 5
and
whose sepal width
is 2.5
. By default, BigML considers the last field
(species
) in the row as the
objective field (i.e., the field that you want to generate predictions
for). The csv structure is::
sepal length,sepal width,petal length,petal width,species
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
...
The steps required to generate a prediction are creating a set of source, dataset and model objects::
var bigml = require('bigml');
var source = new bigml.Source();
source.create('./data/iris.csv', function(error, sourceInfo) {
if (!error && sourceInfo) {
var dataset = new bigml.Dataset();
dataset.create(sourceInfo, function(error, datasetInfo) {
if (!error && datasetInfo) {
var model = new bigml.Model();
model.create(datasetInfo, function (error, modelInfo) {
if (!error && modelInfo) {
var prediction = new bigml.Prediction();
prediction.create(modelInfo, {'petal length': 1})
}
});
}
});
}
});
Note that in our example the prediction.create
call has no associated
callback. All the CRUD methods of any resource allow assigning a callback as
the last parameter,
but if you don't the default action will be
printing the resulting resource or the error. For the create
method:
> result:
{ code: 201,
object:
{ category: 0,
code: 201,
content_type: 'text/csv',
created: '2013-06-08T15:22:36.834797',
credits: 0,
description: '',
fields_meta: { count: 0, limit: 1000, offset: 0, total: 0 },
file_name: 'iris.csv',
md5: 'd1175c032e1042bec7f974c91e4a65ae',
name: 'iris.csv',
number_of_datasets: 0,
number_of_ensembles: 0,
number_of_models: 0,
number_of_predictions: 0,
private: true,
resource: 'source/51b34c3c37203f4678000020',
size: 4608,
source_parser: {},
status:
{ code: 1,
message: 'The request has been queued and will be processed soon' },
subscription: false,
tags: [],
type: 0,
updated: '2013-06-08T15:22:36.834844' },
resource: 'source/51b34c3c37203f4678000020',
location: 'https://localhost:1026/andromeda/source/51b34c3c37203f4678000020',
error: null }
The generated objects can be retrieved, updated and deleted through the corresponding REST methods. For instance, in the previous example you would use:
bigml = require('bigml');
var source = new bigml.Source();
source.get('source/51b25fb237203f4410000010' function (error, resource) {
if (!error && resource) {
console.log(resource);
}
})
to recover and show the source information.
You can also generate local predictions using the information of your models::
bigml = require('bigml');
var localModel = new bigml.LocalModel('model/51922d0b37203f2a8c000010');
localModel.predict({'petal length': 1},
function(error, prediction) {console.log(prediction)});
And similarly, for your ensembles
bigml = require('bigml');
var localEnsemble = new bigml.LocalEnsemble('ensemble/51901f4337203f3a9a000215');
localEnsemble.predict({'petal length': 1}, 0,
function(error, prediction) {console.log(prediction)});
will generate a prediction by combining the predictions of each of the models
they enclose. The example uses the plurality
combination method (whose code
is 0
. Check the docs for more information about the available combination
methods).
We've just drawn a first sketch. For additional information, see the files included in the docs folder.
Please follow these steps:
- Fork the project on github.com.
- Create a new branch.
- Commit changes to the new branch.
- Send a pull request.
For details on the underlying API, see the BigML API documentation.