-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Trees
See also: examples/trees/id3/trees.go
.
ID3 is a decision tree induction algorithm which splits on the Attribute which gives the greatest Information Gain (entropy gradient). It performs well on categorical data. Numeric datasets will need to be discretised before using ID3 (see Filtering).
CART is a decision tree algorithm that iteratively splits on the Threshold and Attribute which gives the largest Reduction in Loss (Gini or Entropy). Categorical Attributes (even target class) have to be converted to Float Attributes before using the algorithm. The Tree can perform Regression or Classification and the Loss function should be chosen accordingly (Gini or Entropy for Classification. MSE or MAE for Regression).
Example: examples/trees/cart/cart.go
.
Random Trees are structurally identical to those generated by ID3, but the split Attribute is chosen randomly. Golearn's implementation allows you to choose up to k nodes for consideration at each split.
Random forests are a bagged ensemble technique which combines multiple Random Trees (ID3 algorithm) and outputs a classification via a majority vote.
Isolation Forest is an outlier detection algorithm that works by splitting the data with random Thresholds and random Attributes. The outliers should be isolated in less splits than the normal data. This is an unsupervised learning algorithm so all Class Attributes will be treated as Data Attributes (used for training). Golearn's implementation returns the Anamoly Score for each Instance when calling predict.
Example: examples/trees/isolationForest/isolation_forest.go
.