Skip to content

LANDMarkClassifier Version 2.1.0

Compare
Choose a tag to compare
@jrudar jrudar released this 12 Jul 20:42
· 13 commits to main since this release
99f5b05
  • 'use_cascade' parameter is now active. This parameter extends X usng the results of the decision function at each node. These new features are then used alongside the original features during the training (and prediction) steps within each node in the tree.
  • Updated API
  • Updated tests
  • Added notebooks on simple and advanced usage
  • Added notebooks demonstrating the effects of some important parameters
  • Changed code in linear models so that 'y_min' is equal to 80% of minimum count of the bootstrapped resampled data
  • 5-Fold Stratified Cross-Validation is now explicitly stated to be used for linear models
  • Use of a neural network model to split a node now requires that at greater than 'minority_sz_nnet' samples of the minority class is present in the bootstrapped resampled data.
  • 'minority_sz_lm' controls how many samples must be present to split using a linear model
  • Autodetection of sparse matrix and conversion to CSR format if sparsity >= 90% sparsity
  • New way to create high-dimensional embedding for calculation of dissimilarities using the proximity() function: "path". By using this parameter a binary matrix containing all nodes visited by sample is created rather than just the terminal nodes. Original method is now "terminal".
  • Initial layer of neural network model uses 'mish' activation function and other small changes to the architecture of the network
  • Replaced the _get_node_ids() function in tree.py with _get_all_nodes(). This is done so that the new embedding approach can be implimented
  • Updated version (2.1.0) due to new non-breaking feature

What's Changed

  • Preservation of Proximity Information Within LANDMark Trees by @jrudar in #13

Full Changelog: LANDMarkClassifier-v.2.0.7...LANDMarkClassifier-v.2.1.0