Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API design #13

Open
Fil opened this issue Oct 21, 2020 · 5 comments
Open

API design #13

Fil opened this issue Oct 21, 2020 · 5 comments

Comments

@Fil
Copy link
Contributor

Fil commented Oct 21, 2020

With the current API, if one wants to project in d=3, one has to know the exact number n of optional arguments before specifying 3 as the n+1th argument. This feels a bit uneasy, and it means that we can't add a supplementary hyperparameter to any method without it being a breaking change.

It seems to be that it would be nice to rethink the API "à la D3", so that:

  • all the algorithms can be called interchangeably
  • we could separate the training and transform phases (learn then transform? #11)
  • we could specify hyperparameters individually
  • we could serialize the model (in and out : save and load)

I would imagine that this could be structured as:

  • new Druid([method or model]) — create a druid
  • druid.values([accessor]) — sets the values accessor if specified, and returns the druid; return the values accessor if not specified
  • druid.dimensions([number]) — sets or returns the dimensions (default: 2)
  • druid.class([accessor]) — sets or returns the class accessor (for LDA)
  • druid.method([name or class]) — sets the current method (UMAP, FASTMAP etc) if specified and returns the druid ; if not specified, return the method (as a Class or function).
  • druid.fit(data) — train the model on the data and returns the druid
  • druid.transform([data]) — transforms the data if specified; if data is not specified, returns the transformed train set
  • druid.model([model]) — returns the serialized model (JSON) if a model is not specified, loads the model if specified

And for each hyperparameter, for example UMAP/min_dist

  • druid.min_dist([min_dist]) — if specified, sets the min_dist hyperparameter and returns the druid, or read it if not specified

With this we could say for example:

const dr = new Druid("LDA"); // dr
dr.dimensions(2).class(d => d.species).values(d => [+d.sepal_length, +d.petal_length, ]).fit(data); // dr
dr.transform(); // transformed data
const model = dr.model(); // JSON {}

const dr = new Druid(model); // dr
dr.transform([new data]); // apply the model to new data…

I wonder what should be done for NaN, I suppose they should be automatically ignored if the values accessor returns any NaN.

Note also that some methods such as UMAP can accept a distance matrix instead of a data array.

PS: Sorry for spamming your project :) The potential is very exciting.

@Fil
Copy link
Contributor Author

Fil commented Oct 22, 2020

Update: changed train to fit in order to match sklearn API

@saehm
Copy link
Owner

saehm commented Oct 23, 2020

You are right, with the current API you have to know the parameters. The idea was, that if you want to change the dimensionality, or the used metric you have to know what you are doing anyways. But, a DR object already has a function druid.parameter("parameter_name", [parameter_value]) - (with two aliases "para" and "p") where you can set a parameter (which is chainable, similar to d3's attr function). But it would be no problem to use getters and setter. Checking the parameters would be probably easier that way.

Druid has already some other things implemented, some clustering-, k-NN-, and linear algebra implementations --- some of them doesn't work that well yet ;). Therefore we could maybe change the DR constructor to take a String for the name of the DR method for example
const dr = new druid.DR("LDA");. (For now it works if you use const dr = new druid["LDA"];)

I like the values function very much, we should add this :), also the function to set dimensionality and additionally one for changing the metric function.

As I mentioned in issue #11 a fit or train method will not work with most of the DR methods. Maybe we could add it for those DR methods where it works?

@Fil
Copy link
Contributor Author

Fil commented Oct 23, 2020

Ah I hadn't seen the chainable .parameter method, now I see it!

new Druid.UMAP(data).parameter("min_dist", 2).transform()

however it feels a bit strange to parametrize after adding the values. And it seems you can't use .parameter("d", 3) to change the dimensionality?

you have to know what you are doing anyways

I disagree :) I love to learn by testing things out, and it's frustrating if they break for no apparent reason. You can see that in the "hello" notebook: it needs quite a bit of code to inject the default values. And if you're trying to go 3D, they are not optional.

PS: I admit I haven't paid attention yet to the clustering methods (and others); my comments so far are meant only for the DR methods of the API. But I'm curious about them and waiting for some examples or documentation to appear :)

@Fil
Copy link
Contributor Author

Fil commented Oct 28, 2020

I would also very much like all DR methods to return a generator, even if it only yields one final result. Otherwise we have to do something like this in user space:

return typeof D.generator === "function" ? D.generator() : D.transform();

Ref. https://observablehq.com/@fil/druidjs-worker

EDIT: solved in 0.7.3

@hydrosquall
Copy link

I would find having a uniform way to set the dimensions parameter across all algorithms very helpful, as I came to this issues page specifically to find out whether that was possible, as I wanted to try projecting to 3D.

I didn't see a d parameter available in the ObservableHQ notebook example - is this a capability that's currently available but not documented, or something new to implement?

@Fil Fil mentioned this issue Aug 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants