Dealing with missing data #12

LarsOL · 2016-12-16T01:34:27Z

I am contemplating using LSH in my application, but I am unsure how to deal with absent/missing data in a vector. The nearest neighbor imputation implies that this type of algorithm deals with this scenario, but how would I go about implementing it?

tdebatty · 2016-12-21T07:45:31Z

Hi! I never had to test this, but my guess would be providing default values... Le ven. 16 déc. 2016 02:34, Lars Lawoko <notifications@github.com> a écrit :

…

I am contemplating using LSH in my application, but I am unsure how to deal with absent/missing data in a vector. The nearest neighbor imputation implies that this type of algorithm deals with this scenario, but how would I go about implementing it? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#12>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA1SDJ8n5yhHkpojA044HXTz_gwhv3SCks5rIeqjgaJpZM4LOwp3> .

LarsOL · 2016-12-21T09:32:06Z

The main issue I see with providing a default value is that; wouldn't the values be artificially clustered around those "default" values that seem valid for the algorithm ? Random data may work, but then it is not deterministic.

Ideally what would happen is you can ignore a dimension if there is not a value in it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dealing with missing data #12

Dealing with missing data #12

LarsOL commented Dec 16, 2016

tdebatty commented Dec 21, 2016 via email

LarsOL commented Dec 21, 2016

Dealing with missing data #12

Dealing with missing data #12

Comments

LarsOL commented Dec 16, 2016

tdebatty commented Dec 21, 2016 via email

LarsOL commented Dec 21, 2016