Skip to content

Py DS_Engineer Lab Report #06

Amy Lin edited this page Jul 20, 2017 · 21 revisions

Python Programming for Data Scientists & Engineers Lab #06

Lab #06-1 Linear Regression in Tensorflow




Lab #06-1-2 K-Mean Clustering



A small dataset ( 23 people ) with their names, heights and weights is used in this case. For siplicity on clustering a fiarly small dataset, one iteration of K-mean Clustering was simutated throughout the process into 4 Clusters. The labeling will be assigned back to the data so each person will know what size of the T-shirt they're having! And for the company, they'll be able to determine the quantity and size range based on customers' weights and heights.


Lab #06-2 Spectral + Hierarchical Clustering

Spectral Clustering a.k,a. Graphic Clustering

For social data, a graph formed by distances of points will be induced.The Spectral Clustering will then look at eigenvectors of the Laplacian of the graph to attempt to find a good (low dimensional) embedding of the graph into Euclidean space.

This technique is to find a transfornation of the graph to present manifold thathe the data is assumed to land on.

* Weaknesses : Partitioning is still polluting data with noise.

* Intuitive Parameters : Clustering number must be specifyour or hopefully find a 'suitabele' one through a range of parameters.

* Stability : A little bit more stable than K-mean due to the transformation but still suffer from some issues.

* Performance : A slower algorithm since spatial data don't have a sparse grpah ( unless we prep it by purselves).

Hierarchical Clustering

Clone this wiki locally