In classification problems we split input examples by certain characteristic.
Usage examples: spam-filters, language detection, finding similar documents, handwritten letters recognition, etc.
For classifying an object into 1 of the pre known catagories. The algo will look at the K nearest objects, and based on those will assign the category of the majority of the neighbor objects. it is important to select the right K. for 2 catagories, K shouldnt be =2, because then there might be a deadlock, same as for 3 categories, K shouldnt be =3. the KNN model will not only output a classification, but it will also give an accuracy % of the model and an confidence % for each classification. so if we had 2 categories A,B, and K=3, if the model tries to classify X, and finds nearest 3 neighbors are A,A,B. then classification is A with confidence of 66%. This is not the most afficient algo because each classification will recalculated distances of all points in data set. SVM is much more scalable.
In regression problems we do real value predictions. Basically we try to draw a line/plane/n-dimensional plane along the training examples.
Usage examples: stock price forecast, sales analysis, dependency of any number, etc.
To predict whether a person will buy a car (1) or (0) To know whether the tumor is malignant (1) or (0)
The Algo finds the clusters (labels) on its own without the scientist feeding them to the model first.
Usage examples: market segmentation, social networks analysis, organize computing clusters, astronomical data analysis, image compression, etc.
[Flat clusteting] providing the model with a dataset and asking it to sepparate the dataset into K number of groups
[Hierarchical clustering] providing the model with a dataset and asking it to sepparate the dataset into groups, telling us how many groups there are, and what they are