This project has 6 models - Linear Regression, Polynomial Regression, Logistic Regression, K- Nearest Neighbors, K- Means Clustering, Neural Networks.
I used Google Colaboratory to write my code. The language used is Python. Various python libraries like Numpy, Pandas and Matplotlib are also used.
It is used to predict Y values using features(X1, X2…). This is what I did.
I first used normal equation to predict Y values. Then I implemented gradient descent.
I made a function first to train my model and got theta values.Cost function values were plotted with no. of iterations. Cost converged till 1907 for 5000 iterations. Then theta was used to get Y values for my testing data.
Then I calculated Root Mean Squared Error by comparing it with the given testing Y values.
It is used to predict Y values using given features and their nth degree. I used my knowledge of linear regression to write its function. I started with degree n=2 and tried it till n=4. I got maximum accuracy for n=3. Then I made a function to train my model and get theta values. Cost function values were plotted with no. of iterations. Cost value came out to be around 59 for 90000 iterations.
Then predicted Y for testing data using theta. Subsequently, Root Mean Squared Error was calculated.
It is used to model the probability of discrete outcomes and tell that the given data belongs to which category. We have MNIST Dataset for this.It uses sigmoid function to model. My system was crashing again and again. So, I used only 62000 training examples. First, we had to do one hot encoding on test and train labels. I made a function to train my data and get theta values. I plotted cost function values with no. of iterations. Cost value came out to be around 1.7 for 3000 iterations.
Then multiply that theta matrix with the testing data and then took sigmoid of it to get hypothesis values. Then the value with maximum probability was taken to be the predicted value. Then I compared it with the given labels to get accuracy.
For accuracy, I equated given and predicted values and counted the no. of zeros. This gave me the number of correctly predicted values.
We are given some testing data, which classifies coordinates into groups identified by an attribute. Now given an unclassified point, we can assign it to a group by observing its nearest neighbors. We have MNIST Dataset for this.
It is used to predict the correct class for the test data. I used only 62000 training examples and 5000 testing examples because my system was not supporting more than this and also the computational time was really high. My model first calculates the distance of each testing data with training data. Then it checks the k nearest points to predict class. The predicted class is compared with the given values to get accuracy. For accuracy, I used the same idea I used in logistic regression.
We are given a data set of items with certain features. The algorithm will group them into k groups or clusters based on the similarity between them. We have MNIST Dataset for this. It uses only 60000 examples.
Firstly, I plotted a graph of WCSS versus k (WCSS is the sum of squared distances between each point and centroid of that cluster).
It showed a dip for k=30.
So, I made 30 clusters for the dataset. First, we took random points to be the center of our clusters. Then, calculate the distance of each center with other points. The point having minimum distance with a center was assigned to the cluster belonging to that center. Then, new centers were made by taking the mean of points of that cluster. Then this process is repeated till I get the same value for centers.
Then I gave each cluster a label by using the given values for our data Y. Though it is an Unsupervised learning algorithm and we are not supposed to calculate accuracy. Still, I wanted to check my code. So, I calculated accuracy. You can ignore that if you want.
It involves neurons, connections, weights, biases, propagation function, and a learning rule. The learning rule modifies the weights and biases of the variables in the network.
We have MNIST Dataset for this. This model tries to predict outputs for the given inputs. It consists of 1 input layer, 1 hidden layer and 1 output layer. My system was crashing again and again. So, I used only 60000 training examples. I initialized thetas and biases, then use Forward and Back propagation to get optimum values. The cost value came out to be around 87 for 5000 iterations. Those values of thetas and biases were used to get predicted values for testing data. Then I calculated the accuracy using the same idea I used in Logistic Regression.