My study&review notes on machine learning.
code enviroment:
Python 3.6.5 | Tensorflow 1.13.1 | Pytorch 1.2.0 | Sci-kit 0.22.1 | Keras 2.2.4
20-02-29 updates:
- Logistic Regression Algorithm & Implementing with numpy.
20-03-02 updates:
- Linear Discriminant Analysis & Implementing with numpy.
- Principal Component Analysis & Implementing with numpy.
- Both PCA and LDA are methods of reducing feature dimensions.
- LDA is a supervised method while PCA is unsupervised.
- LDA can be used as classification method.
- PCA cares about the principal features of datas while LDA cares about seprating each categories.
- Both eigenvalue decomposition and singluar value decomposition can be used in PCA or LDA.
- Better centeralizing the datas while using PCA.
20-03-04 updates:
- Decision Tree & Implementing with numpy.
- Implemented ID3.
- Information entropy, conditional entropy, information gain, information gain ratio.
- Recursively building decision tree and pruning.
20-03-05 updates:
- Neural Network & Implementing with numpy.
- Implemented basic fully connected neural network with numpy.
- Sigmoid activation function only, updates in subsequent version.
- newly updated model architecture: tanh -> tanh -> ... -> sigmoid
- Basic back propagation algorithm only, updates in subsequent version.
- Choose hidden layers, hidden units, epochs and batch size artificially before start training.
20-03-08 updates:
- Support Vector Machine & Implementing with numpy.
- Implementing a SVC with soft margin and kernel funcion (linear kernel, RBFkernel).
- The implementing of SMO algorithm in this project can be further optimized.
- The mathematical principals in SVM and formulas derivation.
- Naive Bayes Classifier & Implementing with numpy.
- Calculate discrete features by statistics.
- Calculate continuous features by Gaussian distribution.
- Predict test sample by calculating argmax{c} p(c)∏p(x|c).
20-03-09 updates:
- Clustering algorithm K-means & Implementing with numpy.
- Distance between tow vectors, p-norm or cosine similarity.
- K-Nearest Neighbors & Implementing with numpy.
20-03-12 updates:
- EM Algorithm
- Gaussian Mixture Model (in EMAlgorithm.py)
- The mathematical principals are as follow:
- updated EMAlgorithm.py
20-03-15 updates:
- Hidden Markov Model
- Viterbi algorithm.
20-03-16 updates:
- Chinese part of speech tagging by HMM
20-03-18 updates:
- word2vec skip-gram&cbow model
- skip-gram: given a center word, to predicting its context words;
- cbow: given a set of context words, to predicting the center word;
- datasets: "词性标注@人民日报199801.txt";
- the result below seems not very accurate due to lack of high-quality training data or insufficient training process;
- quality of word vectors is dependent on dataset, pre-processing and training setups.












