Skip to content
/ code Public

Some collection of codes that are used in data mining and data science related fields, developed by me

Notifications You must be signed in to change notification settings

dwipam/code

Repository files navigation

ReadMe for this Branch.

Some collection of codes that are used in data mining and data science related fields, developed by me (Data Science, Indiana University):

Artificial-Intelligence: This folder contains programs in python, where I implemented KNN, Neural Nets, BFS, DFS, A*, Naive Baye's, HMM Viterbi, MCMC Gibs Sampling algorithms. The description of every program is returned above the specified program itself. Please check File to run program for each

  1. Image Classifier -
    File to run - orient.py
    Models used - Neural Nets, KNN
    Train_data - train-data.txt
    test_data - test-data.txt

  2. Maps -
    File to run - route.py
    City Data - road-segments.txt
    A* data - city-gps.txt

  3. Parts Of Speech tagger -
    File to run - pos_solver.py
    Train_data - bc.train
    Test_data - bc.test

  4. Zacate_Auto_Player -
    File to run - zacate.py

  5. Solver_16 -
    File to run - solver16.py
    input_matrix_data - input

Algorithms:

  1. Selection Sort - selectionsort.java
  2. Quick sort - quicksort.java
  3. Merge Sort - mergersort.java
  4. Least Commmon Subsequence - LCS.java
  5. Huffman coding - Huffman.py
  6. Heap Sort - HeapSort.java
  7. Dijkstra path finding - dijkstra.py
  8. DFS - dfs.py (recurssion)
  9. Binary Search Tree - BinarySearchTree.java

Data Mining:

  1. Kmeans - kmean_test.R (Implementaion of K-means Algorithm, with number of clusters value(k), tow,l, where l is the number of points the data to be allocated to.
    Data - http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/
  2. K-L distance - kl.R (Calculates the KL distance)
  3. Data_mining/BUS_decoders/BUS_decoders/Code - has all the codes related to the project, for cleaning, merging the data.
    Please check Readme_Data.txt, Readme_code.txt and Report.pdf

Machine Learning(Self Implementations):

  1. Linear Regression -
    ml_assign_1.py
  2. Ridge regression -
    self_implement/rig_regression.py
  3. Lasso regression -
    lass.py
  4. Time series -
    predict_18april_2may.R
  5. Bagging and Boosting(Adaboost) -
    mytree.py
  6. Decision Tree -
    mytree.py

Practice folder is for the coding that I do in my spare time.

Exploratory Data Analysis :- In depth analysis before building predictive model. After clicking on .html file, insert http://htmlpreview.github.com/? before the URL, for example http://htmlpreview.github.com/?https://github.com/dwipam/code/blob/master/EDA/s670-04.html

Bayesian A/B test :- Farm and multi-armed bandit problem simulation

Distribution by Technologies:-
Python - Check for Artificial Intelligence Folder, dijkstra.py, dfs.py and practice folder
R - Check for Data Mining Folder
JAVA - Check for Algorithm folder and Data Mining- BetterCode.java and practice folder

Challenges:-
Noctober - Check model.ipnyb within Noctober Folder. Placed 3 winner on AnalyticsVidhya competition.
Telstra - Check Telstra.ipnyb within Telstra challenge.
Attribution - http://htmlpreview.github.io/?https://github.com/dwipam/code/blob/master/AttributionChallenge/Model.html

If this readme is not understandable, write to:
ddkatari@iu.edu
dwipam.katariya@gmail.com