Skip to content

Featurization

faridani edited this page Jan 24, 2012 · 1 revision
  • The code main.m generates a cell array for you and stores it in the "descriptions" variable
  • for each cell in descriptions do this
    • pass it through commentSanitizer
    • initialize global hashmap g = containers.Map()
    • for each word in the resulting string do this
      • pass the word to porterStemmer
      • if the word is not in your hashmap for that cell add it, if it is just add one to its value. Do the same thing for the global hashmap
  • Take the values in your g that have value>=n (say 3) and store those keys as a header
  • for each hashmap of a cell featurize it using headers (for example if header is {'book','note','I'} a comment like "I love my book" should be [1,0,1]
  • store the matrix of the oroginal featurized cell array in a csv file by using csvwrite
Clone this wiki locally