Looks like all the `https://t.co/xyz` tokens get caught in, as well as non-significant words (eg: ve, si, ch). Last items have been maybe once or twice so not sure it makes them significant as well. <img width="276" alt="screen shot 2017-03-14 at 09 34 02" src="https://cloud.githubusercontent.com/assets/138627/23893897/81773d06-08a0-11e7-95d1-768c2c8291b5.png">