The documentation for itoken is silent about the data structure that is returned. It appears to be an R6 object with a few public functions and variables, but I cannot figure out what they are.
For context, I am trying to create one-hot encoded (long-vector) word embeddings for teaching/demonstration purposes. More specifically I want
- load texts, create vocabulary
- transform words to the corresponding one-hot encoded vectors
- combine nearby words into corresponding word embeddings (using one-hot vectors).
In a sense, this is equivalent to working with a DTM where each document is an individual word. As such DTM easily get's large, I am trying to find a way to iterate over individual words.
The documentation for
itokenis silent about the data structure that is returned. It appears to be an R6 object with a few public functions and variables, but I cannot figure out what they are.For context, I am trying to create one-hot encoded (long-vector) word embeddings for teaching/demonstration purposes. More specifically I want
In a sense, this is equivalent to working with a DTM where each document is an individual word. As such DTM easily get's large, I am trying to find a way to iterate over individual words.