-
Notifications
You must be signed in to change notification settings - Fork 3
Description
The People & Parliament group would like a new visualisation to show collocates of the search term. Here is a description from Risto:
IMO, ideally, the collocation functionality would work like this: the user wants to see which words appear in a window of x words from the search term y more frequently than statistically expected. For example: the user searches for the term “democracy” in the UK parliamentary data from the years 1919-1939, using the window of 5 words to the left and right from the search word. The user can choose the metric for counting the collocates, e.g., raw co-occurrences OR mutual information. [...]
In Europe, CLARIN is offering this functionality for modern parliamentary data in their NoSketchEngine, I think they have done it very well (I do not know the technical details behind, should be open source though): https://www.clarin.si/ske/#open
Another famous example for the collocation functionality is the AntConc: https://www.youtube.com/watch?v=s0N-89xI23Y&list=PLcAJNy32_1Z8mH_g7LZ7YbGlhM3iMDU2u&index=8
For Textcavator, my suggestion was to adapt the ngram visualisation. The logic would be pretty much the same; the difference is that from each context window, the n-gram counter selects sequences of words, and the collocation counter would count single words. (In fact, the ngram visualisation was originally built for collocations.) The joyplot layout should also work fine here.
Agreed to build a demo version based on the ngram visualisation. Not sure whether this should be presented as a new visualisation type, or just an extra option in the ngram visualisation.