Skip to content

Collocation visualisation #1968

@lukavdplas

Description

@lukavdplas

The People & Parliament group would like a new visualisation to show collocates of the search term. Here is a description from Risto:

IMO, ideally, the collocation functionality would work like this: the user wants to see which words appear in a window of x words from the search term y more frequently than statistically expected. For example: the user searches for the term “democracy” in the UK parliamentary data from the years 1919-1939, using the window of 5 words to the left and right from the search word. The user can choose the metric for counting the collocates, e.g., raw co-occurrences OR mutual information. [...]

In Europe, CLARIN is offering this functionality for modern parliamentary data in their NoSketchEngine, I think they have done it very well (I do not know the technical details behind, should be open source though): https://www.clarin.si/ske/#open

Another famous example for the collocation functionality is the AntConc: https://www.youtube.com/watch?v=s0N-89xI23Y&list=PLcAJNy32_1Z8mH_g7LZ7YbGlhM3iMDU2u&index=8

For Textcavator, my suggestion was to adapt the ngram visualisation. The logic would be pretty much the same; the difference is that from each context window, the n-gram counter selects sequences of words, and the collocation counter would count single words. (In fact, the ngram visualisation was originally built for collocations.) The joyplot layout should also work fine here.

Agreed to build a demo version based on the ngram visualisation. Not sure whether this should be presented as a new visualisation type, or just an extra option in the ngram visualisation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    new featureadds new functionality for usersvisualisationchanges to visualisation features

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions