This is a repository for the paper 'Unsupervised Document Clustering with Cluster Topic Identification' to be published in the Office for National Statistics' working paper series.
The paper details a pipeline that can be used to cluster documents in an unsupervised way with a suggested automated procedure for identifying a cluster's topic.
There is an accompanying Jupyter notebook taht details an example of the pipeline as set out in the paper.