This repository contains datasets for paper "Deep Code-Comment Understanding and Assessment".
The public dataset is from this work. The public dataset includes the results of a manual assessment on the coherence between comments and the implementations of 3636 methods, gathered from three open source softwares implemented in Java.
Our labeled dataset is from the Java projects uploaded to GitHub before October 2018.
For each method in our labeled dataset, the structure is shown as belows:
- #No
- #File (the name of the source file)
- #Comment
- #Code