-
Notifications
You must be signed in to change notification settings - Fork 0
Data directory
The data directory of this repository contains seven files.
-
answer_comments.png
andanswer_edits.png
These two PNG files present graphs of how many answers have x number of comments or edits. They were used to motivate our decision of selecting thresholds when creating a sample of answers for the ground truth.
-
example_result.json
This file contains the comment-edit pairs of a single answer in JSON format (4 comment-edit pairs). This JSON object has the same keys as the columns of the
results.csv
that is generated by the program. It is just an example to show the json format. -
general_precision.json
This JSON file contains the 1,910 comment-edit pairs that was used in the paper. These 1,910 pairs are the ones that were manually evaluated by the two authors, which is why this contains values that are not automatically evaluated by the program (e.g., tangled, useful, etc.). This is a Json packaging of the pairs that can be used in subsequent programs. These 1,910 comment-edit pairs were randomly sampled from the results of the program on each language tag, i.e., after running the program on each language tag of Java, JavaScript, Android, Python, and Php individually, we randomly selected 382 comment-edit pairs from each result set.
-
ground_truth.csv
-
This CSV file is the 100 answers (20 from Java, JavaScript, Android, Php, and Python) used in the ground truth of the paper. However, this file has less information compared to the files in the
ground_truth
directory of the results zip. This is because only these columns are used for theeval
command line option of the program. Note that theeval
command line option only evaluates theresults.csv
based on the answer ids and comment ids in theground_truth.csv
. -
The
ground_truth.csv
contains four columns:AnswerIds
,CommentIds
,EditIds
, andUseful
.AnswerIds
are the post ids of the answers to evaluate.CommentIds
are the ids of the comments on each answer.EditIds
are the manual evaluations of which comments and edits are paired.Useful
is a column containing the manual evaluations of whether the comment-edit pair is useful. If useful, this is denoted by the wordyes
.
-
-
pull_requests.csv
This CSV file contains information regarding the 15 pull requests described in the paper.
-
threshold_comparison.csv
This CSV file contains the comparisons of the different fuzzywuzzy thresholds. This threshold is the one that is used when determining if a comment and edit share code, which is how we connect comments and edits together. In the paper and in the
config.ini
this threshold is set to 90%. This csv shows the results of the thresholds from 50%-100% in 10% increments.