Skip to content

Latest commit

 

History

History
39 lines (27 loc) · 1.93 KB

README.md

File metadata and controls

39 lines (27 loc) · 1.93 KB

Tracking Hackathon Code Usage

Replication Package for "The Secret Life of Hackathon Code Where does it come from and where does it go?"

Only the scripts are available in the replication package. The underlying dataset used in the study is the World of Code (WoC) dataset, and it is too large to share here. Please check the WoC Tutorial for details of how to access and use the dataset.

Data Processing Steps Listed below using WoC:

Command for getting required data from WoC

Get Project to Commit maps (p2c):

$ cat hack_projects | ~/lookup/getValues -f p2c > p2c.csv

Get Commit to Blob maps (c2b):

$ cat p2c.csv | cut -d\; -f2 | sort -u | ~/lookup/getValues -f c2b > c2b.csv

Get Blob to Author maps (b2a):

$ cat c2b.csv | cut -d\; -f2 | sort -u > Bs $ cat Bs | ~/lookup/getValues -f b2a | awk -F\; '{OFS=";"; $2=strftime("%Y-%m-%d %H:%M:%S", $2); print $0}' > b2a.csv

Get Project to author maps (p2a):

$ cat hack_projects | ~/lookup/getValues p2a | sed -e 's/;/,/g' -e 's/,/;/1' > p2a.csv

Get Blobs to commit maps (b2c):

$ cat Bs | ~/lookup/getValues -f b2c > b2c.csv

Get Commit to Project map (c2P):

$ cat b2c.csv | cut -d\; -f2 | sort -u -T. > CsAll $ cat CsAll | ~/lookup/getValues -f c2P > c2P.csv

Get Commit to Timestamp/Author map (c2ta):

$ cat CsAll | ~/lookup/getValues c2ta | awk -F\; '{OFS=";"; $2=strftime("%Y-%m-%d %H:%M:%S", $2); print $0}' > c2ta.csv