├── README.md
├── run.sh
├── src
│ ├── average_degree.py
│ └── tweets_cleaned.py
├── tweet_input
│ └── tweets.txt
└── tweet_output
├── ft1.txt
└── ft2.txt
import json
for parsing jsonimport time
for parsing the timestampfrom datetime import datetime
for calculating the time difference between 2 timestamps
- Run the
run.sh
script with./run.sh
(may need to runchmod +x run.sh
) - Results will appear in
tweet_output
directory ft1.txt
is part 1, cleaned tweetsft2.txt
is part 2, average degree calculations for the tweets
Arguments | Description |
---|---|
input file |
The location of the input file. |
output file |
The location of the output file. |
- Part 1:
python tweets_cleaned.py <input file> <output file>
- Part 2:
python average_degree.py <input file> <output file>
python src/tweets_cleaned.py tweet_input/tweets.txt tweet_output/ft1.txt
- Go through all the tweets in
tweet_input/tweets.txt
- parse the json for
text
andcreated_at
- clean the unicode and remove
\n
and\r
- for each tweet with a unicode, increment the counter
- write the cleaned tweet with timestamp in format
tweet (timestamp)\n
into filetweet_output/ft1.txt
- at the end write
X tweets contained unicode.
, whereX
is the number of tweets that contains unicode
- Use the file
tweet_output/ft1.txt
created from part 1 to calculate the average degree - Go through each tweet, and parse the hashtags and the timestamp
- insert into the
DegreeGraph
Class, which handles the insertion of hashtags, evicting hashtags that are not in the 60 second window, and also does the average degree calculations for each tweet added - write the average degree for each tweet to the file
tweet_output/ft2.txt