Skip to content

jzaia18/DandyHacks2018

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DandyHacks2018

Steven Li, Jeffrey Weng, Justin Yau, Jake Zaia

alt text

Inspiration

The Onion and "fake news" inspired this project. Our project name, The Orange, as well as being a parody of "The Onion", comes from an old English phrase, a Clockwork Orange. A Clockwork Orange appears normal and natural on the outside but is bizzare and mechanical on the inside. Similarly, our website that appears to be news is completely randomly generated by computers, using sources such as The Onion as training data.

What it Does

Using Markov Chains, we train an algorithm on large portions of text. We then use this "trained" data to generate original news stories. Users can also vote and comment on the articles. We store all our information on a DigitalOcean droplet that utilizes mongoDB.

How We Built It

A scraper written in Python uses several APIs and libraries to pull relevent data from different news sources. We then feed that information into our Markov Chain algorithm (also written in python) to help it "learn" how to write natural sounding articles. The website itself is run using Python Flask as a backend with a light Bootstrap & JQuery frontend, and is deployed with Apache. The Digital Ocean droplet stores data internally using MongoDB, while simultaneously hosting the web server.

Challenges

The Scraper was challenging. Different websites use different data storage formats and making a Scraper to extract relevent data consistently from these formats is a difficult task. Working with large data sets was also difficult as our algorithm requires a decent amount of computational power, and can produce files that are very large. Implementing Markov Chains was also an arduous task, as none of the team had any formal experience doing so before.

What we learned

We learned how to utilize markov chains to producing realistic sounding text. We also learned how to use scrapers to get information from the internet algorithmically. Additionally we learned how to deploy a Flask App using apache2 on a Linux Machine.

alt text

Whats next

There's still plenty to do such as:

  • Improving our scrapers to get more meaningful data
  • Training our algorithms on larger datasets to make better stories
  • Automatically generating stories on a set schedule
  • Expanding our website to have more functionality for users

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •