DW scraper Repo

Websitescrape to gather data from German news websitese dw.com/dw. The project aims to create a database out of:

articles
keywords
article meta: author, post date, recomendations

Project structure

[Follows..]

Setup Databases

The Scrapper can utilize two kinds of databases: -mongodb -redis Both are NoSQL Databases, which can operate on different system and can be used to store and query documents. Because of the ease of use of NoSQL and the possibilitys of flexible choosing the of amount data to store behind keys in a NoSQL database, these both were choosen for this porject.

Several different hardware architectures were avialbe at the beginning of the project (RaspberryPi ARM32 and Normal Ubuntu Server 64bit), so different implementations were needed.

Setting up the database structure

Setting up the mongodb is straight forward.

Follow the steps to install and start a localhost:27017 mongodb server Link
Run mongo.py within the src/db folder

Setting up redis is a bit more tedious:

Install redis according to the project website
Install redis-py with pip Link. If you use conda follow Link

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
.vscode		.vscode
doc		doc
sh		sh
src		src
test		test
.gitignore		.gitignore
.gitmodules		.gitmodules
.makefile.swp		.makefile.swp
README.md		README.md
makefile		makefile
scrapper.sh		scrapper.sh
setup.py		setup.py
test.txt		test.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DW scraper Repo

Project structure

Setup Databases

Setting up the database structure

About

Releases

Packages

Languages

acmbo/datascrapper

Folders and files

Latest commit

History

Repository files navigation

DW scraper Repo

Project structure

Setup Databases

Setting up the database structure

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages