wiki_philosophy

Wiki Crawler
Starting from a random Wikipedia article (example: http://en.wikipedia.org/wiki/Art) and clicking
on the first non-italicized link not surrounded by parentheses in the main text and then repeating
the process for subsequent articles usually leads to http://en.wikipedia.org/wiki/Philosophy.
Please write a program that models this behavior and answers the following questions, while
making as few http requests as possible.

# Questions:
## What percentage of pages lead to philosophy?
## Using the random article link (found on any wikipedia article in the left sidebar),
   what is the distribution of path lengths for 500 pages, discarding those paths that never reach the Philosophy page?

Dependencies

python2
BeautifulSoup

Running Program:

Please from terminal run python wiki-crawler.py

the result would be something like:

percentage of page lead to philosophy: 100.0%
random percentage of page lead to philosophy: 80.0%
Counter({15: 3, 10: 1, 13: 1})

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
bin		bin
flask		flask
helper		helper
include		include
top-100-liked-questions		top-100-liked-questions
tree		tree
.gitignore		.gitignore
3sum.py		3sum.py
README.md		README.md
age_in_numbers.py		age_in_numbers.py
algorithms-implementation.py		algorithms-implementation.py
balanced-symbol.py		balanced-symbol.py
binary-search.py		binary-search.py
bloomberg.py		bloomberg.py
card.py		card.py
concurrency.py		concurrency.py
crawl.py		crawl.py
data-structure.py		data-structure.py
first-duplicate.py		first-duplicate.py
first-not-repeating-character.py		first-not-repeating-character.py
grading_students.py		grading_students.py
is_zero.py		is_zero.py
list_generator.py		list_generator.py
misc.py		misc.py
node.py		node.py
number_in_words.py		number_in_words.py
output.png		output.png
palindrome.py		palindrome.py
pip-selfcheck.json		pip-selfcheck.json
requirements.txt		requirements.txt
rock_paper_scissors.py		rock_paper_scissors.py
rotate-image.py		rotate-image.py
seam-carving.png		seam-carving.png
seam-carving.py		seam-carving.py
sort_key.py		sort_key.py
spiral-matrix.py		spiral-matrix.py
sudoku.py		sudoku.py
test-m.py		test-m.py
test.py		test.py
wiki-crawler.py		wiki-crawler.py
year_you_trun_100.py		year_you_trun_100.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

wiki_philosophy

Dependencies

Running Program:

About

Releases

Packages

Languages

nabaz/python-experiments

Folders and files

Latest commit

History

Repository files navigation

wiki_philosophy

Dependencies

Running Program:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages