In this assignment we will perform a clustering analysis of house announcements in Rome from Immobiliare.it.
The data in usage are scraped from immobiliare.it.
We are given passwords2.txt file as input. Each row corresponds to a string of 20 characters. We define three hash functions that associate a value to each string. In this case, the goal is to check whether there are some duplicate strings.
The first function doesn't take in account the order of the characters so, i.e., "AAB" and "ABA" are considered duplicates.
The second and third functions take in account the order of the characters so, i.e., "AAB" amd "ABA" are not considered duplicates-
hw4_lib.py
: This script contains all the useful functions to get the proposed requests, deeply commented to have a clear view of the logic behind.
The goal of the Notebook
is to provide a storytelling friendly format that shows how to use the implemented code and to carry out the reesults. It provides explanations and examples.
We tried to organize it in a way the reader can follow our logic and conlcusions.
Obviously, it is splitted in two parts,one for each main topic of the assignment.