Calculate page similarity based on page UI tree structure.
Referred to Li Jingyang, Zhang Bo. Method and device for determining the similarity of web page structure
Based on code implementation of HTMLSimilarity.
from utils import get_xml_similarity
from adapter import json2xml
is_similarity, value = get_xml_similarity(doc1, doc2)
See main.py
for details on how to use it.
Three document types are supported:
json
file exported by Droidbotxml
file exported byadb uiautomator
html
file
- document 1
- document 2
- dimension after dimensionality reduction, default is 5000
- threshold, default is 0.1
- phrase type, default is
'xml'
, optionally'lxml'
- whether or not it is similar
- similar value (similar for
value < tol
, not similar forvalue > tol
)