Skip to content

Commit

Permalink
update examples
Browse files Browse the repository at this point in the history
  • Loading branch information
guofei9987 committed Nov 25, 2023
1 parent beffdb5 commit f911418
Show file tree
Hide file tree
Showing 3 changed files with 24 additions and 24 deletions.
18 changes: 9 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,15 +13,15 @@ A fast Python implementation of locality sensitive hashing.



| Algorithm | Function | Application | Features |
|-----------|----------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------|---------------------------------------------------------------------------------------|
| fuzzy-hash| Map text or string or file to 64-bits (or other) hash values. Similar contents hash similar hash values | Fast compare similar contents | Suitable for text/string/file
| min-hash | Map sets to signature matrices and find similar sets by calculating Jaccard similarity | Similarity retrieval | Suitable for text, network, audio, and other data |
| SimHash | Convert high-dimensional data such as text and images into fixed-length vectors, and map similar vectors to the same bucket through hash functions | Text and image similarity retrieval | Suitable for high-dimensional data |
| aHash | Compress images to a fixed size and map similar images to the same bucket through hash functions | Similar image retrieval | Has some robustness to scaling and slight deformations |
| dHash | Convert images to grayscale and calculate difference values, then map similar images to the same bucket through hash functions | Similar image retrieval | Has some robustness to scaling and slight deformations |
| pHash | Convert images to DCT coefficients and map similar images to the same bucket through hash functions | Similar image retrieval | Has some robustness to scaling, brightness, translation, rotation, and noise addition |
| LSH | Map high-dimensional vectors to low-dimensional space and map similar vectors to the same bucket through hash functions | Fast search for approximate vectors | Suitable for large-scale high-dimensional data |
| Algorithm | Function | Application | Features |
|------------|----------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------|---------------------------------------------------------------------------------------|
| fuzzy-hash | Map text or string or file to 64-bits (or other) hash values. Similar contents hash similar hash values | Fast compare similar contents | Suitable for text/string/file |
| min-hash | Map sets to signature matrices and find similar sets by calculating Jaccard similarity | Similarity retrieval | Suitable for text, network, audio, and other data |
| SimHash | Convert high-dimensional data such as text and images into fixed-length vectors, and map similar vectors to the same bucket through hash functions | Text and image similarity retrieval | Suitable for high-dimensional data |
| aHash | Compress images to a fixed size and map similar images to the same bucket through hash functions | Similar image retrieval | Has some robustness to scaling and slight deformations |
| dHash | Convert images to grayscale and calculate difference values, then map similar images to the same bucket through hash functions | Similar image retrieval | Has some robustness to scaling and slight deformations |
| pHash | Convert images to DCT coefficients and map similar images to the same bucket through hash functions | Similar image retrieval | Has some robustness to scaling, brightness, translation, rotation, and noise addition |
| LSH | Map high-dimensional vectors to low-dimensional space and map similar vectors to the same bucket through hash functions | Fast search for approximate vectors | Suitable for large-scale high-dimensional data |



Expand Down
18 changes: 9 additions & 9 deletions README_cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,15 +11,15 @@



| 算法 | 功能 | 场景 | 特点 |
|----------|-------------------------------------|-----------------------|---------------------------|
| fuzzy-hash| 计算字符串、二进制、文件的 Hash 值,使相似的内容对应的 Hash 值也相似 | 快速检索相似文档、文件 | 适用于检测存在轻微变化的内容 |
| LSH | 把实数向量映射到 Hash 值,使相似的向量对应的 Hash 值也相似 | O(N)时间内快速检索到top-k相似向量 | |
| min-hash | 把集合映射到 Hash 值,使相似的集合对应的 Hash 值也相似 | 快速检索相似集合、检索相似文档 | Hash 值相同的概率,等于 Jaccard 系数 |
| SimHash | 把文档映射到 Hash 值,使相似的集合对应的 Hash 值也相似 | 快速检索相似文档 | |
| aHash | 把图片映射到 Hash 值,使相似图片的 Hash 值也相似 | 相似图片检索 | 抗缩放、亮度攻击等 |
| dHash | 把图片映射到 Hash 值,使相似图片的 Hash 值也相似 | 相似图片检索 | 抗缩放、亮度攻击等 |
| pHash | 把图片映射到 Hash 值,使相似图片的 Hash 值也相似 | 相似图片检索 | 抗缩放、亮度攻击、平移、小部分内容改变 |
| 算法 | 功能 | 场景 | 特点 |
|------------|----------------------------------------------------|-----------------------|---------------------------|
| fuzzy-hash | 计算字符串、二进制、文件的 Hash 值,使相似的内容对应的 Hash 值也相似 | 快速检索相似文档、文件 | 适用于检测存在轻微变化的内容 |
| LSH | 把实数向量映射到 Hash 值,使相似的向量对应的 Hash 值也相似 | O(N)时间内快速检索到top-k相似向量 | |
| min-hash | 把集合映射到 Hash 值,使相似的集合对应的 Hash 值也相似 | 快速检索相似集合、检索相似文档 | Hash 值相同的概率,等于 Jaccard 系数 |
| SimHash | 把文档(或者文档的特征例如TF-IDF)映射到 Hash 值,使相似的集合对应的 Hash 值也相似 | 快速检索相似文档 | |
| aHash | 把图片映射到 Hash 值,使相似图片的 Hash 值也相似 | 相似图片检索 | 抗缩放、亮度攻击等 |
| dHash | 把图片映射到 Hash 值,使相似图片的 Hash 值也相似 | 相似图片检索 | 抗缩放、亮度攻击等 |
| pHash | 把图片映射到 Hash 值,使相似图片的 Hash 值也相似 | 相似图片检索 | 抗缩放、亮度攻击、平移、小部分内容改变 |



Expand Down
12 changes: 6 additions & 6 deletions examples/example_img_hash.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,15 @@
img1 = 'img.jpeg'
img2 = 'att_img.jpeg'
att.resize_att(input_filename=img1, output_file_name=img2, out_shape=(300, 500))
img1_att_noise = att.shelter_att(input_filename=img1, output_file_name='img_with_noise.jpeg')

# %% aHash
a_hash_img1 = img_hash.a_hash(PIL.Image.open(img1))
a_hash_img2 = img_hash.a_hash(PIL.Image.open(img2))

hamming_distance = hamming(a_hash_img1, a_hash_img2)
print('[aHash]: img1 = {}, img2 = {}'.format(hex(a_hash_img1), hex(a_hash_img2)))
print(f'hamming_distance = {hamming_distance}')
print(f'[aHash] hamming_distance = {hamming_distance}')
assert hamming_distance < 5

# %% dHash
Expand All @@ -22,7 +23,7 @@

hamming_distance = hamming(d_hash_img1, d_hash_img2)
print('[dHash]: img1 = {}, img2 = {}'.format(hex(d_hash_img1), hex(d_hash_img2)))
print(f'hamming_distance = {hamming_distance}')
print(f'[aHash] hamming_distance = {hamming_distance}')
assert hamming_distance < 5

# %% pHash
Expand All @@ -31,16 +32,15 @@

hamming_distance = hamming(p_hash_img1, p_hash_img2)
print('[pHash]: img1 = {}, img2 = {}'.format(hex(p_hash_img1), hex(p_hash_img2)))
print(f'hamming_distance = {hamming_distance}')
print(f'[pHash] hamming_distance = {hamming_distance}')
assert hamming_distance < 5

# %% SSIM
import cv2
from pyLSHash.img_ssim import SSIM

img1_att_noise = att.shelter_att(input_filename=img1)

ssim = SSIM()
ssim_score = ssim.cal_ssim(cv2.imread(img1), img1_att_noise)

ssim_score = ssim.cal_ssim(cv2.imread(img1), cv2.imread('img_with_noise.jpeg'))

print("SSIM after attack:", ssim_score)

0 comments on commit f911418

Please sign in to comment.