Skip to content

Commit

Permalink
new doc
Browse files Browse the repository at this point in the history
  • Loading branch information
digoal zhou authored and digoal zhou committed May 14, 2021
1 parent 43b91e6 commit 3281dbd
Show file tree
Hide file tree
Showing 7 changed files with 226 additions and 0 deletions.
2 changes: 2 additions & 0 deletions 202105/20210511_01.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ PostgreSQL , 安全
----

## 背景
![pic](20210511_01_pic_010.jpg)

搜索引擎输入电信诈骗, 预估有 1,870,000 条结果. 现在大街小巷都在进行防诈骗宣传, 连支付宝都推出了防诈骗考试, 考试通过才可以拿到绿码.

为什么校园贷、注销校园贷、刷单、杀猪盘等电信诈骗这么猖獗?
Expand Down
Binary file added 202105/20210511_01_pic_010.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
114 changes: 114 additions & 0 deletions 202105/20210514_03.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
## PostgreSQL 开源 高维向量相似搜索插件 vector - 关联阿里云rds pg pase, cube, 人脸识别

### 作者
digoal

### 日期
2021-05-14

### 标签
PostgreSQL , vector , 向量 , 相似 , cube , pase

----

## 背景
PostgreSQL 内置的cube插件支持最高100维向量, 每个维度8字节, 性能一般.

阿里云rds pg提供了一个高维向量插件pase用于解决高精度向量高效搜索, 例如人脸识别.

同时pase有对外公开发表的论文, 基于这个论文, 国外的开源爱好者开发了一个名为vector的插件, 支持ivfflat索引算法.

vector支持三种向量距离:
L2 distance, inner product, and cosine distance

用法

```
CREATE EXTENSION vector;
CREATE TABLE table (column vector(3));
CREATE INDEX ON table USING ivfflat (column);
INSERT INTO table VALUES ('[1,2,3]'), ('[4,5,6]');
SELECT * FROM table ORDER BY column <-> '[1,2,3]' LIMIT 5;
```

```
L2 distance (<->)
inner product (<#>)
cosine distance (<=>)
Note: <#> returns the negative inner product
since Postgres only supports ASC order index scans on operators
```

Speed up queries with an approximate index. Add an index for each distance function you want to use.

L2 distance:

```
CREATE INDEX ON table USING ivfflat (column);
```

Inner product:

```
CREATE INDEX ON table USING ivfflat (column vector_ip_ops);
```

Cosine distance:

```
CREATE INDEX ON table USING ivfflat (column vector_cosine_ops);
```

必须有一定的数据后再建立索引, 否则性能不佳

Indexes should be created after the table has data for optimal clustering.
Also, unlike typical indexes which only affect performance, you may see different results for queries after adding an approximate index.

Index Options, list就是中心点的数量, 即buckets. 可以看我之前讲解的PASE原理. 或者参考阿里云rds pg pase文档.
Specify the number of inverted lists (100 by default)

```
CREATE INDEX ON table USING ivfflat (column) WITH (lists = 100);
```

Query Options
Specify the number of probes (1 by default)
搜索时,指定probes可以提高精度, 搜索多少个最近的特征点以及对应的bucket内的point. 越多精度越高但是性能越差

```
SET ivfflat.probes = 1;
```

A higher value improves recall at the cost of speed.

Use SET LOCAL inside a transaction to set it for a single query

```
BEGIN;
SET LOCAL ivfflat.probes = 1;
SELECT ... COMMIT;
```

[《PostgreSQL 在资源搜索中的设计 - pase, smlar, pg_trgm - 标签+权重相似排序 - 标签的命中率排序》](../202009/20200930_01.md)
[《PostgreSQL 向量相似推荐设计 - pase》](../202004/20200424_01.md)
[《社交、电商、游戏等 推荐系统 (相似推荐) - 阿里云pase smlar索引方案对比》](../202004/20200421_01.md)
[《PostgreSQL 阿里云rds pg发布高维向量索引,支持图像识别、人脸识别 - pase 插件》](../201912/20191219_02.md)



#### [PostgreSQL 许愿链接](https://github.com/digoal/blog/issues/76 "269ac3d1c492e938c0191101c7238216")
您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。[开不开森](https://github.com/digoal/blog/issues/76 "269ac3d1c492e938c0191101c7238216").


#### [9.9元购买3个月阿里云RDS PostgreSQL实例](https://www.aliyun.com/database/postgresqlactivity "57258f76c37864c6e6d23383d05714ea")


#### [PostgreSQL 解决方案集合](https://yq.aliyun.com/topic/118 "40cff096e9ed7122c512b35d8561d9c8")


#### [德哥 / digoal's github - 公益是一辈子的事.](https://github.com/digoal/blog/blob/master/README.md "22709685feb7cab07d30f30387f0a9ae")


![digoal's wechat](../pic/digoal_weixin.jpg "f7ad92eeba24523fd47a6e1a0e691b59")

50 changes: 50 additions & 0 deletions 202105/20210514_04.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
## PostgreSQL - 关于 like 'xxx%' 或 ~ '^xxx' 不走索引的现象 和 正确的索引用法

### 作者
digoal

### 日期
2021-05-14

### 标签
PostgreSQL , collate , pattern

----

## 背景
https://www.postgresql.org/docs/current/indexes-opclass.html

```
select x from tbl where col like 'xxx%' 或 ~ '^xxx'
```

col有索引但是不能走索引的原因和解法? 如下:

The operator classes text_pattern_ops, varchar_pattern_ops, and bpchar_pattern_ops support B-tree indexes on the types text, varchar, and char respectively. The difference from the default operator classes is that the values are compared strictly character by character rather than according to the locale-specific collation rules. This makes these operator classes suitable for use by queries involving pattern matching expressions (LIKE or POSIX regular expressions) when the database does not use the standard “C” locale. As an example, you might index a varchar column like this:

```
CREATE INDEX test_index ON test_table (col varchar_pattern_ops);
```

Note that you should also create an index with the default operator class if you want queries involving ordinary ```<, <=, >, or >= ```comparisons to use an index. Such queries cannot use the xxx_pattern_ops operator classes. (Ordinary equality comparisons can use these operator classes, however.) It is possible to create multiple indexes on the same column with different operator classes. If you do use the C locale, you do not need the xxx_pattern_ops operator classes, because an index with the default operator class is usable for pattern-matching queries in the C locale.

```
CREATE INDEX test_index ON test_table (col collate "C");
```

#### [PostgreSQL 许愿链接](https://github.com/digoal/blog/issues/76 "269ac3d1c492e938c0191101c7238216")
您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。[开不开森](https://github.com/digoal/blog/issues/76 "269ac3d1c492e938c0191101c7238216").


#### [9.9元购买3个月阿里云RDS PostgreSQL实例](https://www.aliyun.com/database/postgresqlactivity "57258f76c37864c6e6d23383d05714ea")


#### [PostgreSQL 解决方案集合](https://yq.aliyun.com/topic/118 "40cff096e9ed7122c512b35d8561d9c8")


#### [德哥 / digoal's github - 公益是一辈子的事.](https://github.com/digoal/blog/blob/master/README.md "22709685feb7cab07d30f30387f0a9ae")


![digoal's wechat](../pic/digoal_weixin.jpg "f7ad92eeba24523fd47a6e1a0e691b59")

54 changes: 54 additions & 0 deletions 202105/20210514_05.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
## 代替DBA的数据库工具类SaaS 有没有市场?

### 作者
digoal

### 日期
2021-05-14

### 标签
PostgreSQL , SaaS

----

## 背景
1、抖音, 一些短视频制作工具类SaaS
C端, 目标群体: 大叔大妈, 目标需求: 怀旧和动手乐
例如几个照片或自录视频的效果加载(革命歌曲配乐、添加动态叠加视频等)

2、图怪兽 SaaS, 创客贴 SaaS
C端, 目标群体: 职场人士, 目标需求: 公司请专业人士可能不划算或者人才少价格高, 但是又偶尔要用到的功能.
例如做海报.

3、代替DBA的数据库工具类SaaS 有没有市场?
数据库市场大吗? 大, 哪个企业不用数据库?
企业都请得起DBA或者都需要DBA吗? 作为使用方, 规模大的互联网公司才需要专业DBA. 小公司不需要DBA.
DBA贵吗? 贵
好的DBA难找吗? 会越来越难
开发者可以兼DBA吗? 可以, 但是需要有工具(自动化工具解决)
工具以外剩下的问题交给谁? 远程DBA(人力解决)
单位购买, 开发人员或开发运维共用, 一举多得.
类似的产品国外已经开始有了: 包括自动分析SQL执行计划, 优化SQL, SQL格式化类, 自动生成SQL类, 字典类, 监控, 参数优化类
- https://pgdash.io/pricing
- https://pganalyze.com/
- 口袋工具 : https://c.runoob.com/
- 口袋地图(pg动态视图鸟瞰图) : https://pgstats.dev/
- 自动参数优化: https://github.com/timescale/timescaledb-tune



#### [PostgreSQL 许愿链接](https://github.com/digoal/blog/issues/76 "269ac3d1c492e938c0191101c7238216")
您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。[开不开森](https://github.com/digoal/blog/issues/76 "269ac3d1c492e938c0191101c7238216").


#### [9.9元购买3个月阿里云RDS PostgreSQL实例](https://www.aliyun.com/database/postgresqlactivity "57258f76c37864c6e6d23383d05714ea")


#### [PostgreSQL 解决方案集合](https://yq.aliyun.com/topic/118 "40cff096e9ed7122c512b35d8561d9c8")


#### [德哥 / digoal's github - 公益是一辈子的事.](https://github.com/digoal/blog/blob/master/README.md "22709685feb7cab07d30f30387f0a9ae")


![digoal's wechat](../pic/digoal_weixin.jpg "f7ad92eeba24523fd47a6e1a0e691b59")

3 changes: 3 additions & 0 deletions 202105/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@

### 文章列表
----
##### 20210514_05.md [《代替DBA的数据库工具类SaaS 有没有市场?》](20210514_05.md)
##### 20210514_04.md [《PostgreSQL - 关于 like 'xxx%' 或 ~ '^xxx' 不走索引的现象 和 正确的索引用法》](20210514_04.md)
##### 20210514_03.md [《PostgreSQL 开源 高维向量相似搜索插件 vector - 关联阿里云rds pg pase, cube, 人脸识别》](20210514_03.md)
##### 20210514_02.md [《PGSync - PostgreSQL 逻辑订阅同步到 ElasticSearch》](20210514_02.md)
##### 20210514_01.md [《PostgreSQL 时序数据库插件 timescaledb 2.2.1 通过custom plan provider接口实现index skip scan, 加速distinct, last_value, first_value等大表稀疏值快速搜索, 最快上万倍性能提升》](20210514_01.md)
##### 20210513_02.md [《PostgreSQL 14 release notes 新特性详解》](20210513_02.md)
Expand Down
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,9 @@ digoal's|PostgreSQL|文章|归类

### 所有文档如下
----
##### 202105/20210514_05.md [《代替DBA的数据库工具类SaaS 有没有市场?》](202105/20210514_05.md)
##### 202105/20210514_04.md [《PostgreSQL - 关于 like 'xxx%' 或 ~ '^xxx' 不走索引的现象 和 正确的索引用法》](202105/20210514_04.md)
##### 202105/20210514_03.md [《PostgreSQL 开源 高维向量相似搜索插件 vector - 关联阿里云rds pg pase, cube, 人脸识别》](202105/20210514_03.md)
##### 202105/20210514_02.md [《PGSync - PostgreSQL 逻辑订阅同步到 ElasticSearch》](202105/20210514_02.md)
##### 202105/20210514_01.md [《PostgreSQL 时序数据库插件 timescaledb 2.2.1 通过custom plan provider接口实现index skip scan, 加速distinct, last_value, first_value等大表稀疏值快速搜索, 最快上万倍性能提升》](202105/20210514_01.md)
##### 202105/20210513_02.md [《PostgreSQL 14 release notes 新特性详解》](202105/20210513_02.md)
Expand Down

0 comments on commit 3281dbd

Please sign in to comment.