We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
首先这是不加自定义词库时, 返回的结果, 注意看返回的 "感恩“
psql (13.4) Type "help" for help. marketbet_crawler_development=# select * from zhparser.zhprs_custom_word; word | tf | idf | attr ------+----+-----+------ (0 rows) marketbet_crawler_development=# SELECT ts_parse('zhparser','金市周评:FED加息预期升温且国际贸 易局势缓和,感恩节前金价回落'); ts_parse ------------ (110,金市) (110,市) (110,周评) (118,评) (117,:) (101,FED) (118,加息) (118,加) (110,息) (118,预期) (118,升温) (118,升) (118,温) (99,且) (110,国际) (110,国) (110,际) (110,贸) (97,易) (110,局势) (110,局) (110,势) (118,缓和) (118,缓) (117,,) (118,感恩) (110,恩) (116,节前) (110,节) (110,金价) (110,价) (118,回落) (118,回) (118,落) (34 rows)
下面添加 “感恩节” 到自定义词库 2.1
marketbet_crawler_development=# INSERT INTO zhparser.zhprs_custom_word values('感恩节') ON CONFLICT DO NOTHING; INSERT 0 1 marketbet_crawler_development=# select * from zhparser.zhprs_custom_word; word | tf | idf | attr --------+----+-----+------ 感恩节 | 1 | 1 | @ (1 row)
为了确保生效,退出 psql, 再次连接, 并且执行 sync_zhprs_custom_word();, 可以看到 “感恩节还在”
marketbet_crawler_development=# SELECT sync_zhprs_custom_word(); sync_zhprs_custom_word ------------------------ (1 row) marketbet_crawler_development=# select * from zhparser.zhprs_custom_word; word | tf | idf | attr --------+----+-----+------ 感恩节 | 1 | 1 | @ (1 row)
然后再次查询, 问题来了,并未看到 “感恩节” 作为 token 出现,事实上,两者没有任何变化,仿佛没有加这个关键字一样。
marketbet_crawler_development=# SELECT ts_parse('zhparser','金市周评:FED加息预期升温且国际贸 易局势缓和,感恩节前金价回落'); ts_parse ------------ (110,金市) (110,市) (110,周评) (118,评) (117,:) (101,FED) (118,加息) (118,加) (110,息) (118,预期) (118,升温) (118,升) (118,温) (99,且) (110,国际) (110,国) (110,际) (110,贸) (97,易) (110,局势) (110,局) (110,势) (118,缓和) (118,缓) (117,,) (118,感恩) (110,恩) (116,节前) (110,节) (110,金价) (110,价) (118,回落) (118,回) (118,落) (34 rows)
The text was updated successfully, but these errors were encountered:
(116,感恩节) (118,是) (114,什么) (3 rows)
(118,感恩) (116,节前) (110,金价) (118,回落) (4 rows)
(1 row)
postgres=# \q [lzzhang@lzzhang bin]$ ./psql -d postgres psql (15.0) Type "help" for help.
(120,感恩节前) (110,金价) (118,回落) (3 rows)
(118,感恩) (116,节假日) (118,来临)
似乎比较难处理。
可以给scws提一个issue看看,不过改项目已经很久没维护了。不一定会处理
Sorry, something went wrong.
我个人觉得,在业务中处理 感恩节前 这种长词会好些
No branches or pull requests
首先这是不加自定义词库时, 返回的结果, 注意看返回的 "感恩“
下面添加 “感恩节” 到自定义词库 2.1
为了确保生效,退出 psql, 再次连接, 并且执行 sync_zhprs_custom_word();, 可以看到 “感恩节还在”
然后再次查询, 问题来了,并未看到 “感恩节” 作为 token 出现,事实上,两者没有任何变化,仿佛没有加这个关键字一样。
The text was updated successfully, but these errors were encountered: