Skip to content

Commit

Permalink
export jieba, fix prob files and remove deps of absl-py
Browse files Browse the repository at this point in the history
  • Loading branch information
Hai Liang Wang committed Sep 24, 2020
1 parent ea92e8f commit caf046f
Show file tree
Hide file tree
Showing 12 changed files with 336 additions and 314 deletions.
2 changes: 1 addition & 1 deletion Requirements.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
synonyms>=3.12
synonyms>=3.13
67 changes: 33 additions & 34 deletions VALUATION.md
Original file line number Diff line number Diff line change
@@ -1,34 +1,33 @@
# synonyms 分数评测 [(v3.12.0)](https://pypi.python.org/pypi/synonyms/3.12.0)

| 词 1 | 词 2 | synonyms | 人工评定 |
| ------ | -------- | -------- | -------- |
| 轿车 | 汽车 | 0.892 | 0.98 |
| 宝石 | 宝物 | 1.0 | 0.96 |
| 旅游 | 游历 | 0.649 | 0.96 |
| 男孩子 | 小伙子 | 0.77 | 0.94 |
| 海岸 | 海滨 | 0.889 | 0.925 |
| 庇护所 | 精神病院 | 0.211 | 0.9025 |
| 魔术师 | 巫师 | 0.95 | 0.875 |
| 中午 | 正午 | 0.9 | 0.855 |
| 火炉 | 炉灶 | 0.889 | 0.7775 |
| 食物 | 水果 | 0.363 | 0.77 |
|| 公鸡 | 0.895 | 0.7625 |
||| 1.0 | 0.7425 |
| 工具 | 器械 | 0.881 | 0.7375 |
| 兄弟 | 和尚 | 0.139 | 0.705 |
| 起重机 | 器械 | 0.195 | 0.42 |
| 小伙子 | 兄弟 | 0.703 | 0.415 |
| 旅行 | 轿车 | 0.088 | 0.29 |
| 和尚 | 圣贤 | 0.222 | 0.275 |
| 墓地 | 林地 | 0.874 | 0.2375 |
| 食物 | 公鸡 | 0.151 | 0.2225 |
| 海岸 | 丘陵 | 0.248 | 0.2175 |
| 森林 | 墓地 | 0.14 | 0.21 |
| 岸边 | 林地 | 0.193 | 0.1575 |
| 和尚 | 奴隶 | 0.059 | 0.1375 |
| 海岸 | 森林 | 0.23 | 0.105 |
| 小伙子 | 巫师 | 0.182 | 0.105 |
| 琴弦 | 微笑 | 0.089 | 0.0325 |
| 玻璃 | 魔术师 | 0.02 | 0.0275 |
| 中午 | 绳子 | 0.049 | 0.02 |
| 公鸡 | 航行 | 0.0 | 0.02 |
# synonyms 分数评测 [(v3.13.0)](https://pypi.python.org/pypi/synonyms/3.13.0)
| 词1 | 词2 | synonyms | 人工评定 |
| --- | --- | --- | --- |
| 轿车 | 汽车 | 0.892 | 0.98 |
| 宝石 | 宝物 | 1.0 | 0.96 |
| 旅游 | 游历 | 0.649 | 0.96 |
| 男孩子 | 小伙子 | 0.77 | 0.94 |
| 海岸 | 海滨 | 0.889 | 0.925 |
| 庇护所 | 精神病院 | 0.211 | 0.9025 |
| 魔术师 | 巫师 | 0.95 | 0.875 |
| 中午 | 正午 | 0.9 | 0.855 |
| 火炉 | 炉灶 | 0.889 | 0.7775 |
| 食物 | 水果 | 0.363 | 0.77 |
|| 公鸡 | 0.895 | 0.7625 |
||| 1.0 | 0.7425 |
| 工具 | 器械 | 0.881 | 0.7375 |
| 兄弟 | 和尚 | 0.139 | 0.705 |
| 起重机 | 器械 | 0.195 | 0.42 |
| 小伙子 | 兄弟 | 0.703 | 0.415 |
| 旅行 | 轿车 | 0.088 | 0.29 |
| 和尚 | 圣贤 | 0.222 | 0.275 |
| 墓地 | 林地 | 0.874 | 0.2375 |
| 食物 | 公鸡 | 0.151 | 0.2225 |
| 海岸 | 丘陵 | 0.248 | 0.2175 |
| 森林 | 墓地 | 0.14 | 0.21 |
| 岸边 | 林地 | 0.193 | 0.1575 |
| 和尚 | 奴隶 | 0.059 | 0.1375 |
| 海岸 | 森林 | 0.23 | 0.105 |
| 小伙子 | 巫师 | 0.182 | 0.105 |
| 琴弦 | 微笑 | 0.089 | 0.0325 |
| 玻璃 | 魔术师 | 0.02 | 0.0275 |
| 中午 | 绳子 | 0.049 | 0.02 |
| 公鸡 | 航行 | 0.0 | 0.02 |
16 changes: 11 additions & 5 deletions demo.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,10 +33,6 @@
# raise "Must be using Python 3"
#

from absl import flags
from absl import logging

FLAGS = flags.FLAGS
import synonyms # https://github.com/huyingxi/Synonyms
import numpy
import unittest
Expand Down Expand Up @@ -144,10 +140,20 @@ def test_basecase_2(self):
sen2 = "今天天气怎么样"
r = synonyms.compare(sen1, sen2, seg=True)


def test_analyse_extract_tags(self):
'''
使用 Tag 方式获得关键词
https://github.com/fxsjy/jieba/tree/v0.39
'''
from synonyms.jieba import analyse
sentence = "华为芯片被断供,源于美国关于华为的修订版禁令生效——9月15日以来,台积电、高通、三星等华为的重要合作伙伴,只要没有美国的相关许可证,都无法供应芯片给华为,而中芯国际等国产芯片企业,也因采用美国技术,而无法供货给华为。目前华为部分型号的手机产品出现货少的现象,若该形势持续下去,华为手机业务将遭受重创。"
keywords = analyse.extract_tags(sentence, topK=5, withWeight=False, allowPOS=())
print("[test_analyse_extract_tags] keywords %s" % keywords)

def test():
unittest.main()


if __name__ == '__main__':
FLAGS([__file__, '--verbosity', '1'])
test()
10 changes: 10 additions & 0 deletions scripts/publish.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,14 @@ export PATH=/opt/miniconda3/envs/venv-py3/bin:$PATH
# main
[ -z "${BASH_SOURCE[0]}" -o "${BASH_SOURCE[0]}" = "$0" ] || return
cd $baseDir/..

if [ ! -d tmp ]; then
mkdir tmp
fi

if [ -f synonyms/data/words.vector.gz ]; then
mv synonyms/data/words.vector.gz tmp
fi

python setup.py sdist upload -r pypi
mv tmp/words.vector.gz synonyms/data/words.vector.gz
2 changes: 1 addition & 1 deletion scripts/test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ export PATH=/opt/miniconda3/envs/venv-py3/bin:$PATH
cd $baseDir/..
if [ -f .env ]; then
echo "load env with" `pwd`"/.env"
#source .env
source .env
fi

python demo.py
7 changes: 4 additions & 3 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@

setup(
name='synonyms',
version='3.12.0',
version='3.13.0',
description='中文近义词:聊天机器人,智能问答工具包;Chinese Synonyms for Natural Language Processing and Understanding',
long_description=LONGDOC,
author='Hai Liang Wang, Hu Ying Xi',
Expand Down Expand Up @@ -41,11 +41,12 @@
'six>=1.11.0',
'numpy>=1.13.1',
'scipy>=1.0.0',
'scikit-learn>=0.19.1',
'absl-py>=0.4'
'scikit-learn>=0.19.1'
],
package_data={
'synonyms': [
'**/**/idf.txt',
'**/**/*.p',
'**/*.gz',
'**/*.txt',
'LICENSE']})
1 change: 1 addition & 0 deletions synonyms/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
__all__ = ["seg",
"jieba",
"nearby",
"compare",
"display",
Expand Down
Loading

0 comments on commit caf046f

Please sign in to comment.