-
Notifications
You must be signed in to change notification settings - Fork 2
插件(Archived)
插件能够让您在索引时添加任何额外的元信息(Meta Information) 例如,我们有一些新闻的文字数据,当我们索引这些新闻的时候,我们希望能够通过现有的机器学习模型来添加“情感”或者“话题”的信息。我们可以使用插件功能轻松地将这些模型添加到 soco-search-worker 中,并且这些插件具有高度的灵活性和可定制性。想要了解更多插件API的内容,请参考插件API页
我们将在这个部分讲解如何实现情感分析插件, 以下为源字段及通过插件处理后获得的目标字段:
输入格式(源)
{
"article": "Democrats are racing to pass President Joe Biden's .9 trillion COVID-19 relief plan...",
"date": "2020-08-31T16:03:28",
"mode": 2
}
索引格式(目标)
{
"content": "Democrats are racing to pass President Joe Biden's .9 trillion COVID-19 relief plan...",
"sentiment": "positive",
"topic": "politics"
}
- 使用 plugin_helper 定义插件类
from worker.plugins.plugin_helper import PlugInHelper
class SentimentAnalysis(PlugInHelper):
pass
- 编写 plugin_method 来定义抽象方法
from worker.plugins.plugin_helper import PlugInHelper
import stanza
class SentimentAnalysis(PlugInHelper):
def plugin_method(self, batch: list) -> list:
outputs = []
for x in batch:
text = x["value"]
configs = x["configs"]
lang = configs.get("lang", "en")
nlp = stanza.Pipeline(lang=lang, processors='tokenize,sentiment',dir=self.rsc_dir)
doc = nlp(text)
scores = []
for i, sentence in enumerate(doc.sentences):
scores.append(sentence.sentiment)
outputs.append(float(sum(scores))/len(scores))
return outputs
- 在 plugin_manager.py 中添加一个新插件
from worker.plugins.sentiment_analysis import SentimentAnalysis <-- 1. import class you define
class PluginManager():
sentiment_analysis = SentimentAnalysis() <-- 2. instantiating an plugin
@classmethod
def run_plugins(cls, docs, params):
docs = cls.sentiment_analysis.process(docs=docs, params=params, plugin_name='sentiment_analysis') <-- call the 'process' method
return docs
在configs中,你只需要把key设为plugin,并将value设为您所定义的插件名即可 (e.g. {"key":"plugin", "value":"sentiment_analysis"})
以下为插件映射的示例。
{
"mappings": {
"title": [
{
"targetFieldName": "title",
"type": "keyword"
}
],
"text": [
{
"targetFieldName": "text",
"type": "keyword"
},
{
"targetFieldName":"sentiment",
"type":"keyword",
"configs":[
{"key":"plugin", "value":"sentiment_analysis"},
{"key":"lang","value":"en"}]
}
]
},
"index_id":"plugin_test"
}
在本例中,我们将根据“文本(text)”数据添加推荐标签. 以下为以下为源字段及通过插件处理后获得的目标字段示例:
输入格式(源)
{
"text": "Democrats are racing to pass President Joe Biden's .9 trillion COVID-19 relief plan..."
}
索引格式(目标)
{
"text": "Democrats are racing to pass President Joe Biden's $1.9 trillion COVID-19 relief plan...",
"tag": {"yong_ppl": 1.1, "sports": 2.1, "football": 3.2},
}
映射可为如下格式:
{
"mappings": {
"text": [
{
"targetFieldName": "text",
"type": "keyword"
},
{
"targetFieldName":"tag",
"type":"term_score",
"configs":[
{"key":"plugin", "value":"recommendation"},
{"key":"lang","value":"en"}]
}
]
},
"index_id":"plugin_test_term_score"
}
这意味着源字段 '文本(text)'字段 可以被转换为'text'和 'tag' 我们将使用推荐插件并设置为configs= {"lang":"en"}. 在我们上面的例子中,我们从 PlugInHelper 定义推荐类并为抽象类(PlugInHelper)实现“plugin_method”:
from worker.plugins.plugin_helper import PlugInHelper
class Recommendation(PlugInHelper):
def plugin_method(self, batch: list) -> list:
"""
:param batch: [{"value":"....","configs":{...}},...]
:return: list
"""
outputs = []
for x in batch:
value = x["value"] # value from input
configs = x["configs"] # the config you define in the mapping
# if configs in the mapping is as follows:
# { "configs":[
# {"key":"plugin", "value":"sentiment_analysis"},
# {"key":"lang","value":"en"}]
# }
# the 'configs' will be {"lang":"en", "plugin": "sentiment_analysis"}
# add your method
result = self._do_recommend(value, configs)
outputs.append(result)
return outputs
def _do_recommend(self, value, configs):
# Complete your method here
result = {'yong_ppl': 1.1, 'sports': 2.1, 'football': 3.2} # In order to show term_score example, we manually hardcode the result here. Please implement your method.
return result
在我们实现'plugin_method'后, 我们可以从推荐类(Recommendation Class)调用 'the process method'来定义父类(PluginHelper) 我们可以将以下代码包含在 plugin_manager.py
# only precessing if it contains 'recommendation' plugin in the params.
if cls.is_plugin(params, "recommendation"):
docs = cls.recommendation.process(
docs=docs, params=params, plugin_name="recommendation"
)
- 输入数据
docs = [{"text":"Democrats are racing to pass President Joe Biden's .9 trillion COVID-19 relief plan, one that includes ,400 stimulus checks, an extension of unemployment benefits and money for local and state governments. "}]
- 将映射从用户友好格式转换为机器友好格式: 输入(用户友好格式):
{
"mappings": {
"text": [
{
"targetFieldName": "text",
"type": "keyword"
},
{
"targetFieldName":"tag",
"type":"term_score",
"configs":[
{"key":"plugin", "value":"recommendation"},
{"key":"lang","value":"en"}]
}
]
},
"index_id":"plugin_test_term_score"
}
内部参数格式 (机器友好格式):
{
"index": "plugin_test_term_score",
"mappings": {
"_uid": {
"type": "_uid"
},
"text": {
"type": "keyword"
},
"rec": {
"type": "term_score"
},
"_meta": {
"type": "meta",
"value": {
"rec": { # This is the part you will get in the configs in the plugin_method
"plugin": "recommendation", #
"lang": "en" #
} #
}
},
"_target": {
"type": "meta",
"value": {
"text": [
"text",
"rec"
]
}
},
"_doc_id": {
"type": "keyword"
},
"_plugins": {
"type": "meta",
"value": [
"recommendation"
]
}
}
}
'docs' 将根据映射进行转换
'text' will have two fields: 'text' and 'tag':
[{"text": "aaa"}] -> [{"text":"aaa", "tag":"aaa"}]
- plugin_manager 将输入转换后的文档和参数
docs = cls.recommendation.process(docs=docs, params=params, plugin_name='recommendation')
-
process method 内部调用plug_method (plugin_helper.py)
-
'docs' 将根据 plugin_helper 进行转换
[{"text": "aaa"}] -> [{"text":"aaa", "tag":{'yong_ppl': 1.1, 'sports': 2.1, 'football': 3.2}}]
sudo docker build -f docker/Dockerfile.worker -t convmind/soco-search:0.1-worker .
sudo docker push convmind/soco-search:0.1-worker
sudo docker-compose -f docker/docker-compose.yml up