Skip to content

插件(Archived)

Yvette Han edited this page Jun 18, 2021 · 1 revision

插件能够让您在索引时添加任何额外的元信息(Meta Information) 例如,我们有一些新闻的文字数据,当我们索引这些新闻的时候,我们希望能够通过现有的机器学习模型来添加“情感”或者“话题”的信息。我们可以使用插件功能轻松地将这些模型添加到 soco-search-worker 中,并且这些插件具有高度的灵活性和可定制性。想要了解更多插件API的内容,请参考插件API页

示例

情绪分析插件

我们将在这个部分讲解如何实现情感分析插件, 以下为源字段及通过插件处理后获得的目标字段:

输入格式(源)

 {
  "article": "Democrats are racing to pass President Joe Biden's .9 trillion COVID-19 relief plan...",
  "date": "2020-08-31T16:03:28",
  "mode": 2
}

索引格式(目标)

{
  "content": "Democrats are racing to pass President Joe Biden's .9 trillion COVID-19 relief plan...",
  "sentiment": "positive",
  "topic": "politics"
}

添加插件

  1. 使用 plugin_helper 定义插件类
from worker.plugins.plugin_helper import PlugInHelper

class SentimentAnalysis(PlugInHelper):
    pass
  1. 编写 plugin_method 来定义抽象方法
from worker.plugins.plugin_helper import PlugInHelper
import stanza

class SentimentAnalysis(PlugInHelper):    
     def plugin_method(self, batch: list) -> list:
        outputs = []
        for x in batch:
            text = x["value"]
            configs = x["configs"]
            lang = configs.get("lang", "en")
            nlp = stanza.Pipeline(lang=lang, processors='tokenize,sentiment',dir=self.rsc_dir)
            doc = nlp(text)
            scores = []
            for i, sentence in enumerate(doc.sentences):
                scores.append(sentence.sentiment)
            outputs.append(float(sum(scores))/len(scores))
        return outputs
  1. 在 plugin_manager.py 中添加一个新插件
from worker.plugins.sentiment_analysis import SentimentAnalysis <-- 1. import class you define


class PluginManager():
    sentiment_analysis = SentimentAnalysis() <-- 2. instantiating an plugin

    @classmethod
    def run_plugins(cls, docs, params):
        docs = cls.sentiment_analysis.process(docs=docs, params=params, plugin_name='sentiment_analysis') <-- call the 'process' method
        return docs

为插件创建映射

在configs中,你只需要把key设为plugin,并将value设为您所定义的插件名即可 (e.g. {"key":"plugin", "value":"sentiment_analysis"})

以下为插件映射的示例。

{
  "mappings": {
              "title": [
                {
                  "targetFieldName": "title",
                  "type": "keyword"                  
                }
              ],              
              "text": [
                  {
                  "targetFieldName": "text",
                  "type": "keyword"                  
                },
                  {
                      "targetFieldName":"sentiment", 
                      "type":"keyword", 
                      "configs":[
                        {"key":"plugin", "value":"sentiment_analysis"}, 
                        {"key":"lang","value":"en"}]
                  }
                ]
    },     
   "index_id":"plugin_test"
}

Term Score插件

在本例中,我们将根据“文本(text)”数据添加推荐标签. 以下为以下为源字段及通过插件处理后获得的目标字段示例:

输入格式(源)

{
  "text": "Democrats are racing to pass President Joe Biden's .9 trillion COVID-19 relief plan..."
}

索引格式(目标)

{
    "text": "Democrats are racing to pass President Joe Biden's $1.9 trillion COVID-19 relief plan...",
    "tag": {"yong_ppl": 1.1, "sports": 2.1, "football": 3.2},
}

映射可为如下格式:

{
          "mappings": {
                      "text": [
                          {
                          "targetFieldName": "text",
                          "type": "keyword"
                        },
                          {
                              "targetFieldName":"tag",
                              "type":"term_score",
                              "configs":[
                                {"key":"plugin", "value":"recommendation"},
                                {"key":"lang","value":"en"}]
                          }
                        ]
            },
           "index_id":"plugin_test_term_score"
}

这意味着源字段 '文本(text)'字段 可以被转换为'text'和 'tag' 我们将使用推荐插件并设置为configs= {"lang":"en"}. 在我们上面的例子中,我们从 PlugInHelper 定义推荐类并为抽象类(PlugInHelper)实现“plugin_method”:

from worker.plugins.plugin_helper import PlugInHelper

class Recommendation(PlugInHelper):
    def plugin_method(self, batch: list) -> list:
        """
        :param batch: [{"value":"....","configs":{...}},...]
        :return: list
        """
        outputs = []
        for x in batch:
            value = x["value"] # value from input
            configs = x["configs"] # the config you define in the mapping
            # if configs in the mapping is as follows:
            # {  "configs":[
            #              {"key":"plugin", "value":"sentiment_analysis"},
            #              {"key":"lang","value":"en"}]
            # }
            # the 'configs' will be {"lang":"en", "plugin": "sentiment_analysis"}
            # add your method
            result = self._do_recommend(value, configs)
            outputs.append(result)
        return outputs

    def _do_recommend(self, value, configs):
        # Complete your method here
        
        result = {'yong_ppl': 1.1, 'sports': 2.1, 'football': 3.2} # In order to show term_score example, we manually hardcode the result here. Please implement your method. 
        return result 

在我们实现'plugin_method'后, 我们可以从推荐类(Recommendation Class)调用 'the process method'来定义父类(PluginHelper) 我们可以将以下代码包含在 plugin_manager.py

# only precessing if it contains 'recommendation' plugin in the params.
if cls.is_plugin(params, "recommendation"):
    docs = cls.recommendation.process(
        docs=docs, params=params, plugin_name="recommendation"
    )

内部处理

  1. 输入数据
    docs = [{"text":"Democrats are racing to pass President Joe Biden's .9 trillion COVID-19 relief plan, one that includes ,400 stimulus checks, an extension of unemployment benefits and money for local and state governments. "}]
  1. 将映射从用户友好格式转换为机器友好格式: 输入(用户友好格式):
{
          "mappings": {
                      "text": [
                          {
                          "targetFieldName": "text",
                          "type": "keyword"
                        },
                          {
                              "targetFieldName":"tag",
                              "type":"term_score",
                              "configs":[
                                {"key":"plugin", "value":"recommendation"},
                                {"key":"lang","value":"en"}]
                          }
                        ]
            },
           "index_id":"plugin_test_term_score"
        }

内部参数格式 (机器友好格式):

{
        "index": "plugin_test_term_score",
        "mappings": {
            "_uid": {
                "type": "_uid"
            },
            "text": {
                "type": "keyword"
            },
            "rec": {
                "type": "term_score"
            },
            "_meta": {
                "type": "meta",
                "value": {
                    "rec": {                         # This is the part you will get in the configs in the plugin_method
                        "plugin": "recommendation",  #
                        "lang": "en"                 #
                    }                                #
                }
            },
            "_target": {
                "type": "meta",
                "value": {
                    "text": [
                        "text",
                        "rec"
                    ]
                }
            },
            "_doc_id": {
                "type": "keyword"
            },
            "_plugins": {
                "type": "meta",
                "value": [
                    "recommendation"
                ]
            }
        }
    }

'docs' 将根据映射进行转换

'text' will have two fields: 'text' and 'tag':
[{"text": "aaa"}] -> [{"text":"aaa", "tag":"aaa"}]
  1. plugin_manager 将输入转换后的文档和参数
docs = cls.recommendation.process(docs=docs, params=params, plugin_name='recommendation')
  1. process method 内部调用plug_method (plugin_helper.py)

  2. 'docs' 将根据 plugin_helper 进行转换

[{"text": "aaa"}] -> [{"text":"aaa", "tag":{'yong_ppl': 1.1, 'sports': 2.1, 'football': 3.2}}]

图片搜索的插件

更新 soco-search-worker 的 docker 镜像

sudo docker build -f docker/Dockerfile.worker -t convmind/soco-search:0.1-worker .
sudo docker push convmind/soco-search:0.1-worker
sudo docker-compose -f docker/docker-compose.yml up
Clone this wiki locally