Skip to content

映射示例(Mapping Examples)

Yvette Han edited this page May 28, 2021 · 1 revision

基本的映射

以下例子为基本的映射。输入数据有两个字段:标题(title)和文本(text)。我们想要将title索引为关键字(keyword),并将text索引为bm25。

{
  "mappings": {
      "title": [
        {
          "targetFieldName": "title",
          "type": "keyword",
          "configs": []
        }
      ],
      "text": [
        {
          "targetFieldName": "text",
          "type": "text",
          "configs": [
            {
              "key": "lang",
              "value": "zh"
            },
            {
              "key": "index",
              "value": "bm25"
            }
          ]
        }
      ]
    },
    "index_id":"media"
}
    

但是,当我们使用以上映射进行索引时,查询将包含bm25和title的结果,结果将类似于以下内容,该文本字段具有用于BM25搜索的标记化文本。

  "ranks": [
        {
            "_source": {
                "text": "尊敬/的/全国/国政/政协/全国政协/副/主席/刘新成/先生/,/\r\n/\r\n/ / / / /尊敬/的/江西/政府/江西省/省政府/江西省政府/易炼/红/省长/,/\r\n/\r\n/ / / / /尊敬/的/江西/江西省/政协/姚增科/主席/,/\r\n/\r\n/ / / / /尊敬/的/各位/来宾/,/女士/女士们/、/先生/们/、/朋友/们/:/\r\n/\r\n/ / / / /大家/上午/好/!/\r\n/\r\n/ / / / /今天/,/我们/又/一次/相聚/瓷都/,/举办/2020/中国/景德镇/国际/陶瓷/博览/博览会/,/这是/千年/瓷都/景德镇/重振/辉煌/的/金字/招牌/金字招牌/和/重大/平台/,/必将/为/景德镇/陶瓷/文化/传承/保护/、/创新/发展/注入/动力/新动力/、/续写/新篇/篇章/新篇章/。/本届/瓷/博会/一如/既往/一如既往/地/得到/国家/有关/国家有关/部委/、/社会/各界/社会各界/和/众多/品牌/企业/的/大力/支持/大力支持/。/尤其/是/全国/国政/政协/全国政协/副/主席/、/民进/中央/民进中央/常务/常务副/主席/刘新成/先生/亲临/今天/的/开幕/开幕式/,/让/我们/倍受/鼓舞/倍受鼓舞/。/在/此/,/让/我们/用/热烈/的/掌声/对/各位/领导/和/嘉宾/的/到来/,/表示/诚挚/的/欢迎/和/衷心/的/感谢/!",
                "title": "2020中国景德镇国际 陶瓷博览会开幕式主持词",
                "_uid": "7199693360468429045",
                "_doc_id": "60335d2d742df4bb8100bc36"
            },
            "score": 10.86544
        }, ...

因此,为了解决这个问题。我们可以如下定义映射:

{
  "mappings": {
      "title": [
        {
          "targetFieldName": "title",
          "type": "keyword",
          "configs": []
        }
      ],
      "text": [
        {
          "targetFieldName": "text",
          "type": "keyword",
          "configs": [        
          ]
        },
        {
          "targetFieldName": "text_bm25",
          "type": "text",
          "configs": [
            {
              "key": "lang",
              "value": "zh"
            },
            {
              "key": "index",
              "value": "bm25"
            }
          ]
        }
      ]
    },
    "index_id":"media"
}

这里的源字段(source field)有两个目标字段(target fields):text和text_bm25。 一个用于关键字,另一个用于bm25搜索。我们可以进行以下搜索

POST http://{R2BASE_URL}/r2base/v1/search/{index_id}/query
BODY {"query":{"match": {"text_bm25": "hi"}}

进阶的映射

Text Sparta

{
  "mappings":
    {
      "title": [
        {
          "targetFieldName": "title",
          "type": "keyword",
          "configs": []
        }
      ],
      "text": [
        {
          "targetFieldName": "text",
          "type": "keyword",
          "configs": []
        },
        {
          "targetFieldName": "tss",
          "type": "term_score",
          "configs": [
            {
              "key": "encoder_id",
              "value": "bert-base-uncase-ti-log-max-320head-snm"
            }
          ]
        }
      ]
    },
    "index_id":"media"
}

sentence chunking

{
  "mappings":
    {
      "title": [
        {
          "targetFieldName": "title",
          "type": "keyword",
          "configs": []
        }
      ],
      "text": [
        {
          "targetFieldName": "text",
          "type": "keyword",
          "configs": []
        },
        {
          "targetFieldName": "text_bm25",
          "type": "text",
          "configs": [
            {
              "key": "index",
              "value": "bm25"
            },
            {
              "key": "lang",
              "value": "en"
            },
            {
              "key": "chunk_mode",
              "value": {"lang":"en","size":4,"stride":2}
            }
          ]
        }
      ]
    },
    "index_id":"media"
}

当前,我们使用句子级分块(sentence level chunkcing)。 (我们将在以后提供单词级分块(word level chunking),size = 4和stride = 2, 表示每个分块有4个句子,步幅(stride)为2。例如,

# Original Input:
[{'text': 'Acadia University is a predominantly undergraduate university '
          'located in Wolfville, Nova Scotia, Canada with some graduate '
          "programs at the master's level and one at the doctoral level. The "
          'enabling legislation consists of Acadia University Act and the '
          'Amended Acadia University Act 2000. The Wolfville Campus houses '
          'Acadia University Archives and the Acadia University Art Gallery. '
          'Acadia offers over 200 degree combinations in the Faculties of '
          'Arts, Pure and Applied Science, Professional Studies, and Theology. '
          'The student-faculty ratio is 15:1 and the average class size is 28. '
          'Open Acadia offers correspondence and distance education courses. '
          'Acadia began as an extension of Horton Academy (1828), which was '
          'founded in Horton, Nova Scotia, by Baptists from Nova Scotia and '
          "Queen's College (1838). The College was later named Acadia College. "
          'Acadia University, established at Wolfville, Nova Scotia in 1838 '
          'has a strong Baptist religious affiliation. It was designed to '
          'prepar'}]

通过size = 4, stide = 2的分级之后 文档列表变为了原本的四倍大

[{'text': 'Acadia University is a predominantly undergraduate university '
          'located in Wolfville, Nova Scotia, Canada with some graduate '
          "programs at the master's level and one at the doctoral level. The "
          'enabling legislation consists of Acadia University Act and the '
          'Amended Acadia University Act 2000. The Wolfville Campus houses '
          'Acadia University Archives and the Acadia University Art Gallery. '
          'Acadia offers over 200 degree combinations in the Faculties of '
          'Arts, Pure and Applied Science, Professional Studies, and '
          'Theology.'},
 {'text': 'The Wolfville Campus houses Acadia University Archives and the '
          'Acadia University Art Gallery. Acadia offers over 200 degree '
          'combinations in the Faculties of Arts, Pure and Applied Science, '
          'Professional Studies, and Theology. The student-faculty ratio is '
          '15:1 and the average class size is 28. Open Acadia offers '
          'correspondence and distance education courses.'},
 {'text': 'The student-faculty ratio is 15:1 and the average class size is 28. '
          'Open Acadia offers correspondence and distance education courses. '
          'Acadia began as an extension of Horton Academy (1828), which was '
          'founded in Horton, Nova Scotia, by Baptists from Nova Scotia and '
          "Queen's College (1838). The College was later named Acadia "
          'College.'},
 {'text': 'Acadia began as an extension of Horton Academy (1828), which was '
          'founded in Horton, Nova Scotia, by Baptists from Nova Scotia and '
          "Queen's College (1838). The College was later named Acadia College. "
          'Acadia University, established at Wolfville, Nova Scotia in 1838 '
          'has a strong Baptist religious affiliation. It was designed to '
          'prepar'}]

下面是另一个示例(大小= 4,步幅= 2)。当我们有10个句子['0 1 2 3 4 5 6 7 8 9']时,它将如下划分 ['0 1 2 3','2 3 4 5','4 5 6 7','6 7 8 9']

图片搜索 (Image Search)

{
  "mappings":
    {
      "url": [
        {
          "targetFieldName": "url",
          "type": "keyword",
          "configs": []
        },
        {
          "targetFieldName": "tss",
          "type": "term_score",
          "configs": [
            {
              "key": "encoder_id",
              "value": "visualsparta-mscoco-68.2"
            }
          ]
        }
      ]
    },
    "index_id":"media"
}

Visual Search( 视觉搜索)

docs = [{'url': 'https://vincent-soco.s3.amazonaws.com/data/mscoco/val2014/COCO_val2014_000000407825.jpg'}, {'url': 'https://vincent-soco.s3.amazonaws.com/data/mscoco/val2014/COCO_val2014_000000198043.jpg'}, {'url': 'https://vincent-soco.s3.amazonaws.com/data/mscoco/val2014/COCO_val2014_000000058690.jpg'}, {'url': 'https://vincent-soco.s3.amazonaws.com/data/mscoco/val2014/COCO_val2014_000000575882.jpg'}, {'url': 'https://vincent-soco.s3.amazonaws.com/data/mscoco/val2014/COCO_val2014_000000188040.jpg'}, {'url': 'https://vincent-soco.s3.amazonaws.com/data/mscoco/val2014/COCO_val2014_000000388258.jpg'}, {'url': 'https://vincent-soco.s3.amazonaws.com/data/mscoco/val2014/COCO_val2014_000000236370.jpg'}, {'url': 'https://vincent-soco.s3.amazonaws.com/data/mscoco/val2014/COCO_val2014_000000286119.jpg'}, {'url': 'https://vincent-soco.s3.amazonaws.com/data/mscoco/val2014/COCO_val2014_000000007253.jpg'}, {'url': 'https://vincent-soco.s3.amazonaws.com/data/mscoco/val2014/COCO_val2014_000000084762.jpg'}]