-
Notifications
You must be signed in to change notification settings - Fork 2
映射示例(Mapping Examples)
Yvette Han edited this page May 28, 2021
·
1 revision
以下例子为基本的映射。输入数据有两个字段:标题(title)和文本(text)。我们想要将title索引为关键字(keyword),并将text索引为bm25。
{
"mappings": {
"title": [
{
"targetFieldName": "title",
"type": "keyword",
"configs": []
}
],
"text": [
{
"targetFieldName": "text",
"type": "text",
"configs": [
{
"key": "lang",
"value": "zh"
},
{
"key": "index",
"value": "bm25"
}
]
}
]
},
"index_id":"media"
}
但是,当我们使用以上映射进行索引时,查询将包含bm25和title的结果,结果将类似于以下内容,该文本字段具有用于BM25搜索的标记化文本。
"ranks": [
{
"_source": {
"text": "尊敬/的/全国/国政/政协/全国政协/副/主席/刘新成/先生/,/\r\n/\r\n/ / / / /尊敬/的/江西/政府/江西省/省政府/江西省政府/易炼/红/省长/,/\r\n/\r\n/ / / / /尊敬/的/江西/江西省/政协/姚增科/主席/,/\r\n/\r\n/ / / / /尊敬/的/各位/来宾/,/女士/女士们/、/先生/们/、/朋友/们/:/\r\n/\r\n/ / / / /大家/上午/好/!/\r\n/\r\n/ / / / /今天/,/我们/又/一次/相聚/瓷都/,/举办/2020/中国/景德镇/国际/陶瓷/博览/博览会/,/这是/千年/瓷都/景德镇/重振/辉煌/的/金字/招牌/金字招牌/和/重大/平台/,/必将/为/景德镇/陶瓷/文化/传承/保护/、/创新/发展/注入/动力/新动力/、/续写/新篇/篇章/新篇章/。/本届/瓷/博会/一如/既往/一如既往/地/得到/国家/有关/国家有关/部委/、/社会/各界/社会各界/和/众多/品牌/企业/的/大力/支持/大力支持/。/尤其/是/全国/国政/政协/全国政协/副/主席/、/民进/中央/民进中央/常务/常务副/主席/刘新成/先生/亲临/今天/的/开幕/开幕式/,/让/我们/倍受/鼓舞/倍受鼓舞/。/在/此/,/让/我们/用/热烈/的/掌声/对/各位/领导/和/嘉宾/的/到来/,/表示/诚挚/的/欢迎/和/衷心/的/感谢/!",
"title": "2020中国景德镇国际 陶瓷博览会开幕式主持词",
"_uid": "7199693360468429045",
"_doc_id": "60335d2d742df4bb8100bc36"
},
"score": 10.86544
}, ...
因此,为了解决这个问题。我们可以如下定义映射:
{
"mappings": {
"title": [
{
"targetFieldName": "title",
"type": "keyword",
"configs": []
}
],
"text": [
{
"targetFieldName": "text",
"type": "keyword",
"configs": [
]
},
{
"targetFieldName": "text_bm25",
"type": "text",
"configs": [
{
"key": "lang",
"value": "zh"
},
{
"key": "index",
"value": "bm25"
}
]
}
]
},
"index_id":"media"
}
这里的源字段(source field)有两个目标字段(target fields):text和text_bm25。 一个用于关键字,另一个用于bm25搜索。我们可以进行以下搜索
POST http://{R2BASE_URL}/r2base/v1/search/{index_id}/query
BODY {"query":{"match": {"text_bm25": "hi"}}
{
"mappings":
{
"title": [
{
"targetFieldName": "title",
"type": "keyword",
"configs": []
}
],
"text": [
{
"targetFieldName": "text",
"type": "keyword",
"configs": []
},
{
"targetFieldName": "tss",
"type": "term_score",
"configs": [
{
"key": "encoder_id",
"value": "bert-base-uncase-ti-log-max-320head-snm"
}
]
}
]
},
"index_id":"media"
}
{
"mappings":
{
"title": [
{
"targetFieldName": "title",
"type": "keyword",
"configs": []
}
],
"text": [
{
"targetFieldName": "text",
"type": "keyword",
"configs": []
},
{
"targetFieldName": "text_bm25",
"type": "text",
"configs": [
{
"key": "index",
"value": "bm25"
},
{
"key": "lang",
"value": "en"
},
{
"key": "chunk_mode",
"value": {"lang":"en","size":4,"stride":2}
}
]
}
]
},
"index_id":"media"
}
当前,我们使用句子级分块(sentence level chunkcing)。 (我们将在以后提供单词级分块(word level chunking),size = 4和stride = 2, 表示每个分块有4个句子,步幅(stride)为2。例如,
# Original Input:
[{'text': 'Acadia University is a predominantly undergraduate university '
'located in Wolfville, Nova Scotia, Canada with some graduate '
"programs at the master's level and one at the doctoral level. The "
'enabling legislation consists of Acadia University Act and the '
'Amended Acadia University Act 2000. The Wolfville Campus houses '
'Acadia University Archives and the Acadia University Art Gallery. '
'Acadia offers over 200 degree combinations in the Faculties of '
'Arts, Pure and Applied Science, Professional Studies, and Theology. '
'The student-faculty ratio is 15:1 and the average class size is 28. '
'Open Acadia offers correspondence and distance education courses. '
'Acadia began as an extension of Horton Academy (1828), which was '
'founded in Horton, Nova Scotia, by Baptists from Nova Scotia and '
"Queen's College (1838). The College was later named Acadia College. "
'Acadia University, established at Wolfville, Nova Scotia in 1838 '
'has a strong Baptist religious affiliation. It was designed to '
'prepar'}]
通过size = 4, stide = 2的分级之后 文档列表变为了原本的四倍大
[{'text': 'Acadia University is a predominantly undergraduate university '
'located in Wolfville, Nova Scotia, Canada with some graduate '
"programs at the master's level and one at the doctoral level. The "
'enabling legislation consists of Acadia University Act and the '
'Amended Acadia University Act 2000. The Wolfville Campus houses '
'Acadia University Archives and the Acadia University Art Gallery. '
'Acadia offers over 200 degree combinations in the Faculties of '
'Arts, Pure and Applied Science, Professional Studies, and '
'Theology.'},
{'text': 'The Wolfville Campus houses Acadia University Archives and the '
'Acadia University Art Gallery. Acadia offers over 200 degree '
'combinations in the Faculties of Arts, Pure and Applied Science, '
'Professional Studies, and Theology. The student-faculty ratio is '
'15:1 and the average class size is 28. Open Acadia offers '
'correspondence and distance education courses.'},
{'text': 'The student-faculty ratio is 15:1 and the average class size is 28. '
'Open Acadia offers correspondence and distance education courses. '
'Acadia began as an extension of Horton Academy (1828), which was '
'founded in Horton, Nova Scotia, by Baptists from Nova Scotia and '
"Queen's College (1838). The College was later named Acadia "
'College.'},
{'text': 'Acadia began as an extension of Horton Academy (1828), which was '
'founded in Horton, Nova Scotia, by Baptists from Nova Scotia and '
"Queen's College (1838). The College was later named Acadia College. "
'Acadia University, established at Wolfville, Nova Scotia in 1838 '
'has a strong Baptist religious affiliation. It was designed to '
'prepar'}]
下面是另一个示例(大小= 4,步幅= 2)。当我们有10个句子['0 1 2 3 4 5 6 7 8 9']时,它将如下划分 ['0 1 2 3','2 3 4 5','4 5 6 7','6 7 8 9']
{
"mappings":
{
"url": [
{
"targetFieldName": "url",
"type": "keyword",
"configs": []
},
{
"targetFieldName": "tss",
"type": "term_score",
"configs": [
{
"key": "encoder_id",
"value": "visualsparta-mscoco-68.2"
}
]
}
]
},
"index_id":"media"
}
docs = [{'url': 'https://vincent-soco.s3.amazonaws.com/data/mscoco/val2014/COCO_val2014_000000407825.jpg'}, {'url': 'https://vincent-soco.s3.amazonaws.com/data/mscoco/val2014/COCO_val2014_000000198043.jpg'}, {'url': 'https://vincent-soco.s3.amazonaws.com/data/mscoco/val2014/COCO_val2014_000000058690.jpg'}, {'url': 'https://vincent-soco.s3.amazonaws.com/data/mscoco/val2014/COCO_val2014_000000575882.jpg'}, {'url': 'https://vincent-soco.s3.amazonaws.com/data/mscoco/val2014/COCO_val2014_000000188040.jpg'}, {'url': 'https://vincent-soco.s3.amazonaws.com/data/mscoco/val2014/COCO_val2014_000000388258.jpg'}, {'url': 'https://vincent-soco.s3.amazonaws.com/data/mscoco/val2014/COCO_val2014_000000236370.jpg'}, {'url': 'https://vincent-soco.s3.amazonaws.com/data/mscoco/val2014/COCO_val2014_000000286119.jpg'}, {'url': 'https://vincent-soco.s3.amazonaws.com/data/mscoco/val2014/COCO_val2014_000000007253.jpg'}, {'url': 'https://vincent-soco.s3.amazonaws.com/data/mscoco/val2014/COCO_val2014_000000084762.jpg'}]