New language inheritance mechanism for opensearch ​
INFO
This document represents an architecture decision record (ADR) and has been mirrored from the ADR section in our Shopware 6 repository. You can find the original version here
Context ​
Currently, when using Elasticsearch for searching on storefront, we are creating multiple indexes of each language. This would be fine till now however there are a few problems with it:
- We need to manage multiple indexes, if the shop's using multilingual, we need to create several indexes for each language, this is a big problem on cloud especially
- "Indices and shards are therefore not free from a cluster perspective, as there is some level of resource overhead for each index and shard."
- Everytime a record is updated, we need to update that record in every language indexes
- There's currently no fallback mechanism when searching, therefor duplicating default language data for each index is needed, but not every field is translatable, this take more storage for each index
Decision ​
New feature flag ​
We introduce a new feature flag ES_MULTILINGUAL_INDEX
to allow people to opt in to the new multilingual ES index immediately.
New Elasticsearch data mapping structure ​
We changed the approach to Multilingual fields strategy following these criteria
- Each searchable entity now have only one index for all languages (e.g sw_product)
- Each translated field will be mapped as an
object field
, each language_id will be a key in the object - When searching for these fields, use multi-match search with <translated_field>.<context_lang_id>, <translated_field>.<parent_current_lang_id> and <translated_field>.<default_lang_id> as fallback, this way we have a fallback mechanism without needing duplicate data
- Same logic applied when sorting with the help of a painless script (see 3.Sorting below)
- When a new language is added or a record is update, we do a partial update instead of replacing the whole document, this will reduce the request update payload and thus improve indexing performance overall
Example:
1. Create mapping setting ​
OLD structure
// PUT /sw_product
{
"mappings": {
"properties": {
"productNumber": {
"type": "keyword"
},
"name": {
"type": "keyword",
"fields": {
"text": {
"type": "text"
},
"ngram": {
"type": "text",
"analyzer": "sw_ngram_analyzer"
}
}
}
}
}
}
NEW structure
// PUT /sw_product/_mapping
{
"mappings": {
"properties": {
"productNumber": {
"type": "keyword"
},
"name": {
"properties": {
"en": {
"type": "keyword",
"fields": {
"text": {
"type": "text",
"analyzer": "sw_english_analyzer"
},
"ngram": {
"type": "text",
"analyzer": "sw_ngram_analyzer"
}
}
},
"de": {
"type": "keyword",
"fields": {
"text": {
"type": "text",
"analyzer": "sw_german_analyzer"
},
"ngram": {
"type": "text",
"analyzer": "sw_ngram_analyzer"
}
}
}
}
}
}
}
}
2. Searching ​
Assume we're searching products in german
// GET /sw_product/_search
{
"query": {
"multi_match": {
"query": "some keyword",
"fields": [
"title.de.search", // context languge
"title.en.search" // fallback language
],
"type": "best_fields"
}
}
}
3. Sorting ​
We add new painless scripts in Framework/Indexing/Scripts/translated_field_sorting.groovy
and Framework/Indexing/Scripts/numeric_translated_field_sorting.groovy
, this script then will be used when sorting
Example: Sort products by name in DESC
// GET /sw_product/_search
{
"query": {
...
},
"sort": [
{
"_script": {
"type": "string",
"script": {
"id": "translated_field_sorting",
"params": {
"field": "name",
"languages": [
"119317f1d1d1417c9e6fb0059c31a448", // context language
"2fbb5fe2e29a4d70aa5854ce7ce3e20b" // fallback language
]
}
},
"order": "DESC"
}
}
]
}
Adding a new language ​
- When a new language is created, we perform this request to update mapping includes new added language
// PUT /sw_product/_mapping
{
"properties": {
"name": {
"properties": {
"<new_language_id>": {
"type": "keyword",
"fields": {
"text": {
"type": "text",
"analyzer": "<new_language_stop_words_analyzer>"
},
"ngram": {
"type": "text",
"analyzer": "sw_ngram_analyzer"
}
}
}
}
}
}
}
Consequences ​
- From the next major version, old language based indexes will not be used any longer thus could be removed on es cluster
- When the feature is activated, the shop must reindex using command
bin/console es:index
in the next update