Skip to content

New language inheritance mechanism for opensearch

You are viewing an outdated version of the documentation.
Click here to switch to the stable version (v6.6), or use the version switcher on the left to navigate between versions.

New language inheritance mechanism for opensearch

INFO

This document represents an architecture decision record (ADR) and has been mirrored from the ADR section in our Shopware 6 repository. You can find the original version here

Context

Currently, when using Elasticsearch for searching on storefront, we are creating multiple indexes of each language. This would be fine till now however there are a few problems with it:

  • We need to manage multiple indexes, if the shop's using multilingual, we need to create several indexes for each language, this is a big problem on cloud especially
  • "Indices and shards are therefore not free from a cluster perspective, as there is some level of resource overhead for each index and shard."
  • Everytime a record is updated, we need to update that record in every language indexes
  • There's currently no fallback mechanism when searching, therefor duplicating default language data for each index is needed, but not every field is translatable, this take more storage for each index

Decision

New feature flag

We introduce a new feature flag ES_MULTILINGUAL_INDEX to allow people to opt in to the new multilingual ES index immediately.

New Elasticsearch data mapping structure

We changed the approach to Multilingual fields strategy following these criteria

  1. Each searchable entity now have only one index for all languages (e.g sw_product)
  2. Each translated field will be mapped as an object field, each language_id will be a key in the object
  3. When searching for these fields, use multi-match search with <translated_field>.<context_lang_id>, <translated_field>.<parent_current_lang_id> and <translated_field>.<default_lang_id> as fallback, this way we have a fallback mechanism without needing duplicate data
  4. Same logic applied when sorting with the help of a painless script (see 3.Sorting below)
  5. When a new language is added or a record is update, we do a partial update instead of replacing the whole document, this will reduce the request update payload and thus improve indexing performance overall

Example:

1. Create mapping setting

OLD structure

json
// PUT /sw_product
{
    "mappings": {
        "properties": {
            "productNumber": {
                "type": "keyword"
            },
            "name": {
                "type": "keyword",
                "fields": {
                    "text": {
                        "type": "text"
                    },
                    "ngram": {
                        "type": "text",
                        "analyzer": "sw_ngram_analyzer"
                    }
                }
            }
        }
    }
}

NEW structure

json
// PUT /sw_product/_mapping
{
    "mappings": {
        "properties": {
            "productNumber": {
                "type": "keyword"
            },
            "name": {
                "properties": {
                    "en": {
                        "type": "keyword",
                        "fields": {
                            "text": {
                                "type": "text",
                                "analyzer": "sw_english_analyzer"
                            },
                            "ngram": {
                                "type": "text",
                                "analyzer": "sw_ngram_analyzer"
                            }
                        }
                    },
                    "de": {
                        "type": "keyword",
                        "fields": {
                            "text": {
                                "type": "text",
                                "analyzer": "sw_german_analyzer"
                            },
                            "ngram": {
                                "type": "text",
                                "analyzer": "sw_ngram_analyzer"
                            }
                        }
                    }
                }
            }
        }
    }
}

2. Searching

Assume we're searching products in german

json
// GET /sw_product/_search
{
  "query": {
    "multi_match": {
      "query": "some keyword",
      "fields": [
          "title.de.search", // context languge
          "title.en.search" // fallback language
      ],
      "type": "best_fields" 
    }
  }
}

3. Sorting

We add new painless scripts in Framework/Indexing/Scripts/translated_field_sorting.groovy and Framework/Indexing/Scripts/numeric_translated_field_sorting.groovy, this script then will be used when sorting

Example: Sort products by name in DESC

json
// GET /sw_product/_search
{
    "query": {
      ...
    },
    "sort": [
        {
            "_script": {
                "type": "string",
                "script": {
                    "id": "translated_field_sorting",
                    "params": {
                        "field": "name",
                        "languages": [
                            "119317f1d1d1417c9e6fb0059c31a448", // context language
                            "2fbb5fe2e29a4d70aa5854ce7ce3e20b" // fallback language
                        ]
                    }
                },
                "order": "DESC"
            }
        }
    ]
}

Adding a new language

  • When a new language is created, we perform this request to update mapping includes new added language
json
// PUT /sw_product/_mapping
{
    "properties": {
        "name": {
            "properties": {
                "<new_language_id>": {
                    "type": "keyword",
                    "fields": {
                        "text": {
                            "type": "text",
                            "analyzer": "<new_language_stop_words_analyzer>"
                        },
                        "ngram": {
                            "type": "text",
                            "analyzer": "sw_ngram_analyzer"
                        }
                    }
                }
            }
        }
    }
}

Consequences

  • From the next major version, old language based indexes will not be used any longer thus could be removed on es cluster
  • When the feature is activated, the shop must reindex using command bin/console es:index in the next update