You can select any of the following for the knowledge base’s language:
- Chinese (Cantonese) — Hong Kong
- Chinese (Simplified) — China
- Chinese (Traditional) — Taiwan
- Dutch — Netherlands
- English — Australia
- English — Canada
- English — Great Britain
- English — India
- English — United States
- French — Canada
- French — France
- German — Germany
- Italian — Italy
- Japanese — Japan
- Korean — Korea
- Portuguese — Brazil
- Portuguese — Portugal
- Spanish — Latin America
- Spanish — Mexico
- Spanish — Spain
- Turkish — Turkey
Specifying the knowledge base's language
When you create an internal or external knowledge base, you specify its language:
As mentioned elsewhere (example), make an accurate selection.
In general, keep things consistent across integrations. The language of the knowledge base should be the same as the language of the bot in which the knowledge base is used.
Supporting cross-lingual queries and mixed-language knowledge bases
Many of our brands have a global presence or a presence in countries with multiple national languages. So naturally they want their Conversational AI solution to support multilingual scenarios. If this is you, now is the time to start exploring. KnowledgeAI offers experimental support for:
- Cross-lingual queries
- Mixed-language knowledge bases
Let’s define terms.
Cross-lingual queries are those where the consumer’s query is in one language, but the knowledge article is in another. For example:
- The consumer’s query is in Spanish, but the article is in English.
- The consumer’s query is in Italian, but the article is in German.
Mixed-language knowledge bases are those that contain content in two or more languages. By their nature, mixed-language knowledge bases can lead to cross-lingual queries.
Both of these features are intended only for solutions that also use answers enriched via Generative AI. Why? Consider the following scenario:
- The consumer sends a query in Spanish.
- An English-language article in the knowledge base is matched and sent to the LLM service for enrichment.
- The enriched answer is returned from the LLM service.
In what language is the enriched answer? Our early testing indicates that the LLM service is likely to generate an answer in the same language as the consumer’s query. We’re researching this now and working to strengthen this outcome.
Again, support for cross-lingual queries and mixed-language knowledge bases is experimental. Explore these features in your demo Messaging bots (there's no support in Voicebots) and in your Conversation Assist solution. Learn alongside us. And share your feedback! As always, proceed with care: Test thoroughly before rolling out to Production.
Setup includes just one step: Configure the Language setting in the knowledge base as follows:
- If the content in the knowledge base is in a language other than English, the choice here is easy. Select the primary language of the content. Behind the scenes, when searching the knowledge base for articles, a multilingual embedding model is used.
- If the content in the knowledge base is in English, and you only need to support English-language consumer queries, the choice here is easy too. Select the variant of English from the dropdown. In this case, an English-language embedding model is used.
- If the content in the knowledge base is entirely in English and you need to support cross-lingual queries, or if it’s in English and one or more other languages (which also means you need to support cross-lingual queries), select “Other” from the dropdown. In this case, the multilingual embedding model is used. The multilingual model is required to support cross-lingual queries.
The multilingual embedding model performs very well. However, the performance of the English-language model is even better. So, if your content is mostly in English and the queries of your consumers are mostly in English, we recommend that you select a variant of English from the dropdown. Your choice will depend on your use case and priorities.
Mixed-language knowledge bases shouldn’t be common. Use them when you have some content that needs to go in one language and other content that needs to go in another language. Don’t include the same content in two or more languages. This isn’t efficient, as the LLM will translate the answer during enrichment.
Working with special language characters
If you need to support special language characters (e.g., ö, ü, ß), and you’re creating an internal knowledge base by importing a CSV file, ensure the import file is saved as a UTF-8 encoded CSV file beforehand.