Development Guides Home >> Guide to Locales

Guide to Locales - Why You Should Use UTF-8
Copy for LLM
Copy page as Markdown for LLMs
View as Markdown
Open this page as Markdown
Open in ChatGPT
Get insights from ChatGPT
Open in Claude
Get insights from Claude
Connect to Cursor
Install MCP server on Cursor
Connect to VS Code
Install MCP server on VS Code

Introduction

The locale system's character set is always utf-8.

While it is possible to use other character sets for the locale system, utf-8 offers many benefits that other character sets lack, and has no known issues. For this reason, we only recommend utf-8 locales.

If you receive character encoding errors or other "garbled" text, read our Troubleshooting documentation.

Why utf-8?

The locale system uses the utf-8 character encoding for several specific reasons:

Character sets and collations are complicated topics. A unified standard simplifies the localization process immensely.
utf-8 support is universal.
utf-8 allows you to list multiple languages in a single interface or file (for example, to create a menu of available locales).
utf-8 ensures that the locale system can interact with external systems (for example, file editors and databases).
Languages like JavaScript and Perl can natively use utf-8 data.

While it is possible for a locale to use another character encoding, we have yet to find a good reason to do so. For this reason, we will not document how to use a different character set. If, however, you think that you have found a valid reason to use another character set, we would be happy to consider it. Reach out to us in Discord with the technical reasons why your project requires a locale in another encoding.

For more information about utf-8, we recommend that you watch Dan Muey's I ♥ Unicode presentation from OSCon 2014.

Introduction

Why utf-8?

Was this helpful?