Encoding Menu
These entries influence the file encoding of the active file – how the underlying bytes of the file are interpreted as glyphs, and how the characters you enter are saved as underlying bytes. The New Document preferences will influence which Encoding is selected for a new file, and the [MISC > Autodetect character encoding](../preferences/ #misc) preference will affect what encoding will be selected when the file is first read from disk.
There are the major encodings found at the beginning of the Encodings menu:
ANSI: A family of encodings based on the active Windows Code Page – most “ANSI” codepages are sets of 256 characters (8-bits); but Windows also allows you to set the “ANSI” codepage to Japanese/Shift-JIS, Simplified Chinese/GBK, Korean Unified Hangul Code, and Traditional Chinese/Big5; and starting in recent Windows 11, also to set the codepage to Unicode UTF-8, which is described more below. Whatever code page your OS is set to use (and thus the one that shows up in the ?-menu’s Debug Info asCurrent ANSI codepage), that is what theANSIencoding refers to. (It was named generically, because historically, people have thought of their default codepage as the “ANSI” codepage. In the US, that code page is usually Windows-1252, but it depends on your Windows settings.)UTF-8: This encoding uses variable-width multi-byte sequences to represent Unicode characters, either without or with the BOM character at the start of the file. (The BOM isn’t technically part of the UTF-8 spec, because there isn’t a Little Endian or Big Endian variant of UTF-8 – the bytes are always in a predefined order. However, many applications use the BOM codepoint to indicate that the file should be interpreted as UTF-8, and Notepad++ supports reading and writing the file with the BOM sequence to support those external applications file-format needs.)UTF-16: This encoding uses two-byte Big Endian or Little Endian sequences to represent Unicode characters.- The various
Character setsfound in the sub-menus allow you to specify any of the various international sets of characters that provide a limited set of glyphs (most are 8-bits, and thus limited to 256 glyphs), rather than the full suite Unicode character. You can use one of these encodings to be able to edit a file from one character set, even if your Windows code page is at a different code page. For example, this allows you to edit a file using the Eastern European ISO 9959-2 character set even if your copy of Windows is setup for Windows 1252.
The ... with BOM entries indicate that it uses the Unicode Byte Order Mark at the start of the file to indicate the correct byte order (big endian or little endian), and in the case of UTF-8, to make it unambiguous that the file is meant to be a UTF-8 Unicode file rather than another 8-bit encoding.
The Convert to ... entries below the separator line will change the encoding (the underlying bytes stored on disk) of the active file, without changing the glyphs. So if you just have the Euro currency symbol € in your file, it will be stored as byte 0x80 if you Convert to ANSI (and are in a Western-European codepage in Windows), as the three-byte sequence 0xE2 0x82 0xAC if you Convert to UTF-8, and as the two byte sequence 0x20 0xAC if you Convert to UTF-16 BE BOM.
The entries above the separator line (without Convert to in the name) show the file’s active encoding or character set. If you change that setting manually, it will leave the bytes in the file the same and change the glyph or glyph sequence that is shown, based on the updated interpretation of the bytes. For example, if you enter the € in a UTF-8 encoded file, and then manually select Encoding > ANSI, suddenly those characters will look something like € (depending on the active Windows code page); this is because UTF-8 € is the three bytes 0xE2 0x82 0xAC, and those three bytes represent three characters when interpreted as ANSI. Or, if you are starting with a character set of Western European > OEM-US (the old DOS box-drawing character set) with the ▓ grey box, if you change to character set to Western European > Windows-1252, it will become the ² superscript 2. (Technically, it doesn’t always just convert the interpretation: if you start with one of the 2-byte UTF-16 encodings, which has a 0x00 byte for the bigger of the two bytes, if you switch the interpretation to ANSI, instead of showing all those 0x00 bytes as NUL characters, it will just not include those bytes in the new interpretation.)
In general, if you want the glyph to stay the same and change the bytes on the disk, then use the Convert to... entries; whereas if the glyphs shown don’t match what you think the bytes of the data should represent, you probably need to use one of the upper entries to change the interpretation of the bytes.
Encoding Auto-Detection
If the file you open is encoded in UTF-16 (which always has the byte order mark “BOM” character), or in UTF-8 with the BOM, then Notepad++ will use the encoding based on the BOM.
If the file is an XML or HTML file, then if the encoding is defined in the declaration/prolog, Notepad++ will use that encoding for the file.
Failing that, if MISC > Autodetect character encoding is enabled, Notepad++ will also analyze some of the byte sequences in the file, and if they match patterns common to one of the character sets, then Notepad++ will use that encoding.
If it still doesn’t have an encoding, then Notepad++ will look to see if it’s 100% ASCII (in which case, it chooses “ANSI” or “UTF-8” depending on the Apply to opened ANSI files setting); or if all of the non-ASCII bytes follow the rules for valid UTF-8, it will use that encoding.
Finally, if the encoding has not yet been decided (regardless of the autodetection status), Notepad++ will choose the encoding based on the system locale or set it to “ANSI”.
If you find that your text with accented characters often gets misinterpreted by Notepad++ (Windows-1255 encoded Hebrew is a common incorrectly-chosen encoding), and if you always or usually just use files that are in the same localization as your Windows is set to, it’s generally recommended to turn off character-encoding autodetection, and Notepad++ will be able to use your system setting without incorrectly guessing some other encoding.
Encoding and Use Unicode UTF-8 for worldwide language support
As of Notepad++ version 8.8.8, the ANSI and Convert to ANSI entries on the Encoding menu are disabled when the Windows setting Use Unicode UTF-8 for worldwide language support is enabled. When that setting is in effect, the system default code page, which ordinarily defines “ANSI” in Windows, is UTF-8; attempting to treat UTF-8 as an ordinary code page does not work properly, which caused erratic behavior prior to version 8.8.8. Since the traditional concept of “ANSI” has no consistent meaning when that Windows setting is enabled, Notepad++ disables ANSI encoding. (But even with that OS option set, Notepad++ can still choose one of the Character Set encodings; it just manually selects that entry, not setting it to “ANSI”.)
Some Windows 11 installations are coming with that option turned on by default. If you need to be able to use the Convert to ANSI action, and you find it’s disabled in Notepad++ v8.8.8 or newer (or if that conversion doesn’t behave as expected on older versions of Notepad++), you can verify in ?-menu’s Debug Info: it will show Current ANSI codepage: 65001 if that Windows OS option is on. If you want to change that Windows OS setting, Microsoft provides multiple paths to that setting, but two of the common ways to find it are:
- Windows Control Panel > Clock & Region (or just Region), go to the Administrative tab on the dialog, using the Change System Locale button, and toggle the Use Unicode UTF-8 for worldwide language support checkmark.
- Windows Settings > Time & Language, in the Language (or Language & Region) section, find the Use Unicode UTF-8 for worldwide language support toggle (it may be not show, in which case, look under the Windows Display Language ▼ pulldown to show it).
Encoding During Editing
Notepad++ does not always edit a document in the same encoding used to store it in its file (which doesn’t influence most users, though some plugins may give you byte-level information about the internal representation of the document, instead of the file itself, which has confused users of some plugins). When the encoding (shown in the Encoding menu and in status bar) is ANSI or UTF-8, you are editing the document in the same encoding as the file; in all other cases (UTF-16 or anything from the Character sets sub-menus), you are editing the document as UTF-8, and Notepad++ converts from or to the chosen encoding when opening or saving the file.
Also, for encodings that have the BOM sequence, Notepad++ will not include the BOM character in the editor panel – but it is in the file on disk during reads and writes; said another way, Notepad++ treats the BOM sequence as “metadata”, and doesn’t include it in the text you are editing. This means you cannot use the editor panel in Notepad++ to look at, add, or remove the BOM character; if you want to change the BOM status for UTF-8, use the Convert to UTF-8-BOM to add the BOM or Convert to UTF-8 to remove the BOM; if the Encoding shows one of the UTF-16 options chosen, or UTF-8-BOM, then you can be confident that Notepad++ will write the BOM when the file saves or read the BOM when the file was loaded.