Search references for UTF. Phrases containing UTF
See searches and references containing UTF!UTF
ASCII-compatible variable-width encoding of Unicode
UTF-8 is a character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from Unicode Transformation
UTF-8
Variable-width encoding of Unicode, using one or two 16-bit code units
UTF-16 (16-bit Unicode Transformation Format) is a character encoding that supports all 1,112,064 valid code points of Unicode. The encoding is variable-length
UTF-16
Topics referred to by the same term
Look up UTF in Wiktionary, the free dictionary. UTF may refer to: Unicode Transformation Format UTF-1 UTF-7 UTF-8 UTF-16 UTF-32 U.T.F. (Undead Task Force)
UTF
Encoding Unicode characters as 4 bytes per code point
UTF-32 (32-bit Unicode Transformation Format), sometimes called UCS-4, is a fixed-length encoding used to encode Unicode code points that uses exactly
UTF-32
Character encoding standard
Unicode Standard itself defines three encodings: UTF-8, UTF-16, and UTF-32, though several others exist. UTF-8 is the most widely used by a large margin,
Unicode
Using numbers to represent text characters
8859, and Unicode encodings such as UTF-8 and UTF-16. The most popular character encoding on the World Wide Web is UTF-8, which is used in 98.9% of surveyed
Character_encoding
Character encoding for Unicode compatible with EBCDIC
UTF-EBCDIC is a character encoding capable of encoding all 1,112,064 valid character code points in Unicode using 1 to 5 bytes (in contrast to a maximum
UTF-EBCDIC
Unicode character
- UTF-8, UTF-16, UTF-32 & BOM: Can a UTF-8 data stream contain the BOM character (in UTF-8 form)? If yes, then can I still assume the remaining UTF-8
Byte_order_mark
Garbled text as a result of incorrect character encodings
8-bit encodings), or the use of variable length encodings (notably UTF-8 and UTF-16). Failed rendering of glyphs due to either missing fonts or missing
Mojibake
UTF-8 string because it only looks for the ASCII '%' character to define a formatting string. All other bytes are printed unchanged. UTF-16 and UTF-32
Comparison of Unicode encodings
Comparison_of_Unicode_encodings
Obsolete multibyte encoding for Unicode
UTF-1 is an obsolete method of transforming ISO/IEC 10646/Unicode into a stream of bytes. Its design does not provide self-synchronization, which makes
UTF-1
Character encoding
UTF-7 (7-bit Unicode Transformation Format) is an obsolete variable-length character encoding for representing Unicode text using a stream of ASCII characters
UTF-7
Unicode block containing some special codepoints and two non-characters
assumes the input is UTF-8, the first and third bytes are valid UTF-8 encodings of ASCII, but the second byte (0xFC) is not valid in UTF-8. The text editor
Specials_(Unicode_block)
Term for computer data consisting only of unformatted characters of readable material
principle, plain text can be in any encoding, but today usually implies UTF-8. Plain text is different from formatted text, where style information is
Plain_text
Encoding scheme for Unicode
The Compatibility Encoding Scheme for UTF-16: 8-Bit (CESU-8) is a variant of UTF-8 that is described in Unicode Technical Report #26. A Unicode code point
CESU-8
Standard set of characters defined by ISO/IEC 10646
conflicts with other encoding forms. The original edition of the UCS defined UTF-16, an extension of UCS-2, to represent code points outside the BMP. A range
Universal_Coded_Character_Set
Character encoding standard
points) and encoding (to 8-, 16-, or 32-bit binary formats, called UTF-8, UTF-16, and UTF-32, respectively). ASCII was incorporated into the Unicode (1991)
ASCII
Method of encoding characters in a URI
character. (A non-ASCII character is typically converted to its byte sequence in UTF-8, and then each byte value is represented as above.) The reserved character
Percent-encoding
Encoding for a sequence of byte values using 64 printable characters
UVXYZ[`abcdefhijklmpqr". UTF-8 A UTF-8 environment can use non-synchronized continuation bytes as base64: 0b10xxxxxx. See UTF-8#Self-synchronization. 8BITMIME
Base64
Overview on Unicode implementation in Microsoft Windows
explicitly to the UTF-16 encoding. Anything else, including UTF-8, is not "Unicode" in Microsoft's outdated language (while UTF-8 and UTF-16 are both Unicode
Unicode_in_Microsoft_Windows
historically been used for storing text on the World Wide Web, though by now UTF-8 is dominant, with all languages at 95% use or higher by some estimates
Popularity_of_text_encodings
Use of encoding systems for international characters in HTML
current Living Standard published by WHATWG, the only valid encoding is UTF-8. There are two general ways to specify which character encoding is used
Character_encodings_in_HTML
Computer file containing plain text
Freytag, Asmus (2015-12-18). "FAQ – UTF-8, UTF-16, UTF-32 & BOM". The Unicode Consortium. Retrieved 2016-05-30. Yes, UTF-8 can contain a BOM. However, it
Text_file
Bug in Microsoft Windows
Windows which causes text encoded in ASCII to be interpreted as if it were UTF-16LE, resulting in garbled text. When the string "Bush hid the facts", without
Bush_hid_the_facts
Configuration file for computer networking
Mozilla Firefox 66 and later additionally supports PAC scripts encoded as UTF-8. The function dnsResolve (and similar other functions) performs a DNS lookup
Proxy_auto-config
Handling of strings in the C programming language
Unicode literals such as char foo[512] = "φωωβαρ"; (UTF-8) or wchar_t foo[512] = L"φωωβαρ"; (UTF-16 or UTF-32, depends on wchar_t) is implementation defined
C_string_handling
Parameters defining locale in computer
explicit UTF-8 encoding: $ locale LANG=cs_CZ.UTF-8 LC_CTYPE="cs_CZ.UTF-8" LC_NUMERIC="cs_CZ.UTF-8" LC_TIME="cs_CZ.UTF-8" LC_COLLATE="cs_CZ.UTF-8" LC_MONETARY="cs_CZ
Locale_(computer_software)
List of humorous technical standards proposals
Morality Sections in Routing Area Drafts," Informational. RFC 4042 – "UTF-9 and UTF-18 Efficient Transformation Formats of Unicode," Informational. Encodes
April Fools' Day Request for Comments
April_Fools'_Day_Request_for_Comments
Data-interchange format
backslash-escaped. JSON exchange in an open ecosystem must be encoded in UTF-8. The encoding supports the full Unicode character set, including those
JSON
American computer scientist known for Unix (born 1943)
expressions and early computer text editors QED and ed, the definition of the UTF-8 encoding, and his work on computer chess that included the creation of
Ken_Thompson
Email that contains non-ASCII characters in the header
characters (characters which do not exist in the ASCII character set), encoded as UTF-8, in the email header and in supporting mail transfer protocols. The most
International_email
Process of determining content's charset
pass a UTF-8 validity test. However, badly written charset detection routines do not run the reliable UTF-8 test first, and may decide that UTF-8 is some
Charset_detection
Symbol "#!", used in computing
"FAQ UTF-8, UTF-16, UTF-32 & BOM: Can a UTF-8 data stream contain the BOM character (in UTF-8 form)? If yes, then can I still assume the remaining UTF-8
Shebang_(Unix)
Access control method for the HTTP network communication protocol
realm="User Visible Realm", charset="UTF-8" This parameter indicates that the server expects the client to use UTF-8 for encoding username and password
Basic_access_authentication
Continuous group of 65536 Unicode code points
of 17 planes is due to UTF-16, which can encode 220 code points (16 planes) as pairs of words, plus the BMP as a single word. UTF-8 was designed with a
Plane_(Unicode)
Data structure
possible to store every possible ASCII or UTF-8 string. However, it is common to store the subset of ASCII or UTF-8 – every character except NUL – in null-terminated
Null-terminated_string
Symbols encoded in computers to make text
system uses the 8-bit byte for each character. Today, the Unicode-based UTF-8 encoding uses a varying number of byte-sized code units to define a code
Character_(computing)
File extension
default encoding specifically for property resource bundles is UTF-8, and if an invalid UTF-8 byte sequence is encountered it falls back to ISO-8859-1. Editing
.properties
Computer file format for a multimedia playlist
of UTF-8 encoding is mandatory in M3U playlists with the M3U8 file extension. The system codepage is usually assumed for .m3u but this is often UTF-8 as
M3U
Windows character set for Latin alphabet
static pages. Almost all websites now use the multi-byte character encoding UTF-8, another superset of ASCII. Some countries or languages show a higher usage
Windows-1252
Foreign function interface for the Java language
functions, which use UTF-16LE encoding on little-endian architectures and UTF-16BE on big-endian architectures, and then use a UTF-16 to UTF-8 conversion routine
Java_Native_Interface
Archived from the original on 2016-08-30. Retrieved 2016-08-29. "Faq - Utf-8, Utf-16, Utf-32 & Bom". "How to : Load XML from File with Encoding Detection".
List_of_file_signatures
Format for expressing RDF statements in HTML documents
relationships with other people and things: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3
RDFa
C programming language standard, current revision
c8rtomb() to convert a narrow multibyte character to UTF-8 encoding and a single code point from UTF-8 to a narrow multibyte character representation respectively
C23_(C_standard_revision)
Computer programmer and co-creator of Go
Unix Programming Environment. With Ken Thompson, he is the co-creator of UTF-8 character encoding. While at Bell Labs, Pike was also involved in the creation
Rob_Pike
Linked hypertext system on the Internet
browser indicating success: HTTP/1.1 200 OK Content-Type: text/html; charset=UTF-8 Followed by the content of the requested page. Hypertext Markup Language
World_Wide_Web
Windows character set for Cyrillic alphabet
minority of Russian websites use it, with 94.6% of Russian (.ru) websites using UTF-8, and the legacy 8-bit encoding is distant second. In Linux, the encoding
Windows-1251
Esoteric programming language
symbols". utf-8.jp. Archived from the original on 2009-07-15. Retrieved 2017-10-25. Hasegawa, Yosuke (July 2009). "UTF-8.jp [2009-07-28]". utf-8.jp. Archived
JSFuck
Mail sent using electronic means
images. International email, with internationalized email addresses using UTF-8, is standardized but not widely adopted. The term electronic mail has been
The Unemployment Trust Fund (UTF) is composed of 59 accounts in the United States Treasury related to unemployment insurance program. Specifically, there
Unemployment_Trust_Fund
Relationship between Unicode characters and HTML
HTML document. For UTF-8, the BOM is optional, while it is a must for the UTF-16 and the UTF-32 encodings. (Note: UTF-16 and UTF-32 without the BOM are
Unicode_and_HTML
MIME compatible Unicode compression scheme
MIME-compatible Unicode compression scheme. BOCU-1 combines the wide applicability of UTF-8 with the compactness of Standard Compression Scheme for Unicode (SCSU)
Binary Ordered Compression for Unicode
Binary_Ordered_Compression_for_Unicode
Special character sequences in the C programming language
UTF-8, and UTF-16 for wchar_t: // A single byte with the value 0xC0; not valid UTF-8 char s1[] = "\xC0"; // Two bytes with values 0xC3, 0x80; the UTF-8
Escape_sequences_in_C
Aspect of the Unicode standard
distinction has some semantic value and affects the rendering of the text. UTF-8 and UTF-16 (and also some other Unicode encodings) do not allow all possible
Unicode_equivalence
Extracting/adding file and/or directory names into archive in either UTF-7, UTF-8 or UTF-16/UCS-2 encoding to support single file/directory name which contains
Comparison_of_file_archivers
Encoding which maps information to a variable number of bits
intended role instead being taken by UTF-8, which does preserve ASCII compatibility. Crispin, M. (2005-04-01). UTF-9 and UTF-18 Efficient Transformation Formats
Variable-length_encoding
Sets of characters used in the 1980s & 90s
Windows versions support Unicode, new Windows applications should use Unicode (UTF-8) and not 8-bit character encodings. There are two groups of system code
Windows_code_page
Identifier of the destination where email messages are delivered
above ASCII characters, international characters above U+007F, encoded as UTF-8, are permitted by RFC 6531 when the EHLO specifies SMTPUTF8, though even
Email_address
American screenwriter
Bruckheimer television series E-Ring. He has also created/written the comic book UTF (Undead Task Force) with Tone Rodriguez for APE comics. Reynolds worked as
Scott_Reynolds_(writer)
Application layer protocol
OK Date: Mon, 23 May 2005 22:38:34 GMT Content-Type: text/html; charset=UTF-8 Content-Length: 155 Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT Server:
HTTP
Software library
historically used UTF-16, and still does only for Java; while for C/C++ UTF-8 is supported, including the correct handling of "illegal UTF-8". ICU 73.2 has
International Components for Unicode
International_Components_for_Unicode
Complete list of the characters available on most computers
text is not likely to be encoded in UTF-8, since those bytes are invalid in UTF-8. It is also not likely to be UTF-16 in little-endian byte order because
Universal Character Set characters
Universal_Character_Set_characters
Higher-level 7-bit and 8-bit character encoding system
(most UTFs, one exception being the obsolete UTF-1) Representing all characters, including control codes, with multiple bytes (e.g. UTF-16, UTF-32) Mixing
ISO/IEC_2022
Software library for interpreting regular expressions
with UTF support, the (*UTF) option at the beginning of a pattern can be used instead of setting an external option to invoke UTF-8, UTF-16, or UTF-32 mode
Perl Compatible Regular Expressions
Perl_Compatible_Regular_Expressions
Latin letter A with circumflex
encoded in UTF-8 and decoded using ISO 8859-1 or Windows-1252, two encodings which are commonly referred to as Western or Western European. In UTF-8, the
Â
Text editor forked from Pluma
tabs. It fully supports international text through its use of the Unicode UTF-8 encoding. As a general-purpose text editor, Xed supports most standard
Xed
Something that represents an idea, process, or physical entity
Unicode Standard itself defines three encodings: UTF-8, UTF-16, and UTF-32, though several others exist. UTF-8 is the most widely used by a large margin,
Symbol
Process for converting data into a "standard", "normal", or canonical form
standard, in particular UTF-8, may cause an additional need for canonicalization in some situations. Namely, by the standard, in UTF-8 there is only one valid
Canonicalization
World Wide Web Consortium recommendation
Language SSML. Here is an example PLS document: <?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
Pronunciation Lexicon Specification
Pronunciation_Lexicon_Specification
HEXAGRAM FOR THE CREATIVE HEAVEN Encodings decimal hex Unicode 19904 U+4DC0 UTF-8 228 183 128 E4 B7 80 Numeric character reference ䷀ ䷀
List of hexagrams of the I Ching
List_of_hexagrams_of_the_I_Ching
Program that extracts subtitles from video
YouTube only supports UTF-8. The default encoding for subtitle files in FFmpeg is UTF-8. All text in a Matroska™ file is encoded in UTF-8. This means that
SubRip
Password-based key derivation function
specification was revised to specify that when hashing strings: the string must be UTF-8 encoded the null terminator must be included With this change, the version
Bcrypt
Czech physicist (1942–2024)
2023. Retrieved 17 November 2021. "Death notice". utf.mff.cuni. Retrieved 26 January 2024. http://utf.mff.cuni.cz/info/lide/bicak.html Jiří Bičák at IMDb
Jiří_Bičák
Human-readable data serialization language
some control characters, and may be encoded in any one of UTF-8, UTF-16 or UTF-32. (Though UTF-32 is not mandatory, it is required for a parser to have
YAML
U+abcdeF). Computing – UTF-16/Unicode: There are 17 addressable planes in UTF-16, and, thus, as Unicode is limited to the UTF-16 code space, 17 valid
Orders_of_magnitude_(numbers)
Specification for genealogical data
exporting to GEDCOM format. GEDCOM is defined as a plain text file, using UTF-8 encoding as of version 7.0. This file contains genealogical information
GEDCOM
Sequence of characters, data type
byte stream format UTF-8 is designed not to have the problems described above for older multibyte encodings. UTF-8, UTF-16 and UTF-32 require the programmer
String_(computer_science)
Tactical military truck
and an engine power output of 326 hp (243 kW). Until the Bundeswehr's WLS UTF/GTF awards these designations did not appear on the trucks themselves, and
RMMV HX range of tactical trucks
RMMV_HX_range_of_tactical_trucks
Executable Java file format
moniker "UTF-8 string", are not actually encoded according to the Unicode standard, although it is similar. There are two differences (see UTF-8 for a
Java_class_file
Purposely unassigned Unicode code points
Additionally, when UTF-16 codes are embedded in LMBCS, the UTF-16 codes corresponding to U+F601 through U+F6FF are substituted for UTF-16 codes which would
Private_Use_Areas
2011 American TV series or program
premiered on August 29, 2011. The series follows the Undead Task Force (UTF), a newly formed division of the LAPD, as they are filmed by a camera news
Death Valley (American TV series)
Death_Valley_(American_TV_series)
ConTEXT only supports converting text to UTF-16. Also, it can only use one type of new-line format if converting to UTF-16. Geany supports spell checking via
Comparison_of_text_editors
User interface element
background color on hover: <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0">
Mouseover
Spell checker for complex languages
MySpell uses a single-byte character encoding, Hunspell can use Unicode UTF-8-encoded dictionaries. Software with Hunspell support: Hunspell is free
Hunspell
Character encoding in which characters are encoded in one or two bytes
and UTF-8 use more than two bytes for some characters, and they support one byte for other characters. Some people use DBCS to mean the UTF-16 and UTF-8
Double-byte_character_set
Unicode Technical Standard
at 2 bytes per symbol through non-locking shifts. SCSU can also switch to UTF-16 internally to handle non-alphabetic languages. Reuters originally developed
Standard Compression Scheme for Unicode
Standard_Compression_Scheme_for_Unicode
Set of rules defining correctly structured programs for the Rust programming language
C# syntax On Unix systems, this is often UTF-8 strings without an internal 0 byte. On Windows, this is UTF-16 strings without an internal 0 byte. Unlike
Rust_syntax
Relationship between Unicode and email
non-ASCII characters in one of the Unicode transforms negotiating the use of UTF-8 encoding in email addresses and reply codes (SMTPUTF8) sending the information
Unicode_and_email
Character encodings standard
applications Unicode and UTF-8 are preferred; authors of new web pages and the designers of new protocols are instructed to use UTF-8 instead. Since 2023
ISO/IEC_8859-9
Metadata standard in digital images
0, was released in May 2023, and brings, among other things, support for UTF-8 to allow text data in non-ASCII encoding. The Exif tag structure is borrowed
Exif
Character in text processing
The Unicode Consortium. 2025-09-09. ISBN 978-1-936213-35-1. FAQ - UTF-8, UTF-16, UTF-32 & BOM, ”What should I do with U+FEFF in the middle of a file?“
Word_joiner
Collection of Japanese standards for digital character encoding
frameshifts of UTF-8-encoded text will produce invalid UTF-8, but it is possible to construct sequences of characters that remain valid UTF-8 even when frameshifted
JIS_encoding
Playing card
OF SPADES Encodings decimal hex Unicode 127137 U+1F0A1 UTF-8 240 159 130 161 F0 9F 82 A1 UTF-16 55356 56481 D83C DCA1 Numeric character reference 🂡
Ace_of_spades
Protocol for real-time Internet chat and messaging
ISO-2022-JP. With the common migration from ISO 8859 to UTF-8 on Linux and Unix platforms since about 2002, UTF-8 has become an increasingly popular substitute
IRC
be decoded through a two-stage recoding: first from utf-8 to latin-1, then from windows-1251 to utf-8 (assuming that one works in a Unicode environment)
Comparison_of_email_clients
Hamilton Laboratories reported they are working on an update to support UTF-8 characters everywhere internally, allowing high-resolution international
Hamilton_C_shell
Numerical value representing a character in a coded character set
called a code unit. For the UTF-32 encoding, all code points are encoded as one four-byte (octet) binary number; for the UTF-16 encoding, different code
Code_point
C++ wrapper around SQLite 3.x
Since SQLite stores strings in UTF-8 encoding, the wxSQLite3 methods provide automatic conversion between wxStrings and UTF-8 strings. This works best for
WxSQLite3
QR code format
recognize it and treat it like a contact ready to import. MeCard is based in UTF-8 (which is ASCII compatible); the fields are separated with one semicolon
MeCard_(QR_code)
E-book format
specification. Unicode is required, and content producers must use either UTF-8 or UTF-16 encoding. This is to support international and multilingual books
EPUB
UTF
UTF
UTF
UTF
Boy/Male
Arabic
Variant of Qa'im; Upright; Stable
Girl/Female
Gujarati, Hindu, Indian, Kannada, Malayalam, Marathi, Oriya, Punjabi, Sanskrit, Sikh, Sindhi, Tamil, Telugu
Princess of Ujjain; City of Ujjain; Goddess
Girl/Female
British, English, Greek
Jehovah is Gracious
Boy/Male
Afghan, Arabic, Indian, Muslim
Guided to the Right Path
Boy/Male
British, English
A Small Bird
Girl/Female
Christian, Hindu, Indian
Gem; Cloud
Boy/Male
Tamil
Earth
Girl/Female
Sikh
Powerful, Power, Diamond, Darkness
Girl/Female
Teutonic
Tranquil leader.
Boy/Male
Tamil
Oblation, Offerings
UTF
UTF
UTF
UTF
UTF