Search references for UTF. Phrases containing UTF
See searches and references containing UTF!UTF
ASCII-compatible variable-width encoding of Unicode
UTF-8 is a character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from Unicode Transformation
UTF-8
Variable-width encoding of Unicode, using one or two 16-bit code units
UTF-16 (16-bit Unicode Transformation Format) is a character encoding that supports all 1,112,064 valid code points of Unicode. The encoding is variable-length
UTF-16
Topics referred to by the same term
Look up UTF in Wiktionary, the free dictionary. UTF may refer to: Unicode Transformation Format UTF-1 UTF-7 UTF-8 UTF-16 UTF-32 U.T.F. (Undead Task Force)
UTF
Character encoding standard
Unicode Standard itself defines three encodings: UTF-8, UTF-16, and UTF-32, though several others exist. UTF-8 is the most widely used by a large margin,
Unicode
Character encoding for Unicode compatible with EBCDIC
UTF-EBCDIC is a character encoding capable of encoding all 1,112,064 valid character code points in Unicode using 1 to 5 bytes (in contrast to a maximum
UTF-EBCDIC
Using numbers to represent text characters
8859, and Unicode encodings such as UTF-8 and UTF-16. The most popular character encoding on the World Wide Web is UTF-8, which is used in 98.9% of surveyed
Character_encoding
Encoding Unicode characters as 4 bytes per code point
UTF-32 (32-bit Unicode Transformation Format), sometimes called UCS-4, is a fixed-length encoding used to encode Unicode code points that uses exactly
UTF-32
Unicode character
- UTF-8, UTF-16, UTF-32 & BOM: Can a UTF-8 data stream contain the BOM character (in UTF-8 form)? If yes, then can I still assume the remaining UTF-8
Byte_order_mark
Garbled text as a result of incorrect character encodings
8-bit encodings), or the use of variable length encodings (notably UTF-8 and UTF-16). Failed rendering of glyphs due to either missing fonts or missing
Mojibake
UTF-8 string because it only looks for the ASCII '%' character to define a formatting string. All other bytes are printed unchanged. UTF-16 and UTF-32
Comparison of Unicode encodings
Comparison_of_Unicode_encodings
Term for computer data consisting only of unformatted characters of readable material
principle, plain text can be in any encoding, but today usually implies UTF-8. Plain text is different from formatted text, where style information is
Plain_text
Obsolete multibyte encoding for Unicode
UTF-1 is an obsolete method of transforming ISO/IEC 10646/Unicode into a stream of bytes. Its design does not provide self-synchronization, which makes
UTF-1
Unicode block containing some special codepoints and two non-characters
assumes the input is UTF-8, the first and third bytes are valid UTF-8 encodings of ASCII, but the second byte (0xFC) is not valid in UTF-8. The text editor
Specials_(Unicode_block)
Character encoding
UTF-7 (7-bit Unicode Transformation Format) is an obsolete variable-length character encoding for representing Unicode text using a stream of ASCII characters
UTF-7
Encoding scheme for Unicode
The Compatibility Encoding Scheme for UTF-16: 8-Bit (CESU-8) is a variant of UTF-8 that is described in Unicode Technical Report #26. A Unicode code point
CESU-8
Standard set of characters defined by ISO/IEC 10646
conflicts with other encoding forms. The original edition of the UCS defined UTF-16, an extension of UCS-2, to represent code points outside the BMP. A range
Universal_Coded_Character_Set
Overview on Unicode implementation in Microsoft Windows
explicitly to the UTF-16 encoding. Anything else, including UTF-8, is not "Unicode" in Microsoft's outdated language (while UTF-8 and UTF-16 are both Unicode
Unicode_in_Microsoft_Windows
Character encoding standard
points) and encoding (to 8-, 16-, or 32-bit binary formats, called UTF-8, UTF-16, and UTF-32, respectively). ASCII was incorporated into the Unicode (1991)
ASCII
Use of encoding systems for international characters in HTML
current Living Standard published by WHATWG, the only valid encoding is UTF-8. There are two general ways to specify which character encoding is used
Character_encodings_in_HTML
historically been used for storing text on the World Wide Web, though by now UTF-8 is dominant, with all languages at 95% use or higher by some estimates
Popularity_of_text_encodings
Method of encoding characters in a URI
character. (A non-ASCII character is typically converted to its byte sequence in UTF-8, and then each byte value is represented as above.) The reserved character
Percent-encoding
Encoding for a sequence of byte values using 64 printable characters
UVXYZ[`abcdefhijklmpqr". UTF-8 A UTF-8 environment can use non-synchronized continuation bytes as base64: 0b10xxxxxx. See UTF-8#Self-synchronization. 8BITMIME
Base64
Bug in Microsoft Windows
Windows which causes text encoded in ASCII to be interpreted as if it were UTF-16LE, resulting in garbled text. When the string "Bush hid the facts", without
Bush_hid_the_facts
Computer file containing plain text
Freytag, Asmus (2015-12-18). "FAQ – UTF-8, UTF-16, UTF-32 & BOM". The Unicode Consortium. Retrieved 2016-05-30. Yes, UTF-8 can contain a BOM. However, it
Text_file
Email that contains non-ASCII characters in the header
characters (characters which do not exist in the ASCII character set), encoded as UTF-8, in the email header and in supporting mail transfer protocols. The most
International_email
Handling of strings in the C programming language
Unicode literals such as char foo[512] = "φωωβαρ"; (UTF-8) or wchar_t foo[512] = L"φωωβαρ"; (UTF-16 or UTF-32, depends on wchar_t) is implementation defined
C_string_handling
List of humorous technical standards proposals
Morality Sections in Routing Area Drafts," Informational. RFC 4042 – "UTF-9 and UTF-18 Efficient Transformation Formats of Unicode," Informational. Encodes
April Fools' Day Request for Comments
April_Fools'_Day_Request_for_Comments
Parameters defining locale in computer
explicit UTF-8 encoding: $ locale LANG=cs_CZ.UTF-8 LC_CTYPE="cs_CZ.UTF-8" LC_NUMERIC="cs_CZ.UTF-8" LC_TIME="cs_CZ.UTF-8" LC_COLLATE="cs_CZ.UTF-8" LC_MONETARY="cs_CZ
Locale_(computer_software)
Symbol "#!", used in computing
"FAQ UTF-8, UTF-16, UTF-32 & BOM: Can a UTF-8 data stream contain the BOM character (in UTF-8 form)? If yes, then can I still assume the remaining UTF-8
Shebang_(Unix)
American computer scientist known for Unix (born 1943)
expressions and early computer text editors QED and ed, the definition of the UTF-8 encoding, and his work on computer chess that included the creation of
Ken_Thompson
Access control method for the HTTP network communication protocol
realm="User Visible Realm", charset="UTF-8" This parameter indicates that the server expects the client to use UTF-8 for encoding username and password
Basic_access_authentication
Data structure
possible to store every possible ASCII or UTF-8 string. However, it is common to store the subset of ASCII or UTF-8 – every character except NUL – in null-terminated
Null-terminated_string
Process of determining content's charset
pass a UTF-8 validity test. However, badly written charset detection routines do not run the reliable UTF-8 test first, and may decide that UTF-8 is some
Charset_detection
Configuration file for computer networking
Mozilla Firefox 66 and later additionally supports PAC scripts encoded as UTF-8. The function dnsResolve (and similar other functions) performs a DNS lookup
Proxy_auto-config
Relationship between Unicode characters and HTML
HTML document. For UTF-8, the BOM is optional, while it is a must for the UTF-16 and the UTF-32 encodings. (Note: UTF-16 and UTF-32 without the BOM are
Unicode_and_HTML
Archived from the original on 2016-08-30. Retrieved 2016-08-29. "Faq - Utf-8, Utf-16, Utf-32 & Bom". "How to : Load XML from File with Encoding Detection".
List_of_file_signatures
Data-interchange format
backslash-escaped. JSON exchange in an open ecosystem must be encoded in UTF-8. The encoding supports the full Unicode character set, including those
JSON
Continuous group of 65536 Unicode code points
of 17 planes is due to UTF-16, which can encode 220 code points (16 planes) as pairs of words, plus the BMP as a single word. UTF-8 was designed with a
Plane_(Unicode)
File extension
default encoding specifically for property resource bundles is UTF-8, and if an invalid UTF-8 byte sequence is encountered it falls back to ISO-8859-1. Editing
.properties
Computer programmer and co-creator of Go
Unix Programming Environment. With Ken Thompson, he is the co-creator of UTF-8 character encoding. While at Bell Labs, Pike was also involved in the creation
Rob_Pike
Computer file format for a multimedia playlist
of UTF-8 encoding is mandatory in M3U playlists with the M3U8 file extension. The system codepage is usually assumed for .m3u but this is often UTF-8 as
M3U
C programming language standard, current revision
c8rtomb() to convert a narrow multibyte character to UTF-8 encoding and a single code point from UTF-8 to a narrow multibyte character representation respectively
C23_(C_standard_revision)
Windows character set for Latin alphabet
static pages. Almost all websites now use the multi-byte character encoding UTF-8, another superset of ASCII. Some countries or languages show a higher usage
Windows-1252
Windows character set for Cyrillic alphabet
minority of Russian websites use it, with 94.6% of Russian (.ru) websites using UTF-8, and the legacy 8-bit encoding is distant second. In Linux, the encoding
Windows-1251
Relationship between Unicode and email
non-ASCII characters in one of the Unicode transforms negotiating the use of UTF-8 encoding in email addresses and reply codes (SMTPUTF8) sending the information
Unicode_and_email
Aspect of the Unicode standard
distinction has some semantic value and affects the rendering of the text. UTF-8 and UTF-16 (and also some other Unicode encodings) do not allow all possible
Unicode_equivalence
Esoteric programming language
symbols". utf-8.jp. Archived from the original on 2009-07-15. Retrieved 2017-10-25. Hasegawa, Yosuke (July 2009). "UTF-8.jp [2009-07-28]". utf-8.jp. Archived
JSFuck
Purposely unassigned Unicode code points
Additionally, when UTF-16 codes are embedded in LMBCS, the UTF-16 codes corresponding to U+F601 through U+F6FF are substituted for UTF-16 codes which would
Private_Use_Areas
Mail sent using electronic means
images. International email, with internationalized email addresses using UTF-8, is standardized but not widely adopted. The term electronic mail has been
Software library for interpreting regular expressions
with UTF support, the (*UTF) option at the beginning of a pattern can be used instead of setting an external option to invoke UTF-8, UTF-16, or UTF-32 mode
Perl Compatible Regular Expressions
Perl_Compatible_Regular_Expressions
Identifier of the destination where email messages are delivered
above ASCII characters, international characters above U+007F, encoded as UTF-8, are permitted by RFC 6531 when the EHLO specifies SMTPUTF8, though even
Email_address
American screenwriter
Bruckheimer television series E-Ring. He has also created/written the comic book UTF (Undead Task Force) with Tone Rodriguez for APE comics. Reynolds worked as
Scott_Reynolds_(writer)
Foreign function interface for the Java language
functions, which use UTF-16LE encoding on little-endian architectures and UTF-16BE on big-endian architectures, and then use a UTF-16 to UTF-8 conversion routine
Java_Native_Interface
Symbols encoded in computers to make text
system uses the 8-bit byte for each character. Today, the Unicode-based UTF-8 encoding uses a varying number of byte-sized code units to define a code
Character_(computing)
Tactical military truck
and an engine power output of 326 hp (243 kW). Until the Bundeswehr's WLS UTF/GTF awards these designations did not appear on the trucks themselves, and
RMMV HX range of tactical trucks
RMMV_HX_range_of_tactical_trucks
Sets of characters used in the 1980s & 90s
Windows versions support Unicode, new Windows applications should use Unicode (UTF-8) and not 8-bit character encodings. There are two groups of system code
Windows_code_page
Application layer protocol
OK Date: Mon, 23 May 2005 22:38:34 GMT Content-Type: text/html; charset=UTF-8 Content-Length: 155 Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT Server:
HTTP
Format for expressing RDF statements in HTML documents
relationships with other people and things: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3
RDFa
Special character sequences in the C programming language
UTF-8, and UTF-16 for wchar_t: // A single byte with the value 0xC0; not valid UTF-8 char s1[] = "\xC0"; // Two bytes with values 0xC3, 0x80; the UTF-8
Escape_sequences_in_C
Sequence of characters, data type
byte stream format UTF-8 is designed not to have the problems described above for older multibyte encodings. UTF-8, UTF-16 and UTF-32 require the programmer
String_(computer_science)
Complete list of the characters available on most computers
text is not likely to be encoded in UTF-8, since those bytes are invalid in UTF-8. It is also not likely to be UTF-16 in little-endian byte order because
Universal Character Set characters
Universal_Character_Set_characters
Encoding which maps information to a variable number of bits
intended role instead being taken by UTF-8, which does preserve ASCII compatibility. Crispin, M. (2005-04-01). UTF-9 and UTF-18 Efficient Transformation Formats
Variable-length_encoding
Higher-level 7-bit and 8-bit character encoding system
(most UTFs, one exception being the obsolete UTF-1) Representing all characters, including control codes, with multiple bytes (e.g. UTF-16, UTF-32) Mixing
ISO/IEC_2022
Software library
historically used UTF-16, and still does only for Java; while for C/C++ UTF-8 is supported, including the correct handling of "illegal UTF-8". ICU 73.2 has
International Components for Unicode
International_Components_for_Unicode
Latin letter A with circumflex
encoded in UTF-8 and decoded using ISO 8859-1 or Windows-1252, two encodings which are commonly referred to as Western or Western European. In UTF-8, the
Â
Process for converting data into a "standard", "normal", or canonical form
standard, in particular UTF-8, may cause an additional need for canonicalization in some situations. Namely, by the standard, in UTF-8 there is only one valid
Canonicalization
Executable Java file format
moniker "UTF-8 string", are not actually encoded according to the Unicode standard, although it is similar. There are two differences (see UTF-8 for a
Java_class_file
Identifier of a coded character set
encoding schemes (referred to as "transformation formats")—including UTF-8, UTF-16 and UTF-32—but which may or may not actually be accompanied by a CCSID number
CCSID
Program that extracts subtitles from video
YouTube only supports UTF-8. The default encoding for subtitle files in FFmpeg is UTF-8. All text in a Matroska™ file is encoded in UTF-8. This means that
SubRip
Something that represents an idea, process, or physical entity
Unicode Standard itself defines three encodings: UTF-8, UTF-16, and UTF-32, though several others exist. UTF-8 is the most widely used by a large margin,
Symbol
QR code format
recognize it and treat it like a contact ready to import. MeCard is based in UTF-8 (which is ASCII compatible); the fields are separated with one semicolon
MeCard_(QR_code)
Password-based key derivation function
specification was revised to specify that when hashing strings: the string must be UTF-8 encoded the null terminator must be included With this change, the version
Bcrypt
User interface element
background color on hover: <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0">
Mouseover
"Bundeswehr places second UTF order for 5-, 15-tonne trucks". 13 June 2019. ES&T Redaktion (8 January 2021). "Rahmenvertrag UTF-Logistikfahrzeuge stark
List of modern equipment of the German Army
List_of_modern_equipment_of_the_German_Army
Character encoding in which characters are encoded in one or two bytes
and UTF-8 use more than two bytes for some characters, and they support one byte for other characters. Some people use DBCS to mean the UTF-16 and UTF-8
Double-byte_character_set
ConTEXT only supports converting text to UTF-16. Also, it can only use one type of new-line format if converting to UTF-16. Geany supports spell checking via
Comparison_of_text_editors
Unicode Technical Standard
at 2 bytes per symbol through non-locking shifts. SCSU can also switch to UTF-16 internally to handle non-alphabetic languages. Reuters originally developed
Standard Compression Scheme for Unicode
Standard_Compression_Scheme_for_Unicode
Protocol for real-time Internet chat and messaging
ISO-2022-JP. With the common migration from ISO 8859 to UTF-8 on Linux and Unix platforms since about 2002, UTF-8 has become an increasingly popular substitute
IRC
Specification for genealogical data
exporting to GEDCOM format. GEDCOM is defined as a plain text file, using UTF-8 encoding as of version 7.0. This file contains genealogical information
GEDCOM
The Unemployment Trust Fund (UTF) is composed of 59 accounts in the United States Treasury related to unemployment insurance program. Specifically, there
Unemployment_Trust_Fund
Extracting/adding file and/or directory names into archive in either UTF-7, UTF-8 or UTF-16/UCS-2 encoding to support single file/directory name which contains
Comparison_of_file_archivers
2011 American TV series or program
premiered on August 29, 2011. The series follows the Undead Task Force (UTF), a newly formed division of the LAPD, as they are filmed by a camera news
Death Valley (American TV series)
Death_Valley_(American_TV_series)
Basic word processor formerly included with Microsoft Windows
support, enabling WordPad to support multiple languages, but big endian UTF-16/UCS-2 is not supported. It can open Microsoft Word (versions 6.0–2003)
WordPad
Lightweight text editor forked from Pluma
tabs. It fully supports international text through its use of the Unicode UTF-8 encoding. As a general-purpose text editor, Xed supports most standard
Xed
World Wide Web Consortium recommendation
Language SSML. Here is an example PLS document: <?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
Pronunciation Lexicon Specification
Pronunciation_Lexicon_Specification
MIME compatible Unicode compression scheme
MIME-compatible Unicode compression scheme. BOCU-1 combines the wide applicability of UTF-8 with the compactness of Standard Compression Scheme for Unicode (SCSU)
Binary Ordered Compression for Unicode
Binary_Ordered_Compression_for_Unicode
Character encodings standard
applications Unicode and UTF-8 are preferred; authors of new web pages and the designers of new protocols are instructed to use UTF-8 instead. Since 2023
ISO/IEC_8859-9
Human-readable data serialization language
some control characters, and may be encoded in any one of UTF-8, UTF-16 or UTF-32. (Though UTF-32 is not mandatory, it is required for a parser to have
YAML
Character in text processing
The Unicode Consortium. 2025-09-09. ISBN 978-1-936213-35-1. FAQ - UTF-8, UTF-16, UTF-32 & BOM, ”What should I do with U+FEFF in the middle of a file?“
Word_joiner
Programming tool for Windows
This build added support for changing a text resource format: Unicode, UTF-8, ANSI. On October 14, 2016, version 4.5.28 was released. On March 28, 2018
Resource_Hacker
be decoded through a two-stage recoding: first from utf-8 to latin-1, then from windows-1251 to utf-8 (assuming that one works in a Unicode environment)
Comparison_of_email_clients
Digital data interchange format
unsigned) float, floating point numbers (IEEE single/double precision) str, UTF-8 string bin, binary data (up to 232 − 1 bytes) array map, an associative
MessagePack
Collection of Japanese standards for digital character encoding
frameshifts of UTF-8-encoded text will produce invalid UTF-8, but it is possible to construct sequences of characters that remain valid UTF-8 even when frameshifted
JIS_encoding
Markup language and file format
used. Encodings other than UTF-8 and UTF-16 are not necessarily recognized by every XML parser (and in some cases not even UTF-16, even though the standard
XML
HEXAGRAM FOR THE CREATIVE HEAVEN Encodings decimal hex Unicode 19904 U+4DC0 UTF-8 228 183 128 E4 B7 80 Numeric character reference ䷀ ䷀
List of hexagrams of the I Ching
List_of_hexagrams_of_the_I_Ching
Windows character set for Hebrew
Windows-1255, especially on the Internet; meaning UTF-8, the dominant encoding for web pages, or UTF-16. Windows-1255 is used by less than 0.1% of websites
Windows-1255
Text string used to uniquely identify a computer file
of the filename, such as L"\x00C0.txt" (UTF-16, NFC) (Latin capital A with grave) and L"\x0041\x0300.txt" (UTF-16, NFD) (Latin capital A, grave combining)
Filename
E-book format
specification. Unicode is required, and content producers must use either UTF-8 or UTF-16 encoding. This is to support international and multilingual books
EPUB
Set of rules defining correctly structured programs for the Rust programming language
C# syntax On Unix systems, this is often UTF-8 strings without an internal 0 byte. On Windows, this is UTF-16 strings without an internal 0 byte. Unlike
Rust_syntax
Consonant in the Cyrillic alphabet, written as Н
LETTER EN Encodings decimal hex dec hex Unicode 1053 U+041D 1085 U+043D UTF-8 208 157 D0 9D 208 189 D0 BD Numeric character reference Н Н
En_(Cyrillic)
UTF
UTF
UTF
UTF
Girl/Female
Arabic, Muslim
Rain; Blessing
Boy/Male
Hindu
Lustrous
Girl/Female
Indian
River Yamuna, Success
Girl/Female
Tamil
Karunamayee | கரà¯à®¨à®¾à®®à®ˆ
Merciful, Full of pity for others
Male
Greek
(ΧÏÏσης) Greek myth name of a priest of Apollo, derived from the word khrysos, KHRYSES means "golden."
Boy/Male
English
Troy derives from the ancient Greek city of Troy; also from an Irish surname meaning 'soldier.
Girl/Female
Sikh
Golden
Girl/Female
Muslim
Branch, Tributary, Happy, Lucky, Fem of Saeed, Most beautiful, Unmatched, Friendly
Boy/Male
Scottish American Gaelic Latin
From the river's mouth.
Male
Polish
Polish form of Greek Eustakhios, EUSTACHY means "fruitful."
UTF
UTF
UTF
UTF
UTF