An “encoding” describes the correspondence
 between CHARACTERs and raw bytes during input/output via
 STREAMs with STREAM-ELEMENT-TYPE CHARACTER.
An EXT:ENCODING is an object composed of the following facets:
CHARACTERs that
    can be represented and passed through the I/O channel, and the way
    these characters translate into raw bytes, i.e., the map between
    sequences of CHARACTER and (UNSIGNED-BYTE 8) in the form of STRINGs
    and (VECTOR (UNSIGNED-BYTE 8)) as well as character and byte STREAMs.
    In this context, for example, CHARSET:UTF-8 and CHARSET:UCS-4
    are considered different, although they can represent the same set
    of characters.EXT:ENCODINGs are also TYPEs.  As such, they represent the set of
 characters encodable in the character set.  In this context, the way
 characters are translated into raw bytes is ignored, and the line
 terminator mode is ignored as well.  TYPEP and SUBTYPEP can be used
 on encodings:
(SUBTYPEPCHARSET:UTF-8CHARSET:UTF-16) ⇒; ⇒T(TSUBTYPEPCHARSET:UTF-16CHARSET:UTF-8) ⇒; ⇒T(TSUBTYPEPCHARSET:ASCII CHARSET:ISO-8859-1) ⇒; ⇒T(TSUBTYPEPCHARSET:ISO-8859-1 CHARSET:ASCII) ⇒; ⇒NILT
“1:1” encodings. Encodings which define a bijection between character and byte
 sequences are called “1:1” encodings. CHARSET:ISO-8859-1 is an example of such an
 encoding: any byte sequence corresponds to some character sequence and
 vice versa.  ASCII, however, is not a “1:1” encoding: there are no
 characters for bytes in the range [128;255]. CHARSET:UTF-8 is not a
 “1:1” encoding either: some byte sequences do not correspond to any character
 sequence.
The following character sets are supported, as values of the corresponding (constant) symbol in the “CHARSET” package:
Symbols in package “CHARSET”
UCS-2
   ≡ UNICODE-16
   ≡ UNICODE-16-BIG-ENDIAN,
   the 16-bit basic multilingual plane of the UNICODE character set.
   Every character is represented as two bytes.UNICODE-16-LITTLE-ENDIAN
    UCS-4
   ≡ UNICODE-32
   ≡ UNICODE-32-BIG-ENDIAN,
   the 21-bit UNICODE character set. Every character is represented as
   four bytes. This encoding is used by CLISP internally.UNICODE-32-LITTLE-ENDIANUTF-8,
   the 21-bit UNICODE character set.
   Every character is represented as one to four bytes.
   ASCII characters represent themselves and need one byte per character.
   Most Latin/Greek/Cyrillic/Hebrew characters need two bytes per
   character. Most other characters need three bytes per character,
   and the rarely used remaining characters need four bytes per
   character. This is therefore, in general, the most space-efficient
   encoding of all of Unicode.UTF-16,
   the 21-bit UNICODE character set. Every character in the 16-bit
   basic multilingual plane is represented as two bytes, and the
   rarely used remaining characters need four bytes per character.
   This character set is only available on
                                 platforms with GNU libc or GNU libiconv.UTF-7,
   the 21-bit UNICODE character set. This is a stateful 7-bit encoding.
   Not all ASCII characters represent themselves.
   This character set is only available on
                                 platforms with GNU libc or GNU libiconv.JAVA,
   the 21-bit UNICODE character set.
   ASCII characters represent themselves and need one byte per character.
   All other characters of the basic multilingual plane are represented
   by \unnnn sequences
   (nnnn a hexadecimal number)
   and need 6 bytes per character. The remaining characters are represented
   by \uxxxx\uyyyy
   and need 12 bytes per character. While this encoding is very comfortable
   for editing Unicode files using only ASCII-aware tools and editors, it
   cannot faithfully represent all UNICODE text. Only text which
   does not contain \u (backslash followed by
   lowercase Latin u) can be faithfully represented by this encoding.
 ASCII,
   the well-known US-centric 7-bit character set (American Standard
   Code for Information Interchange - ASCII).ISO-8859-1,
   an extension of the ASCII character set, suitable for the Afrikaans, Albanian, Basque, Breton, Catalan,
   Cornish, Danish, Dutch, English, Færoese, Finnish, French,
   Frisian, Galician, German, Greenlandic, Icelandic, Irish, Italian,
   Latin, Luxemburgish, Norwegian, Portuguese, Ræto-Romanic,
   Scottish, Spanish, and Swedish languages.
This encoding has the nice property that
(LOOP:for i :from 0 :toCHAR-CODE-LIMIT:for c = (CODE-CHARi) :always (OR(NOT(TYPEPc CHARSET:ISO-8859-1)) (EQUALP(EXT:CONVERT-STRING-TO-BYTES(STRINGc) CHARSET:ISO-8859-1) (VECTORi)))) ⇒T
   i.e., it is compatible with CLISP CODE-CHAR/CHAR-CODE
   in its own domain.
ISO-8859-2,
   an extension of the ASCII character set, suitable for the Croatian, Czech, German, Hungarian, Polish,
   Slovak, Slovenian, and Sorbian languages. ISO-8859-3,
   an extension of the ASCII character set, suitable for the Esperanto and Maltese languages.ISO-8859-4,
   an extension of the ASCII character set, suitable for the Estonian, Latvian, Lithuanian and Sami (Lappish)
   languages.ISO-8859-5,
   an extension of the ASCII character set, suitable for the Bulgarian, Byelorussian, Macedonian, Russian,
   Serbian, and Ukrainian languages.ISO-8859-6,
   suitable for the Arabic language.ISO-8859-7,
   an extension of the ASCII character set, suitable for the Greek language.ISO-8859-8,
   an extension of the ASCII character set, suitable for the Hebrew language (without punctuation).ISO-8859-9,
   an extension of the ASCII character set, suitable for the Turkish language.ISO-8859-10,
   an extension of the ASCII character set, suitable for the Estonian, Icelandic, Inuit (Greenlandic), Latvian,
   Lithuanian, and Sami (Lappish) languages.ISO-8859-13,
   an extension of the ASCII character set, suitable for the Estonian, Latvian, Lithuanian, Polish and Sami
   (Lappish) languages.ISO-8859-14,
   an extension of the ASCII character set, suitable for the Irish Gælic, Manx Gælic, Scottish
   Gælic, and Welsh languages.ISO-8859-15,
   an extension of the ASCII character set, suitable for the ISO-8859-1 languages, with improvements for
   French, Finnish and the Euro.ISO-8859-16
   an extension of the ASCII character set, suitable for the Rumanian language.KOI8-R,
   an extension of the ASCII character set, suitable for the Russian language (very popular, especially on the
   internet).KOI8-U,
   an extension of the ASCII character set, suitable for the Ukrainian language (very popular, especially on the
   internet).KOI8-RU,
   an extension of the ASCII character set, suitable for the Russian language. This character set is only available on
                           platforms with GNU libiconv.JIS_X0201,
   a character set for the Japanese language.MAC-ARABIC,
   a platform specific extension of the ASCII character set.MAC-CENTRAL-EUROPE,
   a platform specific extension of the ASCII character set.MAC-CROATIAN,
   a platform specific extension of the ASCII character set.MAC-CYRILLIC,
   a platform specific extension of the ASCII character set.MAC-DINGBAT,
   a platform specific character set.MAC-GREEK,
   a platform specific extension of the ASCII character set.MAC-HEBREW,
   a platform specific extension of the ASCII character set.MAC-ICELAND,
   a platform specific extension of the ASCII character set.MAC-ROMAN
   ≡ MACINTOSH,
   a platform specific extension of the ASCII character set.MAC-ROMANIA,
   a platform specific extension of the ASCII character set.MAC-SYMBOL,
   a platform specific character set.MAC-THAI,
   a platform specific extension of the ASCII character set.MAC-TURKISH,
   a platform specific extension of the ASCII character set.MAC-UKRAINE,
   a platform specific extension of the ASCII character set.CP437, a DOS oldie,
   a platform specific extension of the ASCII character set.CP437-IBM,
   an IBM variant of CP437.CP737, a DOS oldie,
   a platform specific extension of the ASCII character set, meant to be suitable for the Greek language.CP775, a DOS oldie,
   a platform specific extension of the ASCII character set, meant to be suitable for some Baltic languages.CP850, a DOS oldie,
   a platform specific extension of the ASCII character set.CP852, a DOS oldie,
   a platform specific extension of the ASCII character set.CP852-IBM,
   an IBM variant of CP852.CP855, a DOS oldie,
   a platform specific extension of the ASCII character set, meant to be suitable for the Russian language.CP857, a DOS oldie,
   a platform specific extension of the ASCII character set, meant to be suitable for the Turkish language.CP860, a DOS oldie,
   a platform specific extension of the ASCII character set, meant to be suitable for the Portuguese language.CP860-IBM,
   an IBM variant of CP860.CP861, a DOS oldie,
   a platform specific extension of the ASCII character set, meant to be suitable for the Icelandic language.CP861-IBM,
   an IBM variant of CP861.CP862, a DOS oldie,
   a platform specific extension of the ASCII character set, meant to be suitable for the Hebrew language.CP862-IBM,
   an IBM variant of CP862.CP863, a DOS oldie,
   a platform specific extension of the ASCII character set.CP863-IBM,
   an IBM variant of CP863.CP864, a DOS oldie,
   meant to be suitable for the Arabic language.CP864-IBM,
   an IBM variant of CP864.
 CP865, a DOS oldie,
   a platform specific extension of the ASCII character set, meant to be suitable for some Nordic languages.CP865-IBM,
   an IBM variant of CP865.
 CP866, a DOS oldie,
   a platform specific extension of the ASCII character set, meant to be suitable for the Russian language.CP869, a DOS oldie,
   a platform specific extension of the ASCII character set, meant to be suitable for the Greek language.CP869-IBM,
   an IBM variant of CP869.
 CP874, a DOS oldie,
   a platform specific extension of the ASCII character set, meant to be suitable for the Thai language.CP874-IBM,
   an IBM variant of CP874.
 WINDOWS-1250
   ≡ CP1250,
   a platform specific extension of the ASCII character set, heavily incompatible with ISO-8859-2.
 WINDOWS-1251
   ≡ CP1251,
   a platform specific extension of the ASCII character set, heavily incompatible with ISO-8859-5,
   meant to be suitable for the Russian language.WINDOWS-1252
   ≡ CP1252,
   a platform specific extension of the ISO-8859-1 character set.
 WINDOWS-1253
   ≡ CP1253,
   a platform specific extension of the ASCII character set, gratuitously incompatible with ISO-8859-7,
   meant to be suitable for the Greek language.WINDOWS-1254
   ≡ CP1254,
   a platform specific extension of the ISO-8859-9 character set.
 WINDOWS-1255
   ≡ CP1255,
   a platform specific extension of the ASCII character set, gratuitously incompatible with ISO-8859-8,
   suitable for the Hebrew language.
   This character set is only available on
                                 platforms with GNU libc or GNU libiconv.WINDOWS-1256
   ≡ CP1256,
   a platform specific extension of the ASCII character set, meant to be suitable for the Arabic language.WINDOWS-1257
   ≡ CP1257,
   a platform specific extension of the ASCII character set.WINDOWS-1258
   ≡ CP1258, a platform specific extension of the ASCII character set, meant to be suitable for the
   Vietnamese language. This character set is only available on
                                 platforms with GNU libc or GNU libiconv.HP-ROMAN8,
   a platform specific extension of the ASCII character set.NEXTSTEP,
   a platform specific extension of the ASCII character set.EUC-JP,
   a multibyte character set for the Japanese language.
   This character set is only available on
                                 platforms with GNU libc or GNU libiconv.SHIFT-JIS,
   a multibyte character set for the Japanese language.
   This character set is only available on
                                 platforms with GNU libc or GNU libiconv.CP932,
   a Microsoft variant of SHIFT-JIS.
   This character set is only available on
                                 platforms with GNU libc or GNU libiconv.ISO-2022-JP,
   a stateful 7-bit multibyte character set for the Japanese language.
   This character set is only available on
                                 platforms with GNU libc or GNU libiconv.ISO-2022-JP-2,
   a stateful 7-bit multibyte character set for the Japanese language.
   This character set is only available on platforms with GNU libc 2.3
   or newer or GNU libiconv.ISO-2022-JP-1,
   a stateful 7-bit multibyte character set for the Japanese language.
   This character set is only available on
                           platforms with GNU libiconv.EUC-CN,
   a multibyte character set for simplified Chinese.
   This character set is only available on
                                 platforms with GNU libc or GNU libiconv.HZ,
   a stateful 7-bit multibyte character set for simplified Chinese.
   This character set is only available on
                           platforms with GNU libiconv.GBK,
   a multibyte character set for Chinese,
   This character set is only available on
                                 platforms with GNU libc or GNU libiconv.CP936,
   a Microsoft variant of GBK.
   This character set is only available on
                                 platforms with GNU libc or GNU libiconv.GB18030,
   a multibyte character set for Chinese,
   This character set is only available on
                                 platforms with GNU libc or GNU libiconv.EUC-TW,
   a multibyte character set for traditional Chinese.
   This character set is only available on
                                 platforms with GNU libc or GNU libiconv.BIG5,
   a multibyte character set for traditional Chinese.
   This character set is only available on
                                 platforms with GNU libc or GNU libiconv.CP950,
   a Microsoft variant of BIG5.
   This character set is only available on
                                 platforms with GNU libc or GNU libiconv.BIG5-HKSCS,
   a multibyte character set for traditional Chinese.
   This character set is only available on
                                 platforms with GNU libc or GNU libiconv.ISO-2022-CN,
   a stateful 7-bit multibyte character set for Chinese.
   This character set is only available on
                                 platforms with GNU libc or GNU libiconv.ISO-2022-CN-EXT,
   a stateful 7-bit multibyte character set for Chinese.
   This character set is only available on
                                 platforms with GNU libc or GNU libiconv.EUC-KR,
   a multibyte character set for Korean.
   This character set is only available on
                                 platforms with GNU libc or GNU libiconv.CP949,
   a Microsoft variant of EUC-KR.
   This character set is only available on
                                 platforms with GNU libc or GNU libiconv.ISO-2022-KR,
   a stateful 7-bit multibyte character set for Korean.
   This character set is only available on
                                 platforms with GNU libc or GNU libiconv.JOHAB,
   a multibyte character set for Korean used mostly on DOS.
   This character set is only available on
                                 platforms with GNU libc or GNU libiconv.ARMSCII-8,
   an extension of the ASCII character set, suitable for the Armenian. This character set is only available on
                                 platforms with GNU libc or GNU libiconv.GEORGIAN-ACADEMY,
   an extension of the ASCII character set, suitable for the Georgian. This character set is only available on
                                 platforms with GNU libc or GNU libiconv.GEORGIAN-PS,
   an extension of the ASCII character set, suitable for the Georgian. This character set is only available on
                                 platforms with GNU libc or GNU libiconv.TIS-620,
   an extension of the ASCII character set, suitable for the Thai. This character set is only available on
                                 platforms with GNU libc or GNU libiconv.MULELAO-1,
   an extension of the ASCII character set, suitable for the Laotian. This character set is only available on
                           platforms with GNU libiconv.CP1133,
   an extension of the ASCII character set, suitable for the Laotian. This character set is only available on
                                 platforms with GNU libc or GNU libiconv.VISCII,
   an extension of the ASCII character set, suitable for the Vietnamese. This character set is only available on
                                 platforms with GNU libc or GNU libiconv.TCVN,
   an extension of the ASCII character set, suitable for the Vietnamese. This character set is only available on
                                 platforms with GNU libc or GNU libiconv.BASE64, encodes
  arbitrary byte sequences with 64 ASCII characters 
   ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/
  
 as specifined by MIME; 3 bytes are encoded with 4
  characters, line breaks are inserted after every 76 characters.
While this is not a traditional character set (i.e., it does
  not map a set of characters in a natural language into bytes), it
  does define a map between arbitrary byte sequences and certain
  character sequences, so it falls naturally into the EXT:ENCODING class.
The character sets provided by the library function
 iconv can also be used as encodings.  To create such an encoding,
 call EXT:MAKE-ENCODING with the character set name (a string) as the
 :CHARSET argument.
When an EXT:ENCODING is available both as a built-in and
 through iconv, the built-in is used, because it is more
 efficient and available across all platforms.
These encodings are not assigned to global variables, since
 there is no portable way to get the list of all character sets
 supported by iconv.
On standard-compliant UNIX systems (e.g., GNU systems, such as GNU/Linux and GNU/Hurd) and on systems with GNU libiconv you get this list by calling the program: iconv -l.
The reason we use only GNU libc 2.2 or GNU libiconv is
 that the other iconv implementations are broken in various ways and
 we do not want to deal with random CLISP crashes caused by those bugs.
 If your system supplies an iconv implementation which passes the
 GNU libiconv's test suite, please report that
 to clisp-list and a
 future CLISP version will use iconv on your system.
The line terminator mode can be one of the following three keywords:
Windows programs typically use the :DOS line terminator,
 sometimes they also accept :UNIX line terminators or produce
 :MAC line terminators.
The HTTP protocol also requires :DOS line terminators.
The line terminator mode is relevant only for output (writing to a
 file/pipe/socket STREAM).  During input, all three kinds of line terminators
 are recognized.  See also Section 13.11, “Treatment of Newline during Input and Output   sec_13-1-8”.
EXT:MAKE-ENCODINGThe function (
 returns an EXT:MAKE-ENCODING &KEY :CHARSET
     :LINE-TERMINATOR :INPUT-ERROR-ACTION :OUTPUT-ERROR-ACTION)EXT:ENCODING. The :CHARSET argument may be
 an encoding, a string, or :DEFAULT.
 The possible values for the line terminator argument are the
 keywords :UNIX, :MAC, :DOS.
The :INPUT-ERROR-ACTION argument specifies
 what happens when an invalid byte sequence is encountered while
 converting bytes to characters.  Its value can be :ERROR, :IGNORE
 or a character to be used instead.  The UNICODE character
 #\uFFFD is typically used to indicate an error in the
 input sequence.
The :OUTPUT-ERROR-ACTION argument specifies
 what happens when an invalid character is encountered while converting
 characters to bytes.  Its value can be :ERROR, :IGNORE, a byte to
 be used instead, or a character to be used instead.  The UNICODE
 character #\uFFFD can be used here only if it is
 encodable in the character set.
EXT:ENCODING-CHARSETThe function ( returns the
 charset of the EXT:ENCODING-CHARSET encoding)encoding, as a SYMBOL or a STRING.
( is
  not necessarily a valid STRING (EXT:ENCODING-CHARSET encoding))MIME name.
Besides every file/pipe/socket STREAM containing an encoding,
 the following SYMBOL-MACRO places contain global EXT:ENCODINGs:
SYMBOL-MACRO CUSTOM:*DEFAULT-FILE-ENCODING*. The SYMBOL-MACRO place CUSTOM:*DEFAULT-FILE-ENCODING* is the encoding used for
 new file/pipe/socket STREAM, when no :EXTERNAL-FORMAT argument was specified.
The following are SYMBOL-MACRO places.
CUSTOM:*PATHNAME-ENCODING*
   is the encoding used for converting filenames in the
   file system (represented with byte sequences by the OS) to lisp
   PATHNAME components (STRINGs).
   If this encoding is incompatible with some file names on your system,
   file system access (e.g., DIRECTORY) may SIGNAL ERRORs,
   thus extreme caution is recommended if this is not a “1:1” encoding.
   Sometimes it may not be obvious that the encoding is involved at all.
   E.g., on Win32: 
(PARSE-NAMESTRING(STRING#\ARMENIAN_SMALL_LETTER_RA)) *** - PARSE-NAMESTRING: syntax error in filename "ռ" at position 0
when CUSTOM:*PATHNAME-ENCODING* is CHARSET:UTF-16 because then
   #\ARMENIAN_SMALL_LETTER_RA corresponds
   to the 4 bytes #(255 254 124 5)
   and the byte 124 is not a valid
   byte for a Win32 file name because it
   means | in ASCII.
The set of valid pathname bytes is
   determined by the GNU autoconf test
   src/m4/filecharset.m4
   at configure time. While rather stable for the first 127 bytes,
   on Win32 it varies wildly for the bytes 128-256, depending on the
   OS version and the file system.
The line terminator mode of CUSTOM:*PATHNAME-ENCODING* is ignored.
Platform Dependent: Mac OS X platform only:
   Mac OS X pathnames are actually UNICODE STRINGs, so
   CUSTOM:*PATHNAME-ENCODING* is a constant with value CHARSET:UTF-8.
CUSTOM:*TERMINAL-ENCODING*
   *TERMINAL-IO*.
 CUSTOM:*MISC-ENCODING*
   CUSTOM:*FOREIGN-ENCODING*
   The default encoding objects are initialized according to .-Edomain encoding
You have to use EXT:LETF/EXT:LETF*
  for SYMBOL-MACROs; LET/LET* will not work!
The line terminator facet of the above EXT:ENCODINGs is determined by
 the following logic: since CLISP understands all possible
 line terminators on input (see
 Section 13.11, “Treatment of Newline during Input and Output   sec_13-1-8”), all that matters is what line terminator
 do most other programs expect?
O_BINARY cpp
   constant is defined, we assume that the OS distinguishes between text
   and binary files, and, since the encodings are relevant only for text
   files, we thus use :DOS; otherwise the default is :UNIX.
:DOS.This boils down to the following code
 in src/encoding.d:
#if defined(WIN32) || (defined(UNIX) && (O_BINARY != 0))
Both of the above tests
  pass on Cygwin, so the default line terminator is :DOS.
  If you so desire, you can change it in your RC file.
Encodings can also be used to convert directly between strings and their corresponding byte vector representation according to that encoding.
(EXT:CONVERT-STRING-FROM-BYTES
      vector encoding &KEY :START :END)vector (a (VECTOR (UNSIGNED-BYTE 8)))
   from start to end to a STRING, according to the given
   encoding, and returns the resulting string.
(EXT:CONVERT-STRING-TO-BYTES
      string encoding &KEY :START :END)string from
   start to end to a (VECTOR (UNSIGNED-BYTE 8)), according to the given
   encoding, and returns the resulting byte vector.
| These notes document CLISP version 2.49.93+ | Last modified: 2018-02-19 |