Codecs library#
Importing#
|#| 'stdcodecs.nest' = cc
Nest encodings#
Note
To avoid any mistakes or confusion it is recommended that the constants defined in this library be used.
Nest supports the following encodings:
| Encoding | Aliases | Link |
|---|---|---|
ascii |
us-ascii |
ASCII |
cp1250 |
cp-1250, windows[-]1250 |
CP1250 |
cp1251 |
cp-1251, windows[-]1251 |
CP1251 |
cp1252 |
cp-1252, windows[-]1252 |
CP1252 |
cp1253 |
cp-1253, windows[-]1253 |
CP1253 |
cp1254 |
cp-1254, windows[-]1254 |
CP1254 |
cp1255 |
cp-1255, windows[-]1255 |
CP1255 |
cp1256 |
cp-1256, windows[-]1256 |
CP1256 |
cp1257 |
cp-1257, windows[-]1257 |
CP1257 |
cp1258 |
cp-1258, windows[-]1258 |
CP1258 |
latin-1 |
latin1, l1, latin, iso[-]8859-1 |
latin1 |
utf8 |
utf-8 |
UTF-8 |
ext-utf8 |
ext[-]utf[-]8 |
- |
utf16le |
utf-16le, utf[-]16 |
UTF-16LE |
utf16be |
utf-16be |
UTF-16BE |
ext-utf16le |
ext[-]utf[-]16le, ext[-]utf[-]16 |
- |
ext-utf16be |
ext[-]utf[-]16be |
- |
utf32le |
utf-32le, utf[-]32 |
UTF-32LE |
utf32be |
utf-32be |
UTF-32BE |
Note
[-] means that the hyphen is optional, for example windows1252 and
windows-1252 are both accepted.
The name of the encoding is case insensitive. Underscores (_), hyphens (-)
and spaces () are interchangeable. This means that utf8, UTF-8, uTf_8
and UtF 8 are all valid ways of specifying the UTF-8 encoding.
Functions#
@cp_is_valid#
Synopsis:
[cp: Int|Byte] @cp_is_valid -> Bool
Returns:
true if cp is a valid Unicode code point and false otherwise.
@encoding_info#
Synopsis:
[encoding: Str] @encoding_info -> Map
Returns:
A new map containing various information about a particular encoding. The keys in the map are the following:
| Key | Type | Value |
|---|---|---|
name |
Str |
The name of the encoding. |
min_len |
Int |
The minimum length of a code point (character) in bytes. |
max_len |
Int |
The maximum length of a code point (character) in bytes. |
bom |
Array?.Byte |
The Byte Order Mark, an array of bytes if it exists for the encoding and null if it doesn't |
Example:
|#| 'stdcodecs.nest' = cc
'utf16' @cc.encoding_info --> {'name': 'UTF-16LE', 'min_len': 2, 'max_len': 4, 'bom': {255b, 254b}}
'latin1' @cc.encoding_info --> {'name': 'ISO-8859-1', 'min_len': 1, 'max_len': 1, 'bom': null}
@from_cp#
Synopsis:
[cp: Int|Byte] @from_cp -> Str
Returns:
A new string containing the character associated with the given code point. If
cp is not valid (can be checked with
cp_is_valid) the function throws an error.
@to_cp#
Synopsis:
[char: Str] @to_cp -> Int
Returns:
The code point associated with the character in char. If char does not
contain only one character an error is thrown.
Constants#
ASCII#
ASCII (a.k.a. US-ASCII) encoding name.
UTF_8#
UTF-8 encoding name.
EXT_UTF_8#
extUTF-8 encoding name. This encoding is Nest-specific and is UTF-8 that accepts unpaired surrogates.
UTF_16#
UTF-16 encoding name.
UTF_16LE#
UTF-16LE encoding name.
UTF_16BE#
UTF-16BE encoding name.
EXT_UTF_16#
extUTF-16 encoding name. This encoding is Nest-specific and is UTF-16 that accepts unpaired surrogates. The only exception is the last character that must not be a high surrogate.
EXT_UTF_16LE#
extUTF-16LE encoding name. Little endian version of extUTF-16..
EXT_UTF_16BE#
extUTF-16BE encoding name. Big endian version of extUTF-16..
UTF_32#
UTF-32 encoding name.
UTF_32LE#
UTF-32LE encoding name.
UTF_32BE#
UTF-32BE encoding name.
CP1250#
CP1250 (a.k.a. Windows-1250) encoding name.
CP1251#
CP1251 (a.k.a. Windows-1251) encoding name.
CP1252#
CP1252 (a.k.a. Windows-1252) encoding name.
CP1253#
CP1253 (a.k.a. Windows-1253) encoding name.
CP1254#
CP1254 (a.k.a. Windows-1254) encoding name.
CP1255#
CP1255 (a.k.a. Windows-1255) encoding name.
CP1256#
CP1256 (a.k.a. Windows-1256) encoding name.
CP1257#
CP1257 (a.k.a. Windows-1257) encoding name.
CP1258#
CP1258 (a.k.a. Windows-1258) encoding name.
LATIN_1#
Latin-1 (a.k.a. ISO/IEC 8859-1) encoding name.
ISO_8859_1#
ISO/IEC 8859-1 (a.k.a. latin-1) encoding name. This is the same as
LATIN_1.