Codecs library#

Importing#

|#| 'stdcodecs.nest' = cc

Nest encodings#

Note

To avoid any mistakes or confusion it is recommended that the constants defined in this library be used.

Nest supports the following encodings:

Encoding	Aliases	Link
`ascii`	`us-ascii`	ASCII
`cp1250`	`cp-1250`, `windows[-]1250`	CP1250
`cp1251`	`cp-1251`, `windows[-]1251`	CP1251
`cp1252`	`cp-1252`, `windows[-]1252`	CP1252
`cp1253`	`cp-1253`, `windows[-]1253`	CP1253
`cp1254`	`cp-1254`, `windows[-]1254`	CP1254
`cp1255`	`cp-1255`, `windows[-]1255`	CP1255
`cp1256`	`cp-1256`, `windows[-]1256`	CP1256
`cp1257`	`cp-1257`, `windows[-]1257`	CP1257
`cp1258`	`cp-1258`, `windows[-]1258`	CP1258
`latin-1`	`latin1`, `l1`, `latin`, `iso[-]8859-1`	latin1
`utf8`	`utf-8`	UTF-8
`ext-utf8`	`ext[-]utf[-]8`	-
`utf16le`	`utf-16le`, `utf[-]16`	UTF-16LE
`utf16be`	`utf-16be`	UTF-16BE
`ext-utf16le`	`ext[-]utf[-]16le`, `ext[-]utf[-]16`	-
`ext-utf16be`	`ext[-]utf[-]16be`	-
`utf32le`	`utf-32le`, `utf[-]32`	UTF-32LE
`utf32be`	`utf-32be`	UTF-32BE

Note

[-] means that the hyphen is optional, for example windows1252 and windows-1252 are both accepted.

The name of the encoding is case insensitive. Underscores (_), hyphens (-) and spaces () are interchangeable. This means that utf8, UTF-8, uTf_8 and UtF 8 are all valid ways of specifying the UTF-8 encoding.

Functions#

`@cp_is_valid`#

Synopsis:

[cp: Int|Byte] @cp_is_valid -> Bool

Returns:

true if cp is a valid Unicode code point and false otherwise.

`@encoding_info`#

Synopsis:

[encoding: Str] @encoding_info -> Map

Returns:

A new map containing various information about a particular encoding. The keys in the map are the following:

Key	Type	Value
`name`	`Str`	The name of the encoding.
`min_len`	`Int`	The minimum length of a code point (character) in bytes.
`max_len`	`Int`	The maximum length of a code point (character) in bytes.
`bom`	`Array?.Byte`	The Byte Order Mark, an array of bytes if it exists for the encoding and `null` if it doesn't

Example:

|#| 'stdcodecs.nest' = cc

'utf16'  @cc.encoding_info --> {'name': 'UTF-16LE', 'min_len': 2, 'max_len': 4, 'bom': {255b, 254b}}
'latin1' @cc.encoding_info --> {'name': 'ISO-8859-1', 'min_len': 1, 'max_len': 1, 'bom': null}

`@from_cp`#

Synopsis:

[cp: Int|Byte] @from_cp -> Str

Returns:

A new string containing the character associated with the given code point. If cp is not valid (can be checked with cp_is_valid) the function throws an error.

`@to_cp`#

Synopsis:

[char: Str] @to_cp -> Int

Returns:

The code point associated with the character in char. If char does not contain only one character an error is thrown.

Constants#

`ASCII`#

ASCII (a.k.a. US-ASCII) encoding name.

`UTF_8`#

UTF-8 encoding name.

`EXT_UTF_8`#

extUTF-8 encoding name. This encoding is Nest-specific and is UTF-8 that accepts unpaired surrogates.

`UTF_16`#

UTF-16 encoding name.

`UTF_16LE`#

UTF-16LE encoding name.

`UTF_16BE`#

UTF-16BE encoding name.

`EXT_UTF_16`#

extUTF-16 encoding name. This encoding is Nest-specific and is UTF-16 that accepts unpaired surrogates. The only exception is the last character that must not be a high surrogate.

`EXT_UTF_16LE`#

extUTF-16LE encoding name. Little endian version of extUTF-16..

`EXT_UTF_16BE`#

extUTF-16BE encoding name. Big endian version of extUTF-16..

`UTF_32`#

UTF-32 encoding name.

`UTF_32LE`#

UTF-32LE encoding name.

`UTF_32BE`#

UTF-32BE encoding name.

`CP1250`#

CP1250 (a.k.a. Windows-1250) encoding name.

`CP1251`#

CP1251 (a.k.a. Windows-1251) encoding name.

`CP1252`#

CP1252 (a.k.a. Windows-1252) encoding name.

`CP1253`#

CP1253 (a.k.a. Windows-1253) encoding name.

`CP1254`#

CP1254 (a.k.a. Windows-1254) encoding name.

`CP1255`#

CP1255 (a.k.a. Windows-1255) encoding name.

`CP1256`#

CP1256 (a.k.a. Windows-1256) encoding name.

`CP1257`#

CP1257 (a.k.a. Windows-1257) encoding name.

`CP1258`#

CP1258 (a.k.a. Windows-1258) encoding name.

`LATIN_1`#

Latin-1 (a.k.a. ISO/IEC 8859-1) encoding name.

`ISO_8859_1`#

ISO/IEC 8859-1 (a.k.a. latin-1) encoding name. This is the same as LATIN_1.

Codecs library#

Importing#

Nest encodings#

Functions#

@cp_is_valid#

@encoding_info#

@from_cp#

@to_cp#

Constants#

ASCII#

UTF_8#

EXT_UTF_8#

UTF_16#

UTF_16LE#

UTF_16BE#

EXT_UTF_16#

EXT_UTF_16LE#

EXT_UTF_16BE#

UTF_32#

UTF_32LE#

UTF_32BE#

CP1250#

CP1251#

CP1252#

CP1253#

CP1254#

CP1255#

CP1256#

CP1257#

CP1258#

LATIN_1#

ISO_8859_1#

`@cp_is_valid`#

`@encoding_info`#

`@from_cp`#

`@to_cp`#

`ASCII`#

`UTF_8`#

`EXT_UTF_8`#

`UTF_16`#

`UTF_16LE`#

`UTF_16BE`#

`EXT_UTF_16`#

`EXT_UTF_16LE`#

`EXT_UTF_16BE`#

`UTF_32`#

`UTF_32LE`#

`UTF_32BE`#

`CP1250`#

`CP1251`#

`CP1252`#

`CP1253`#

`CP1254`#

`CP1255`#

`CP1256`#

`CP1257`#

`CP1258`#

`LATIN_1`#

`ISO_8859_1`#