3.4 Text Encoding

The various string fields bring up the question: what text encoding is used? There are actually three text encodings in play:

  1. the encoding in use in your Scheme source files
  2. the encoding in use within the Guile interpreter
  3. the encoding in use in libscribbu

The first is documented in the Guile manual under “Character Encoding of Source Files” See Character Encoding of Soruce Files in The Guile Reference Manual. The upshot is this: UTF-8 is assumed, but the author may tell Guile what is being used through a coding hint:

;;; coding: iso-8859-1

The set of encodings recognized is defined by IANA in RFC2978.

The second is also documented in the Guile manual, under “String Internals” See String Internals in The Guile Reference Manual.:

Guile stores each string in memory as a contiguous array of Unicode code points along with an associated set of attributes. If all of the code points of a string have an integer range between 0 and 255 inclusive, the code point array is stored as one byte per code point: it is stored as an ISO-8859-1 (aka Latin-1) string. If any of the code points of the string has an integer value greater that 255, the code point array is stored as four bytes per code point: it is stored as a UTF-32 string.

Conversion between the one-byte-per-code-point and four-bytes-per-code-point representations happens automatically as necessary.

That just leaves libscribbu. On read (that is, when the library reads text from tags on disk), the encoding is sometimes specified by the tag itself, or is specified by the caller, or is guessed. From there, it will be converted to a Guile string. On write, text will be converted from the internal Guile representation to the desired text encoding on disk (deduced from either caller preferences or the frame settings themselves).