scribbu

The extensible tool for tagging your music collection.

This manual corresponds to scribbu version 0.7.0.

Copyright © 2018-2024 Michael Herstine <sp1ff@pobox.com>

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled “GNU Free Documentation License”.

A copy of the license is also available from the Free Software Foundation Web site at https://www.gnu.org/licenses/fdl.html.

This document was typeset with GNU Texinfo.

Table of Contents


1 Introduction

scribbu is a C++ library & associated command-line tool for working with ID3 tags (see ID3). It was born when I retired my last Windows machine & could no longer use Winamp (see Winamp) to manage my library of digital music. The scribbu library offers classes & methods for reading, modifying & writing ID3v1 & ID3v2 tags. The scribbu program provides assorted sub-commands for working with ID3-tagged files (e.g. re-naming files based on their tags), but its real power lies in its embedded Scheme interpreter (see The Guile Reference Manual) in which scribbu library features are exported as a Scheme module (on which more below).


1.1 scribbu

The scribbu project has a few components. The first is a program that provides assorted sub-commands, a few of which are:

  • scribbu dump will write the contents of any & all ID3 tags found in one or more files to stdout. See Invoking scribbu dump.
  • scribbu report will generate a report listing ID3 attributes on one or more files on stdout. CSV, TDF & ASCII-delimited formats are supported currently. See Invoking scribbu report.
  • scribbu rename will rename one or more files based on the contents of their ID3 tags; e.g. scribbu rename -t ``%A-%T.mp3'' *.mp3 will rename all the files matching “*.mp3” to “<artist>-<title>.mp3” where "artist" and "title" are derived from their ID3 tags (if any). See Invoking scribbu rename.
  • scribbu popm will update ID3v2 play count & popularimeter tags. For instance, scribbu popm foo.mp3 will increment the play counts in “foo.mp3”. See Invoking scribbu popm.
  • scribbu text maintains assorted ID3v2 text frames; for instance, scribbu text --artist='Roxy Music' *.mp3 will set the artist frame to “Roxy Music” in all ID3v2 tags in all files matching “*.mp3”. See Invoking scribbu text.

Any sub-command can be invoked with --help or -h for more information. Use --info option to display the command’s node in this manual.

The scribbu program also exports functions & GOOPS (see GOOPS) classes to a Scheme interpreter, so scribbu can also be invoked...

  • with a Scheme expression (-e, --expression) or Scheme file (-f, --file). In this case, scribbu will evaluate the given program & exit.
  • with no arguments at all. In this case, scribbu will drop into a Scheme REPL (Read Evaluate Print Loop) in which the user can evaluate arbitrary Scheme expressions.
  • as a script:
    #!/usr/local/scribbu \
    -e main -s
    #!
    (define (main args)
        ...
    

Finally, scribbu contains a C++ library (libscribbu, see Using libscribbu) against which one can build C++ libraries & programs.


1.2 Project Background

Some background on MP3, ID3, Winamp & the genesis of this project.


1.2.1 MP3

Widespread digital encoding of music arrived with the introduction of the compact disk in 1982. However, the size of the resulting digital representation was large: the standard Compact Disk stored about one hour & twenty minutes worth of music in about seven hundred MiB (at a time when the typical hard drive could hold ten MiB). In 1989 the relevant standards body (the Moving Picture Experts Group, or MPEG) called for proposals for lossy audio compression algorithms. The fourteen propsals they received were eventually combined into three “layers”, each with a different set of trade-offs between quality, space, and computational complexity. “MPEG Audio Layer I” was the simplest, designed to enable real-time encoding on the hardware of the day. “MPEG Audio Layer II” provides higher quality than Layer I but offers computationally simpler decoding than Layer III. “MPEG Audio Layer III” (or MP3) provides good quality at lower bitrates than Layer II, albeit at the cost of greater computational complexity.

Layer three was primarily developed by the German company Fraunhofer IIS. The file extension .mp3 was selected as a result of an internal survey of researchers at Frauhofer. At a sampling rate of 128kbits/sec, MP3 needed about a megabyte per minute of music encoded; nearly one-tenth the size of CD audio.

At one MB per minute, given the size of consumer hard drives in the nineties, home users could easily store many MP3 tracks. The format found such universal application in the portable digital music players becoming available that they came to be known as “mp3 players”. With the network bandwidths available at the time, one could conveniently transmit MP3-encoded files across the internet, and even stream them.

Typically of technological history, the application responsible for the widespread adoption of MP3 was not the application for which it was designed. Applications for audio encoded by MP3 were intitially thought to be “musical transmission over ISDN telephone lines” and “voice announcement systems for local public transport”. Instead, the medium of choice for digital music became the ‘.mp3’ file.


1.2.2 ID3

A problem quickly emerged: the MP3 standard included no provision for metadata; no way to “tag” an .mp3 file with information such as title, artist, et cetera. NamkraD (AKA Eric Kemp) is credited with the idea of attaching such a tag to .mp3 files in 1996. Presumably to make it easy to detect & parse, while not interfering with existing decoders, it had a fixed size of one hundred twenty-eight bytes, and was attached to the end of the file (if a player that was unaware of the tag played the enclosing file, at worst the user would hear a bit of static at the end). It provided for a thirty-byte title, artist & album along with year, comment & a one-byte genre field. The original proposal defined eighty genres, extended to 148 by 1.91 release of Winamp (see Winamp) in June 1998 and to 192 by the 5.6 release of Winamp in November 2010.

The limitations of this format quickly became aparent, leading to the proposal in 1998 of ID3v2 by Martin Nilsson and several other contributors. Although it shared a name, ID3v2 was a completely different approach to tagging music: it was prepended to the audio data (making it suitable for streaming media) and it was variable-length; ID3v2 tags are comprised of multiple frames, each containing one piece of information about the music (title, artist &c).


1.2.3 Winamp

Space-efficient, high-quality, tagged audio was no good without a ready means of listening to it. The then-existing Windows Media Player and Real Networks’ Real Player never found widespread adoption. In April 1997 Justin Frankel and Dmitry Boldyrev released Winamp, a small, performant Windows MP3 player. Frankel formed Nullsoft in January 1998. With version 1.5, Winamp changed from freeware to shareware & charged a ten dollar registration fee; far from dampening uptake, this brought in $100,000 a month from $10 paper checks in the mail from paying users. Winamp 2.0 was released in September 1998 & became one of the most downloaded Windows programs ever.

One of the things that endeared Winamp to its users was its plugin architecture. Nullsoft provided several plugins as part of the standard distribution, one of which was the Music Library. Using this, one could manage, organize, search & play a personal library of thousands of MP3 files, all based on ID3 tags (see ID3).

Nullsoft was (in)famoulsy acquired by AOL in 1999. By 2000 Winamp had been registered twenty-five million times, but Nullsoft began to struggle with the propblems of so many AOL acquisitions. 2002 saw the misbegotten release of Winamp 3, a complete re-write that broke with the prior ethos of tight, lightweight code. Widespread incidence of users (including the author) reverting to Winamp 2 in response to poor performance & high resource demands of Winamp 3 led to Nullsoft continuing 2.x development, and eventually the release of Winamp 5 (2+3) late in 2003. From version 5.2, Winamp provided the ability to sync the user’s library with iPods, which led to many iPod owners’ (again including the author) choosing to use Winamp instead of iTunes to manage their devices.

The original Winamp team quit AOL in 2004 & development moved to Dulles (VA). Work continued, albeit at a slower pace. With the release of Winamp 5.66 in late 2013, AOL announced that winamp.com would be shutdown later that year and that the software would no longer be availble for download. It was later announced that Nullsoft (along with Shoutcast, an MP3 streaming platform) had been sold to the Belgian company Radionomy. As of the time of this writing, winamp.com is up, and offering a download of Winamp 5.8 (beta) from Radionomy.


1.2.4 Today

It is a credit to Winamp that it remained usable well into the twenty-teens as a way to mange large libraries of .mp3 files. Winamp is not quite dead, but it is stranded on an operating system that I have left behind (along, I suspect, with many other technically-inclined music aficionados today). The MP3 format itself is showing its age; Fraunhofer IIS announced in 2017 that it was ending its licensing programs for MP3. AAC is now the standard for digital music.

And yet, I have several thousand .mp3 files in my personal library. Since both MP3 and AAC are lossy formats, transcoding them to AAC would not lead to good results even if I were inclined to do the work. The original sources of many of the .mp3s have been lost, so re-encoding to AAC is not possible.

Perhaps scribbu (see The scribbu Program) will support AAC in the future, but it seems that MP3 & ID3 will be relevant to my musical life for some time. I wrote this tool to help me manage them, and I offer it to anyone else in the same position: if you need to manage ID3-tagged .mp3 files, and especially if you enjoy hacking in LISP and/or C++, I hope you find scribbu useful and enjoyable.


2 The scribbu Program

The simplest way to use scribbu is through the command-line tool. For the scribbu command itself, as well as all scribbu subcommands, the -h flag will produce a brief help message on stdout, and the --help will display the corresponding man page. You can get a list of all the sub-commands scribbu provides by saying scribbu -h. You can display a given sub-command’s node in this Info manual by saying scribbu CMD --info.

Display this manual by saying scribbu --info.


2.1 Example

Let us suppose we have a few ‘.mp3’ files which we have just downloaded, or have encoded some time ago & forgotten about. Regardless, we want to examine & update their tags before adding them to our library. The following chapters demonstrate this using scribbu sub-commands.


2.1.1 scribbu dump

The simplest place to start is scribbu dump. This will show us what is in the tags:

$>: scribbu dump *.mp3
"lorca.mp3":
ID3v2.3(.0) Tag:
452951 bytes, synchronised
flags: 0x00
The Pogues - Lorca's Novena
Hell's Ditch [Expanded] (US Version) (track 5), 1990
Content-type Pop
TIT2: Lorca's Novena
TPE1: The Pogues
TALB: Hell's Ditch [Expanded] (US Version)
TCON: Pop
TCOM:
TPE3:
TRCK: 5
TYER: 1990
TPE2: The Pogues
COMM (<no description>):
Amazon.com Song ID: 203558254
TCOP: 2004 Warner Music UK Ltd.
TPOS: 1
frame APIC (115554 bytes)
frame PRIV (1122 bytes)
335921 bytes of padding
9425708 bytes of track data:
MD5: 48ff9cadea7d842e9059db25159d2daa
ID3v1.1: The Pogues - Lorca's Novena
Hell's Ditch [Expanded] (US Ve (track 5), 1990
Amazon.com Song ID: 20355825
unknown genre 255

"opium.mp3":
ID3v2.3(.0) Tag:
2038 bytes, synchronised
flags: 0x00
Stephan Luke - Opium Chant Intro
Opium Gardens (track 1), 2003
Content-type General Club Dance
Encoded by Winamp 5.552
TENC: Winamp 5.552
TRCK: 1
COMM (<no description>):
Ripped by Winamp on Pimpernel
TPUB: Opium Music
TPOS: 1/1
TYER: 2003
TCON: General Club Dance
TALB: Opium Gardens
TPE2: Opium Garden
TPE1: Stephan Luke
UFID: http://www.cddb.com/id3/taginfo1.html
      334344334e33395235383037313937335532343836364232394336314239364139333341424332363945364531454642444233445032
TIT2: Opium Chant Intro
1522 bytes of padding
1191874 bytes of track data:
MD5: 690194f49592c7d8ccfbfe8a157d4c1e
ID3v1.1: Stephan Luke - Opium Chant Intro
Opium Gardens (track 1), 2003
Ripped by Winamp on Pimperne
unknown genre 255

"orlando.mp3":
ID3v2.3(.0) Tag:
607 bytes, synchronised
flags: 0x40
Bill LeFaive - Orlando
http://music.download.com (track 1), <no year>
TALB: http://music.download.com
TIT2: Orlando
TIT3: http://music.download.com/
TPE1: Bill LeFaive
TCOM: Bill LeFaive
frame WOAF (1 bytes)
frame WPUB (27 bytes)
frame WXXX (54 bytes)
TRCK: 1
TCOP: 2006 Bill LeFaive
TPUB: http://music.download.com
256 bytes of padding
6549838 bytes of track data:
MD5: dd0c70e13d4aeec676f8d7a7bda622b0
ID3v1.1: Bill LeFaive - Orlando
http://music.download.com (track 1), 2004
http://music.download.com/
unknown genre 255

We see we have three files, all of which have both ID3v1 & ID3v2 tags. The output also contains some basic information on the MP3 data in between the tags.

scribbu dump takes as arguments one or more files and/or directories, and prints information about all files listed, or in the directories named, recursively. With no options, scribbu dump will dump everything it understands. With options, the output can be scoped in various ways (e.g. ID3v2 tags only, ID3v1 tags only, track data only, among other options). The output format can also be controlled; refer to the man page for a complete list (or see Invoking scribbu dump).


2.1.2 scribbu report

Another way to investigate files, especially large numbers of files, is scribbu report:

$>: scribbu report -o myfiles.csv *.mp3
$>: cat myfiles.csv
directory,file,file size(MB),ID3v2 version,ID3v2 revision,ID3v2 size(bytes),ID3v2 flags,ID3v2 unsync,ID3v2 Artist,ID3v2 Title,ID3v2 Album,ID3v2 Content Type,ID3v2 Encoded By,ID3v2 Year,ID3v2 Langauges,# ID3v2 play count frames,Play Count,# ID3v2 comment frames,comment #0 text,comment #1 text,comment #2 text,comment #3 text,comment #4 text,comment #5 text,size (bytes),MD5,has ID3v1.1,has ID3v1 extended,ID3v1 Artist,ID3v1 Title,ID3v1 Album,ID3v1 Year,ID3v1 Comment,ID3v1 Genrre
"/tmp/tut","lorca.mp3",9.421,3,0,452951,0x00,0,The Pogues,Lorca's Novena,Hell's Ditch [Expanded] (US Version),Pop,,1990,,0,,1,Amazon.com Song ID: 203558254,,,,,,9425708,48ff9cadea7d842e9059db25159d2daa,1,0,The Pogues,Lorca's Novena,Hell's Ditch [Expanded] (US Ve,1990,Amazon.com Song ID: 20355825,255
"/tmp/tut","opium.mp3",1.139,3,0,2038,0x00,0,Stephan Luke,Opium Chant Intro,Opium Gardens,General Club Dance,Winamp 5.552,2003,,0,,1,Ripped by Winamp on Pimpernel,,,,,,1191874,690194f49592c7d8ccfbfe8a157d4c1e,1,0,Stephan Luke,Opium Chant Intro,Opium Gardens,2003,Ripped by Winamp on Pimperne,255
"/tmp/tut","orlando.mp3",6.247,3,0,607,0x40,0,Bill LeFaive,Orlando,http://music.download.com,,,,,0,,0,,,,,,,6549838,dd0c70e13d4aeec676f8d7a7bda622b0,1,0,Bill LeFaive,Orlando,http://music.download.com,2004,http://music.download.com/,255

Running scribbu report (See Invoking scribbu report.) with an option of -o <name>.csv will produce an RFC-4180-compliant comma separated variable file reporting on the files given on the command line. Option -t will instead produce tab-delimited data. scribbu itself provides little beyond this in terms of reporting, the idea being that CSV or TDF output can be readily imported into other programs better suited to that task.

That said, one can do some basic querying at the command line, for which tab-delimited format can be convenient. For example, this little awk program will show the ID3v2 version for each file:

$>: scribbu report -t -o myfiles.tdf *.mp3
$>: cat myfiles.tdf | awk 'BEGIN {FS="\t"}; {print $2, $4}'
file ID3v2 version
"lorca.mp3" 3
"opium.mp3" 3
"orlando.mp3" 3

2.1.3 scribbu popm

We notice that none of these files contain a PCNT or a POPM frame; let’s add them now:

$>: scribbu popm -a -o foo@bar.com -r oooo *.mp3
$>: scribbu dump *.mp3
...
PCNT: 0
POPM: foo@bar.com
rating: 179
counter: 00
...

The popm command can be used to manage both PCNT & POPM frames (see Invoking scribbu popm) The -a flag indicates that we want to create the relevant frames, if they don’t already exist. -r sets the rating for the POPM frame; you can provide an integer between 0 and 255, or use the “star” system. In this case, I’ve given all the files four stars. “****” would be more mnemonic, but inconvenient in the shell, so scribbu will recognize almost any character, repeated one to five times, as a “star”.

Having created the PCNT and/or POPM frames, one can update the play counts with a simple command; e.g.

scribbu popm opium.mp3

will increment the play count by one in any PCNT or POPM frames it finds. The operation can be scoped or modified in a number of ways, such as limiting it to only one or the other, or only to POPM frames with a certain owner– see the man page or Invoking scribbu popm for full details. The intent of the command is to enable players that don’t update the playcount or set ratings themselves, but which can be scripted or extended in some way, to do so.


2.1.4 scribbu rename

Finally, let us re-name the files based on their ID3 tags:

scribbu rename *.mp3

Will by default rename each file to “<artist> - <title>.mp3” with <artist> & <title> each derived from the corresponding ID3v2 frame (see Invoking scribbu rename). This can be customized by providing a “template”: text interspersed with replacement parameters to be filled in with tag contents. The parameters begin with a ‘%’, and each parameter has a one-character “short” form and a more descriptive “long” form. For instance, “artist” can be represented as either %A or %(artist). So the default template could be expressed as “%A - %T.mp3” or “%(artist) - %(title).mp3”.

If the long form is used, the action of the replacement parameter may optionally be modified by options given after the parameter name and a colon in “query-style”: opt0&opt1&opt2... where opti is in the form name=value or just name. For instance, if we wanted the artist to always be taken from the ID3v1 tag, and that field happens to use the ISO-8859-1 character encoding, we could say:

%(artist:v1-only&v1-encoding=iso-8859-1)

Let us accept the default settings, but see what would happen without actually re-naming anything:

scribbu rename -n *.mp3
"lorca.mp3" => "Pogues, The - Lorca's Novena.mp3"
"opium.mp3" => "Stephan Luke - Opium Chant Intro.mp3"
"orlando.mp3" => "Bill LeFaive - Orlando.mp3"

Before we rename the files, there is a lot more hygiene that could be carried out. “lorca.mp3” has a number of empty text frames that should be removed, “opium.mp3” has a comment frame with no owner, and the ID3v1 genre in all three is set to “255”.

As I developed scribbu, and began using it to manage my personal music collection, it became clear that providing a sub-command for every conceivable operation was not feasible. Furthermore, many of the things I wanted it to do were one-off tasks pertinent to a single file, or a handful of files, that weren’t worth formally coding up as sub-commands. What I really wanted was a way to “script” libscribbu (see Using libscribbu). I found my solution in Guile (see The Guile Reference Manual.), which is the topic of the next chapter.


2.2 Invoking scribbu dump

scribbu dump will walk each file and/or directory specified (recursing directories), read each file found, look for ID3 tags therein, and pretty-print what it finds to stdout.


2.2.1 scribbu dump Options

scribbu dump accepts the following options:

  • -1|--id3v1-tags display only ID3v1 tags
  • -D|--track-data display track data only
  • -2|--id3v2-tags display only ID3v2 tags
  • -i arg|--indent=arg indent all output arg spaces
  • -g|--no-expand-genre don’t attempt to expand the genre when specified as a numeric constant
  • -e regex|--expression=regex define a regular expression for filtering the files to be pretty-printed. For each file, its entire path will be matched against this regular expression before being pretty-printed (non-matches will be ignored).
  • -f FMT|--format=FMT specify the desired output format; at present, only two values are supported for this option: standard, which is the default, pretty-prints the selectd portions of each file to stdout with one attribute (artist, title &c) per line. A format of csv will print one line in CSV format per file.

    The precise columns output in csv format will depend on the other options, but for ID3v2 tags include:

    version, revision, size, flags, unsynchronised, artist, title, album, genre, encoded by, year, languages, play count, comments

    for track data: size & MD5 checksum

    for ID3v1 tags: v1.1, extended, artist, title, album, year, comment & genre.

  • -c ENC|--v1-encoding=ENC specify an encoding for ID3v1 tags (CP1252 by default). See See Character Encodings. for a complete list of the encodings supported and their identifiers.

2.3 Invoking scribbu encodings

Several scribbu commands accept character encodings as options. It is not always clear what encodings are supported or how to specify them textually. This sub-command will print the list of supported character encodings along with their names for the user’s convenience.

That list is reproduced here:

  • ASCII
  • ISO-8859-1
  • ISO-8859-2
  • ISO-8859-3
  • ISO-8859-4
  • ISO-8859-5
  • ISO-8859-7
  • ISO-8859-9
  • ISO-8859-10
  • ISO-8859-13
  • ISO-8859-14
  • ISO-8859-15
  • ISO-8859-16
  • KOI8-R
  • KOI8-U
  • KOI8-RU
  • CP1250
  • CP1251
  • CP1252
  • CP1253
  • CP1254
  • CP1257
  • CP850
  • CP866
  • CP1131
  • MacRoman
  • MacCentralEurope
  • MacIceland
  • MacCroatian
  • MacRomania
  • MacCyrillic
  • MacUkraine
  • MacGreek
  • MacTurkish
  • Macintosh
  • ISO-8859-6
  • ISO-8859-8
  • CP1255
  • CP1256
  • CP862
  • MacHebrew
  • MacArabic
  • EUC-JP
  • SHIFT-JIS
  • CP932
  • ISO-2022-JP
  • ISO-2022-JP-2
  • ISO-2022-JP-1
  • ISO-2022-JP-MS
  • EUC-CN
  • HZ
  • GBK
  • CP936
  • GB18030
  • EUC-TW
  • BIG5
  • CP950
  • BIG5-HKSCS
  • BIG5-HKSCS:2004
  • BIG5-HKSCS:2001
  • BIG5-HKSCS:1999
  • ISO-2022-CN
  • ISO-2022-CN-EXT
  • EUC-KR
  • CP949
  • ISO-2022-KR
  • JOHAB
  • ARMSCII-8
  • Georgian-Academy
  • Georgian-PS
  • KOI8-T
  • PT154
  • RK1048
  • TIS-620
  • CP874
  • MacThai
  • MuleLao-1
  • CP1133
  • VISCII
  • TCVN
  • CP1258
  • HP-ROMAN8
  • NEXTSTEP
  • UTF-8
  • UCS-2
  • UCS-2BE
  • UCS-2LE
  • UCS-4
  • UCS-4BE
  • UCS-4LE
  • UTF-16
  • UTF-16BE
  • UTF-16LE
  • UTF-32
  • UTF-32BE
  • UTF-32LE
  • UTF-7
  • C99
  • JAVA

2.4 Invoking scribbu genre

scribbu genre will set the genre (i.e. the TCON frame for ID3v2 tags and the genre byte for ID3v1) for all tags in all files named on the command line. If an argument is a file, operate on the tags in that file. If the argument is a directory, operate recursively on all files containing ID3 tags therein.

The genre can be specified in a few ways:

scribbu genre -w N

will interpret N as one of the genres defined by Winamp, specified as an integer between 0 & 191 (inclusive). Run scribbu genre -W to print a list of the Winamp genres.

scribbu genre -g GENRE

will attempt to map to map the string GENRE to one of the Winamp genres using Damerau-Levenshtein distance, but disregarding case. For instance scribbu genre -g rok will be interpreted as Winamp genre number seventeen "Rock".

scribbu genre -G GENRE

will accept GENRE uncritically as the TCON to be used for ID3v2 tags. ID3v1 tags, if present, will have their genre field mapped to one of the Winamp values again by case-insensitive Damerau-Levenshtein distance (or just set to 255 if that fails). To explicitly set the ID3v1 version when specifying genre in this way, add the --v1 flag (e.g. scribbu genre -G foo --v1 17).

This brings up the question of what to do if there is no ID3v1 and/or no ID3v2 tag(s) in a given file. By default, in the absence of a tag, nothing will be done (so if invoked, for instance, on a file with neither an ID3v1 nor an ID3v2 tag, this sub-command would do nothing). This behavior can be customized in two ways. If --create-v2 or --create-v1 are given, an ID3v2 (ID3v1, resp.) tag will be created for any file which already possess an ID3v1 (ID3v2, resp.) tag. If --always-create-v2 or --always-create-v1 are given, an ID3v2 (ID3v1, resp.) tag will always be created if it doesn’t exist.


2.4.1 scribbu genre Behavior on Write

If the command didn’t change the tagset for a given file, that file will not be modified. Typically however, the tagset will have been changed, and we need to write out the new tagset. Since ID3v1 tags are fixed-size blocks appended to the file, writing them out is trivial.

The default behavior for ID3v2 tags is to first try to emplace them; that is, write them over the current set of ID3v2 tags without touching track data, adjusting the padding if needed. If that is impossible (i.e. the new tagset can’t be fit into the space occupied by the old, even when adjusting padding) a full copy will be made by writing the new tagset to a temporary file, copying the track data from the extant file to the temporary file, and finally appending the ID3v1 tag, if any, to the temporary file. Only then is the temporary file renamed ot the original (which is hopefully atomic).

This behavior can be modified by the --create-backups option (see scribbu genre Options), which will create a backup of the original file before renaming the temporary file.


2.4.2 scribbu genre Options

  • -n|--dry-run dry-run mode; only print what would happen
  • -u|--adjust-unsync adjust each ID3v2 tag’s use of the unsynchronisation scheme on write (by default, it’s never used)
  • -a|--always-create-v2 always create an ID3v2 tag with a TCON frame for any file that does not possess one (whether or not there’s an ID3v1 tag present); may be combined with --always-create-v1 or --create-v1, but not with --create-v2.
  • -A|--always-create-v1 always create an ID3v1 tag with the genre field set appropriately for any file that does not possess one (whether or not there’s an ID3v2 tag present); may be combined with --always-create-v2 or --create-v2, but not with --create-v1.
  • -b|--create-backups by default, the new tagset will be written in-place (emplacing ID3v2 tags, if feasible); this option will cause a backup file to be made before changing the original See scribbu genre Behavior on Write
  • -c|--create-v2 create an ID3v2 tag with a TCON frame for any file that has an ID3v1 tag but does not have an ID3v2 tag (fields from the ID3v1 tag will be copied over); may be combined with --always-create-v1 or --create-v1, but not with --always-create-v2
  • -C|--create-v1 create an ID3v1 tag with the genre field set appropriately for any file that has an ID3v2 tag, but no ID3v1 tag; may be combined with --always-create-v2 or --create-v2, but not with --always-create-v1
  • -g GENRE|--genre=GENRE specify GENRE as the textual name of one of the 192 Winamp-defined genres ("Rock", e.g.); if GENRE doesn’t exactly match case-insensitively one of the Winamp genres, it it will be matched to the official list by minimal Damerau-Levenshtein distance (again without regard to case). The genre field for ID3v1 tags will be the corresponding numeric value

    See also --list-winamp-genres.

  • -G GENRE|--Genre=GENRE specify GENRE as the verbatim text to be used for ID3v2 TCON frames (i.e. no matching to the Winamp-defined list will be done). The value for the genre field in ID3v1 tags will however be determined by the closest match to the Winamp-defined list. See also --v1 below for how to turn off that behavior.
  • -W,--list-winamp-genres list the Winamp-defined genres, piped through your pager if scribbu can determine that (the environment variables SCRIBBU_PAGER & then PAGER are checked first, then any program named less on PATH will be used. See also --no-pager.
  • -P,--no-pager do not use any pager when printing the Winamp genre list; just print to stdout
  • -t INDEX,--tag=INDEX specify a zero-based index describing which ID3v2 tag to alter, in the case of multiple ID3v2 tags in a given file; may be given more than once to select multiple tags– if not given, all tags present will be modified
  • -v N,--v1=N numeric genre to use for ID3v1 tags when G is given
  • -1,--v1-only only update ID3v1 tags; ignore any ID3v2 tags found
  • -2,--v2-only only update ID3v2 tags; ignore any ID3v1 tags found.
  • -w N,--winamp=N specify the genre numerically in terms of the 192 Winamp-defined genres.

2.5 Invoking scribbu m3u

scribbu m3u will walk each file and/or directory specified (recursing directories) and print extended M3U entries for each. By default, it will print EXTM3U entries to stdout for all files named (directly or indirectly) on the command line. If an argument is a directory, this command will operate recursively on all files therein (the order of traversal is unspecified).

Each extended M3U entry takes the form:

    # EXINF:<duration-in-seconds>,<display title>
    <path-to-file>

The display title will be "Artist - Title" if those two items can be derived from ID3 tags; otherwise the file basename will be used. The text forming the artist & title tags will be assumed to be in the system locale’s encoding. To override this, specify the -s flag (for source encoding).

The entry’s path will be relative or absolute, according to the argument (i.e. specifying an absolute path to a file or directory will produce absolute paths, a relative path to a file or directory relative paths in the output).

When writing entries to stdout, all text will be written in the system locale’s text encoding.

If the -o option is given, the output will be written to the file named as the option’s value. By convention, files ending in .m3u8 are UTF-8 encoded and files ending in .m3u are written in an unspecified encoding. Given that M3U is a de facto standard, scribbu does not enforce this (or any other naming convention).

In this case, a new file will be created with the #EXTM3U header line, unless the -a (append) flag is given, in which case the output will be appended to the (presumably existing) file.

By default, output will be in the system locale’s text encoding. To force UTF-8 output, specify the -8 option.

So, for instance, if your system locale’s encoding is ISO-8859-1, and your tags are written in, say, Windows Code Page 1251, but you would like an M3U playlist in UTF-8 format, say:

scribbu m3u -s CP1251 -o test.m3u8 -8 some-directory/

2.5.1 scribbu m3u Options

  • -v|--verbose Produce more verbose output; this really only makes sense when printing entries to file (otherwise the informational messages will be intermingled with the entries printed to stdout.
  • -s ARG|--source-encoding=ARG Specify the text encoding in which the textual tags author & title are written in the files to be processed; e.g. "ASCII" or "UTF-8". Say scribbu encodings for a list of names for all support ed encodings.
  • -o ARG|--output=ARG This option indicates that the EXTM3U entries shall be written to the given file rather than stdout.
  • -a|--append When writing to file, this option indicates that the named file should be appended to rather than overwritten.
  • -8|use-utf-8 When writing to file, this option indicates that the output shall be encoded as UTF-8 (rather than the system locale’s text encoding, which is the default behavior).
  • -e|--on-encoding-failure Specify handling for encoding failures; may be one of fail (the default). transliterate or ignore

2.6 Invoking scribbu report

scribbu report will walk each file and/or directory specfied (recursing directories), read each file found, look for ID3 tags therein, and generate a report on their contents. The idea is to use scribbu to do the work of scanning the tags in combination with some other tool better suited to querying & reporting. Consequently, filtering mechanisms are minimal, and the output formats (CSV or TDF) are chosen to facilitate transformation to other formats as well as import by other tools.


2.6.1 scribbu report Options

  • -c ARG|--num-comments=ARG number of comments to be reported; the total number of comment frames is always reported, but this governs the number of comment frames (owner, text &c) to be included in the report (default six)
  • -o ARG|--output=ARG the file to which the report shall be written
  • -1 ENC|--v1-encoding=ENC specify the encoding See Character Encodings. to be used to read ID3v1 tags (defaults to CP1252
  • -t|--tsv-format select tab-delimited format instead of comma-separated values (the default).
  • -a|--ascii-delimited if using tab-delimited format, output ASCII-delimited text by using 0x1f (the ASCII unit separater) to delimit fields rather than TABs.

2.7 Invoking scribbu rename

scribbu rename will walk each file and/or directory specified (recursively, in the case of directories) and rename each ID3-tagged file found according to its tag(s). By default, each ID3-tagged file will be renamed to “<artist> - <title>.<extension>” (where <artist> & <title> are derived from the file’s ID3 tags), but this can be heavily customized by specifying a naming “template” made up of a mixture of text and replacement parameters (such as artist, title, album &c).

Replacement parameters begin with a % character (percent characters that do not begin a replacement parameter may be escaped with a backslash). Each replacement parameter has a one-character “short” form as well as a “long-form” name. For example, the artist replacement can be represented as either %A or as %(artist).

When the long form is used, the action of replacement may optionally be modified by giving options after a colon. The options take the form opt0&opt1&opt2&... where opti is of the form name=value, or just name. So to continue the above example, if we wanted the artist name to instead be derived from the ID3v1 tag, and that field was encoded as ISO-8859-1, we would say:

%(artist:v1-only&v1-encoding=iso-8859-1)

See Tag-Based Replacements. for a complete list of replacement parameters & their options.


2.7.1 scribbu rename Options

  • -h,--help Display help & exit
  • -n,--dry-run Dry-run; only print what would happen
  • -o ARG,--output=ARG, If specified, copy the output files to this directory, rather than renaming in-place.
  • -r,--rename Remove the source file (ignored if --dry-run is given)
  • -t TEMPLATE,--template=TEMPLATE The template by which to rename ID3-tagged files in the arguments (defaults to “%A - %T.mp3”
  • -v,--verbose Produce verbose output.

2.7.2 Tag-Based Replacements

Tag-based replacement parameters:

ContentShort-FormLong-Form
albumLalbim
artistAartist
content typeGcontent-type,genre
encoded byeencoded-by
titleTtitle
yearYyear

Tag-based replacement parameters take the following options:

  • Source of the replacement text:
    • prefer-v2
    • prefer-v1
    • v2-only
    • v1-only
  • character encoding when the ID3v1 tag is used: v1-encoding=...
    • auto
    • iso-8859-1
    • ascii
    • cp1252
    • utf-8
    • utf-16-be
    • utf-16-le
    • utf-32
  • Handling “The...”: the=...
    • suffix (i.e. “The Pogues” will be changed to “Pogues, The”)
    • prefix
  • capitalization: cap=...
    • all-upper
    • all-lower
  • handling whitespace: either compress can be given (to merge space between words to a single space) or ws=TEXT can be given to replace whitespace (e.g. if ws=_ were given, “a b” would become “a_b”.

Lastly, the year can be formatted as two digits or four by giving “yy” or “yyyy” in the options for %(year).

E.g. %(artist:prefer-v2&v1-encoding=cp1252&the=suffix&compress) applied to a file whose ID3v2 tag had an artist frame of "The Pogues" would produce "Pogues, The".


2.7.3 File-based Replacements

There are a few more replacement parameters based on the file itself:

  • b,basename: The file basename
  • E,extension: The file extension (including the dot)

Both of these take the same “The,”, capitalization & whitespace options as Tag-Based Replacements.

  • 5,md5: the MD5 checksum of the file’s audio data
  • S,size: the file size, in bytes

Both of these take the following options:

  • base=(decimal|hex): specify the radix for the numbers
  • hex-case=(U|L): case to use for hexidecimal numbers

2.8 Invoking scribbu popm

scribbu popm creates or updates play count and/or popularimeter frames. With no options, it increments the counter fields in every play count and/or popularimeter frame in every tag by one. With the --create-frame flag, create the relevant frames in each tag. Popularimeter frames will not be created in the absence of the --owner option. Play count & popularimeter frame creation can be inhibited via the --popularimeter-only and --playcount-only flags, respectively.

The popularimeter rating field can be set using the --rating option. Ratings can be specified explicitly as an integer between 0 & 255, or as one-to-five stars. “Stars” would most naturally be expressed as *s (asterisks), but since this will often be inconvenient in the shell, scribbu will accept almost any character, repeated one-to-five times.


2.8.1 scribbu popm Options

  • -h,--help display help & exit with status zero
  • -n,--dry-run don’t modify the files named in the arguments; just print what would have been done
  • -f,--create-frame create playcount and/or popularimeter in any tags that are missing. This can be modified by the --popularimeter-only and --playcount-only flags, respectively. Popularimeter frames will only be created if the --owner flag is given, as well.
  • -c|--create-v2 create an ID3v2 tag with a POPM and/or PCNT frames for any file that has an ID3v1 tag but does not have an ID3v2 tag; may not be combined with --always-create-v2
  • -a|--always-create-v2 always create an ID3v2 tag with a POPM and/or PCNT frame for any file that does not possess one (whether or not there’s an ID3v1 tag present); may not be combined with --create-v2.
  • -b,--create-backups by default, the new tagset will be written in-place (emplacing, if possible); this option will cause a backup file to be made first.
  • -C COUNT,--count=COUNT set all counter fields to COUNT instead of incrementing
  • -i INCR,--increment=INCR increment all counter fields by INCR, instead of by one.
  • -o OWNER,--owner=OWNER Specify the owner field for popularimeter frames. If incrementing count fields, only popularimeter frames with an owner of OWNER will be updated. When creating popularimeter frames, the owner field will be set to OWNER.
  • -p,--playcount-only if present, this switch will limit operations to playcount frames only
  • -m,--popularimeter-only if present, this switch will limit operaitons to popularimeter frames only
  • -r RATING,--rating=RATING specify the rating for use in popularimeter tags. RATING may be given either as an integer between 0 & 255 (inclusive) or as one-to-five “stars”, given as [a-zA-Z@#%*+]{1,5} e.g. three stars could be expressed as “xxx” or “###” or “***”.
  • -t INDEX,--tag=INDEX specify a zero-based index describing which tag to alter, in case of multiple ID3v2 tags in a single file. This option may be given more than once to indicate multiple tags. If not given, all tags will be modified.
  • -u,--adjust-unsync adjust each tag’s use of the unysnchronisation scheme on write (by default, it’s never used)

2.9 Invoking scribbu text

scribbu text will create, udpate & delete various ID3v2 text frames & ID3v1 tag fields.


2.9.1 scribbu text Options

  • -c|--create-v2 create an ID3v2 tag with a POPM and/or PCNT frames for any file that has an ID3v1 tag but does not have an ID3v2 tag; may not be combined with --always-create-v2
  • -a|--always-create-v2 always create an ID3v2 tag with a POPM and/or PCNT frame for any file that does not possess one (whether or not there’s an ID3v1 tag present); may not be combined with --create-v2.
  • -a ALBUM,--album=ALBUM set the TALB, or Album/Movie/Show Title frame
  • -A ARTIST,--artist=ARTIST Set the TPE1, or Lead artist(s)/Lead performer(s)/Soloist(s)/Performing group frame
  • -e ENC,--encoded-by=ENC Set the TENC, or Encoded By frame
  • -g GENRE,--genre=GENRE Set the TCON, or Content time frame
  • -T TITLE,--title=TITLE Set the TIT2, or Title/Songname/Content description frame
  • -k TRACK,--track=TRACK Set the TRCK, or Track number/Position in set frame
  • -y YEAR,--year=YEAR Set the TYER, or Year frame
  • -d FRAME,--delete=FRAME Specify a frame to remove, if present; this option may be given more than once to delete multiple frames. Frames may be named by either their option name (e.g. ‘artist’) or by their ID3v2.3 frame ID (e.g. TPE1).
  • -E ENC,--encoding=ENC Specify the character encoding used in the input strings using the iconv name (‘ISO-8859-1’, e.g.) If not given, the system locale will be assumed (See Character Encodings. for a complete list of the encodings supported and their identifiers).
  • -t INDEX,--tag=INDEX Zero-based index of the tag on which to operate; may be given more than once to select multiple tags
  • -u,--adjust-unsync Update the unsynchronisation flag as needed on write (default is to never use it).
  • -b,--create-backups Create backup copies of all files before modifying them.

2.10 Invoking scribbu xtag

The author has defined a new ID3v2 frame representing a tag cloud. Tags may be have zero or more values associated with them, e.g. “hopeful” (zero values), or “decade=90s” (one value) or “sub-genres=smooth-jazz,bossa-nova” (two values) & so on.

The tag identifier is XTAG and a given ID3v2 tag may have multiple XTAG frames, each distinguished by a different owner– a null-terminated string with a URL containing an email address, or a link to a location where an email address can be found, that belongs to the organisation responsible for the frame.

scribbu xtag will create, udpate & delete an experimental tag cloud frame


2.10.1 scribbu xtag Options

  • -u,--adjust-unsync Update the unsynchronisation flag as needed on write (default is to never use it).
  • -b,--create-backups Create backup copies of all files before modifying them.
  • -c,--create-v2 Create an ID3v2 tag with a XTAG frame for any file that has an ID3v1 tag but does not have an ID3v2 tag (fields from the ID3v1 tag will be copied over).
  • -C,--always-create-v2 Always create an ID3v2 tag with the XTAG frame set appropriately for any file that does not possess one.
  • -f,--create-frame Create a new XTAG frame if not present.
  • -m,--merge Merge the given tags, don’t overwrite
  • -n,--dry-run Don’t do anything; just print what would be done.
  • -g,--get Print the existing tag cloud, if any; don’t set or update
  • -o OWNER,--owner=OWNER Operate only on XTAG frames with this owner, or specify the owner in case an XTAG frame is being created.
  • -t INDEX,--tag=INDEX Zero-based index of the tag on which to operate; may be given more than once to select multiple tags.
  • -T TAG-CLOUD,--xtags=TAG-CLOUD Tags to be set or merged expressed in HTTP query parameter style using URL-encoding, e.g. “foo&bar=has%20%2c&splat=a,b,c”

3 Scripting scribbu

The set of sub-commands sribbu offers, or could offer, is small in comparison to the number of operations one could possibly hope to carry out in managing ID3 tags. Sooner or later (likely sooner) you will want to do something you can’t accomplish via a sub-command.

For that reason, the bulk of the work on scribbu has been exposing the library’s functionality to a first-class language like LISP (see The Guile Reference Manual), to enable scribbu users to build their own solutions.


3.1 Worked Example

This chapter begins by demonstrating how to use the interactive Scheme REPL to explore solutions, then demonstrates building Scheme programs using scribbu, and finishes with some references.


3.1.1 The Scheme REPL

At the end of “scribbu rename” (see scribbu rename) there were a number of tag hygiene issues to be cleaned up. Let us begin experimenting with solutions. Invoking scribbu with no arguments at all will start the Scheme REPL:

$>: scribbu
scribbu 0.6.23
Copyright (C) 2017-2022 Michael Herstine <sp1ff@pobox.com>

You are in the Guile REPL; in your shell, type `info scribbu' for documentation.

GNU Guile 3.0.9
Copyright (C) 1995-2023 Free Software Foundation, Inc.

Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
This program is free software, and you are welcome to redistribute it
under certain conditions; type `,show c' for details.

Enter `,help' for help.
scheme@(guile-user)>

You are now at the Scheme prompt (“scheme” refers to the language currently in use and “guile-user” refers to the current module). You can type Scheme expressions & have your them evaluated:

scheme@(guile-user)> (format #t "Hello, world!")
Hello, world!$1 = #t
scheme@(guile-user)> (define x 1)
scheme@(guile-user)> (set! x (+ x 1))
scheme@(guile-user)> x
$2 = 2
scheme@(guile-user)> (if (> x 1) (format #t "Yes!\n"))
Yes!
$3 = #
scheme@(guile-user)>

scribbu exports assorted types & functions for working with ID3 tags to the Guile interpreter. Let’s take a look at that ownerless comment frame. We begin by reading in the ID3v2 tagset:

scheme@(guile-user)> (use-modules (oop goops) (scribbu))
scheme@(guile-user)> (define tags (read-tagset "opium.mp3"))
scheme@(guile-user)> tags
$4 = ((#<<id3v2-tag> 1ccf780> 3 #f))

read-tagset returns a list of three-tuples, one for each ID3v2 tag present in its argument. Since “opium.mp3” has only one ID3v2 tag, the list has only one element.

Function: read-tagset file

Read all ID3v2 tags from the beginning of file. Return a list of three-tuples, one for each tag. Each three tuple consists of an <id3v2-tag> instance, the ID3v2 version (“3” in this case) and a boolean indicating whether the unsynchronisation flag is set.

Let’s examine the tag:

scheme@(guile-user)> (define tag (caar tags))
scheme@(guile-user)> tag
$5 = #<<id3v2-tag> 1ccf780>
scheme@(guile-user)> (let ((frames (slot-ref tag 'frames)) (i 0)) (while (> (length frames) 0) (format #t "~d: ~a\n" i (slot-ref (car frames) 'id)) (set! i (+ i 1)) (set! frames (cdr frames))))
0: encoded-by-frame
1: track-frame
2: comment-frame
3: publisher-frame
4: part-of-a-set-frame
5: year-frame
6: genre-frame
7: album-frame
8: band-frame
9: artist-frame
10: unknown-frame
11: title-frame
12: play-count-frame
13: pop-frame
$6 = #f

We see that tag is an instance of the GOOPS class <id3v2-tag>, and that it has 14 frames. Frame two (counting from zero) is that comment frame:

scheme@(guile-user)> (slot-ref (list-ref (slot-ref tag 'frames) 2) 'dsc)
$7 = ""

As expected, the description field is an empty string– let’s fix that:

scheme@(guile-user)> (slot-set! (list-ref (slot-ref tag 'frames) 2) 'dsc "sp1ff@pobox.com")
$8 = "sp1ff@pobox.com"
# check
scheme@(guile-user)> (slot-ref (list-ref (slot-ref tag 'frames) 2) 'dsc)
$9 = "sp1ff@pobox.com"

Now what about that ID3v1 genre?

scheme@(guile-user)> (define v1 (read-id3v1-tag "opium.mp3"))
scheme@(guile-user)> v1
$10 = #<<id3v1-tag> 18fb8c0>
scheme@(guile-user)> (slot-ref v1 'genre)
$11 = 255
Function: read-id3v1-tag file

Reads the ID3v1 tag, if any, from file.

Let’s set that to “Lounge”– the Winamp genre list sets that to 171:

scheme@(guile-user)> (slot-set! v1 'genre 171)
$12 = 171

What remains is writing out our modifications to their respective tags. We could do this directly in the REPL, but let’s capture our work in the form of a program.


3.1.2 Writing Scheme Programs with scribbu

scribbu understands both its own command-line parameters as well as those understood by the guile command. When it sees parameters applicable to guile, it will collect them and pass them on to the Scheme interpreter (when this makes sense, of course; supplying guile options while invoking a scribbu sub-command, for instance, would make no sense & results in an error). This means that scribbu can take advantage of the guile scripting options (see Guile Scripting in The Guile Reference Manual).

Continuing our example, let us capture our work so far:

#!/usr/local/bin/scribbu -e main -s
#!
(use-modules (oop goops) (scribbu))

(define (main)
    (let* ((tags (read-tagset "opium.mp3"))
           (v1   (read-id3v1-tag "opium.mp3"))
           (tag  (caar tags)))
        (slot-set! (list-ref (slot-ref tag 'frames) 2) 'dsc "sp1ff@pobox.com")
        (slot-set! v1 'genre 171)))

This Scheme program of course does nothing; it corrects the orphaned comment frame as well as the ID3v1 genre, but only in-memory. Let us write these out to disk. Writing out the ID3v1 is simpler since it’s a fixed size, so we’ll start with that:

#!/usr/local/bin/scribbu -e main -s
#!
(use-modules (oop goops) (scribbu))

(define (main)
    (let* ((tags (read-tagset "opium.mp3"))
           (v1   (read-id3v1-tag "opium.mp3"))
           (tag  (caar tags)))
        (slot-set! (list-ref (slot-ref tag 'frames) 2) 'dsc "sp1ff@pobox.com")
        (slot-set! v1 'genre 171)
        (write-id3v1-tag v1 "optimum.mp3")))

Writing an ID3v1 tag is also easier because it is appended to the file.

Function: write-id3v1-tag tag file

Write the ID3v1 tag tag to file, overwriting any (ID3v1) tag that was there previously. Nb. tag may be written as an ID3v1, ID3v1.1 and/or an ID3v1 enhanced tag, depending on the precise contents of tag.

Writing ID3v2 tagsets is more complicated, since their size can vary. write-tagset can either make a wholesale copy of the file, or attempt to emplace the new tagset at the beginning of the extant file (which is the default):

#!/usr/local/bin/scribbu -e main -s
#!
(use-modules (oop goops) (scribbu))

(define (main)
    (let* ((tags (read-tagset "opium.mp3"))
           (v1   (read-id3v1-tag "opium.mp3"))
           (tag  (caar tags)))
        (slot-set! (list-ref (slot-ref tag 'frames) 2) 'dsc "sp1ff@pobox.com")
        (slot-set! v1 'genre 171)
        (write-id3v1-tag v1 "optimum.mp3")
        (write-tagset (list (list tag 3)) "opium.mp3")))
Function: write-tagset tagset file #:copy copy #:apply-unsync apply-unsync

Write a list of <id3v2-tag> instances to file. tagset is a list of two-element lists; the first in each is the <id3v2-tag> instance to be written, the second is an integer designating the ID3v2 version to use (i.e. 2, 3 or 4)

If keyword argument copy is #t, first write the new tagset to a new file, then append file’s contents after the existing tagset (if any) to that new file, and then rename the new file over the original (making a backup copy first). Otherwise (if copy is #f, attempt to emplace the new tagset, perhaps by adjusting padding, if possible (and fallback to copying).

Keyword argument apply-unsync controls whether the unsynchronisation scheme is applied to each tag. Set this to #f (the default) to never do so, #t to always do so, and as-needed to do so only if the tag needs unsynchronisation.


3.1.3 Getting More Information

References:

  1. See Introduction in The Guile Reference Manual.
  2. Scheme versus Common Lisp https://www.cs.utexas.edu/~novak/schemevscl.html
  3. Schemers.org https://schemers.org

3.2 ID3v1 tags

The orginal ID3v1 tag contained title, artist, album, year, comment & genre. The fields are fixed-size (30, 30, 30, 4, 30 & 1 byte, respectively). The original proposal called for filling out the fields with nil (zero) values, but that is not universally implemented (Winamp, for instance, pads fields out with ASCII spaces (i.e. 32 = 0x20)).

Michael Mutschier observed that if the fields were zero-padded, an implementation will likely stop on reading the first nil. Therefore, if the second-to-last byte of a field is nil, a one-byte value could be stored in the last field. He proposed storing the track number in the last byte of the comment field. This became known as ID3v1.1.

A thirty-byte limit quickly became constraining, leading to the ID3v1 “enhanced” specification. The origins of the proposal are unclear to me, but the proposal itself involves prepending a second two-hundred twenty-seven byte block to the ID3v1 block. This would extend the title, artist & album fields by sixty bytes each, adds a thirty-byte free-form genre field, and introduces start-time, end-time, and “speed” fields.

scribbu represents the ID3v1 tag by the GOOPS class <id3v1-tag>:

(define-class <id3v1-tag> ()
  (title      #:init-value ""  #:accessor title       #:init-keyword #:title)
  (artist     #:init-value ""  #:accessor artist      #:init-keyword #:artist)
  (album      #:init-value ""  #:accessor album       #:init-keyword #:album)
  (year       #:init-value '() #:accessor year        #:init-keyword #:year)
  (comment    #:init-value ""  #:accessor comment     #:init-keyword #:comment)
  (genre      #:init-value 255 #:accessor genre       #:init-keyword #:genre)
  (track-no   #:init-value '() #:accessor track-no    #:init-keyword #:track-no)
  (enh-genre  #:init-value '() #:accessor enh-genre  #:init-keyword #:enh-genre)
  (speed      #:init-value '() #:accessor speed       #:init-keyword #:speed)
  (start-time #:init-value '() #:accessor start-time #:init-keyword #:start-time)
  (end-time   #:init-value '() #:accessor end-time   #:init-keyword #:end-time))

The class’ fields include the union of all ID3v1, ID3v1.1 and ID3v1 enhanced fields. All fields above & beyond those present in ID3v1 however, have a default alue of '() (or nil, in Scheme). Whether a given <id3v1-tag> instance is ID3v1, ID3v1.1, and/or ID3v1 enchanced is implicitly determined by whether any of these fields are non-nil.

One can create an <id3v1-tag> instance directly, like any GOOPS class:

(use-modules (oop goops))
(define tag (make <id31-tag> #:title  "The Body of an American"
                             #:artist "Pogues, The"
                             #:album  "Poguetry in Motion"
                             #:year   "1986"
                             #:genre  88))

One can also create an instance from an existing tag on disk:

(use-modules (scribbu))
(define tag (read-id3v1-tag "foo.mp3"))
(format #t "~s - ~s\n" (slot-ref tag #:artist) (slot-ref tag #:title))

<id3v1-tag> instances can be written to disk via write-id3v1-tag: (write-id3v1-tag tag "bar.mp3"). If any of the title, artist or album slotes are longer than thirty characters, or any of the new fields (enhanced genreo, speed, start-time or end-time) are non-nil, it will be written as an ID3v1 enhanced tag.


3.3 ID3v2 tags

The various flavors of ID3v1 tags had obvious limitations, leading to the introduction in 1998 of ID3v2 (by Martin Nilsson, Michael Mutschler et al.). Despite the name, this format has nothing to do with ID3v1. ID3v2 tags are much more complex. The tags are pre-pended to the files they describe. They are comprised of one or more frames, each of which contains one piece of information. There is provision for padding appended to the tag, to permit subsequent augmentation of the tag without having to re-write the entire file. ID3v1 tags suffered from the fact that they encoded text as ASCII (ISO-8859-1, at most): ID3v2 carried the encoding scheme along with textual information.

Furthermore, there are three versions of the ID3v2 spec that saw general use:

  • ID3v2.2 was the first public version; it used three-character frame identifiers instead of four; it is generally considered obsolete.
  • ID3v2.3 introduced four-character frame identifiers as well as adding a number of new frames along with a second, extended header. This is the version most frequently encountered in the wild.
  • ID3v2.4 was the last version published, but never saw widespread adoption.

3.3.1 The Unsynchronisation Scheme

MPEG decoding software uses a two-byte sentinel value in the input stream to detect the beginning of the audio. MPEG decoding software that is not ID3-aware could mistakenly interpret that value as the beginning of the audio should it happen to occur in an ID3v2 tag. Unsynchronisation is an optional encoding scheme for the ID3v2 tag to prevent that. "Unsynchronisation may only be made with MPEG 2 layer I, II and III and MPEG 2.5 files" (http://id3.org/id3v2-00).

More specifically, whenever a two byte combination of the form:

11111111 111xxxxx

(i.e. 0xFF 0xEx or 0xFF 0xFx) is encountered in an ID3v2 tag to be written to disk, it is replaced with:

11111111 00000000 111xxxxx

and the unsynchronisation flag will be set.

This leaves us with an ambiguous situation on read: if we encounter a bit pattern

11111111 00000000 111xxxxx

when reading a tag with the unsynchronisation flag set, we have no way to know whether that was a false sync that was unsynchronised (and so the three bytes should be interpreted as 11111111 111xxxxx or whether those three bytes had occurred naturally in the tag when it was written. To resolve this, on applying unsynchronisation all two-byte sequences of the form $FF 00 should also be written as $FF 00 00.

ID3v2.4 introduced unsynchronisation at a frame level; the unsynchronisation flag in the header being set indicates that all frames are unsynchronised; unset in the header means that at least one frame is *not* unsynchronised.

Note that since the point of unsynchronisation is to avoid presenting a false sync point to the MPEG decoding software, unsynchronisation should be employed last, after any compression or encryption.


3.3.2 ID3v2 Frames

All ID3v2 frames subclass GOOPS class <id3v2-frame>:

(define-class <id3v2-frame> ()
  (id     #:init-value 'unknown-frame #:accessor id     #:init-keyword #:id)
  (tap    #:init-value '()            #:accessor tap    #:init-keyword #:tap)
  (fap    #:init-value '()            #:accessor fap    #:init-keyword #:fap)
  (ro     #:init-value '()            #:accessor ro     #:init-keyword #:ro)
  (unsync #:init-value '()            #:accessor unsync #:init-keyword #:unsync))

id is a symbol naming the frame (see Frame Identifiers). The remaining four fields are frame flags that can be either true (#t), false (#f) or just left undefined ('()):

  1. tap Tag Alter Preserve “This flag tells the software what to do with this frame if it is unknown and the tag is altered in any way. This applies to all kinds of alterations, including adding more padding and reordering the frames.” Sec 3.3.1
  2. fap File Alter Preserve “This flag tells the software what to do with this frame if it is unknown and the file, excluding the tag, is altered. This does not apply when the audio is completely replaced with other audio data.” Sec 3.3.1
  3. ro Read Only “This flag, if set, tells the software that the contents of this frame is intended to be read only. Changing the contents might break something, e.g. a signature. If the contents are changed, without knowledge in why the frame was flagged read only and without taking the proper means to compensate, e.g. recalculating the signature, the bit should be cleared.” Sec 3.3.1
  4. unsync Unsynchronisation In ID3v2.2 & ID3v2.3, a value of #t for this flag indicates that the unsynchronisation scheme See The Unsynchronisation Scheme, has been applied to this tag. In ID3v2.4, it indicates that it has been applied to all frames.

Module scribbu defines a few <id3v2-frame> sub-classes.


3.3.2.1 <text-frame>

A great many ID3v2 frames represent textual information (title, artist &c) and are represented in a uniform way, distinguished only by frame identifer. scribbu represents such frames as instances of <id3v2-text-frame>:

(define-class <text-frame> (<id3v2-frame>)
  (text #:init-value "" #:accessor text #:init-keyword #:text))

3.3.2.2 <comment-frame>

<comment-frame> encodes the COM & COMM (comment) frames. #:lang is a three-letter ISO-639-2 language code. The #:dsc fields is described in the specification as a “short content description”.

(define-class <comment-frame> (<id3v2-frame>)
  (lang  #:init-value "eng" #:accessor lang #:init-keyword #:lang)
  (dsc   #:init-value ""    #:accessor dsc  #:init-keyword #:dsc)
  (text  #:init-value ""    #:accessor text #:init-keyword #:text))

3.3.2.3 <user-defined-text-frame>

<user-defined-text-frame> encodes the TXX & TXXX (user-defined text) frames. The #:dsc fields is a description of the textual information & #:text is the information itself. There may be multiple user-defined text frames in a tag, but only one with a given description. Cf. section 4.2.2 of the ID3v2 spec.

(define-class <user-defined-text-frame> (<id3v2-frame>)
  (dsc   #:init-value "" #:accessor dsc  #:init-keyword #:dsc)
  (text  #:init-value "" #:accessor text #:init-keyword #:text))

3.3.2.4 <play-count-frame>

<play-count-frame encodes the CNT & PCNT (play count) frames. The #:count field is simply a counter recording the number of times the file has been played (see scribbu popm). There may be only one <play-count-frame frame in a tag. Cf. section 4.17 of the ID3v2 spec.

(define-class <play-count-frame> (<id3v2-frame>)
  (count #:init-value 0 #:accessor count #:init-keyword #:count))

3.3.2.5 <popm-frame>

<popm-frame> encodes the POP & POPM (popularimeter) frames. <popm-frame> combines an eight-bit rating field with a <play-count-frame>-style play count. Unlike <play-count-frame>, there may be multiple <popm-frame> frames because each is tagged with the e-mail address of the author.

(define-class <pop-frame> (<id3v2-frame>)
  (e-mail #:init-value "" #:accessor e-mail #:init-keyword #:e-mail)
  (rating #:init-value 0  #:accessor rating #:init-keyword #:rating)
  (count  #:init-value 0  #:accessor count  #:init-keyword #:count))

3.3.2.6 <tag-cloud-frame>

<tag-cloud-frame> represents the author’s “tag cloud” (XTAG) frame. Like <popm-frame>, there may be multiple <tag-cloud-frame> frames because each is tagged with the e-mail address of the author. The tag cloud itself (field tags) is represented by an URL-encoded string.

(define-class <tag-cloud-frame> (<id3v2-frame>)
  (owner #:init-value ""  #:accessor owner #:init-keyword #:owner)
  (tags  #:init-value '() #:accessor tags  #:init-keyword #:tags))

3.3.2.7 <unk-frame>

Frames about which scribbu does not know may be encoded as <unk-frame> instances:

(define-class <unk-frame> (<id3v2-frame>)
  (id-text #:init-value ""     #:accessor frameid #:init-keyword #:frameid)
  (data    #:init-value #vu8() #:accessor data    #:init-keyword #:data))

The data field will contain everything beyond the ID3v2 header; i.e. the frame identifier & flags will have been parsed out.


3.3.3 <id3v2-tag>

The scribbu ID3v2 tag abstraction doesn’t try to model the various versions of the ID3v2 spec. Rather, it encodes a “generic” ID3v2 tag; the version to which it shall be serialized is specified at write time, and the version from which it was deserialized is returned at read time (See ID3v2 Serialization.)

(define-class <id3v2-tag> ()
  (experimental #:init-value '() #:accessor experimental
                #:init-keyword experimental)
  (frames       #:init-value '() #:accessor frames  #:init-keyword #:frames)
  (padding      #:init-value   0 #:accessor padding #:init-keyword #:padding))

3.3.3.1 ID3v2 Serialization

While you can of course create an <id3v2-tag> instance “from scratch” (in-memory, as a result of a call to (make <id3v2-tag> ...) you will more frequently be reading them from files on disk.

The function for doing this is read-tagset. The name is intended as a reminder that a file can have multiple ID3v2 tags, so you are in general reading a tag set, not just a tag.

scheme@(guile-user)> (define tags (read-tagset "opium.mp3"))
scheme@(guile-user)> tags
$1 = ((#<<id3v2-tag> 56188c2a3d80> 3 #f))

read-tagset returns a list of three-tuples, one tuple for each tag (so it could return '(), if the file contained no ID3v2 tags). Each three tuple contains:

  1. an <id3v2-tag> instance, representing the tag
  2. the ID3v2 version as which the tag was serialized (i.e. 2, 3 or 4)
  3. a boolean indicating whether the unsynchronisation bit was set (see The Unsynchronisation Scheme).

Once you’ve created or updated your ID3v2 tag(s), you will presumably want to write it (them) to disk, presumably in place of an existing tagset. This is done via write-tagset(tags, file, ...). tags is a list of two-tuples: the first element is always an <id3v2-tag> isntance to be written to disk & the second is the ID3v2 version under which it shall be serialized (i.e. an int, either 2, 3 or 4). file is the file into which the new tagset shall be written, replacing any tagset present therein.

write-tagset takes a few optional parameters:

  1. #:apply-unsync governs whether The Unsynchronisation Scheme, should be applied when writing out the given tags: #f (the default) means never, #t means it will always be applied and 'as-needed means that it will be applied to any tag whose serialization would contain false syncs.
  2. #:copy governs whether a backup copy of the target file will be made: a value of #f (the default) means that the new tagset will be written in place (moving the audio data & ID3v1 tag, if any, if needed) and a value of #t means that the target file will be copied to a backup, the new tagset will be written, and then the track data & ID3v1 tag (if any) will be copied over to the new file.

3.3.3.2 Miscellaneous Functions

Module (scribbu) provides a few other functions that can be useful for working with ID3v2 tags & frames.

Function: with-track-in directory fn

with-track-in(directory, fn) is a convenience function; it will iterate over all filesystem entities in directory and apply fn to them. fn shall be a function taking three parameters:

  1. a tagset, such as what is returned from read-tagset
  2. a string naming the fileystem entity
  3. an ID3v1 tag (nil if none exists)

Example:

scheme@(guile-user)> (with-track-in "." (lambda (tags pth v1) (format #t "~s has ~d ID3v2 tags\n" pth (length tags))))
"./track.dat" has 0 ID3v2 tags
"./id3v22-tda.mp3" has 1 ID3v2 tags
...
Function: has-frame? tag id

Return #t if tag has a frame with identifier id.

Function: get-frames tag id

Returns a (possibly empty) list of frames in tag with identifier id.


3.4 Text Encoding

The various string fields bring up the question: what text encoding is used? There are actually three text encodings in play:

  1. the encoding in use in your Scheme source files
  2. the encoding in use within the Guile interpreter
  3. the encoding in use in libscribbu

The first is documented in the Guile manual under Character Encoding of Soruce Files in The Guile Reference Manual. The upshot is this: UTF-8 is assumed, but the author may tell Guile what is being used through a coding hint:

;;; coding: iso-8859-1

The set of encodings recognized is defined by IANA in RFC2978.

The second is also documented in the Guile manual, under String Internals in The Guile Reference Manual:

Guile stores each string in memory as a contiguous array of Unicode code points along with an associated set of attributes. If all of the code points of a string have an integer range between 0 and 255 inclusive, the code point array is stored as one byte per code point: it is stored as an ISO-8859-1 (aka Latin-1) string. If any of the code points of the string has an integer value greater that 255, the code point array is stored as four bytes per code point: it is stored as a UTF-32 string.

Conversion between the one-byte-per-code-point and four-bytes-per-code-point representations happens automatically as necessary.

That just leaves libscribbu. On read (that is, when the library reads text from tags on disk), the encoding is sometimes specified by the tag itself, or is specified by the caller, or is guessed. From there, it will be converted to a Guile string. On write, text will be converted from the internal Guile representation to the desired text encoding on disk (deduced from either caller preferences or the frame settings themselves).


4 Using libscribbu

The third way in which to use scribbu is to link against the library libscribbu. Detailed documentation can be found in the libscribbu source itself (Doxygen documentation can be produced by doing cd doc && make doxygen-doc).

While detailed documentation on individual classes, free functions, and sub-systems may one day make it’s way into this manual, for now this chapter will describe using the library through a worked example. This example can be found in the examples/az-tags sub-directory of the scribbu source distribution.


4.1 Building a libscribbu program

Let us write a small C++ program using libscribbu to clean-up Amazon.com Song IDs. When downloading .mp3s from Amazon.com, their ID3v2 tags contain non-compliant comment frames (in that they have no description). They also try to cram it into the comment field in the ID3v1 tag, even though it’s generally too small to contain the entire string. We will call this program az-tags.


4.1.1 Implementing az-tags

The complete source for the program can be found in examples/az-tags/main.cc in the source distribution. The logic is simple enough to fit completely in main. The usage is:

az-tags [-h] [-v] file [file...]

Skipping command line parsing, we begin by initializing the library:

// ...
#include <scribbu/scribbu.hh>
// ...
int
main(int argc, char * argv[])
{
    // Parse command-line options...
    scribbu::static_initialize()

libscribbu needs to carry out assorted initialization; rather than deal with the static initialization problem, it just depends on the caller to explicitly initialize the library.

At this point, the first filename is waiting in argv[optind], so we can set the basic structure of the program:

    for (int i = optind; i < argc; ++i) {
        // ...
    }

For each file, we will open it & parse it into its ID3v2 tags, track data and ID3v1 tag:

  for (int i = optind; i < argc; ++i) {

    fs::ifstream ifs(argv[i], ios_base::binary); // 1

    vector<unique_ptr<scribbu::id3v2_tag>> id3v2;
    scribbu::read_all_id3v2(ifs, back_inserter(id3v2)); // 2
    scribbu::track_data td((istream&)ifs); // 3
    unique_ptr<scribbu::id3v1_tag> pid3v1 = scribbu::process_id3v1(ifs); // 4

At 1, we open the file, taking care to use binary mode so as to avoid newline translation. At 2 we ask libscribbu to read any and all ID3v2 tags into id3v2. We’ve used a vector here, but we can use any container providing a forward output iterator.

At this point, the file pointer is pointing just past the last ID3v2 tag (if any– there may be none, in which case the file pointer remains at the beginning of the file and id3v2 is empty). The easiest way to consume the track data is to construct a track_data instance with it. This will collect some data about the track and advance the file pointer to the one-past-the-end point.

There may or may not be some kind of ID3v1 tag waiting for us. That is why process_id3v1 returns a unique_ptr– if there is no ID3v1 tag, a null pointer will be returned.

We now have zero or more ID3v2 tags to be processed in id3v2:

    for (auto &ptag: id3v2) {
        // `ptag' is a reference to a unique_ptr<id3v2_tag>
        // how to get at its frames?
    }

It is at this point that the libscribbu API turns out to be less than ergonmic. The issue is that read_all_id3v2 returns the tags typed as pointers to id3v2_tag; this is a base class providing a “generic” interface supported by all ID3v2 tags, but the API for iterating over frames is provided individually by each sub-class (id3v2_2_tag, id3v2_3_tag & id3v2_4_tag).

Perhaps it would be worth it to provide an interface on the base class to do this, but for now, I simply dynamic_cast & dispatch to a template function process_tag:

    for (auto &ptag: id3v2) {
      switch (ptag->version()) {
      case 2: { // ID3v2.2 tag
        scribbu::id3v2_2_tag &p = dynamic_cast<scribbu::id3v2_2_tag&>(*ptag);
        process_tag(p);
        break;
      }
      case 3: { // ID3v2.3 tag
        scribbu::id3v2_3_tag &p = dynamic_cast<scribbu::id3v2_3_tag&>(*ptag);
        process_tag(p);
        break;
      }
      case 4: { // ID3v2.4 tag
        scribbu::id3v2_4_tag &p = dynamic_cast<scribbu::id3v2_4_tag&>(*ptag);
        process_tag(p);
        break;
      }
    default:
      cerr << "Unknown ID3v2 revision " << ptag->version() << endl;
      abort();
      }
    }

The template parameter is the id3v2_tag sub-class. Since there are only three, I can factor out the ID3v2-version-specific logic into a traits class:

template <class tag_type>
void
process_tag(tag_type &T)
{
 ...

  for (auto fp: T) { // 1
    if (traits_type::COMMID == fp->id()) { // 2

      id3v2_frame &F = fp;
      comm_type &C = dynamic_cast<comm_type&>(F); // 3

      string dsc = C.template description<string>();
      if (dsc.empty()) {
        string txt = C.template text<string>();
        if ("Amazon.com Song ID" == txt.substr(0, 18)) {
          cout << "updating the comment frame containing " << txt << endl;
          fp = traits_type::replace(C);
        }
      }
    }
  }
}

Each concreate id3v2_tag subclass implements begin & end, so we can use instances thereof as targets in for range loops like 1. fp is actually a mutable proxy for an ID3v2-version-specific id3v2_frame subclass. At 2 we have factored out the precise frame ID to select for comments frames.

Each ID3v2 version has a concrete comment frame type, to which we again dynamically cast (I really need to re-evaluate this interface) at 3.

The rest of the logic is straightforward– if there is no description field in the comments frame, and the comment text begins with “Amazon.com Song ID”, replace the frame.


4.1.2 building az-tags

The next step is to compile the program. We shall use Autotools, beginning with the simplest configure.ac we can:

AC_PREREQ([2.69])
AC_INIT([az-tags], [0.1], [sp1ff@pobox.com])
AC_CONFIG_MACRO_DIR([macros])
AC_CONFIG_SRCDIR([src/main.cc])
AC_CONFIG_AUX_DIR([build-aux])
AC_CONFIG_HEADERS([config.h])
AM_INIT_AUTOMAKE([-Wall -Werror])
LT_INIT
AC_RROG_CXX
AC_CONFIG_FILES([Makefile src/Makefile])

AC_PREREQ just asserts that Autoconf 2.69 is required to build a configure script from this template. AC_INIT is the Autoconf initialization macro. We’re going to need some custom macros for this project, so AC_CONFIG_MACRO_DIR tells Autoconf where to find them. AC_CONFIG_SRCDIR is just a sanity check– when running configure users will sometimes pass an incorrect value for --srcdir– this macro equips the generated configure script to catch that. AC_CONFIG_AUX_DIR tells Autoconf to place auxilliary scripts (missing & ionstall-sh, e.g.) in a sub-directory named build-aux.

AC_CONFIG_HEADERS tells Autoconf to generate a header file named config.h containing C preprocessor #defines for the project. Note that we need to generate a template file config.h.in via autoheader.

Finally, we initialize Automake, libtool, check for a C++ compiler & produce Makefile templates.

The Autmake template for the root makefile is trivial:

SUBDIRS = src

Let us begin the Makefile template in src:

bin_PROGRAMS = az-tags
az_tags_SOURCES = main.cc
AM_CXXFLAGS = -std=c++17

We will need to perform some one-time setup:

mkdir build-aux
touch NEWS README AUTHORS ChangeLog
autoheader
aclocal
autoconf
automake --add-missing

At this point, we can run ./configure, but make will fail miserably. Our program needs to be able to find scribbu, openssl and boost includes, along with the corresponding libraries. All the required libraries other than libscribbu provide pre-built macros which we can copy from the scribbu source distro into macros. Let us add the following lines to configure.ac, just before the call to AC_CONFIG_FILES:

PKG_CHECK_MODULES([GUILE], [guile-2.2])
AX_BOOST_BASE([1.58], [],
    [AC_MSG_ERROR([Scribbu requires boost_base 1.58 or later.])])
echo "Checkpoint 3: BOOST_LDFLAGS is $BOOST_LDFLAGS;" >&AS_MESSAGE_LOG_FD

AX_BOOST_IOSTREAMS
AX_BOOST_FILESYSTEM
AX_BOOST_SYSTEM
AX_CHECK_OPENSSL([],[AC_MSG_ERROR([Scribbu requires openssl.])])

Each of these will define Automake variables describing where we can find headers & libraries which we can add to src/Makefile.am, which now reads:

bin_PROGRAMS = az-tags
az_tags_SOURCES = main.cc
AM_CPPFLAGS = $(BOOST_CPPFLAGS)
AM_CXXFLAGS = -std=c++17 $(GUILE_CFLAGS)
AM_LDFLAGS = $(BOOST_LDFLAGS)
LDADD = $(GUILE_LIBS)           \
	$(BOOST_SYSTEM_LIB)     \
	$(BOOST_FILESYSTEM_LIB) \
	$(BOOST_IOSTREAMS_LIB)  \
	$(OPENSSL_LIBS)

This just leaves the question of where to find libscribbu. scribbu, at the time of this writing, provides no Autoconf macros (however, this sample provided the author the opportunity to prototype one).

We add the following code to configure.ac, just after the call to AC_PROG_CXX (it’s a lot of code; step-by-step explanation to follow):

AC_ARG_WITH([scribbu],
    AS_HELP_STRING([--with-scribbu=DIR],
                   [root directory of scribbu installation]),
    [
        case "$withval" in
	"" | y | ye | yes | n | no)
	    AC_MSG_ERROR([--with-scribbu takes a root directory]);;
	*)
	    scribbu_dirs="$withval";;
	esac
    ],
    [
        # Just use the defaults
	scribbu_dirs="/usr/local /usr /opt/local /sw"
    ])

dnl One way or another, we have one or more candidates in ${scribbu_dirs}
found=no
for scribbu_home in ${scribbu_dirs}; do
    AC_MSG_CHECKING([for scribbu/scribbu.h under ${scribbu_home}])
    if test -f "${scribbu_home}/include/scribbu/scribbu.hh"; then
        SCRIBBU_INCLUDES="-I${scribbu_home}/include/scribbu"
	SCRIBBU_LDFLAGS="-L${scribbu_home}/lib"
	SCRIBBU_LIBS="-lscribbu"
	found=yes
	AC_MSG_RESULT([yes])
	break
    else
        AC_MSG_RESULT([no])
    fi
done

if test "$found" != "yes"; then
    AC_MSG_ERROR([couldn't find scribbu])
fi

# try the preprocessor and linker with our new flags,
# being careful not to pollute the global LIBS, LDFLAGS, and CPPFLAGS
AC_MSG_CHECKING([whether compiling and linking against scribbu will work])

save_LIBS="$LIBS"
save_LDFLAGS="$LDFLAGS"
save_CPPFLAGS="$CPPFLAGS"
LIBS="$SCRIBBU_LIBS $LIBS"
LDFLAGS="$SCRIBBU_LDFLAGS $LDFLAGS"
CPPFLAGS="$SCRIBBU_CPPFLAGS $CPPFLAGS"

AC_LANG_PUSH([C++])
AC_CHECK_HEADER([scribbu/scribbu.hh], [scribbu_hh=yes], [scribbu_hh=no])
# I'd like to do AC_CHECK_LIB here, but I can't link against libscribbu
# in a test because it, in turn depends on a bunch of other libs
AC_CHECK_FILE([${scribbu_home}/lib/libscribbu.la],
    [scribbu_la=yes], [scribbu_la=no])
AC_LANG_POP([C++])

LIBS="$save_LIBS"
LDFLAGS="$save_LDFLAGS"
CPPFLAGS="$save_CPPFLAGS"

if test "yes" = "$scribbu_hh" && test "yes" = "$scribbu_la"; then
    AC_DEFINE([HAVE_SCRIBBU], [1], [Define to 1 if you have libscribbu])
else
    AC_MSG_ERROR([az-tags requires scribbu])
fi

AC_SUBST([SCRIBBU_CPPFLAGS])
AC_SUBST([SCRIBBU_LIBS])
AC_SUBST([SCRIBBU_LDFLAGS])

The first step is to locate libscribbu. We will form the variable scribbu_dirs containing one or more directories to check. Now, the user could always just tell us where it is. That is the reason we begin with AC_ARG_WITH: if the user invokes configure with --with-scribbu=... we will just use that. Otherwise, we will examine a default set of locations.

That’s what the for look does; for each location in scribbu_dirs, it checks for scribbu.hh in a sub-directory named include/scribbu of the current location. On success, we set a few variables recording that result & break. If we check all locations without success, then we fail.

Now, just because we found a header file at a given place doesn’t mean we can biuld against it or its associated library. The typical idiom is to execute the macros AC_CHECK_HEADER and AC_CHECK_LIB to make sure we can include the header and link against the library, respectively.

The problem in my case is that AC_CHECK_LIB will fail, not through any fault of libscribbu, but because it depends on a number of other libraries; the test will fail with unresolved externals & I can’t see how to add the relevant link flags in the macro. Instead, I settle for AC_CHECK_FILE.

If both these pass, we know we’re good to go; the question remains: how to record the information we’ve just discovered? The Autoconf manual states that one should never add options to user variables such as CPPFLAGS. The idiom seems to be to define new variables that the Automake author can add to their rules. In this case, create three new variables:

  1. SCRIBBU_CPPFLAGS to hold the -I option that will enable the build to find the libscribbu headers
  2. SCRIBBU_LIBS to hold the the -L options that will enable the build to link against libscribbu
  3. SCRIBBU_LDLAGS to hold any linker required flags

This lets us augment src/Makefile.am to:

bin_PROGRAMS = az-tags
az_tags_SOURCES = main.cc
AM_CPPFLAGS = $(BOOST_CPPFLAGS) $(SCRIBBU_CPPFLAGS)
AM_CXXFLAGS = -std=c++17 $(GUILE_CFLAGS)
AM_LDFLAGS = $(SCRIBBU_LDFLAGS) $(BOOST_LDFLAGS)
LDADD = $(SCRIBBU_LIBS)         \
        $(GUILE_LIBS)           \
	$(BOOST_SYSTEM_LIB)     \
	$(BOOST_FILESYSTEM_LIB) \
	$(BOOST_IOSTREAMS_LIB)  \
	$(OPENSSL_LIBS)

With that, we can configure:

$>: autoreconf -vfi
autoreconf: Entering directory `.'
autoreconf: configure.ac: not using Gettext
autoreconf: running: aclocal --force
...
$>: ./configure --prefix=$HOME
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
...
config.status: creating Makefile
config.status: creating src/Makefile
config.status: creating config.h
config.status: executing depfiles commands
config.status: executing libtool commands
$>: make
make  all-recursive
make[1]: Entering directory '/tmp/az-tags'
Making all in src
make[2]: Entering directory '/tmp/az-tags/src'
g++ -DHAVE_CONFIG_H -I. -I/home/mgh/doc/code/projects/az-tags/src -I..  -I/usr/include   -std=c++17 -pthread -I/usr/local/include/guile/2.2 -g -O2 -MT main.o -MD -MP -MF .deps/main.Tpo -c -o main.o /home/mgh/doc/code/projects/az-tags/src/main.cc
...

We have a build! Let us take a look at a file downloaded from Amazon.com:

$>: scribbu dump lorca.mp3
"lorca.mp3":
ID3v2.3(.0) Tag:
452951 bytes, synchronised
...
COMM (<no description>):
Amazon.com Song ID: 203558254
...
9425708 bytes of track data:
MD5: 48ff9cadea7d842e9059db25159d2daa
ID3v1.1: The Pogues - Lorca's Novena
Hell's Ditch [Expanded] (US Ve (track 5), 1990
Amazon.com Song ID: 20355825
unknown genre 255

$>: src/az-tags lorca.mp3
lorca.mp3 has 1 ID3v2 tags, and an ID3v1 tag
updating the comment frame containing Amazon.com Song ID: 203558254
all tags processed; emplacing new tagset...
emplacing new tagset...done.
clearing ID3v1 comment
$>: scribbu dump lorca.mp3
"lorca.mp3":
ID3v2.3(.0) Tag:
452951 bytes, synchronised
...
COMM (amazon.com song id):
Amazon.com Song ID: 203558254
...
9425708 bytes of track data:
MD5: 48ff9cadea7d842e9059db25159d2daa
ID3v1.1: The Pogues - Lorca's Novena
Hell's Ditch [Expanded] (US Ve (track 5), 1990

unknown genre 255

Frame Identifiers

Symbol2.32.4+
’album-frameTALTALB
’artist-frameTP1TPE1
’band-frameTP2TPE2
’bpm-frameTBPTBPM
’comment-frameCOMCOMM
’composer-frameTCMTCOM
’conductor-frameTP3TPE3
’content-group-frameTT1TIT1
’copyright-frameTCRTCOP
’date-frameTDATDAT
’encoded-by-frameTENTENC
’file-owner-frameN/ATOWN
’file-type-frameTFTTFLT
’genre-frameTCOTCON
’initial-key-frameTKETKEY
’interpreted-by-frameTP4TPE4
’isrc-frameTRCTSRC
’langs-frameTLATLAN
’length-frameTLETLEN
’lyricist-frameTXTTEXT
’media-type-frameTMTTMED
’original-album-frameTOTTOAL
’original-artist-frameTOATOPE
’original-filename-frameTOFTOFN
’original-lyricist-frameTOLTOLY
’original-release-year-frameTORTORY
’part-of-a-set-frameTPATPOS
’play-count-frameCNTPCNT
’playlist-delay-frameTDYTDLY
’pop-framePOPPOPM
’publisher-frameTPBTPUB
’recording-dates-frameTRDTRDA
’settings-frameTSSTSSE
’size-frameTSITSIZ
’station-name-frameN/ATRSN
’station-owner-frameN/ATRSO
’subtitle-frameTT3TIT3
’tag-cloud-frameXTGXTAG
’time-frameTIMTIME
’title-frameTT2TIT2
’track-frameTRKTRCK
’udt-frameTXXTXXX
’year-frameTYETYER

Character Encodings

scribbu uses iconv for character encoding. For convenience, here is the list of identifiers used to name them:

  1. European & Russian languages

    ASCII, ISO_8859_1, ISO_8859_2, ISO_8859_3, ISO_8859_4, ISO_8859_5, ISO_8859_7, ISO_8859_9, ISO_8859_10, ISO_8859_13, ISO_8859_14, ISO_8859_15, ISO_8859_16, KOI8_R, KOI8_U, KOI8_RU, CP1250, CP1251, CP1252, CP1253, CP1254, CP1257, CP850, CP866, CP1131, MacRoman, MacCentralEurope, MacIceland, MacCroatian, MacRomania, MacCyrillic, MacUkraine, MacGreek, MacTurkish, Macintosh

  2. Semitic languages ISO_8859_6, ISO_8859_8, CP1255, CP1256, CP862, MacHebrew, MacArabic
  3. Japanese EUC_JP, SHIFT_JIS, CP932, ISO_2022_JP, ISO_2022_JP_2, ISO_2022_JP_1, ISO_2022_JP_MS
  4. Chinese EUC_CN, HZ, GBK, CP936, GB18030, EUC_TW, BIG5, CP950, BIG5_HKSCS, BIG5_HKSCS_2004, BIG5_HKSCS_2001, BIG5_HKSCS_1999, ISO_2022_CN, ISO_2022_CN_EXT
  5. Korean EUC_KR, CP949, ISO_2022_KR, JOHAB
  6. Armenian ARMSCII_8
  7. Georgian Georgian_Academy, Georgian_PS
  8. Tajik KOI8_T
  9. Kazakh PT154, RK1048
  10. Thai TIS_620, CP874, MacThai
  11. Laotian MuleLao_1, CP1133
  12. Vietnamese VISCII, TCVN, CP1258
  13. Platform specifics HP_ROMAN8, NEXTSTEP
  14. Full Unicode UTF_8, UCS_2, UCS_2BE, UCS_2LE, UCS_4, UCS_4BE, UCS_4LE, UTF_16, UTF_16BE, UTF_16LE, UTF_32, UTF_32BE, UTF_32LE, UTF_7, C99, JAVA

Function Index


Index

Jump to:   <  
B   F   I   M   S   T   U  
Index EntrySection

<
<comment-frame><comment-frame>
<id3v2-tag><id3v2-tag>
<play-count-frame><play-count-frame>
<popm-frame><popm-frame>
<tag-cloud-frame><tag-cloud-frame>
<text-frame><text-frame>
<unk-frame><unk-frame>
<user-defined-text-frame><user-defined-text-frame>

B
Building a libscribbu programBuilding a libscribbu program
Building az-tagsBuilding az-tags

F
File-based ReplacementsFile-based Replacements

I
ID3v2 FramesID3v2 Frames
ID3v2 SerializationID3v2 Serialization
ID3v2 tagsID3v2 tags
Implementing az-tagsImplementing az-tags
IntroductionIntroduction
Invoking scribbu dumpInvoking scribbu dump
Invoking scribbu encodingsInvoking scribbu encodings
Invoking scribbu genreInvoking scribbu genre
Invoking scribbu m3uInvoking scribbu m3u
Invoking scribbu popmInvoking scribbu popm
Invoking scribbu renameInvoking scribbu rename
Invoking scribbu reportInvoking scribbu report
Invoking scribbu textInvoking scribbu text
Invoking scribbu xtagInvoking scribbu xtag

M
Miscellaneous FunctionsMiscellaneous Functions

S
scribbuscribbu
scribbu dump Optionsscribbu dump Options
scribbu genre Behavior on Writescribbu genre Behavior on Write
scribbu genre Optionsscribbu genre Options
scribbu m3u Optionsscribbu m3u Options
scribbu popm Optionsscribbu popm Options
scribbu rename Optionsscribbu rename Options
scribbu report Optionsscribbu report Options
scribbu text Optionsscribbu text Options
scribbu xtag Optionsscribbu xtag Options
Scripting scribbuScripting scribbu

T
Tag-Based ReplacementsTag-Based Replacements
Text EncodingText Encoding
The Scheme REPLThe Scheme REPL
The scribbu ProgramThe scribbu Program
The Unsynchronisation SchemeThe Unsynchronisation Scheme

U
Using libscribbuUsing libscribbu