The extensible tool for tagging your music collection.
This manual corresponds to scribbu version 0.6.22.
Copyright © 2018-2022 Michael Herstine <sp1ff@pobox.com>
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled “GNU Free Documentation License”.
A copy of the license is also available from the Free Software Foundation Web site at https://www.gnu.org/licenses/fdl.html.
This document was typeset with GNU Texinfo.
scribbu
Program
scribbu
libscribbu
scribbu is a C++ library & associated command-line tool for working with ID3 tags (See ID3.) It was born when I retired my last Windows machine & could no longer use Winamp (See Winamp.) to manage my library of digital music. The scribbu library offers classes & methods for reading, modifying & writing ID3v1 & ID3v2 tags. The scribbu program provides assorted sub-commands for working with ID3-tagged files (e.g. re-naming files based on their tags), but its real power lies in its embedded Scheme interpreter (See The Guile Reference Manual.) in which scribbu library features are exported as a Scheme module (on which more below).
The scribbu project has a few components. The first is a program that provides assorted sub-commands, a few of which are:
scribbu dump
See Invoking scribbu dump
. will write
the contents of any & all ID3 tags found in one or more files to
stdout.
scribbu report
See Invoking scribbu report
. will generate a report listing ID3 attributes on one or more
files on stdout. CSV, TDF & ASCII-delimited formats are supported
currently.
scribbu rename
See Invoking scribbu rename
. will rename one or more files based on the contents of their
ID3 tags; e.g. scribbu rename -t ``%A-%T.mp3'' *.mp3
will rename
all the files matching “*.mp3” to “<artist>-<title>.mp3” where
"artist" and "title" are derived from their ID3 tags (if any).
scribbu popm
See Invoking scribbu popm
. will
update ID3v2 play count & popularimeter tags. For instance,
scribbu popm foo.mp3
will increment the play counts in
“foo.mp3”.
scribbu text
See Invoking scribbu text
.
maintains assorted ID3v2 text frames; for instance, scribbu
text --artist='Roxy Music' *.mp3
will set the artist frame to “Roxy
Music” in all ID3v2 tags in all files matching “*.mp3”.
Any sub-command can be invoked with --help or -h for more information. Use --info option to display the command’s node in this manual.
The scribbu
program also exports functions & GOOPS
See GOOPS. classes to a Scheme interpreter, so
scribbu
can also be invoked...
#!/usr/local/scribbu \ -e main -s #! (define (main args) ...
Finally, scribbu contains a C++ library (libscribbu
)
See Using libscribbu
. against which one can build C++
libraries & programs.
Some background on MP3, ID3, Winamp & the genesis of this project.
Widespread digital encoding of music arrived with the introduction of the compact disk in 1982. However, the size of the resulting digital representation was large: the standard Compact Disk stored about one hour & twenty minutes worth of music in about seven hundred MiB (at a time when the typical hard drive could hold ten MiB). In 1989 the relevant standards body (the Moving Picture Experts Group, or MPEG) called for proposals for lossy audio compression algorithms. The fourteen propsals they received were eventually combined into three “layers”, each with a different set of trade-offs between quality, space, and computational complexity. “MPEG Audio Layer I” was the simplest, designed to enable real-time encoding on the hardware of the day. “MPEG Audio Layer II” provides higher quality than Layer I but offers computationally simpler decoding than Layer III. “MPEG Audio Layer III” (or MP3) provides good quality at lower bitrates than Layer II, albeit at the cost of greater computational complexity.
Layer three was primarily developed by the German company Fraunhofer
IIS. The file extension .mp3
was selected as a result of an
internal survey of researchers at Frauhofer. At a sampling rate of
128kbits/sec, MP3 needed about a megabyte per minute of music encoded;
nearly one-tenth the size of CD audio.
At one MB per minute, given the size of consumer hard drives in the nineties, home users could easily store many MP3 tracks. The format found such universal application in the portable digital music players becoming available that they came to be known as “mp3 players”. With the network bandwidths available at the time, one could conveniently transmit MP3-encoded files across the internet, and even stream them.
Typically of technological history, the application responsible for the widespread adoption of MP3 was not the application for which it was designed. Applications for audio encoded by MP3 were intitially thought to be “musical transmission over ISDN telephone lines” and “voice announcement systems for local public transport”. Instead, the medium of choice for digital music became the ‘.mp3’ file.
A problem quickly emerged: the MP3 standard included no provision for
metadata; no way to “tag” an .mp3
file with information such
as title, artist, et cetera. NamkraD (AKA Eric Kemp) is credited with
the idea of attaching such a tag to .mp3
files in
1996. Presumably to make it easy to detect & parse, while not
interfering with existing decoders, it had a fixed size of one hundred
twenty-eight bytes, and was attached to the end of the file (if a
player that was unaware of the tag played the enclosing file, at worst
the user would hear a bit of static at the end). It provided for a
thirty-byte title, artist & album along with year, comment & a
one-byte genre field. The original proposal defined eighty genres,
extended to 148 by 1.91 release of Winamp (See Winamp.) in June 1998
and to 192 by the 5.6 release of Winamp in November 2010.
The limitations of this format quickly became aparent, leading to the proposal in 1998 of ID3v2 by Martin Nilsson and several other contributors. Although it shared a name, ID3v2 was a completely different approach to tagging music: it was prepended to the audio data (making it suitable for streaming media) and it was variable-length; ID3v2 tags are comprised of multiple frames, each containing one piece of information about the music (title, artist &c).
Space-efficient, high-quality, tagged audio was no good without a ready means of listening to it. The then-existing Windows Media Player and Real Networks’ Real Player never found widespread adoption. In April 1997 Justin Frankel and Dmitry Boldyrev released Winamp, a small, performant Windows MP3 player. Frankel formed Nullsoft in January 1998. With version 1.5, Winamp changed from freeware to shareware & charged a ten dollar registration fee; far from dampening uptake, this brought in $100,000 a month from $10 paper checks in the mail from paying users. Winamp 2.0 was released in September 1998 & became one of the most downloaded Windows programs ever.
One of the things that endeared Winamp to its users was its plugin architecture. Nullsoft provided several plugins as part of the standard distribution, one of which was the Music Library. Using this, one could manage, organize, search & play a personal library of thousands of MP3 files, all based on ID3 tags (See ID3.)
Nullsoft was (in)famoulsy acquired by AOL in 1999. By 2000 Winamp had been registered twenty-five million times, but Nullsoft began to struggle with the propblems of so many AOL acquisitions. 2002 saw the misbegotten release of Winamp 3, a complete re-write that broke with the prior ethos of tight, lightweight code. Widespread incidence of users (including the author) reverting to Winamp 2 in response to poor performance & high resource demands of Winamp 3 led to Nullsoft continuing 2.x development, and eventually the release of Winamp 5 (2+3) late in 2003. From version 5.2, Winamp provided the ability to sync the user’s library with iPods, which led to many iPod owners’ (again including the author) choosing to use Winamp instead of iTunes to manage their devices.
The original Winamp team quit AOL in 2004 & development moved to Dulles (VA). Work continued, albeit at a slower pace. With the release of Winamp 5.66 in late 2013, AOL announced that winamp.com would be shutdown later that year and that the software would no longer be availble for download. It was later announced that Nullsoft (along with Shoutcast, an MP3 streaming platform) had been sold to the Belgian company Radionomy. As of the time of this writing, winamp.com is up, and offering a download of Winamp 5.8 (beta) from Radionomy.
It is a credit to Winamp that it remained usable well into the
twenty-teens as a way to mange large libraries of .mp3
files. Winamp is not quite dead, but it is stranded on an operating
system that I have left behind (along, I suspect, with many other
technically-inclined music aficionados today). The MP3 format itself
is showing its age; Fraunhofer IIS announced in 2017 that it was
ending its licensing programs for MP3. AAC is now the standard for
digital music.
And yet, I have several thousand .mp3
files in my personal
library. Since both MP3 and AAC are lossy formats, transcoding them to
AAC would not lead to good results even if I were inclined to do the
work. The original sources of many of the .mp3s
have been lost,
so re-encoding to AAC is not possible.
Perhaps scribbu
(See The scribbu
Program.) will
support AAC in the future, but it seems that MP3 & ID3 will be
relevant to my musical life for some time. I wrote this tool to
help me manage them, and I offer it to anyone else in the same
position: if you need to manage ID3-tagged .mp3
files, and
especially if you enjoy hacking in LISP and/or C++, I hope you find
scribbu
useful and enjoyable.
scribbu
Program ¶The simplest way to use scribbu is through the command-line tool. For
the scribbu
command itself, as well as all scribbu
subcommands, the -h flag will produce a brief help message on
stdout
, and the --help will display the corresponding
man page. You can get a list of all the sub-commands scribbu
provides by saying scribbu -h
. You can display a given
sub-command’s node in this Info manual by saying scribbu CMD
--info
.
Display this manual by saying scribbu --info
.
scribbu dump
scribbu encodings
scribbu genre
scribbu m3u
scribbu report
scribbu rename
scribbu popm
scribbu text
Let us suppose we have a few ‘.mp3’ files which we have just
downloaded, or have encoded some time ago & forgotten
about. Regardless, we want to examine & update their tags before
adding them to our library. The following chapters demonstrate this
using scribbu
sub-commands.
The simplest place to start is scribbu dump
. This will show
us what is in the tags:
$>: scribbu dump *.mp3 "lorca.mp3": ID3v2.3(.0) Tag: 452951 bytes, synchronised flags: 0x00 The Pogues - Lorca's Novena Hell's Ditch [Expanded] (US Version) (track 5), 1990 Content-type Pop TIT2: Lorca's Novena TPE1: The Pogues TALB: Hell's Ditch [Expanded] (US Version) TCON: Pop TCOM: TPE3: TRCK: 5 TYER: 1990 TPE2: The Pogues COMM (<no description>): Amazon.com Song ID: 203558254 TCOP: 2004 Warner Music UK Ltd. TPOS: 1 frame APIC (115554 bytes) frame PRIV (1122 bytes) 335921 bytes of padding 9425708 bytes of track data: MD5: 48ff9cadea7d842e9059db25159d2daa ID3v1.1: The Pogues - Lorca's Novena Hell's Ditch [Expanded] (US Ve (track 5), 1990 Amazon.com Song ID: 20355825 unknown genre 255 "opium.mp3": ID3v2.3(.0) Tag: 2038 bytes, synchronised flags: 0x00 Stephan Luke - Opium Chant Intro Opium Gardens (track 1), 2003 Content-type General Club Dance Encoded by Winamp 5.552 TENC: Winamp 5.552 TRCK: 1 COMM (<no description>): Ripped by Winamp on Pimpernel TPUB: Opium Music TPOS: 1/1 TYER: 2003 TCON: General Club Dance TALB: Opium Gardens TPE2: Opium Garden TPE1: Stephan Luke UFID: http://www.cddb.com/id3/taginfo1.html 334344334e33395235383037313937335532343836364232394336314239364139333341424332363945364531454642444233445032 TIT2: Opium Chant Intro 1522 bytes of padding 1191874 bytes of track data: MD5: 690194f49592c7d8ccfbfe8a157d4c1e ID3v1.1: Stephan Luke - Opium Chant Intro Opium Gardens (track 1), 2003 Ripped by Winamp on Pimperne unknown genre 255 "orlando.mp3": ID3v2.3(.0) Tag: 607 bytes, synchronised flags: 0x40 Bill LeFaive - Orlando http://music.download.com (track 1), <no year> TALB: http://music.download.com TIT2: Orlando TIT3: http://music.download.com/ TPE1: Bill LeFaive TCOM: Bill LeFaive frame WOAF (1 bytes) frame WPUB (27 bytes) frame WXXX (54 bytes) TRCK: 1 TCOP: 2006 Bill LeFaive TPUB: http://music.download.com 256 bytes of padding 6549838 bytes of track data: MD5: dd0c70e13d4aeec676f8d7a7bda622b0 ID3v1.1: Bill LeFaive - Orlando http://music.download.com (track 1), 2004 http://music.download.com/ unknown genre 255
We see we have three files, all of which have both ID3v1 & ID3v2 tags. The output also contains some basic information on the MP3 data in between the tags.
scribbu dump
takes as arguments one or more files and/or
directories, and prints information about all files listed, or in the
directories named, recursively. With no options, scribbu
dump
will dump everything it understands. With options, the output
can be scoped in various ways (e.g. ID3v2 tags only, ID3v1 tags only,
track data only, among other options; refer to the man page for a
complete list (or See Invoking scribbu dump
.))
Another way to investigate files, especially large numbers of files,
is scribbu report
:
$>: scribbu report -o myfiles.csv *.mp3 $>: cat myfiles.csv directory,file,file size(MB),ID3v2 version,ID3v2 revision,ID3v2 size(bytes),ID3v2 flags,ID3v2 unsync,ID3v2 Artist,ID3v2 Title,ID3v2 Album,ID3v2 Content Type,ID3v2 Encoded By,ID3v2 Year,ID3v2 Langauges,# ID3v2 play count frames,Play Count,# ID3v2 comment frames,comment #0 text,comment #1 text,comment #2 text,comment #3 text,comment #4 text,comment #5 text,size (bytes),MD5,has ID3v1.1,has ID3v1 extended,ID3v1 Artist,ID3v1 Title,ID3v1 Album,ID3v1 Year,ID3v1 Comment,ID3v1 Genrre "/tmp/tut","lorca.mp3",9.421,3,0,452951,0x00,0,The Pogues,Lorca's Novena,Hell's Ditch [Expanded] (US Version),Pop,,1990,,0,,1,Amazon.com Song ID: 203558254,,,,,,9425708,48ff9cadea7d842e9059db25159d2daa,1,0,The Pogues,Lorca's Novena,Hell's Ditch [Expanded] (US Ve,1990,Amazon.com Song ID: 20355825,255 "/tmp/tut","opium.mp3",1.139,3,0,2038,0x00,0,Stephan Luke,Opium Chant Intro,Opium Gardens,General Club Dance,Winamp 5.552,2003,,0,,1,Ripped by Winamp on Pimpernel,,,,,,1191874,690194f49592c7d8ccfbfe8a157d4c1e,1,0,Stephan Luke,Opium Chant Intro,Opium Gardens,2003,Ripped by Winamp on Pimperne,255 "/tmp/tut","orlando.mp3",6.247,3,0,607,0x40,0,Bill LeFaive,Orlando,http://music.download.com,,,,,0,,0,,,,,,,6549838,dd0c70e13d4aeec676f8d7a7bda622b0,1,0,Bill LeFaive,Orlando,http://music.download.com,2004,http://music.download.com/,255
Running scribbu report
(See Invoking scribbu report
.) with an option of -o <name>.csv will produce an
RFC-4180-compliant comma separated variable file reporting on the
files given on the command line. Option -t will instead
produce tab-delimited data. scribbu
itself provides little
beyond this in terms of reporting, the idea being that CSV or TDF
output can be readily imported into other programs better suited to
that task.
That said, one can do some basic querying at the command line, for
which tab-delimited format can be convenient. For example, this
little awk
program will show the ID3v2 version for
each file:
$>: scribbu report -t -o myfiles.tdf *.mp3 $>: cat myfiles.tdf | awk 'BEGIN {FS="\t"}; {print $2, $4}' file ID3v2 version "lorca.mp3" 3 "opium.mp3" 3 "orlando.mp3" 3
We notice that none of these files contain a PCNT
or
a POPM
frame; let’s add them now:
$>: scribbu popm -a -o foo@bar.com -r oooo *.mp3 $>: scribbu dump *.mp3 ... PCNT: 0 POPM: foo@bar.com rating: 179 counter: 00 ...
The popm
command can be used to manage both PCNT & POPM frames
(See Invoking scribbu popm
.) The -a flag
indicates that we want to create the relevant frames, if they don’t
already exist. -r sets the rating for the POPM frame; you can
provide an integer between 0 and 255, or use the “star” system. In
this case, I’ve given all the files four stars. “****” would be more
mnemonic, but inconvenient in the shell, so scribbu
will
recognize almost any character, repeated one to five times, as a
“star”.
Having created the PCNT and/or POPM frames, one can update the play counts with a simple command; e.g.
scribbu popm opium.mp3
will increment the play count by one in any PCNT
or POPM
frames it finds. The operation can be scoped or modified in a number
of ways, such as limiting it to only one or the other, or only to
POPM
frames with a certain owner– see the man page or
See Invoking scribbu popm
.for full details. The intent of
the command is to enable players that don’t update the playcount or
set ratings themselves, but which can be scripted or extended in some
way, to do so.
Finally, let us re-name the files based on their ID3 tags:
scribbu rename *.mp3
Will by default rename each file to “<artist> - <title>.mp3” with
<artist>
& <title>
each derived from the corresponding
ID3v2 frame (See Invoking scribbu rename
.) This can be
customized by providing a “template”: text interspersed with
replacement parameters to be filled in with tag contents. The
parameters begin with a ‘%’, and each parameter has a one-character
“short” form and a more descriptive “long” form. For instance,
“artist” can be represented as either %A or %(artist). So the
default template could be expressed as “%A - %T.mp3” or “%(artist)
- %(title).mp3”.
If the long form is used, the action of the replacement parameter may
optionally be modified by options given after the parameter name and a
colon in “query-style”: opt0&opt1&opt2...
where opti
is in the form name=value
or just name
. For instance, we
wanted the artist to always be taken from the ID3v1 tag, and that
field happens to use the ISO-8859-1 character encoding, we could
say:
%(artist:v1-only&v1-encoding=iso-8859-1)
Let us accept the default settings, but see what would happen without actually re-naming anything:
scribbu rename -n *.mp3 "lorca.mp3" => "Pogues, The - Lorca's Novena.mp3" "opium.mp3" => "Stephan Luke - Opium Chant Intro.mp3" "orlando.mp3" => "Bill LeFaive - Orlando.mp3"
Before we rename the files, there is a lot more hygiene that could be carried out. “lorca.mp3” has a number of empty text frames that should be removed, “opium.mp3” has a comment frame with no owner, and the ID3v1 genre in all three is set to “255”.
As I developed scribbu, and began using it to manage my personal music
collection, it became clear that providing a sub-command for every
conceivable operation was not feasible. Furthermore, many of the
things I wanted it to do were one-off tasks pertinent to a single
file, or a handful of files, that weren’t worth formally coding up as
sub-commands. What I really wanted was a way to “script”
libscribbu
(See Using libscribbu
.) I found my
solution in Guile (See The Guile Reference Manual.),
which is the topic of the next chapter.
scribbu dump
¶scribbu dump
will walk each file and/or directory specified
(recursing directories), read each file found, look for ID3 tags
therein, and pretty-print what it finds to stdout
.
scribbu dump
Options ¶scribbu dump
accepts the following options:
arg
spaces
standard
, which is the default, pretty-prints the selectd
portions of each file to stdout
with one attribute (artist,
title &c) per line. A format of csv
will print one line in CSV
format per file.
The precise columns output in csv
format will depend on the
other options, but for ID3v2 tags include:
version, revision, size, flags, unsynchronised, artist, title, album, genre, encoded by, year, languages, play count, comments
for track data: size & MD5 checksum
for ID3v1 tags: v1.1, extended, artist, title, album, year, comment & genre.
CP1252
by default). See See Character Encodings. for a
complete list of the encodings supported and their identifiers.
scribbu encodings
¶Several scribbu commands accept character encodings as options. It is not always clear what encodings are supported or how to specify them textually. This sub-command will print the list of supported character encodings along with their names for the user’s convenience.
That list is reproduced here:
scribbu genre
¶scribbu genre
will set the genre (i.e. the TCON frame for ID3v2
tags and the genre byte for ID3v1) for all tags in all files named on
the command line. If an argument is a file, operate on the tags in
that file. If the argument is a directory, operate recursively on all
files containing ID3 tags therein.
The genre can be specified in a few ways:
scribbu genre -w N
will interpret N
as one of the genres defined by Winamp
See Winamp, specified as an integer between 0 & 191
(inclusive). Run scribbu genre -W
to print a list of the Winamp
genres.
scribbu genre -g GENRE
will attempt to map to map the string GENRE
to one of the
Winamp genres using Damerau-Levenshtein distance, but disregarding
case. For instance scribbu genre -g rok
will be interpreted as
Winamp genre number seventeen "Rock".
scribbu genre -G GENRE
will accept GENRE uncritically as the TCON to be used for ID3v2
tags. ID3v1 tags, if present, will have their genre field mapped to
one of the Winamp values again by case-insensitive Damerau-Levenshtein
distance (or just set to 255 if that fails). To explicitly set the
ID3v1 version when specifying genre in this way, add the --v1
flag (e.g. scribbu genre -G foo --v1 17
).
This brings up the question of what to do if there is no ID3v1 and/or no ID3v2 tag(s) in a given file. By default, in the absence of a tag, nothing will be done (so if invoked, for instance, on a file with neither an ID3v1 nor an ID3v2 tag, this sub-command would do nothing). This behavior can be customized in two ways. If --create-v2 or --create-v1 are given, an ID3v2 (ID3v1, resp.) tag will be created for any file which already possess an ID3v1 (ID3v2, resp.) tag. If --always-create-v2 or --always-create-v1 are given, an ID3v2 (ID3v1, resp.) tag will always be created if it doesn’t exist.
scribbu genre
Behavior on Write ¶If the command didn’t change the tagset for a given file, that file will not be modified. Typically however, the tagset will have been changed, and we need to write out the new tagset. Since ID3v1 tags are fixed-size blocks appended to the file, writing them out is trivial.
The default behavior for ID3v2 tags is to first try to emplace them; that is, write them over the current set of ID3v2 tags without touching track data, adjusting the padding if needed. If that is impossible (i.e. the new tagset can’t be fit into the space occupied by the old, even when adjusting padding) a full copy will be made by writing the new tagset to a temporary file, copying the track data from the extant file to the temporary file, and finally appending the ID3v1 tag, if any, to the temporary file. Only then is the temporary file renamed ot the original (which is hopefully atomic).
This behavior can be modified by the --create-backups
option See scribbu genre
Options, which will create a backup
of the original file before renaming the temporary file.
scribbu genre
Options ¶scribbu genre
Behavior on Write
GENRE
as the textual
name of one of the 192 Winamp-defined genres ("Rock", e.g.); if
GENRE
doesn’t exactly match case-insensitively one of the
Winamp genres, it it will be matched to the official list by minimal
Damerau-Levenshtein distance (again without regard to case). The genre
field for ID3v1 tags will be the corresponding numeric value
See also --list-winamp-genres.
GENRE
as the verbatim
text to be used for ID3v2 TCON frames (i.e. no matching to the
Winamp-defined list will be done). The value for the genre field in
ID3v1 tags will however be determined by the closest match to
the Winamp-defined list. See also --v1 below for how to turn
off that behavior.
scribbu
can determine that (the
environment variables SCRIBBU_PAGER
& then PAGER
are
checked first, then any program named less
on PATH
will
be used. See also --no-pager.
stdout
scribbu m3u
¶scribbu m3u
will walk each file and/or directory specified
(recursing directories) and print extended M3U entries for each. By
default, print EXTM3U entries to stdout for all files named (directly
or indirectly) on the command line. If an argument is a directory,
this command will operate recursively on all files therein (the order
of traversal is unspecified).
Each extended M3U entry takes the form:
# EXINF:<duration-in-seconds>,<display title> <path-to-file>
The display title will be "Artist - Title" if those two items can be derived from ID3 tags; otherwise the file basename will be used. The text forming the artist & title tags will be assumed to be in the system locale’s encoding. To override this, specify the -s flag (for source encoding).
The entry’s path will be relative or absolute, according to the argument (i.e. specifying an absolute path to a file or directory will produce absolute paths, a relative path to a file or directory relative paths in the output).
When writing entries to stdout, all text will be written in the system locale’s text encoding.
If the -o option is given, the output will be written to the file named as the option’s value. By convention, files ending in .m3u8 are UTF-8 encoded and files ending in .m3u are written in an unspecified encoding. Given that M3U is a de facto standard, scribbu does not enforce this (or any other naming convention).
In this case, a new file will be created with the #EXTM3U header line, unless the -a (append) flag is given, in which case the output will be appended to the (presumably existing) file.
By default, output will be in the system locale’s text encoding. To force UTF-8 output, specify the -8 option.
So, for instance, if your system locale’s encoding is ISO-8859-1, and your tags are written in, say, Windows Code Page 1251, but you would like an M3U playlist in UTF-8 format, say:
scribbu m3u -s CP1251 -o test.m3u8 -8 some-directory/
scribbu m3u
Options ¶scribbu encodings
for a list of names for all support ed encodings.
scribbu report
¶scribbu report
will walk each file and/or directory specfied
(recursing directories), read each file found, look for ID3 tags
therein, and generate a report on their contents. The idea is to use
scribbu to do the work of scanning the tags in combination with some
other tool better suited to querying & reporting. Consequently,
filtering mechanisms are minimal, and the output formats (CSV or TDF)
are chosen to facilitate transformation to other formats as well as
import by other tools.
scribbu report
Options ¶CP1252
scribbu rename
¶scribbu rename
will walk each file and/or directory specified
(recursively, in the case of directories) and rename each ID3-tagged
file found according to its tag(s). By default, each ID3-tagged file
will be renamed to “<artist> - <title>.<extension>” (where
<artist>
& <title>
are derived from the file’s ID3
tags), but this can be heavily customized by specifying a naming
“template” made up of a mixture of text and replacement parameters
(such as artist, title, album &c).
Replacement parameters begin with a %
character (percent
characters that do not begin a replacement parameter may be
escaped with a backslash). Each replacement parameter has a
one-character “short” form as well as a “long-form” name. For
example, the artist replacement can be represented as either
%A
or as %(artist)
.
When the long form is used, the action of replacement may optionally
be modified by giving options after a colon. The options take the form
opt0&opt1&opt2&...
where opti
is of the form
name=value
, or just name
. So to continue the above
example, if we wanted the artist name to instead be derived from the
ID3v1 tag, and that field was encoded as ISO-8859-1, we would say:
%(artist:v1-only&v1-encoding=iso-8859-1)
See See Tag-Based Replacements. for a complete list of replacement parameters & their options.
scribbu rename
Options ¶Tag-based replacement parameters:
Content | Short-Form | Long-Form |
---|---|---|
album | L | albim |
artist | A | artist |
content type | G | content-type,genre |
encoded by | e | encoded-by |
title | T | title |
year | Y | year |
Tag-based replacement parameters take the following options:
v1-encoding=
...
the=
...
cap=
...
compress
can be given (to merge space
between words to a single space) or ws=TEXT
can be given to replace
whitespace (e.g. if ws=_
were given, “a b” would become “a_b”.
Lastly, the year can be formatted as two digits or four by giving
“yy” or “yyyy” in the options for %(year)
.
E.g. %(artist:prefer-v2&v1-encoding=cp1252&the=suffix&compress)
applied to a file whose ID3v2 tag had an artist frame of "The Pogues"
would produce "Pogues, The".
There are a few more replacement parameters based on the file itself:
b,basename
: The file basename
E,extension
: The file extension (including the dot)
Both of these take the same “The,”, capitalization & whitespace options as See Tag-Based Replacements.
5,md5
: the MD5 checksum of the file’s audio data
S,size
: the file size, in bytes
Both of these take the following options:
base=(decimal|hex)
: specify the radix for the numbers
hex-case=(U|L)
: case to use for hexidecimal numbers
scribbu popm
¶scribbu popm
creates or updates play count and/or
popularimeter frames. With no options, it increments the counter fields in
every play count and/or popularimeter frame in every tag by one. With the
--create-frame flag, create the relevant frames in each tag.
Popularimeter frames will not be created in the absence of the
--owner option. Play count & popularimeter frame creation
can be inhibited via the --popularimeter-only and
--playcount-only flags, respectively.
The popularimeter rating field can be set using the
--rating option. Ratings can be specified explicitly
as an integer between 0 & 255, or as one-to-five stars. “Stars”
would most naturally be expressed as *
s (asterisks), but
since this will often be inconvenient in the shell, scribbu
will accept almost any character, repeated one-to-five times.
scribbu popm
Options ¶COUNT
instead of incrementing
INCR
, instead of by one.
OWNER
will be updated. When creating popularimeter frames,
the owner field will be set to OWNER
.
RATING
may be given either as an integer
between 0 & 255 (inclusive) or as one-to-five “stars”, given as
[a-zA-Z@#%*+]{1,5}
e.g. three stars could be expressed as
“xxx” or “###” or “***”.
scribbu text
¶scribbu text
will create, udpate & delete various ID3v2 text
frames & ID3v1 tag fields.
scribbu text
Options ¶scribbu
¶The set of sub-commands sribbu
offers, or could offer, is
small in comparison to the number of operations one could possibly
hope to carry out in managing ID3 tags. Sooner or later (likely sooner)
you will want to do something you can’t accomplish via a sub-command.
For that reason, the bulk of the work on scribbu
has been
exposing the library’s functionality to a first-class language like
LISP (See The Guile Reference Manual.), to enable
scribbu
users to build their own solutions.
This chapter begins by demonstrating how to use the interactive Scheme
REPL to explore solutions, then demonstrates building Scheme programs
using scribbu
, and finishes with some references.
At the end of “scribbu rename” (See scribbu rename,) there were a
number of tag hygiene issues to be cleaned up. Let us begin
experimenting with solutions. Invoking scribbu
with no
arguments at all will start the Scheme REPL:
$>: scribbu scribbu 0.5 Copyright (C) 2017-2019 Michael Herstine <sp1ff@pobox.com> You are in the Guile REPL; in your shell, type `info scribbu' for documentation. GNU Guile 2.2.2 Copyright (C) 1995-2017 Free Software Foundation, Inc. Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'. This program is free software, and you are welcome to redistribute it under certain conditions; type `,show c' for details. Enter `,help' for help. scheme@(guile-user)>
You are now at the Scheme prompt (“scheme” refers to the language currently in use and “guile-user” refers to the current module). You can type Scheme statements & have your statements evaluated:
scheme@(guile-user)> (format #t "Hello, world!") Hello, world!$1 = #t scheme@(guile-user)> (define x 1) scheme@(guile-user)> (set! x (+ x 1)) scheme@(guile-user)> x $2 = 2 scheme@(guile-user)> (if (> x 1) (format #t "Yes!")) Yes!$3 = # scheme@(guile-user)>
scribbu
exports assorted types & functions for working with ID3
tags to the Guile interpreter. Let’s take a look at that owner-less
comment frame. We begin by reading in the ID3v2 tagset:
scheme@(guile-user)> (use-modules (oop goops) (scribbu)) scheme@(guile-user)> (define tags (read-tagset "opium.mp3")) scheme@(guile-user)> tags $4 = ((#<<id3v2-tag> 1ccf780> 3 #f))
read-tagset
returns a list of three-tuples, one for each ID3v2
tag present in its argument. Since “opium.mp3” has only one ID3v2
tag, the list has only one element. The triplet consists of an
<id3v2-tag>
instance, the ID3v2 version (“3” in this case)
and a boolean indicating whether the unsynchronisation flag is set (it
is not). Let’s examine the tag:
scheme@(guile-user)> (define tag (caar tags)) scheme@(guile-user)> tag $5 = #<<id3v2-tag> 1ccf780> scheme@(guile-user)> (let ((frames (slot-ref tag 'frames)) (i 0)) (while (> (length frames) 0) (format #t "~d: ~a\n" i (slot-ref (car frames) 'id)) (set! i (+ i 1)) (set! frames (cdr frames)))) 0: encoded-by-frame 1: track-frame 2: comment-frame 3: publisher-frame 4: part-of-a-set-frame 5: year-frame 6: genre-frame 7: album-frame 8: band-frame 9: artist-frame 10: unknown-frame 11: title-frame 12: play-count-frame 13: pop-frame $6 = #f
We see that tag
is an instance of the GOOPS class
<id3v2-tag>
, and that it has 14 frames. Frame two (counting
from zero) is that comment frame:
scheme@(guile-user)> (slot-ref (list-ref (slot-ref tag 'frames) 2) 'dsc) $7 = ""
As expected, the description field is an empty string– let’s fix that:
scheme@(guile-user)> (slot-set! (list-ref (slot-ref tag 'frames) 2) 'dsc "sp1ff@pobox.com") $8 = "sp1ff@pobox.com" # check scheme@(guile-user)> (slot-ref (list-ref (slot-ref tag 'frames) 2) 'dsc) $9 = "sp1ff@pobox.com"
Now what about that ID3v1 genre?
scheme@(guile-user)> (define v1 (read-id3v1-tag "opium.mp3")) scheme@(guile-user)> v1 $10 = #<<id3v1-tag> 18fb8c0> scheme@(guile-user)> (slot-ref v1 'genre) $11 = 255
Let’s set that to “Lounge”– the Winamp genre list sets that to 171:
scheme@(guile-user)> (slot-set! v1 'genre 171) $12 = 171
What remains is writing out our modifications to their respective
tags. We could do this directly in the REPL, but let’s capture
our work in the form of a program (See Writing Scheme Programs with scribbu
.)
scribbu
¶scribbu
understands both its own command-line parameters as
well as those understood by the guile
command. When it sees
parameters applicable to guile
, it will collect them and
pass them on to the Scheme interpreter (when this makes sense, of
course; supplying guile
options while invoking a
scribbu
sub-command, for instance, would make no sense &
results in an error). This means that scribbu
can take
advantage of the guile
scripting options (See Guile
Scripting in The Guile Reference Manual.)
Continuing our example, let us capture our work so far:
#!/usr/local/bin/scribbu -e main -s #! (use-modules (oop goops) (scribbu)) (define (main) (let* ((tags (read-tagset "opium.mp3")) (v1 (read-id3v1-tag "opium.mp3")) (tag (caar tags))) (slot-set! (list-ref (slot-ref tag 'frames) 2) 'dsc "sp1ff@pobox.com") (slot-set! v1 'genre 171)))
This Scheme program of course does nothing; it corrects the orphaned comment frame as well as the ID3v1 genre, but only in-memory. Let us write these out to disk. Writing out the ID3v1 is simpler since it’s a fixed size, so we’ll start with that:
#!/usr/local/bin/scribbu -e main -s #! (use-modules (oop goops) (scribbu)) (define (main) (let* ((tags (read-tagset "opium.mp3")) (v1 (read-id3v1-tag "opium.mp3")) (tag (caar tags))) (slot-set! (list-ref (slot-ref tag 'frames) 2) 'dsc "sp1ff@pobox.com") (slot-set! v1 'genre 171) (write-id3v1-tag v1 "optimum.mp3")))
Writing an ID3v1 tag is also easier because it is appended to the
file. NB. v1
may be written as an ID3v1, ID3v1.1 and/or an
ID3v1 enhanced tag, depending on the precise contents of v1
See ID3v1 tags.
Writing ID3v2 tagsets is more complicated, since their size can
vary. write-tagset
can either make a wholesale copy of the
file, or attempt to emplace the new tagset at the beginning of the
extant file (which is the default):
#!/usr/local/bin/scribbu -e main -s #! (use-modules (oop goops) (scribbu)) (define (main) (let* ((tags (read-tagset "opium.mp3")) (v1 (read-id3v1-tag "opium.mp3")) (tag (caar tags))) (slot-set! (list-ref (slot-ref tag 'frames) 2) 'dsc "sp1ff@pobox.com") (slot-set! v1 'genre 171) (write-id3v1-tag v1 "optimum.mp3") (write-tagset (list (list tag 3)) "opium.mp3")))
References:
The orginal ID3v1 tag contained title, artist, album, year, comment & genre. The fields are fixed-size (30, 30, 30, 4, 30 & 1 byte, respectively). The original proposal called for filling out the fields with nil (zero) values, but that is not universally implemented (Winamp See Winamp, for instance, pads fields out with ASCII spaces (i.e. 32 = 0x20)).
Michael Mutschier observed that if the fields were zero-padded, an implementation will likely stop on reading the first nil. Therefore, if the second-to-last byte of a field is nil, a one-byte value could be stored in the last field. He proposed storing the track number in the last byte of the comment field. This became known as ID3v1.1.
A thirty-byte limit quickly became constraining, leading to the ID3v1 “enhanced” specification. The origins of the proposal are unclear to me, but the proposal itself involves prepending a second two-hundred twenty-seven byte block to the ID3v1 block. This would extend the title, artist & album fields by sixty bytes each, adds a thirty-byte free-form genre field, and introduces start-time, end-time, and “speed” fields.
scribbu
represents the ID3v1 tag by the GOOPS See GOOPS. class <id3v1-tag>
:
(define-class <id3v1-tag> () (title #:init-value "" #:accessor title #:init-keyword #:title) (artist #:init-value "" #:accessor artist #:init-keyword #:artist) (album #:init-value "" #:accessor album #:init-keyword #:album) (year #:init-value '() #:accessor year #:init-keyword #:year) (comment #:init-value "" #:accessor comment #:init-keyword #:comment) (genre #:init-value 255 #:accessor genre #:init-keyword #:genre) (track-no #:init-value '() #:accessor track-no #:init-keyword #:track-no) (enh-genre #:init-value '() #:accessor enh-genre #:init-keyword #:enh-genre) (speed #:init-value '() #:accessor speed #:init-keyword #:speed) (start-time #:init-value '() #:accessor start-time #:init-keyword #:start-time) (end-time #:init-value '() #:accessor end-time #:init-keyword #:end-time))
The class’ fields include the union of all ID3v1, ID3v1.1 and ID3v1
enhanced fields. All fields above & beyond those present in ID3v1
however, have a default alue of '()
(or nil, in
Scheme). Whether a given <id3v1-tag>
instance is ID3v1,
ID3v1.1, and/or ID3v1 enchanced is implicitly determined by whether
any of these fields are non-nil.
One can create an <id3v1-tag>
instance directly, like
any GOOPS class:
(use-modules (oop goops)) (define tag (make <id31-tag> #:title "The Body of an American" #:artist "Pogues, The" #:album "Poguetry in Motion" #:year "1986" #:genre 88))
One can also create an instance from an existing tag on disk:
(use-modules (scribbu)) (define tag (read-id3v1-tag "foo.mp3")) (format #t "~s - ~s\n" (slot-ref tag #:artist) (slot-ref tag #:title))
<id3v1-tag
> instances can be written to disk via
write-id3v1-tag
: (write-id3v1-tag tag "bar.mp3")
. The
format in which an <id3v1-tag
) will be written depends upon the
optional fields. If #:track-no
is non-nil (i.e. not equal to
'()
) it will be written as an ID3v1.1 tag. If any of the title,
artist or album slotes are longer than thirty characters, or any of
the new fields (enhanced genreo, speed, start-time or end-time) are
non-nil, it will be written as an ID3v1 enhanced tag.
The various flavors of ID3v1 tags See ID3v1 tags. had obvious limitations, leading to the introduction in 1998 of ID3v2 (by Mrtin Nilsson, Michael Mutschler et al.). Despite the name, this format has nothing to do with ID3v1. ID3v2 tags are much more complex. The tags are pre-pended to the files they describe. They are comprised of one or more frames, each of which contains one piece of information. There is provision for padding appended to the tag, to permit subsequent augmentation of the tag without having to re-write the entire file. ID3v1 tags suffered from the fact that they encoded text as ASCII (ISO-8859-1, at most): ID3v2 carried the encoding scheme along with textual information.
Furthermore, there are three versions of the ID3v2 spec that saw general use:
MPEG decoding software uses a two-byte sentinel value in the input stream to detect the beginning of the audio. MPEG decoding software that is not ID3-aware could mistakenly interpret that value as the beginning of the audio should it happen to occur in an ID3v2 tag. Unsynchronisation is an optional encoding scheme for the ID3v2 tag to prevent that. "Unsynchronisation may only be made with MPEG 2 layer I, II and III and MPEG 2.5 files." http://id3.org/id3v2-00
More specifically, whenever a two byte combination of the form:
11111111 111xxxxx
(i.e. 0xFF 0xEx
or 0xFF 0xFx
) is encountered in an ID3v2
tag to be written to disk, it is replaced with:
11111111 00000000 111xxxxx
and the unsynchronisation
flag will be set.
This leaves us with an ambiguous situation on read: if we encounter a bit pattern
11111111 00000000 111xxxxx
when reading a tag with the unsynchronisation flag set, we have no way to
know whether that was a false sync that was unsynchronised (and so the three
bytes should be interpreted as 11111111 111xxxxx
or whether those three
bytes had occurred naturally in the tag when it was written. To resolve
this, on applying unsynchronisation all two-byte sequences of the form
$FF 00
should also be written as $FF 00 00
.
ID3v2.4 introduced unsynchronisation at a frame level; the unsynchronisation flag in the header being set indicates that all frames are unsynchronised; unset in the header means that at least one frame is *not* unsynchronised.
Note that since the point of unsynchronisation is to avoid presenting a false sync point to the MPEG decoding software, unsynchronisation should be employed last, after any compression or encryption.
All ID3v2 frames subclass GOOPS class <id3v2-frame>
:
(define-class <id3v2-frame> () (id #:init-value 'unknown-frame #:accessor id #:init-keyword #:id) (tap #:init-value '() #:accessor tap #:init-keyword #:tap) (fap #:init-value '() #:accessor fap #:init-keyword #:fap) (ro #:init-value '() #:accessor ro #:init-keyword #:ro) (unsync #:init-value '() #:accessor unsync #:init-keyword #:unsync))
id
is a symbol naming the frame See Frame Identifiers. The remaining
four fields are frame flags that can be either true (#t
), false
(#f
) or just left undefined ('()
):
tap
Tag Alter Preserve “This flag tells the software what to
do with this frame if it is unknown and the tag is altered in any
way. This applies to all kinds of alterations, including adding more
padding and reordering the frames.” Sec 3.3.1
fap
File Alter Preserve “This flag tells the software what to
do with this frame if it is unknown and the file, excluding the tag,
is altered. This does not apply when the audio is completely replaced
with other audio data.” Sec 3.3.1
ro
Read Only “This flag, if set, tells the software that the
contents of this frame is intended to be read only. Changing the
contents might break something, e.g. a signature. If the contents are
changed, without knowledge in why the frame was flagged read only and
without taking the proper means to compensate, e.g. recalculating the
signature, the bit should be cleared.” Sec 3.3.1
unsync
Unsynchronisation
In ID3v2.2 & ID3v2.3, a value of #t
for this flag indicates that
the unsynchronisation scheme See The Unsynchronisation Scheme, has been
applied to this tag. In ID3v2.4, it indicates that it has been applied to
all frames.
Module scribbu
defines a few <id3v2-frame>
sub-classes.
<text-frame>
¶A great many ID3v2 frames represent textual information (title, artist
&c) and are represented in a uniform way, distinguished only by frame
identifer. scribbu
represents such frames as instances of
<id3v2-frame>
:
(define-class <text-frame> (<id3v2-frame>) (text #:init-value "" #:accessor text #:init-keyword #:text))
<comment-frame>
¶<comment-frame>
encodes the COM
& COMM
(comment) frames. #:lang
is a three-letter ISO-639-2 language
code. The #:dsc
fields is described in the specification as a
“short content description”.
(define-class <comment-frame> (<id3v2-frame>) (lang #:init-value "eng" #:accessor lang #:init-keyword #:lang) (dsc #:init-value "" #:accessor dsc #:init-keyword #:dsc) (text #:init-value "" #:accessor text #:init-keyword #:text))
<user-defined-text-frame>
¶<user-defined-text-frame>
encodes the TXX
&
TXXX
(user-defined text) frames. The #:dsc
fields is
a description of the textual information & #:text
is the
information itself. There may be multiple user-defined text frames in
a tag, but only one with a given description. Cf. section 4.2.2 of the
ID3v2 spec.
(define-class <user-defined-text-frame> (<id3v2-frame>) (dsc #:init-value "" #:accessor dsc #:init-keyword #:dsc) (text #:init-value "" #:accessor text #:init-keyword #:text))
<play-count-frame>
¶<play-count-frame
encodes the CNT
& PCNT
(play
count) frames. The #:count
field is simply a counter recording
the number of times the file has been played See scribbu popm. There may be only one <play-count-frame
frame in a
tag. Cf. section 4.17 of the ID3v2 spec.
(define-class <play-count-frame> (<id3v2-frame>) (count #:init-value 0 #:accessor count #:init-keyword #:count))
<popm-frame>
¶<pop-frame>
encodes the POP
& POPM
(popularimeter)
frames. <pop-frame>
combines an eight-bit rating field with a
<play-count-frame>
-style play count. Unlike <play-count-frame>
,
there may be multiple <pop-frame>
frames because each is
tagged with the e-mail address of the author.
(define-class <pop-frame> (<id3v2-frame>) (e-mail #:init-value "" #:accessor e-mail #:init-keyword #:e-mail) (rating #:init-value 0 #:accessor rating #:init-keyword #:rating) (count #:init-value 0 #:accessor count #:init-keyword #:count))
<unk-frame>
¶Frames about which scribbu
does not know may be encoded as
<unk-frame>
instances:
(define-class <unk-frame> (<id3v2-frame>) (id-text #:init-value "" #:accessor frameid #:init-keyword #:frameid) (data #:init-value #vu8() #:accessor data #:init-keyword #:data))
The data
field will contain everything beyond the ID3v2 header;
i.e. the frame identifier & flags will have been parsed out.
<id3v2-tag>
¶The scribbu
ID3v2 tag abstraction doesn’t try to model the various
versions of the ID3v2 spec. Rather, it encodes a “generic” ID3v2 tag; the
version to which it shall be serialized is specified at write time, and
the version from which it was deserialized is returned at read time
(See ID3v2 Serialization.)
(define-class <id3v2-tag> () (experimental #:init-value '() #:accessor experimental #:init-keyword experimental) (frames #:init-value '() #:accessor frames #:init-keyword #:frames) (padding #:init-value 0 #:accessor padding #:init-keyword #:padding))
While you can of course create an <id3v2-tag>
instance “from
scratch” (in-memory, as a result of a call to (make <id3v2-tag>
...)
you will more frequently be reading them from files on
disk.
The function for doing this is read-tagset
. The name is intended
as a reminder that a file can have multiple ID3v2 tags, so you are in
general reading a tag set, not just a tag.
scheme@(guile-user)> (define tags (read-tagset "opium.mp3")) scheme@(guile-user)> tags $1 = ((#<<id3v2-tag> 56188c2a3d80> 3 #f))
read-tagset
returns a list of three-tuples, one tuple for
each tag (so it could return '()
, if the file contained no
ID3v2 tags). Each three tuple contains:
<id3v2-tag>
instance, representing the tag
Once you’ve created or updated your ID3v2 tag(s), you will presumably
want to write it (them) to disk, presumably in place of an existing
tagset. This is done via write-tagset(tags, file,
...)
. tags
is a list of two-tuples: the first element is
always an <id3v2-tag>
isntance to be written to disk & the
second is the ID3v2 version under which it shall be serialized
(i.e. an int, either 2, 3 or 4). file
is the file into which
the new tagset shall be written, replacing any tagset present therein.
write-tagset
takes a few optional parameters:
#:apply-unsync
governs whether the unsynchronisation scheme
See The Unsynchronisation Scheme, should be applied when writing out
the given tags: #f
(the default) means never, #t
means
it will always be applied and 'as-needed
means that it will be
applied to any tag whose serialization would contain false syncs.
#:copy
governs whether a backup copy of the target file will be
made: a value of #f
(the default) means that the new tagset
will be written in place (moving the audio data & ID3v1 tag, if any,
if needed) and a value of #t
means that the target file will be
copied to a backup, the new tagset will be written, and then the track
data & ID3v1 tag (if any) will be copied over to the new file.
with-track-in
¶with-track-in(directory, fn)
is a convenience function; it will
iterate over all filesystem entities in directory
and apply
fn
to them. fn
shall be a function taking three parameters:
read-tagset
scheme@(guile-user)> (with-track-in "." (lambda (tags pth v1) (format #t "~s has ~d ID3v2 tags\n" pth (length tags)))) "./track.dat" has 0 ID3v2 tags "./id3v22-tda.mp3" has 1 ID3v2 tags ...
The various string fields bring up the question: what text encoding is used? There are actually three text encodings in play:
libscribbu
The first is documented in the Guile manual under “Character Encoding of Source Files” See Character Encoding of Soruce Files in The Guile Reference Manual. The upshot is this: UTF-8 is assumed, but the author may tell Guile what is being used through a coding hint:
;;; coding: iso-8859-1
The set of encodings recognized is defined by IANA in RFC2978.
The second is also documented in the Guile manual, under “String Internals” See String Internals in The Guile Reference Manual.:
Guile stores each string in memory as a contiguous array of Unicode code points along with an associated set of attributes. If all of the code points of a string have an integer range between 0 and 255 inclusive, the code point array is stored as one byte per code point: it is stored as an ISO-8859-1 (aka Latin-1) string. If any of the code points of the string has an integer value greater that 255, the code point array is stored as four bytes per code point: it is stored as a UTF-32 string.
Conversion between the one-byte-per-code-point and four-bytes-per-code-point representations happens automatically as necessary.
That just leaves libscribbu
. On read (that is, when the library
reads text from tags on disk), the encoding is sometimes specified by
the tag itself, or is specified by the caller, or is guessed. From
there, it will be converted to a Guile string. On write, text will
be converted from the internal Guile representation to the desired
text encoding on disk (deduced from either caller preferences or the
frame settings themselves).
libscribbu
¶The third way in which to use scribbu is to link against the library
libscribbu
. Detailed documentation can be found in the
libscribbu
source itself (Doxygen documentation can be
produced by doing cd doc && make doxygen-doc
).
While detailed documentation on individual classes, free functions, and sub-systems may one day make it’s way into this manual, for now this chapter will describe using the library through a worked example. This example can be found in the examples/az-tags sub-directory of the scribbu source distribution.
libscribbu
program ¶Let us write a small C++ program using libscribbu
to clean-up
Amazon.com Song IDs. When downloading .mp3s from Amazon.com,
their ID3v2 tags contain non-compliant comment frames (in that they
have no description). They also try to cram it into the comment field
in the ID3v1 tag, even though it’s generally too small to contain the
entire string. We will call this program az-tags
.
az-tags
¶The complete source for the program can be found in
examples/az-tags/main.cc in the source distribution. The logic
is simple enough to fit completely in main
. The usage is:
az-tags [-h] [-v] file [file...]
Skipping command line parsing, we begin by initializing the library:
// ... #include <scribbu/scribbu.hh> // ... int main(int argc, char * argv[]) { // Parse command-line options... scribbu::static_initialize()
libscribbu
needs to carry out assorted initialization; rather
than deal with the static initialization problem, it just depends on the
caller to explicitly initialize the library.
At this point, the first filename is waiting in argv[optind]
, so
we can set the basic structure of the program:
for (int i = optind; i < argc; ++i) { // ... }
For each file, we will open it & parse it into its ID3v2 tags, track data and ID3v1 tag:
for (int i = optind; i < argc; ++i) { fs::ifstream ifs(argv[i], ios_base::binary); // 1 vector<unique_ptr<scribbu::id3v2_tag>> id3v2; scribbu::read_all_id3v2(ifs, back_inserter(id3v2)); // 2 scribbu::track_data td((istream&)ifs); // 3 unique_ptr<scribbu::id3v1_tag> pid3v1 = scribbu::process_id3v1(ifs); // 4
At 1
, we open the file, taking care to use binary mode so as to avoid
newline translation. At 2
we ask libscribbu
to read any and
all ID3v2 tags into id3v2
. We’ve used a vector
here, but we
can use any container providing a forward output iterator.
At this point, the file pointer is pointing just past the last ID3v2
tag (if an– there may be none, in which case the file pointer remains
at the beginning of the file and id3v2
is empty). The easiest
way to consume the track data is to construct a track_data
isntace with it. This will collect some data about the track and
advance the file pointer to the one-past-the-end point.
There may or may not be some kind of ID3v1 tag waiting for us. That
is why process_id3v1
returns a unique_ptr
– if there
is no ID3v1 tag, a null pointer will be returned.
We now have zero or more ID3v2 tags to be processed in id3v2
:
for (auto &ptag: id3v2) { // `ptag' is a reference to a unique_ptr<id3v2_tag> // how to get at its frames? }
It is at this point that the libscribbu
API turns out to be
less than ergonmic. The issue is that read_all_id3v2
returns
the tags typed as pointers to id3v2_tag
; this is a base class
providing a “generic” interface supported by all ID3v2 tags, but the
API for iterating over frames is provided individually by each
sub-class (id3v2_2_tag
, id3v2_3_tag
&
id3v2_4_tag
).
Perhaps it would be worth it to provide an interface on the base class
to do this, but for now, I simply dynamic_cast
& dispatch to a
template function process_tag
:
for (auto &ptag: id3v2) { switch (ptag->version()) { case 2: { // ID3v2.2 tag scribbu::id3v2_2_tag &p = dynamic_cast<scribbu::id3v2_2_tag&>(*ptag); process_tag(p); break; } case 3: { // ID3v2.3 tag scribbu::id3v2_3_tag &p = dynamic_cast<scribbu::id3v2_3_tag&>(*ptag); process_tag(p); break; } case 4: { // ID3v2.4 tag scribbu::id3v2_4_tag &p = dynamic_cast<scribbu::id3v2_4_tag&>(*ptag); process_tag(p); break; } default: cerr << "Unknown ID3v2 revision " << ptag->version() << endl; abort(); } }
The template parameter is the id3v2_tag
sub-class. Since there
are only three, I can factor out the ID3v2-version-specific logic
into a traits class:
template <class tag_type> void process_tag(tag_type &T) { ... for (auto fp: T) { // 1 if (traits_type::COMMID == fp->id()) { // 2 id3v2_frame &F = fp; comm_type &C = dynamic_cast<comm_type&>(F); // 3 string dsc = C.template description<string>(); if (dsc.empty()) { string txt = C.template text<string>(); if ("Amazon.com Song ID" == txt.substr(0, 18)) { cout << "updating the comment frame containing " << txt << endl; fp = traits_type::replace(C); } } } } }
Each concreate id3v2_tag
subclass implements begin
&
end
, so we can use instances thereof as targets in for range
loops like 1. fp
is actually a mutable proxy for an
ID3v2-version-specific id3v2_frame
subclass. At 2 we have
factored out the precise frame ID to select for comments frames.
Each ID3v2 version has a concrete comment frame type, to which we again dynamically cast (I really need to re-evaluate this interface) at 3.
The rest of the logic is straightforward– if there is no description field in the comments frame, and the comment text begins with “Amazon.com Song ID”, replace the frame.
az-tags
¶The next step is to compile the program. We shall use Autotools, beginning
with the simplest configure.ac
we can:
AC_PREREQ([2.69]) AC_INIT([az-tags], [0.1], [sp1ff@pobox.com]) AC_CONFIG_MACRO_DIR([macros]) AC_CONFIG_SRCDIR([src/main.cc]) AC_CONFIG_AUX_DIR([build-aux]) AC_CONFIG_HEADERS([config.h]) AM_INIT_AUTOMAKE([-Wall -Werror]) LT_INIT AC_RROG_CXX AC_CONFIG_FILES([Makefile src/Makefile])
AC_PREREQ
just asserts that Autoconf 2.69 is required to build
a configure
script from this template. AC_INIT
is the
Autoconf initialization macro. We’re going to need some custom macros
for this project, so AC_CONFIG_MACRO_DIR
tells Autoconf where
to find them. AC_CONFIG_SRCDIR
is just a sanity check– when
running configure
users will sometimes pass an incorrect value
for --srcdir
– this macro equips the generated configure
script to catch that. AC_CONFIG_AUX_DIR
tells Autoconf to place
auxilliary scripts (missing
& ionstall-sh
, e.g.) in a
sub-directory named build-aux.
AC_CONFIG_HEADERS
tells Autoconf to generate a header file
named config.h containing C preprocessor #define
s
for the project. Note that we need to generate a template file
config.h.in via autoheader
.
Finally, we initialize Automake, libtool, check for a C++ compiler & produce Makefile templates.
The Autmake template for the root makefile is trivial:
SUBDIRS = src
Let us begin the Makefile template in src:
bin_PROGRAMS = az-tags az_tags_SOURCES = main.cc AM_CXXFLAGS = -std=c++17
We will need to perform some one-time setup:
mkdir build-aux touch NEWS README AUTHORS ChangeLog autoheader aclocal autoconf automake --add-missing
At this point, we can run ./configure
, but make
will
fail miserably. Our program needs to be able to find scribbu
,
openssl
and boost
includes, along with the corresponding
libraries. All the required libraries other than libscribbu
provide pre-built macros which we can copy from the scribbu source
distro into macros. Let us add the following lines to
configure.ac, just before the call to AC_CONFIG_FILES
:
PKG_CHECK_MODULES([GUILE], [guile-2.2]) AX_BOOST_BASE([1.58], [], [AC_MSG_ERROR([Scribbu requires boost_base 1.58 or later.])]) echo "Checkpoint 3: BOOST_LDFLAGS is $BOOST_LDFLAGS;" >&AS_MESSAGE_LOG_FD AX_BOOST_IOSTREAMS AX_BOOST_FILESYSTEM AX_BOOST_SYSTEM AX_CHECK_OPENSSL([],[AC_MSG_ERROR([Scribbu requires openssl.])])
Each of these will define Automake variables describing where we can find headers & libraries which we can add to src/Makefile.am, which now reads:
bin_PROGRAMS = az-tags az_tags_SOURCES = main.cc AM_CPPFLAGS = $(BOOST_CPPFLAGS) AM_CXXFLAGS = -std=c++17 $(GUILE_CFLAGS) AM_LDFLAGS = $(BOOST_LDFLAGS) LDADD = $(GUILE_LIBS) \ $(BOOST_SYSTEM_LIB) \ $(BOOST_FILESYSTEM_LIB) \ $(BOOST_IOSTREAMS_LIB) \ $(OPENSSL_LIBS)
This just leaves the question of where to find libscribbu
. scribbu,
at the time of this writing, provides no Autoconf macros (however, this
sample provided the author the opportunity to prototype one).
We add the following code to configure.ac, just after the call
to AC_PROG_CXX
(it’s a lot of code; step-by-step explanation to
follow):
AC_ARG_WITH([scribbu], AS_HELP_STRING([--with-scribbu=DIR], [root directory of scribbu installation]), [ case "$withval" in "" | y | ye | yes | n | no) AC_MSG_ERROR([--with-scribbu takes a root directory]);; *) scribbu_dirs="$withval";; esac ], [ # Just use the defaults scribbu_dirs="/usr/local /usr /opt/local /sw" ]) dnl One way or another, we have one or more candidates in ${scribbu_dirs} found=no for scribbu_home in ${scribbu_dirs}; do AC_MSG_CHECKING([for scribbu/scribbu.h under ${scribbu_home}]) if test -f "${scribbu_home}/include/scribbu/scribbu.hh"; then SCRIBBU_INCLUDES="-I${scribbu_home}/include/scribbu" SCRIBBU_LDFLAGS="-L${scribbu_home}/lib" SCRIBBU_LIBS="-lscribbu" found=yes AC_MSG_RESULT([yes]) break else AC_MSG_RESULT([no]) fi done if test "$found" != "yes"; then AC_MSG_ERROR([couldn't find scribbu]) fi # try the preprocessor and linker with our new flags, # being careful not to pollute the global LIBS, LDFLAGS, and CPPFLAGS AC_MSG_CHECKING([whether compiling and linking against scribbu will work]) save_LIBS="$LIBS" save_LDFLAGS="$LDFLAGS" save_CPPFLAGS="$CPPFLAGS" LIBS="$SCRIBBU_LIBS $LIBS" LDFLAGS="$SCRIBBU_LDFLAGS $LDFLAGS" CPPFLAGS="$SCRIBBU_CPPFLAGS $CPPFLAGS" AC_LANG_PUSH([C++]) AC_CHECK_HEADER([scribbu/scribbu.hh], [scribbu_hh=yes], [scribbu_hh=no]) # I'd like to do AC_CHECK_LIB here, but I can't link against libscribbu # in a test because it, in turn depends on a bunch of other libs AC_CHECK_FILE([${scribbu_home}/lib/libscribbu.la], [scribbu_la=yes], [scribbu_la=no]) AC_LANG_POP([C++]) LIBS="$save_LIBS" LDFLAGS="$save_LDFLAGS" CPPFLAGS="$save_CPPFLAGS" if test "yes" = "$scribbu_hh" && test "yes" = "$scribbu_la"; then AC_DEFINE([HAVE_SCRIBBU], [1], [Define to 1 if you have libscribbu]) else AC_MSG_ERROR([az-tags requires scribbu]) fi AC_SUBST([SCRIBBU_CPPFLAGS]) AC_SUBST([SCRIBBU_LIBS]) AC_SUBST([SCRIBBU_LDFLAGS])
The first step is to locate libscribbu
. We will form the
variable scribbu_dirs
containing one or more directories to
check. Now, the user could always just tell us where it is. That is
the reason we begin with AC_ARG_WITH
: if the user invokes
configure
with --with-scribbu=...
we will just use
that. Otherwise, we will examine a default set of locations.
That’s what the for look does; for each location in
scribbu_dirs
, it checks for scribbu.hh in a
sub-directory named include/scribbu of the current location. On
success, we set a few variables recording that result & break. If we
check all locations without success, then we fail.
Now, just because we found a header file at a given place doesn’t mean
we can biuld against it or its associated library. The typical idiom
is to execute the macros AC_CHECK_HEADER
and
AC_CHECK_LIB
to make sure we can include the header and link
against the library, respectively.
The problem in my case is that AC_CHECK_LIB
will fail, not
through any fault of libscribbu
, but because it depends on a
number of other libraries; the test will fail with unresolved
externals & I can’t see how to add the relevant link flags in the macro.
Instead, I settle for AC_CHECK_FILE
.
If both these pass, we know we’re good to go; the question remains: how
to record the information we’ve just discovered? The Autoconf manual
states that one should never add options to user variables such as
CPPFLAGS
. The idiom seems to be to define new variables that
the Automake author can add to their rules. In this case, create
three new variables:
SCRIBBU_CPPFLAGS
to hold the -I
option that will enable
the build to find the libscribbu
headers
SCRIBBU_LIBS
to hold the the -L
options that will
enable the build to link against libscribbu
SCRIBBU_LDLAGS
to hold any linker required flags
This lets us augment src/Makefile.am to:
bin_PROGRAMS = az-tags az_tags_SOURCES = main.cc AM_CPPFLAGS = $(BOOST_CPPFLAGS) $(SCRIBBU_CPPFLAGS) AM_CXXFLAGS = -std=c++17 $(GUILE_CFLAGS) AM_LDFLAGS = $(SCRIBBU_LDFLAGS) $(BOOST_LDFLAGS) LDADD = $(SCRIBBU_LIBS) \ $(GUILE_LIBS) \ $(BOOST_SYSTEM_LIB) \ $(BOOST_FILESYSTEM_LIB) \ $(BOOST_IOSTREAMS_LIB) \ $(OPENSSL_LIBS)
With that, we can configure
:
$>: autoreconf -vfi autoreconf: Entering directory `.' autoreconf: configure.ac: not using Gettext autoreconf: running: aclocal --force ... $>: ./configure --prefix=$HOME checking for a BSD-compatible install... /usr/bin/install -c checking whether build environment is sane... yes checking for a thread-safe mkdir -p... /bin/mkdir -p ... config.status: creating Makefile config.status: creating src/Makefile config.status: creating config.h config.status: executing depfiles commands config.status: executing libtool commands $>: make make all-recursive make[1]: Entering directory '/tmp/az-tags' Making all in src make[2]: Entering directory '/tmp/az-tags/src' g++ -DHAVE_CONFIG_H -I. -I/home/mgh/doc/code/projects/az-tags/src -I.. -I/usr/include -std=c++17 -pthread -I/usr/local/include/guile/2.2 -g -O2 -MT main.o -MD -MP -MF .deps/main.Tpo -c -o main.o /home/mgh/doc/code/projects/az-tags/src/main.cc ...
We have a build! Let us take a look at a file downloaded from Amazon.com:
$>: scribbu dump lorca.mp3 "lorca.mp3": ID3v2.3(.0) Tag: 452951 bytes, synchronised ... COMM (<no description>): Amazon.com Song ID: 203558254 ... 9425708 bytes of track data: MD5: 48ff9cadea7d842e9059db25159d2daa ID3v1.1: The Pogues - Lorca's Novena Hell's Ditch [Expanded] (US Ve (track 5), 1990 Amazon.com Song ID: 20355825 unknown genre 255 $>: src/az-tags lorca.mp3 lorca.mp3 has 1 ID3v2 tags, and an ID3v1 tag updating the comment frame containing Amazon.com Song ID: 203558254 all tags processed; emplacing new tagset... emplacing new tagset...done. clearing ID3v1 comment $>: scribbu dump lorca.mp3 "lorca.mp3": ID3v2.3(.0) Tag: 452951 bytes, synchronised ... COMM (amazon.com song id): Amazon.com Song ID: 203558254 ... 9425708 bytes of track data: MD5: 48ff9cadea7d842e9059db25159d2daa ID3v1.1: The Pogues - Lorca's Novena Hell's Ditch [Expanded] (US Ve (track 5), 1990 unknown genre 255
Symbol | 2.3 | 2.4+ |
---|---|---|
’album-frame | TAL | TALB |
’artist-frame | TP1 | TPE1 |
’band-frame | TP2 | TPE2 |
’bpm-frame | TBP | TBPM |
’comment-frame | COM | COMM |
’composer-frame | TCM | TCOM |
’conductor-frame | TP3 | TPE3 |
’content-group-frame | TT1 | TIT1 |
’copyright-frame | TCR | TCOP |
’date-frame | TDA | TDAT |
’encoded-by-frame | TEN | TENC |
’file-owner-frame | N/A | TOWN |
’file-type-frame | TFT | TFLT |
’genre-frame | TCO | TCON |
’initial-key-frame | TKE | TKEY |
’interpreted-by-frame | TP4 | TPE4 |
’isrc-frame | TRC | TSRC |
’langs-frame | TLA | TLAN |
’length-frame | TLE | TLEN |
’lyricist-frame | TXT | TEXT |
’media-type-frame | TMT | TMED |
’original-album-frame | TOT | TOAL |
’original-artist-frame | TOA | TOPE |
’original-filename-frame | TOF | TOFN |
’original-lyricist-frame | TOL | TOLY |
’original-release-year-frame | TOR | TORY |
’part-of-a-set-frame | TPA | TPOS |
’play-count-frame | CNT | PCNT |
’playlist-delay-frame | TDY | TDLY |
’pop-frame | POP | POPM |
’publisher-frame | TPB | TPUB |
’recording-dates-frame | TRD | TRDA |
’settings-frame | TSS | TSSE |
’size-frame | TSI | TSIZ |
’station-name-frame | N/A | TRSN |
’station-owner-frame | N/A | TRSO |
’subtitle-frame | TT3 | TIT3 |
’tag-cloud-frame | XTG | XTAG |
’time-frame | TIM | TIME |
’title-frame | TT2 | TIT2 |
’track-frame | TRK | TRCK |
’udt-frame | TXX | TXXX |
’year-frame | TYE | TYER |
scribbu
uses iconv
for character encoding. For
convenience, here is the list of identifiers used to name them:
ASCII
, ISO_8859_1
, ISO_8859_2
,
ISO_8859_3
, ISO_8859_4
, ISO_8859_5
,
ISO_8859_7
, ISO_8859_9
, ISO_8859_10
,
ISO_8859_13
, ISO_8859_14
, ISO_8859_15
,
ISO_8859_16
, KOI8_R
, KOI8_U
,
KOI8_RU
, CP1250
, CP1251
, CP1252
,
CP1253
, CP1254
, CP1257
, CP850
,
CP866
, CP1131
, MacRoman
,
MacCentralEurope
, MacIceland
, MacCroatian
,
MacRomania
, MacCyrillic
, MacUkraine
,
MacGreek
, MacTurkish
, Macintosh
ISO_8859_6
, ISO_8859_8
, CP1255
,
CP1256
, CP862
, MacHebrew
,
MacArabic
EUC_JP
, SHIFT_JIS
, CP932
,
ISO_2022_JP
, ISO_2022_JP_2
, ISO_2022_JP_1
,
ISO_2022_JP_MS
EUC_CN
, HZ
, GBK
, CP936
,
GB18030
, EUC_TW
, BIG5
, CP950
,
BIG5_HKSCS
, BIG5_HKSCS_2004
,
BIG5_HKSCS_2001
, BIG5_HKSCS_1999
,
ISO_2022_CN
, ISO_2022_CN_EXT
EUC_KR
, CP949
, ISO_2022_KR
, JOHAB
ARMSCII_8
Georgian_Academy
, Georgian_PS
KOI8_T
PT154
, RK1048
TIS_620
, CP874
, MacThai
MuleLao_1
, CP1133
VISCII
, TCVN
, CP1258
HP_ROMAN8
, NEXTSTEP
UTF_8
, UCS_2
, UCS_2BE
, UCS_2LE
,
UCS_4
, UCS_4BE
, UCS_4LE
, UTF_16
,
UTF_16BE
, UTF_16LE
, UTF_32
,
UTF_32BE
, UTF_32LE
, UTF_7
, C99
,
JAVA
Jump to: | <
B F I S T U W |
---|
Jump to: | <
B F I S T U W |
---|