A more compact ERC mode line

I use ERC to read IRC in Emacs (as an aside, I recently had to explain to the intern on my team what IRC is). ERC comes with a nice feature called Channel Tracking. What this does is notice when there's activity in IRC channels which are not visible and update your mode line to give you a visual indication that there's new traffic waiting to be read.

Of course, just adding a list of channel names would quickly take up the entire mode line, so there is a need to produce short but meaningful abbreviations of some sort. The variable erc-track-shorten-function is expected to contain a function that ERC will use for this purpose. The function takes a list of strings naming the active channels & returns the abbreviations to be used in the mode line.

By default, this is erc-track-shorten-names, it is attributed to Alex Schroeder, and it works by looking for the shortest prefix of each active channel still unique among all channels. So, for instance, if my active channels are '("#emacs" "#indieweb" "#indieweb-dev"), erc-track-shorten-names returns '("#e" "#indieweb" "#indieweb-") (#emacs is the only channel I have joined whose name begins with e).

That is certainly better, but it bothered me that "#indeweb" & "#indieweb-" shared so much text, even though "#in" would be enough to distinguish both of them among my channels. I sat down yesterday evening to see what I could do.

I first thought to post-process the output of erc-track-shorten-names, looking for common prefixes. I did this by inserting the active channels into a trie, and looking for nodes that both had a value and had children. While playing around with that, I realized there was a much simpler way:

Let ALL-CHANNELS be all my IRC channels, and ACTIVE-CHANNELS those that are active.

1. insert ALL-CHANNELS into a trie T

2. for each CHAN in ACTIVE-CHANNELS DO

  2.1 let AUX be the empty string

  2.2. search T for CHAN; at each step, let I be the number of
       children for the current node;

       A. if I > 1, the current character is necessary to distinguish
          CHAN among all others; append the current character to AUX

       B. if the current node has a value, and that value isn't equal
          to CHAN, then this node is a prefix of CHAN; again append the
          current character to AUX in order to distinguish CHAN

       at the completion of the search, AUX will contain a minimal
       substring of CHAN that is still sufficient to distinguish it
       from all other channels

Let's look at a quick example: suppose I'm in the channels "#i3", "#indieweb", "#indieweb-dev", "#scheme" and "irc.freednode.net". This is a very crude sketch of the resulting trie; I'm indicating nodes with a single child by just putting them adjacent to one another, so for instance "n d" means the node corresponding to "n" has a single child: that corresponding to "d".

<root> --> # -> i -> 3 ("#i3")
 |         |    |
 |         |    +--> n d i e w e b ("#indieweb")
 |         |                       |
 |         |                       +-> - d e  v ("#indieweb-dev")
 |         |
 |         +--> s c h e m e ("#scheme")
 |
 +--> i r c . f r e e n o d e . n e t ("irc.freenode.net")

The root node has two children: "#" and "i". "#" has two children "i", and "s". "#i" has a two children "3" and "n", and "#i3" has no children along with a node value of "#i3".

Applying the above algorithm to "#indieweb-dev" means we're going to walk through this trie down to the node "#indieweb-dev". AUX starts as "", but at the nodes <root>, "#", & "i" we see multiple children, and so we append the current character at each step, giving "#in" by the time we make it onto that long run of single children. We append nothing more to AUX until we reach the node "#indieweb" where the second rule applies: we've found a channel name that's a prefix of the active channel for which we are searching. That means we append the current character to AUX to get "#in-". The rest of the nodes all have one child until we complete our search.

This algorithm yields the abbreviations '("#i3" "#in", "#in-", "#s" and "i")– much tighter (thirteen characters in total, as opposed to twenty-five).

The implementation was pretty straightforward. My trie implementation is just a struct with an optional value & children. I use an association list for the children:

(cl-defstruct trie-node value children)

(defun sp1ff/trie-create ()
  (make-trie-node :value nil :children nil))

I'm not a good enough LISP programmer to assess the efficiency of this implementation. Maybe a hash table would be faster? Also, tries are notorious for using space inefficiently when naively implemented, and this implementation is truly naive. See here for what's involved in coding up a space-efficient implementation. Given my application, I suspect I'll be fine.

String insertion is the obvious algorithm; iterate over the string to be inserted, descending into the trie for each character. At the end, set the node's value to the string:

(defun sp1ff/trie-insert (node text)
  "Insert TEXT into the trie rooted at NODE."
  (cl-loop for c in (append text nil) do
           (let* ((child (assq c (trie-node-children node))))
             (if child
                 (setq node (cdr child))
               (let ((new-node (make-trie-node :value nil :children nil)))
                 (setf (trie-node-children node)
                       (append (list (cons c new-node)) (trie-node-children node)))
                 (setq node new-node)))))
  (setf (trie-node-value node) text))

Traversal is similarly simple; walk the tree as far as you can based on the search string, return what you've got.

(defun sp1ff/trie-find (node text &optional f)
  "Search the trie rooted at NODE for TEXT; if given invoke F at each step."
  (cl-loop for c in (append text nil) do
           (let* ((val (trie-node-value node))
                  (children (trie-node-children node))
                  (child (and children (assq c children))))
             ;; NB `child' is the node corresponding to `c'
             (if (functionp f) (funcall f c val children))
             (if child
                 (setq node (cdr child))
               (cl-return val))))
  (if node (trie-node-value node) nil))

With those building blocks, I can now implement by own shortening function:

(defun sp1ff/erc/shorten-names (channel-names)
  "Given a list of strings CHANNEL-NAMES, produce a corresponding
  set of sub-strings, each sufficient to distinguish it's
  corresponding channel among all channels."
  (let ((trie (sp1ff/trie-create)))
    (mapcar (lambda (x) (sp1ff/trie-insert trie x)) (erc-all-buffer-names))
    (mapcar
     (lambda (chan)
       (let ((aux ""))
         (sp1ff/trie-find
          trie
          chan
          (lambda (c val children)
            (let ((i (length children)))
              (if (or (> i 1)
                      (and val (not (string= val chan))))
                  (setq aux (concat aux (string c)))))))
         aux))
     channel-names))

  (setq erc-track-shorten-function 'sp1ff/erc/shorten-names)

The result:

07/24/20 14:27

Unwound Stack

A more compact ERC mode line