Moving from Feedly to Elfeed

Introduction

I recently changed my RSS reader from Feedly to Elfeed, and decided to write about the process. If you just want the steps involved, start here. If you're only interested in how I configured Elfeed, go here. Otherwise, please read on.

RSS

I prefer to curate my own newsfeed via RSS. For years, I've started my day with my RSS newsreader of choice. For a long time, that was Google Reader. When Google shuttered that app in 2013 (one of a few lessons in the dangers of depending upon Google), I moved to Feedly. It worked, it was easy to import my feeds from Google Reader and I used it both on desktop & mobile for several years.

Of late, I've been moving to applications that give me more control over both their code and my data. Emacs has featured prominently in that, and I stumbled upon Elfeed– a package for reading your RSS feeds in Emacs.

Background

RSS stands for Really Simple Syndication, or Rich Site Summary… or RDF Site Summary. The confused naming speaks to the gloriously muddled origins & evolution of this convention. I say "convention" because it's not so much a specification as much as a collection of specifications for syndication dating back to the late nineties. I'm giving here as brief a summary as I can while still hitting the salient points, based on the book "Developing Feeds with RSS and Atom" by Ben Hammersley and the article The Rise & Demise of RSS by Sinclair Target.

Hammersley traces the origins of RSS back to 1995, specifically Ramanathan Guha's MCF work at Apple. This eventually became RDF (Resource Description Format), albeit at Netscape, where Guha went after Apple. There was other work going on in this area as well (I'm old enough to remember Microsoft's CDF), but even at this early date, there was a tension between featureful representations of websites (like RDF & CDF), and bloggers looking for a simple way in which to syndicate their sites. Dave Winer (of Userland & ScriptingNews) was in the latter camp; in late 1997 he published an announcement that he would be making his site, Scripting News, available in an XML format he called scriptingNews as well as HTML; the idea was that the former would be consumed by programs & the latter by humans (via web browsers).

The browser wars raged through 1998, during which Netscape released the My Netscape Network; one of the "portals" of that era that presaged today's social media silos. Netscape needed a way for third-party sites to integrate with their portal, and after a few preliminary sallies, RSS 0.90 (authored by Dan Libby, Eckart Walther, and Guha) was published on March 15, 1999. This was an RDF-based format ("RSS", at this point, stood for RDF Site Summary) & representative of the "featureful" school of thought.

Much debate ensued, and two months later, Netscape released RSS 0.91. Despite the incremental version change, it was a major update, removing the RDF elements and adding elements of Winer's format. The name changed from RDF Site Summary to Rich Site Summry, and the update was significant enough to get Winer to abandon his standard in favor of it. Perhaps because of that, RSS 0.91 saw wider adoption over the next year. Typically, the greater exposure exposed shortcomings. but there was no consensus on how to address them. The two camps were again Winer on the one hand, who favored keeping the specification simple, and on the other Rael Dornfest, Ian Davis & Aaron Swartz, who favored RDF support & providing for extensibility via XML namespaces. The split became formal with the formation of the RSS-DEV working group by the latter, which published a specification they called RSS 1.0 in December 2000 (RDF Site Summary). Winer responded by publishing his own specification, named RSS 0.92, which included much more modest changes such as enclosures. In September 2002, he released RSS 2.0 (Really Simple Sindication) with support for namespaces, but provisions for backward compatibility.

Yet another variant appeared in 2003, when the Atom format was created: an entirely new format for the same purpose. It has has been adopted as IETF Proposed Standard RFC 4287, and per Chris Wellons (Elfeed's author) is far superior to the RSS family.

For all the ferment behind it's development, interest in RSS has been trending downward since the mid-aughts. Citing this, Google discontinuied it's reader in 2013, and several other prominent RSS clients have as well. There's no consensus on why, but the rise of social media is frequently cited.

Target summarizes the current state of affaris as follows: "stubbornly adding an RSS feed to your blog, even in 2019, is a political statement. That little tangerine bubble has become a wistful symbol of defiance against a centralized web increasingly controlled by a handful of corporations…" Perhaps. You can find my RSS feed here.

Getting My Feedly Feeds

They don't make it easy, but Feedly does offer OPML export. I did this by scrolling down to the bottom of my feeds in the web app and selecting "Integrations". This brought up a menu of apps with which Feedly integrates, and when I clicked the "X" in the upper left to close that list, I was left at "Manage Account". From there I selected "Privacy & Personal Data", which brought up another tab. I had to scroll down a bit to "Organize Sources" & then click the arrow at the top-right to download an OPML export (feedly-{really-long-hex-string}.opml).

Getting them into Elfeed

The complete source for this is available here.

Peeking into the opml file, I saw that the exported OPML is structured by tag:

<?xml version="1.0" encoding="UTF-8"?>
<opml version="1.0">
    <head>
        <title>...subscriptions in feedly Cloud</title>
    </head>
    <body>
        <outline text="irish" title="irish">
            <outline type="rss" text="North Atlantic Skyline" title="North Atlantic Skyline" xmlUrl="http://johnsmyth.ie/blog/?feed=rss2" htmlUrl="http://johnsmyth.ie/blog"/>
        </outline>
        <outline text="geeking" title="geeking">
            <outline type="rss" text="TechCrunch" title="TechCrunch" xmlUrl="http://feeds.feedburner.com/Techcrunch" htmlUrl="https://techcrunch.com"/>
            <outline type="rss" text="Mx plan :: sachachua's blog" title="Mx plan :: sachachua's blog" xmlUrl="http://feeds.feedburner.com/sachac" htmlUrl="https://sachachua.com/blog"/>
            <outline type="rss" text="genehack.org" title="genehack.org" xmlUrl="http://www.genehack.org/feed/rss/" htmlUrl="http://genehack.org/"/>
            ...

in which a given feed could appear multiple times (under different tags). Conversely, Elfeed takes a list of feeds with tags; you define your feeds like this:

(setq elfeed-feeds
      '(("http://nullprogram.com/feed/" blog emacs)
        "http://www.50ply.com/atom.xml"  ; no autotagging
        ("http://nedroid.com/feed/" webcomic)))

it seemed I was going to have to pivot my data– Feedly gave me a mapping from tag to feeds, whereas what I needed was a mapping of feed to tags.

The Pivot

I fired up Emacs & loaded the xml package. xml-parse-file takes the name of the XML file in which you are interested, and produces an association list mapping tags to values, structured in a fairly natural way according to the XML. In particular, I needed to first extract the 'opml element from the returned alist, and then in turn the 'body element from that:

(let* ((opml-file "~/tmp/feedly.opml")
       (body (assq 'body (assq 'opml (xml-parse-file opml-file)))))
  body)
(body
 nil
 (outline
  ((text . "irish")
   (title . "irish"))
  (outline
   ((type . "rss")
    (text . "North Atlantic Skyline")
    (title . "North Atlantic Skyline")
    (xmlUrl . "http://johnsmyth.ie/blog/?feed=rss2")
    (htmlUrl . "http://johnsmyth.ie/blog"))))
 (outline
  ((text . "geeking")
   (title . "geeking"))
  (outline
   ((type . "rss")
    (text . "TechCrunch")
    (title . "TechCrunch")
    (xmlUrl . "http://feeds.feedburner.com/Techcrunch")
    (htmlUrl . "https://techcrunch.com")))
  (outline
   ((type . "rss")
    (text . "Mx plan :: sachachua's blog")
    (title . "Mx plan :: sachachua's blog")
    (xmlUrl . "http://feeds.feedburner.com/sachac")
    (htmlUrl . "https://sachachua.com/blog")))
  (outline
   ((type . "rss")
    (text . "genehack.org")
    (title . "genehack.org")
    (xmlUrl . "http://www.genehack.org/feed/rss/")
    (htmlUrl . "http://genehack.org/")))
  ...

(I've cleaned up the output somewhat).

So the body element contains the nested structure for which we're looking. After a little experimentation, I found xml-get-children yielded the goods:

(let* ((opml-file "~/tmp/feedly.opml")
       (body (assq 'body (assq 'opml (xml-parse-file opml-file)))))
  (xml-get-children body 'outline))
((outline
  ((text . "irish")
   (title . "irish"))
  (outline
   ((type . "rss") (text . "North Atlantic Skyline") (title . "North Atlantic Skyline") (xmlUrl . "http://johnsmyth.ie/blog/?feed=rss2") (htmlUrl . "http://johnsmyth.ie/blog"))))
 (outline
  ((text . "geeking") (title . "geeking"))
  (outline ((type . "rss") (text . "TechCrunch") (title . "TechCrunch") (xmlUrl . "http://feeds.feedburner.com/Techcrunch") (htmlUrl . "https://techcrunch.com")))
  (outline ((type . "rss") (text . "Mx plan :: sachachua's blog") (title . "Mx plan :: sachachua's blog") (xmlUrl . "http://feeds.feedburner.com/sachac") (htmlUrl . "https://sachachua.com/blog")))
  (outline ((type . "rss") (text . "genehack.org") (title . "genehack.org") (xmlUrl . "http://www.genehack.org/feed/rss/") (htmlUrl . "http://genehack.org/")))
  ...

At this point, the pivot is straightforward Lisp– cdr down the list of "outer" outline elements (which correspond to tags); for each , cdr down the nested list of outline elements (which correspond to feeds). While doing so, we maintain an alist mapping feed to tag; for each feed, we either create a new entry (if this is the first time we're seeing this feed), or update the existing entry's list of tags with the current tag. This is funciton feedly--outline-to-alist in feedly-to-elfeed.el.

Getting the Information to Elfeed

Since I already use Org-mode to build my .emacs, the most natural output format for me was a table:

  #+NAME: feeds-table
  | Title                                             | Tags             | Notes | URL                                                 |
  |---------------------------------------------------+------------------+-------+-----------------------------------------------------|
  | North Atlantic Skyline                            | irish photo      |       | http://johnsmyth.ie/blog/?feed=rss2                 |
  | TechCrunch                                        | geeking @daily   |       | http://feeds.feedburner.com/Techcrunch              |
  | emacs-news – sacha chua :: living an awesome life | @emacs @dev      |       | http://sachachua.com/blog/category/emacs-news/feed/ |
  | genehack.org                                      | geeking planning |       | http://www.genehack.org/feed/rss/                   |
...

After parsing the OPML, and writing a little Elisp to format the output, I had:

(defun feedly-to-elfeed (opml-file)
  "Convert Feedly OPML to elfeed configuration."

  (let* ((body (assq 'body (assq 'opml (xml-parse-file opml-file))))
         (tags (xml-get-children body 'outline)))
    ;; `tags' will be the list of "outer" `outline' tags
    (feedly-to-elfeed--alist-to-org-table (feedly--outline-to-alist tags))))

I copied this by hand into the Org file from which I generate my .emacs, and added a little lisp after the table to setup 'elfeed-feeds:

#+NAME: make-feeds
#+BEGIN_SRC emacs-lisp :tangle yes :results output silent :comments org :var feeds=feeds-table
  (setq
   elfeed-feeds
   (mapcar 
   (lambda (feed)
      (append (list (nth 3 feed)) (mapcar 'intern (split-string (nth 1 feed)))))
    feeds))
#+END_SRC

Customizing Elfeed to Discard Certain Entries

This setup my Elfeed feeds, but there was one bit of non-obvious logic: there is a particular feed of mine which contains entries highlighting goods at Amazon (they're an Amazon Affiliate). I love the site, but didn't feel like skipping through those entries. I thought it would be nice to automatically mark such entries as "read" so they wouldn't even show up.

A little grep'ing through the Elfeed docs turned up 'elfeed-new-entry-hook as the place to process new entries. The Elfeed author's own article showed me that marking an entry as "read" amounted to removing the 'unread tag:

(defun sp1ff/filter-elfeed-entry (entry)
  "Mark a new entry as read.

  At present this just takes any new entry that begins with \"AT AMAZON\"
  and marks it read."
  (let ((title (elfeed-entry-title entry)))
    (if (string-prefix-p "at amazon" title t)
        (progn
          (message "marking %s as read" title)
          (elfeed-untag entry 'unread)))))
(add-hook 'elfeed-new-entry-hook 'sp1ff/filter-elfeed-entry)

Where Am I?

I am now using Efleed day-to-day to read my newsfeed, and curate my feeds directly in the Org-mode table I created above. I did find that I wanted a way to surface particularly interesting entries, which led me to write my own scoring facility (which will be the subject of a future post).

01/04/20 11:48