Where Should elfeed-score State Be Persisted?

The Problem

There's been a subtle bug in elfeed-score for some time now, described in issue 13:

Presently, rules & their state (last match time, # hits, &c) are flushed to disk when:

  • a scoring operation is performed (elfeed-score-score or elfeed-score-search)
  • it is explicitly requested (elfeed-score-serde-write-score-file)
  • elfeed-score is unloaded (elfeed-score-unload)

Otherwise, if you simply leave elfeed-score in place, day after day, reading news & relying on the Elfeed new entry hook to score entries, a lot of state accumulates in-memory that won't be written to disk. Worse, if you update your score file by hand (to add a new rule, say), and then carry out one of the above three operations, your changes will be overwritten.

I see this as a particular instance of the split-brain problem: we have two datasets representing the same thing (the state of your scoring rules) maintained in two different places that have become inconsistent. In the current state of affairs, this inconsistency is resolved in favor of the in-memory copy, at the expense of the on-disk copy.

How to Mitigate the Problem

I first thought to at least warn the user when this was about to happen; elfeed-score could detect that the score file had changed since last read & alert the user that their changes were about to be overwritten. I also thought to write out score state more frequently (i.e. blast the in-memory state out to disk every n seconds, or every m scoring operations). While these would have been improvements, they were only that: mitigations to the essential problem.

@firmart had the right idea: split off the state that gets updated in-memory into a separate file. This didn't appeal to me immediately; it was only when I conceptualized this problem as split-brain that it clicked: we'll solve the split-brain by splitting the state. We'll let the score file be the controlling authority when it comes to rule "structure" and the in-memory representation be the controlling state with respect to statistics (the second file would just be an on-disk cache mirroring the in-memory state).

But Wait, What About Adding Rules Interactively?

I've been for some time planning a new feature in which one could add rules interactively. For instance, while reading an elfeed entry, one might highlight some text in the title & say "Give me a rule that increments the scores of entries with these words in the title by 100." Now we're back to updating rule structure in state, trying to write it out to the score file, and risking the score file having been modified since our last read.

Even still, this isn't so bad: the nature of the in-memory state-change is different. Rather than updating stats for every rule, we are adding a single rule. On detecting such a change to the score file, we could resolve the split-brain without losing any state by simply re-loading the score file, add our new rule, and write the whole thing back out to disk.

05/24/21 07:43

Have a comment on this post? Start a discussion in my public inbox by sending an email to ~sp1ff/public-inbox@lists.src.ht(mailing list etiquette), or see existing discussions.