Atom Heart Mother (beta) – More, Larger Feeds for blogs.sun.com

Exactly how many articles should be held in the feed which a website syndicates to its readership via RSS, ATOM and the like?

It’s a simple question, but actually rather harder to answer without succumbing to the risk of Worse Is Better thinking[1]. Below, I attempt to answer the above question, and I announce some software which I created because the answer bugged me enough to do something about it.

The problem is: I like my coffee in the morning.

I like to drink my my coffee, in the quiet period where I am still drying off after the shower, waking up, when I can sit down in front of my Mac and use NetNewsWire to flick through all the feeds and articles that have been updated in the past 24 hours.

The issue with feeds that come from blogs.sun.com is that the feed (currently) reflects only the most recent 30 articles to have been written by Sun bloggers. My quick analysis suggested that blogs.sun.com was syndicating upwards of 75 articles per day, so if I read the feeds only once per day, I will miss more than 50% of the traffic.

I raised this matter with the blog engineering team, and – heavily edited into a conversational style for the purposes of this posting – the conversation went something like this, with fair points being raised on both sides:

Alec: Given the speed at which the blogs.sun.com feed changes, would it be possible to support parameterising the feed like:

http://…/entries/rss?count=100

…to get the last 100, rather than 30, entries?


Engineering: We’re not sure that we want to parameterize that URL, but we do think we should consider upping the number of entries a bit, but 75 and 100 seem like a bit much.

Lets update it to 50 entries though.


Alec: I checked the BSC roller at about 8am this morning local time, noted the firstmost posting was dated “1108pm”, and then scrolled back three and a half pages before I found a posting from 11pm the previous day.

Not very scientific, but that’s 3.5 * 25 = 87 postings per day.

I want to use NetNewsWire to read all Sun blog postings during my early-morning wake-up coffee break – or at least I want to skim the headlines to pick interesting postings to read later – and I am pretty sure from the above that limiting the feed to 50 articles will just not deal with the scale of the readership problem.


Engineering: We actually had a small discussion about this some time ago and the consensus was that people should increase their feed reader polling times rather than expect BSC to constantly be increasing feed sizes. Part of the problem is that there has to be a reasonable limit to how big the BSC feed gets and IMO 100 entries already exceeds that.

If we were to increase it to 100 then what? Maybe 2 years down the line when even more people are blogging it will take 200 entries to guarantee you see a full day’s worth of content, and eventually 400 or 800? Where do we draw the line?

If I miss a show on tv one day I don’t start complaining to my cable company that they need to rerun all shows more frequently because I can’t be home to watch them all the time, instead I get a tivo or setup some way to record the show.


Alec: So I need to be reading my feeds 24×7, or at least twice a day and not just in one batch in the morning when it’s actually convenient for me to do so?


Engineering: you should have some software that can grab the feed more regularly so that you can read the entries whenever you feel like it without missing anything. Feeds are like news tickers, they are just there to give a constant view of the most recent news. If you aren’t monitoring the ticker often enough then you are going to miss some news.

Why is it our fault that your reader applications aren’t properly setup to keep up with the speed that things are coming? But that is exactly what was discussed last time this issue came up.

Also: too many entries creates the opposite problem for other folks – they retrieve more info than they care for.

Why not just use a more intelligent feed reader, that keeps state?


Alec: Because although I am an open-source developer and am still an open-source evangelist, I also enjoy using attractive, well-designed and functional MacOS software? 🙂

I consider blaming the client application to be a non-inclusive answer, since not all clients want to maintain cache – some, eg: on PDAs might not have the space to implement that – whereas the point of a feed in my worldview is not to be a “ticker” with only the latest things on it, but to be a list of the latest updates within a given interval.

I believe the latter is a more elegant and more user-focused worldview; I don’t know how often the BSC front page feed is generated (it might be done in real time, but I suspect that it is actually quantised as an hourly cronjob, or something?) – but imagine that the feed-size-window is 50, yet at some future product launch, 51 people post near-simultaneous articles within that time quantum.

If the feed-size-window is overflowed then peoples articles will be dropped. Perhaps it’ll be one of Jonathan’s articles that is dropped? It’s quite likely that he’d be the first of the flood, so it could happen.

Wouldn’t that be jolly? 🙂

Simply defining the feed as containing all the articles generated within a given time quantum (and telling people how big it is) would obviate this problem.

Of course it is possible to argue “we can tune the parameters so that this will practically never happen, regenerating the feed every 5 minutes, nobody can post 50 articles in 5 minutes” – but that’s just an arms-race/escalation argument, and still assumes caching clients.

It strikes me that posting the last <TIMEQUANTUM>’s worth of articles is an elegant algorithm which is inclusive of the greatest number of people, whereas merely posting the last <N> articles is not inclusive. To allow for weekends and other low-posting-frequency times, I believe the proper algorithm should be:

In your syndication feed, post the last <TIMEQUANTUM’s> worth of articles, or post the last <N> articles, whichever is the larger number of postings.

…and I would suggest that sensible basic values for TIMEQUANTUM=24h, and N=32.

Summarising: my gut-responses to the challenges raised in the articles above were:

  1. Feeds are routes of communication
  2. Fewer feed articles yields less communication
  3. Blithe feed implementation yields lost communication
  4. Complete and comprehensive communication is the speaker’s / publisher’s burden
  5. The reader is free to use any standards-compliant software
  6. The reader should expect to see a complete and comprehensive feed using that software
  7. Why should the reader need to read multiple times daily in order to remain current?

There are several arguments on the opposing side, and good analogies such as TiVO, etc, but the latter is a subscription service for which the consumer has paid – whereas in blogging we are trying to communicate with the largest set of people possible, where they are choosing to take an interest in us… an interest worth fostering.

The two opposing arguments I see most clearly are:

  1. Why is it the speaker’s / publisher’s burden to work around client limitations?
  2. Why should more capable readers/clients be burdened with traffic they have already seen?

My response to the first is: because the speaker finds value in being inclusive and maximising the number of readers to whom they communicate. I’ve already explained that one enough.

My response to the second is: to ensure the reader is unlikely to miss traffic; of course the ideal solution to this would be ultra-smart parameterisation of the form:

http://site/…/feed?lastvisit=yyyy-mm-dd+hh:mm:ssZ or
http://site/…/feed?since=24h or
http://site/…/feed?count=200

…which doubtless requires marginally more effort, standards adoption, and resources to deploy and get right; on the other hand it’s not at all far from the sort of capabilities where your feed selects by tag-query:

http://site/…/feed?tags=solaris+security&mode=and
http://site/…/feed?tags=music+art+theatre&mode=or

…which functionality surely ought to be somewhere in the pipeline?

Anyway: none of this actually solved the problem of my being able to sit down with coffee in the morning and read all the BSC feeds, and so I wrote Atom Heart Mother:-

atom ♥ mother

more, larger, older feeds for blogs.sun.com

updated twice per hour.

currently available in 6, 24, 48 hour, and 7 and 14 day versions; the 14-day feed is an experiment and may be deleted if it gets too big.

http://www.crypticide.com/ahm/

It’s not beyond “beta” yet, but I intend to keep it up for a while, until a better solution presents itself.


[1] ps: Kudos to Dick Gabriel, now of Sun, for the original essay.

Comments

9 responses to “Atom Heart Mother (beta) – More, Larger Feeds for blogs.sun.com”

  1. Dave Johnson
    re: Atom Heart Mother (beta) – More, Larger Feeds for blogs.sun.com

    Whether or not BSC chooses to support time-parameterized feeds, I believe the underlying Roller software should be made to support such a thing. Note that there are some alternative solutions to this problem such paging via Atom next/prev links and FeedDiff, but they are currently not supported by enough feed readers.

  2. Mads
    re: Atom Heart Mother (beta) – More, Larger Feeds for blogs.sun.com

    I find the blogs.sun.com feed length of 25 pretty annoying as well – in busy times, it is hard to keep up even when reading as the last thing in the evening and again the next morning. My short term solution to the problem is setting up a planet that only grabs from blogs.sun.com and keeps a longer backlog – then I can subsribe to the feed that creates and have a reasonable interval between reading. Longer term, I’m going to have to put something together that merges blogs.sun.com, Planet solaris/opensolaris/sun and a few other related feeds and comes up with just one single feed. Just upping to exactly 24h doesn’t leave you much of room for not hitting the exact moment if you want to get everything. I think asking for posts modified since a specific date would long term be the best solution (and should even save a bit of traffic).

  3. alecm
    re: Atom Heart Mother (beta) – More, Larger Feeds for blogs.sun.com

    <<< Just upping to exactly 24h doesn’t leave you much of room for not hitting the exact moment if you want to get everything. >>>

    That’s why there are the other, longer feeds, too. Most readers I have ever seen have no problem remembering what you have *seen* so long as it is visible in the feed.

    <<<Longer term, I’m going to have to put something together that merges blogs.sun.com, Planet solaris/opensolaris/sun and a few other related feeds and comes up with just one single feed.>>>

    I can do that for ATOM already, and once the RSS-to-ATOM converter is done, I’ll float a larger planet-type thing.

    I’ll post the software then, too. 🙂

  4. alecm
    re: Atom Heart Mother (beta) – More, Larger Feeds for blogs.sun.com

    <<< Whether or not BSC chooses to support time-parameterized feeds, I believe the underlying Roller software should be made to support such a thing. >>>

    I would like that. It struck me when writing this that some standard for parameterisation permitting the same sort of NNTP-like “i last visited you since <TIMESTAMP>, what have you received since then” thing would be the optimal course, but that requires – as you say – evolution of server and client to adopt new standards.

    AHM at least properly implements Last-Modified checking, since the static files created are unchanged unless content has been updated.

  5. Dave Walker
    re: Atom Heart Mother (beta) – More, Larger Feeds for blogs.sun.com

    You die-hard Floydie, you :-).

    More seriously, this is precisely the world-view that the TiVO generation will begin to move <i>from</i>, as VoD starts to roll-out properly. Never mind “blog time-shifting”, give us “Blog On Demand” :-).

  6. Dave Walker
    re: Atom Heart Mother (beta) – More, Larger Feeds for blogs.sun.com

    In fact, your “ultra-smart parameterisation” thought above is, indeed, the Ideal Way To Do It.

  7. Chris Samuel
    re: Atom Heart Mother (beta) – More, Larger Feeds for blogs.sun.com

    WordPress’s RSS feed stuff also implements support for If-Modified-Since:, so it should be trivial to implement elsewhere (as they will probably already knows the dates things were last posted).

  8. Graham
    re: Atom Heart Mother (beta) – More, Larger Feeds for blogs.sun.com

    and the moral of this story, children, is never argue with a muffett…

    Great story, worthy of Dilbert

  9. Dogsbody
    re: Atom Heart Mother (beta) – More, Larger Feeds for blogs.sun.com

    Sorry, I’m with the blog engineering team in this one. Just ping the feed more often. Personally, I use Google Reader as they will ping a feed as often as required (more for busy sites, less for quiet ones) and as it’s all done behind the scenes and online it doesn’t matter if I only read my feeds once a week :-p

    I think you have lost the idea of what these feeds are for, it’s not just for blogs you know 😉

Leave a Reply

Your email address will not be published. Required fields are marked *