August 25, 2004

For all the endless (endless, endless) stink that has been raised about syndication protocols, very little mention has been made of the applications that actually use them. Which is too bad, because each and every feed reader currently available sucks. While you nincompoops have been arguing about diesel versus gasoline, the car has been put up on blocks and is shot through with rust. Maybe next you can fight about aerobic versus anaerobic glycolysis during the production of pornography, instead of just watching the movie like the rest of us.

Feed readers -- for those of you who still think that Web browsers are a good way to browse the Web -- are programs that eat data produced in syndication formats like RSS and Atom and excrete that information in a way that's easy for human beings to deal with. More sensibly thought of as malnourished micro-content clients, feed readers manage such handy tasks as only retrieving new information and presenting it all via a standard interface. They, essentially, remove a lot of the tedium from trying to stay up-to-date on the Web.

But the problem with feed readers has nothing to do with their basic functionality or their underlying protocols. It has everything to do with their author's utter failure of imagination. The only thing that feed readers have managed to this point is put a pretty face on the raw data they receive. Golly. Thanks. Tell that 1973 green-screen sitting next to you I said hello.

Feed readers have at their disposal near infinite processing power, well-differentiated and -defined data and... do nothing with them. You can sort your feed items by date. Exciting!

Where are the extrapolations, based on the data? Where is Bayesian filtering? Why isn't there auto-correlation between like items? Why isn't there sorting by link popularity? Or inter-linking between feeds? Why can't I rank feeds or categories higher than others? Why can't I rate items and let the cumulative ratings over time determine feed rankings? Why isn't there some statistical combination of each of the above to put what I'm actually going to care about at the top of the list and the discussions about which syndication protocol is best at the bottom? Why isn't there an archive, to throw useful-but-read items somewhere other than the to-do list and trash? Why can't I synchronize state information to a server, so I can read feeds at home without having to re-read them the next morning at work? (Dear BlogLines users: shut up. Web apps suck.) Why can't I automatically delete any item which references the same links as the current item? Why is the desire for any of this a surprise?

Why? Because so far only programmers have written feed readers. Actual humans are no where to be found.

The linear programmer mindset has invaded the user model -- something that never happens -- and as a result feed readers haven't evolved beyond a way to more efficiently plow through all available information. Of course you can't skip anything! That would be non-linear!

But unless you're some anal-retentive Asperger's Syndrome poster child, you're only going to care about a sub-set of the endless sea of crap out there, and the software should help you filter it down to manageable, crunchy chunks. Google has an order of magnitude less information about their data than feed readers, and manage to do an order of magnitude more with it. But they had the brains to apply a little creativity to the problem. And where did that get them?

Until feed readers -- and the programmers who write feed readers -- actually start to live in the same world as people who have better things to do, they will forever remain niche software, existing just to allow info-gluttons to endlessly gorge themselves without regard to quality, taste or what the final, messy result is going to be.

[As a half-assed postscript to this rant, I'm as surprised as anyone to find that there is working being done. 0xdecafbad, for instance, has not only anticipated many of the above complaints, but adds the idea of feed trail periods and actual code.]

Tony added:

Then again, those of us who aren't foaming at the mouth for extrapolations think that basic feed readers rule.

If you'd spend, say, 4x's the time to develop this super-reader you're envisioning that you spent on this rant you'd be a billionaire this time next year.

I love you.

Seth added:

Nothing's wrong with aggregators. Because I don't want to spend all day rating and categorizing headlines. I want to quickly scan them for interesting things without missing anything a trained computer mistakenly believed I wouldn't care about.
Lastly, an aggregator is perfect as a web app because once you see an interesting headline, you use a browser to surf to the site in order to read it. So you shut up!
I love you too.

Right on!

Jeff Minard added:

"Because I don't want to spend all day rating and categorizing headlines."

Like it would be hard for a programmer to extrapolate a rating based on how long you sat and read a headline (or it's content), whether you clicked the "display more" inside the reader, or whether you clicked the "read on site" link. This kind of data, the frequency you read that feed, would be easy to create rating from with out your lazy ass ever needing to touch a 1/2/3/4/5 star button.

In addition, a smart reader could scan headlines that you touch, read, full read for keywords and build a list of frequently apealing words and phrases that will automatically add to certain feed/article "ratings" -- again, without your lazy rear touching the "I dis/like this" button.

These kind of thing *are not hard*.

Seth added:

You're right, Jeff, these things could easily be done without any interaction with me. I'm hoping you wouldn't mind describing your aggregator usage briefly. How many subscriptions, how often you read them. How often you subscribe to new headlines or unsubscribe from old ones. That sort of thing.
(nice with the digiorno truck pic, btw :-)

Chris Moritz added:

I completely agree. I'm waiting to see how NetNewsWire 2.0 shapes up.

At the very least, there's plenty of competition in this area. This should lead to the innovations you're talking about. I hope so, 'cuz they sound great.

Jeff: Are aggregators for Just Plain Folks, or are they for Geeky Infovores? You can't seem to decide.

If the former, then 95% of what you've suggested is unnecessary, resource-hogging tech for the sake of tech. JPFs don't want to subscribe to a tremendous number of feeds... they have focused interests, and they have absolute control over the data that's put in front of them. If a source is providing uninteresting material, they can unsub with ease. Show me someone who prefers Bayesian filtering and predictive ranking to simply eliminating the influx of junk altogether, and I'll show you someone who ain't JPF.

And as for "not hard"... we all anxiously await your shipping code.

Jonny added:

You are correct, Sir.

However, I offer hope: If you're running on Mac (OS 10.3), you may find relief with FeedTickler: The RSS Feed Ticker.

Its a platform for automatic scrolling and scanning of RSS feeds. It features interactive scrolling, filtering, layout wizard and many others

For a full listing of features, visit here: http://feedtickler.com/dev/Features.html

Jeff added:

@seth: My rss usage is pretty simple. I use the sage plugin for FireFox because I don't need to launch another program to get such simple functionality. The only thing sage doesn't do that other aggregators do is give me a "Number of unread headlines" in the list view. I read about 40 feeds right now on a daily basis.

@Roger: "resource-hogging tech" - Doom3 is resource hogging tech, parsing text strings is 1960's computing tech. My watch could do the kind of things that we're talking about. I mean, take a look at the new "SpotLight" feature in 10.4 - most reviews said there is little to no impact on performace, and SpotLight does a good deal more filtering than the kind of processes we're discussing.

As for whether "JPF"'s would care for these things or not:

If these kinds of predictive filters were transparent and on by default, everyone would like it. Don't believe me? Take a look at a company that just IPO'd at rediculously high share prices. Google's main business is built around a form of "predictive ranking to simply eliminating the influx of junk" - and it took over the market with this "tech for the sake of tech" feature in something like 3 years. Your assertation that predictive filtering is useless for JPF's is bunk.

As for my "shipping code", won't happen soon - only languages I know are web based ones. Regarless, scanning strings, building datasets, and creating timers to track movement in a program is nothing revolutionary.

Pete Prodoehl added:

I spit out a bunch of ideas in Yet More Aggregator Madness:

http://rasterweb.net/raster/200407.html#07162004073000

Some of these things I've implemented, and some I haven't (yet.) But I think it would be good to keep the conversation going and see where things go...

Tom Coates added:

Dude, are you cross about EVERYTHING?

egoClip's a Windows RSS aggregator that tries to meet your first request.

It notices the news items you show interest in and uses that information to sort the rest of your news.

It's at http://egofile.com/egoclip/

Jorunn added:

What you're asking for sounds a lot like attention.xml, presented in an interesting (although too long) edition of the Gillmor Gang.

http://www.itconversations.com/shows/detail190.html

Srijith added:

Some time ago, I had tried my hand at coding a feed reader that does a lot of things you wanted - rank feeds, rank categories, use these ratings to determine item level ranking etc. The sourcode for Intelli-Aggieis available under GPL, so if someone wants to take a look, they are more than welcome.
(http://www.srijith.net/codes/index.shtml#ia)

I also ranted a bit on why Bayesian filtering may not work on feeds.
http://www.srijith.net/trinetre/archives/2003/08/11.shtml#000373

Reed added:

Great ideas.

Why don't you go implement them then?

arnie added:

Yeah right.

I'd like a program that makes me coffee in the morning, turns up at work and does my job without error (and without asking me for pay), cooks me a perfect meal, pours me a martini in the evening, and tucks me up in bed at night. Oh, perhaps it could also help me get a life.

anu added:

Hmm...don't remember who said it, but ideas are cheap, implementation [execution] is everything...if you can't write code, then howsabout properly speccing out your ideas, and how, precisely, you'd like them to work ?

An aggregator + Findory Blogory may also do some of what you're after.

Jeff: "My watch could do the kind of things that we're talking about."

Yet you will not find a watch doing so. If you want an example of how difficult this kind of thing actually is, check out any fairly sophisticated Usenet tool... they're the direct ancestors of feed readers, after all. Once you get a few hundred thousand entries in a database and start trying to do interesting things with them, you'll quickly discover that your RAM is being swallowed whole and your CPU utilization is edging ever-higher.

"Doom3 is resource hogging tech..."

Doom3 takes over the machine. People expect an aggregator to multitask happily, usually running in the background throughout the day. That puts resource consumption at an even higher premium.

"Google's main business is built around a form of 'predictive ranking to simply eliminating the influx of junk'..."

Again, Google's techniques are necessary because individual users can't do their own filtering. If someone spams Google, I can't simply unsubscribe from that spammer's output... it will keep showing up in my results unless Google takes steps.

This is not the case with an aggregator. I control the influx of content. If someone is injecting junk I don't like, I turn him off with a flick of the mouse. And nothing he does can force me to resub. He's effectively filtered for good.

What we need far more than AI on the client side is better publishing practice on the server side. Blogs, news sites, and pretty much everything else should be providing categorized feeds. If I routinely post about politics, sports, and movies, people should be able to subscribe to individual, self-filtered feeds for those topics.

Greg Linden added:

Where are the extrapolations, based on the data? Where is Bayesian filtering? Why isn’t there auto-correlation between like items?
----------
Take a look at Blogory http://blogory.com

It learns from the weblog articles you read and helps you find other related weblog articles.

Jeff Minard added:

Roger: "Once you get a few hundred thousand entries in a database "

Then perhaps keeping a list of thousands of RSS entries around isn't the best way to do that. Keeping a running tab and data set would be a very acceptable solution. The kind of filtering that we're discussing is not intrinsicly difficult. In addition, doing so your RAM would not be "swallowed whole"


Roger: "People expect an aggregator to multitask happily, usually running in the background throughout the day."

Which could be done as well of any kind of filtering. Spread it over time -- loops/timers are our friends.


Roger: "Again, Google's techniques are necessary because individual users can't do their own filtering."

No, users can do their own filtering - holding the dataset of billions of webpages and the computing power to cross reference all of them and then return results in real time to millions of querying users at once is what a average joe can not do. Cross referencing click habits for a single user against a keyword list of (even if you and infomaniac) 500 RSS feeds is not going to kill a modern (and I mean p3 800) machine.


Roger: "I turn him off with a flick of the mouse. And nothing he does can force me to resub."

The problem I find with this is that people are fickle. They may post something brilliant and relevant to my interests, and the next day they could post rubbish. The idea you set forth that either a person produces either solely good or bad work is a farse -- and basing an arguement on it is shakey.


Roger: "What we need far more than AI on the client side is better publishing practice on the server side."

The simplest response to this is: "ROFLMAO"

Are you kidding? How, exactly, do you propose that be done? Going back to my point above, the internet is filled with all kinds of bad articles and things you *don't* want to read just as much as things you *do* want to read - and it's usually mixed in together.

In addition, filtering on keywords and user input timings is far, far, far away from AI.


Roger: "everything else should be providing categorized feeds"

Once more, category specific feeds won't fix that either - a person can still write great articles on "PHP Programming" and poor ones as well.

Mark Crocker added:

As for synchronizing between home/work or whereever, ever heard of Newsgator Online Services?

It syncs up using my Outlook at work, using the main Newgator program, then at home, using Thunderbird (or whatever email client I choose) it downloads posts as email messages.

http://services.newsgator.com

Zephyr added:

Is this a bd time to ask for a Glassdog feed?
;)

Tom Cialis added:

Linux, BSD and other Unix flavors had a slow start with RSS readers. But they've been catching up with this content distribution technology that lets you subscribe to your favorite sites like to an email newsletter, but without all the spam.

ping.od added:

Bollocks.

Everybody's waiting for the next MS installment of FeedLook(tm) or whatever.

True, every feedreader I've come across sucks big time. Maybe Newzie or FeedDemon classify as ok. But the point remains: "Does my mother need a feedreader?"

Diana added:

fjajsfk

Jokum added:

Of all the whiners out there, you top the charts.. Make your own damn RSS reader instead of putting all your energy into making a colorful blog with rants.

Post a comment

(If you haven't left a comment here before — or even if you have — your comment may need to be approved before it appears online. Until then, it won't appear on this post. Thanks for waiting!)






Posted on: 02:03PM | 26 Comments | Permalink

G L A S S D O G