My next little side project is going to involve parsing feeds. I'm tired of wading through hideous newspaper.coms trying to find a certain story, or stories about a certain area, without having to avoid national news I've read elsewhere, or bits about towns I'll never visit.
Andrew Meyer has been having the same problem:
When I visit PressDemocrat.com, I go for one thing: Sonoma County news. Someone in Mendocino County might visit the site for Mendo County news, which is great, but not the reason I visit. Ok, with that said, how do I locate Sonoma County news on PressDemocrat.com. Ahh... herein lies the problem. Local news granularity is sorely missing on the site.
When scrolling down PressDemocrat.com's frontpage, you won't find sections for "Santa Rosa news" or "Windsor news"
I'm in the same boat. I live in downtown Santa Rosa, and Windsor, sorry folks, isn't at the top of my reading list (I drove through it once, nice looking place). Nor is Mendocino County, or Petaluma, or Napa, really. And chances are, people in Mendocino and Petaluma and Napa and Windsor care a lot more about what is going on in those places--and just those places--than they do about what happens in my front yard.
(I should note here that I don't actively ignore places I don't live, but I don't follow what happens two towns over with the zeal that I watch what happens close to home, or in China or Washington, DC, for that matter. This isn't an argument in favor of insularity--I won't make that one--but simply a need for better filters.)
So, assuming the news organizations of the Bay Area don't decide to rebuild their sites with granular news feeds (like Spokane just did), how can I get a feed of just a few places that I really want to follow? Looks like I'm going to have to build my own feed.
How do we do that?
Start with a few basic ingredients:
- A news-type site with an RSS feed
- Yahoo Pipes
- A feed reader (I like Google Reader)
Here's a simple result:
I picked up the Press Democrat's local news feed, then fed the 50 most recent items through a Pipes filter, allowing only stories that contained "Santa Rosa" in the headline or lede paragraph. We end up with four stories. Not quite what I'd hoped.
Part of the trouble is the PD's feed: It only gives out an excerpt of each story, usually the first sentence or two. To my knowledge, only the Guardian (UK) offers a full RSS feed.
What I'd really like to do (assuming, of course, that newspapers around the Bay don't decide to be more like the Guardian or the Spokesman-Review) is follow each link, look at the text, and grab those that have Santa Rosa (or any location) anywhere in the main body of the story. Something like Open Calais would be sweet.
Anybody know how to do that?