Finding a local news feed

Posted Saturday, December 20, 2008 at 4:20 a.m. by Chris Amico in News , Projects and Self-Indulgence about journalism, RSS and Santa Rosa

My next little side project is going to involve parsing feeds. I'm tired of wading through hideous newspaper.coms trying to find a certain story, or stories about a certain area, without having to avoid national news I've read elsewhere, or bits about towns I'll never visit.

Andrew Meyer has been having the same problem:

When I visit PressDemocrat.com, I go for one thing: Sonoma County news. Someone in Mendocino County might visit the site for Mendo County news, which is great, but not the reason I visit. Ok, with that said, how do I locate Sonoma County news on PressDemocrat.com. Ahh... herein lies the problem. Local news granularity is sorely missing on the site.

When scrolling down PressDemocrat.com's frontpage, you won't find sections for "Santa Rosa news" or "Windsor news"

I'm in the same boat. I live in downtown Santa Rosa, and Windsor, sorry folks, isn't at the top of my reading list (I drove through it once, nice looking place). Nor is Mendocino County, or Petaluma, or Napa, really. And chances are, people in Mendocino and Petaluma and Napa and Windsor care a lot more about what is going on in those places--and just those places--than they do about what happens in my front yard.

(I should note here that I don't actively ignore places I don't live, but I don't follow what happens two towns over with the zeal that I watch what happens close to home, or in China or Washington, DC, for that matter. This isn't an argument in favor of insularity--I won't make that one--but simply a need for better filters.)

So, assuming the news organizations of the Bay Area don't decide to rebuild their sites with granular news feeds (like Spokane just did), how can I get a feed of just a few places that I really want to follow? Looks like I'm going to have to build my own feed.

How do we do that?

Start with a few basic ingredients:

  1. A news-type site with an RSS feed
  2. Yahoo Pipes
  3. Patience
  4. A feed reader (I like Google Reader)

Here's a simple result:

I picked up the Press Democrat's local news feed, then fed the 50 most recent items through a Pipes filter, allowing only stories that contained "Santa Rosa" in the headline or lede paragraph. We end up with four stories. Not quite what I'd hoped.

I know this misses a lot. It missed, for instance, two stories about Mark Felt, better known as Deep Throat. He died yesterday in Santa Rosa.

Part of the trouble is the PD's feed: It only gives out an excerpt of each story, usually the first sentence or two. To my knowledge, only the Guardian (UK) offers a full RSS feed.

What I'd really like to do (assuming, of course, that newspapers around the Bay don't decide to be more like the Guardian or the Spokesman-Review) is follow each link, look at the text, and grab those that have Santa Rosa (or any location) anywhere in the main body of the story. Something like Open Calais would be sweet.

Anybody know how to do that?



Comments:

dec 20, 2008 at 12:03 a.m. // Rob Knight said:

I've wanted this in Santa Cruz too. Because our local paper isn't so good, I'm looking in the other direction. Mainly the ability to pull relevant content from elsewhere on the web (Google News, Blogs, etc.) and filter out duplicates. Essentially, I think we're talking about some form of curation that doesn't require constant attention. Something similar to a spam filter, learning over time what content should come through and what shouldn't.

That is definitely the next layer we'll put on the web. It'll be the cheese cloth of local content.

dec 20, 2008 at 12:48 p.m. // Chris said:

Ha. "The cheese cloth of local content." I like that.

Eventually, I want to make a bunch of feeds each giving me one specific area from one specific source. This is just the first one.

Santa Cruz definitely has the same issue. I don't know of any newspaper that just covers one town anymore. The Sentinel covers Santa Cruz, Scotts Valley, Capitola, Soquel and sometimes ventures down into Watsonville.

I think a big aggregation of all local content--from blogs, TV, newspapers, whatever--is the way to go. If I can get some infrastructure built, maybe we can all pitch in on this.

dec 20, 2008 at 3:58 p.m. // Andrew said:

Let me know when you get that Santa Rosa feed figured out. Does it update as fast as PressDemocrat.com's local news feed or is the Santa Rosa-only parsing a bit delayed?

dec 20, 2008 at 5:13 p.m. // Chris said:

@Andrew,

I haven't timed it, but I'd assume there's at least a slight delay, since Pipes needs to pick up the feed, filter it, then retransmit. Not sure if it's noticeable, if there is indeed a delay.

I'm going to do a few more of these (probably for the East Bay) and see how it handles.

dec 21, 2008 at 1:02 p.m. // Adam Skory said:

I don't know about yahoo pipes (I tried it once and found it's visual interface unintuitive and annoying), but you could certainly write your own script to follow links. I once had a script to do that exact thing for following links to download entire articles plain text so I could load them onto an electronic dictionary and read them later. Python has http, RSS and html modules to make it very easy for you.

dec 23, 2008 at 1:13 p.m. // Chris said:

@Adam

I remember that script, and reading the Economist on that sketchy little dictionary on the way to Qingdao. That's exactly what I want to do (again, assuming newspapers don't do it for me).

I may hit you up for coding help to make it happen.

Comments are closed for this post. If you still have something to say, please email me.