Tuesday, October 23, 2007

New ways to think about archives

News librarians/archivists/researchers should be taking a good look at the work Dave Winer's been doing with archives. This may change everything. I've been intrigued by it since he started and as he develops it I'm thinking about it more and more, but still haven't decided what to make of it all.

First, there was River of News, a way to display headlines from the New York Times' (and other news sources') RSS feed so they'd be easy to read and get to from a mobile phone or other device. It works great on a regular computer, too, as a simple feed reader.

Now Winer's taken a look at the metadata that comes with every online NYTimes story, and come up with the Times River Outline, which organizes the Times' news by category, or keyword. He was inspired partly by after talking to tech folks at the Times. In a posting explaining the Outline, he called it 'Techmeme for the NY Times'. Here's an earlier Winer posting, Something New in News.

These things look deceptively simple. But there's something going on here. Who chooses the keywords? Are librarians involved? Should they be? How can this be integrated into existing archive searches, or should it be?

I'm glad to find that Dan Gillmor is thinking about this: Bringing the New York Times' Cornucopia to All.
Dave Winer has been exploring a superb news resource, exploring the depth and breadth of the New York Times‘ data-stream. The most traditional of news organizations is opening up, including its archives,in ways that could be truly revolutionary in the news business — and Dave is leading the way toward a new way of seeing a core part of our history and current knowledge.

And, of course, Doc was thinking about this, too, urging news organizations to Jump in the River.

(Added later:) In the New York Times' Open code blog, Jacob Harris posts Messing Around With Metadata, a good look at the data attached to every online Times story, and considers questions about how to keyword. These are questions every news librarian has dealt with for years, things like:
* Disambiguation — Is this story about Ford the president or Ford the automotive company?
* Summarization — This article might quote Nancy Pelosi, but it’s really just an article about President Bush, isn’t it?
* Normalization — The text of one story may use “The United States,” while another says “U.S.” Can we label both with the “United States of America” geographic label?
* Taxonomies — One story may be about Global Warming and another on Pollution; can we label both of them as being subcategories for Environment?
Harris also invites hackers to come up with even more interesting things to do with the story data.

Labels: ,


Post a Comment

<< Home