April 8, 2008 7:42 PM PDT

Open-sourcing factual data, Wikipedia style

Bret Taylor, formerly of Google and now of FriendFeed, has a greater appreciation for the business development function. In a post today he wrote about the challenges of getting legal access to factual data--such as mapping, stock quotes, white pages, TV schedules, movie show times, and sports scores--for use in applications.

If you want to experiment with a new driving directions algorithm, it is infinitely more difficult than coming up with an algorithm; you have to hire a lawyer and a sign a contract with a company that collects that data in the country you are developing for.

Bret Taylor: Free the data

He adds that some of the data has quality problems or is incomplete. In sum, Taylor believes that innovation is stymied and the barrier to entry is raised in the current environment. It's not just the need for lawyers and contracts but also the issue of companies that sell data restricting use.

What the solution to freeing up the data? Taylor advocates open-sourcing factual data, and competing on use of the data, not access to it. He wrote:

To this end, I think we should create a Wikipedia for data: a global database for all of these important data sources to which we all contribute and that anyone can use. When a user reports an inaccurate phone number in your products, save it back to the DataWiki so everyone can benefit, and in return, you get everyone else's improvements as well. If your local movie theater doesn't have listings data in DataWiki, you can type it in yourself, and everyone in your town can benefit, and all the products you use that access movie listings will automatically update. Need better mapping data for a city? Pay to collect it, and upload it to the DataWiki. In return you get all the other cities other companies paid for (sort of like a company contributing device drivers to the Linux kernel).

For centuries, companies have made money in exchange for doing the busy work of collecting, massaging, and publishing factual data. The same was true for encyclopedia data until recently. Taylor is definitely onto something, but it presents some real data collection challenges. The open-source community is sure to take up the challenge.

The question is, will the companies that already have the data be of assistance? It's not exactly in their best financial interest to give away their content, but the example of Wikipedia should give them the incentive to press the pause button.

See also: Sarah Perez discusses where to find open data on the Web, such as CKAN (Comprehensive Knowledge Archive Network), OpenStreetMap and Freebase.

Recent posts from Outside the Lines
Deconstructing Wikipedia at the Berkman Center
The Internet thrives on dark energy
Comcast goes social with Plaxo acquisition
Marc Andreessen dings Google's Friend Connect
Google Friend Connect: The movie
Add a Comment (Log in or register) 6 comments (Page 1 of 1)
by BradPatrick April 9, 2008 7:12 AM PDT
You might want to check out www.freebase.com (alpha). That is the project that is closest to what Bret is wanting to develop. It is in its infancy, but shares the same OS philosophy and model.
Reply to this comment
by napm1971 April 9, 2008 10:01 AM PDT
Oh there's a lot to say here... I feel a blog post coming on! In shorthand, though, it's worth checking out the work around appropriate - open - licensing for this data at opendatacommons.org/ I also have a paper at the World Wide Web conference later this week, which digs into the licensing and economic issues a little further... http://events.linkeddata.org/ldow2008/papers/08-miller-styles-open-data-commons.pdf See also http://blogs.talis.com/nodalities/2007/12/licensing_open_data_creative_c.php for some history of this collaboration between ourselves at Talis, the Science Commons project of Creative Commons, and a pair of very smart lawyers; Jordan Hatcher and Charlotte Waelde.
Reply to this comment
by yum8yuk April 9, 2008 10:04 AM PDT
The man has very nice intentions. But i think there is 1 in a billion that the wiki will force data hogs to give up their hard earned data gathering for free. I know i will never do that. Infact wouldn't many of these data companies go out of business if they gave their data for free? Its no mystery that data gathering is a billion dollar business. If you are trying to update a phone number this may be ok. I wish you Luck brother.
Reply to this comment
by krosavcheg April 9, 2008 2:50 PM PDT
I suggest checking out http://www.numberzoom.com/ which is a wiki of user contributed phone listings for unknown caller IDs. I saw an article on the company in the nytimes.
Reply to this comment
by walwebster April 10, 2008 1:02 AM PDT
I've just spent a few years designing, building and populating some private databases containing the names and certain other relevant details about many thousands of people with certain attributes in common. Now tell me again why I want to give them away for nothing? Other than because you might prefer not to pay for them, that is. It's all public-domain information, after all -- in a free market, you're welcome to get that same data from anyone else who'll put in the same work, over the same length of time, and then sell their output to you at a more attractive price than mine. (I wouldn't be holding my breath till that happens, though ...) Issues of data quality and completeness tend to reflect the old adage that "you get what you pay for".
Reply to this comment
by thekohser April 10, 2008 6:23 AM PDT
The responsibility of correcting incorrect data about a person (birthdate, current employer, marital status, etc.) or a private enterprise (P/E ratio, movie times, hours of operation) are the responsibility of the PERSON or the ENTERPRISE. It is they who fail to optimize their gains by allowing incorrect data into the marketplace, so it is they who should be concerned about correcting it. Not "volunteer" data geeks, and especially not vandals from Ralph's Pizza who might want to change the hours of operation at Joe's Pizza to "closed Saturdays and Sundays". Please. Sound, reliable, accurate databases are built on the self-interest of those whose data is represented and on the reputation of the agent who is assembling said data. The model described above isn't going to work.
Reply to this comment
Powered by Jive Software
advertisement
Click Here
  • About Outside the Lines

  • Dan Farber is the editor in chief of CNET News. He has covered technology for more than two decades, and previously served as editor in chief of ZDNet, PCWeek and Macweek. Outside the Lines explores the intersection of business and technology.

Add this feed to your online news reader
Google
Yahoo
MSN

Most popular stories

  1. CBS to buy CNET Networks

  2. Images: Microsoft telescope puts universe on your desktop

  3. Intel Germany executive reportedly confirms Atom-based iPhone

  4. Xbox 360 hits 10 million sold in U.S.

  5. Photos: Microsoft previews 2008 Xbox games

Latest tech news headlines

Featured blogs

Beyond Binary by Ina Fried

Coop's Corner by Charles Cooper

Defense in Depth by Robert Vamosi

Geek Gestalt by Daniel Terdiman

Green Tech

One More Thing by Tom Krazit

The Iconoclast by Declan McCullagh

The Social by Caroline McCarthy

Underexposed by Stephen Shankland

Resource center from News.com sponsors

advertisement
On TechRepublic: 10 ways users mess up their computers
Advanced
search
Advanced
search
Visit other CNET Networks sites: