I spent time this week at O’Reilly’s Web 2.0 Summit in San Francisco, and led a panel discussion on the topic of “Open Data.” Thanks very much to Tim O’Reilly and John Battelle for asking me to organize the panel. Stewart Butterfield from Flickr/Yahoo and Christine Herron from Omidyar Network (pictured with me in the photo) agreed to join the panel — thanks much to both of them for making it a great and well-rounded discussion.
Open data as a topic ran through the conference sessions — it seems to be on people’s minds right now. Thomas Claburn wrote up a good article on TechWeb about the session, and conference speakers Ben Trott from Six Apart and Eric Schmidt from Google, among others, mentioned the topic explicitly on stage. Whenever people talk about the new wave of web applications like Flickr and del.icio.us, the idea of users contributing their data to a pool of information on a site — photos on Flickr, bookmarks on del.icio.us, and so on — always comes up. Open data is about the next step — what then? What happens to my information once I share it on a web site, and what can I do to control it?
On the panel, we talked about three topics in the general area of open data:
- Open data sets – collections of data that are available for public reuse, the best-known being Wikipedia, though we talked more about OpenStreetMap during the panel.
- Open data protocols – ways for people to get access to data from a site, either through open APIs (the Flickr API being a great example) or microformats.
- Open data licenses or policies – the Creative Commons licenses being the best known and most successful to date, but see also the Talis Community License.
During the panel, I gave an example of a great open data policy from Google, one that I think is worth emulating. Back in 1994, someone named Andy Woodward posted a Usenet newsgroup message asking, “Are there any FTP, Gopher or WWW sites out there which archive Usenet news?” He got a rather short-sighted and foolish — some might say stupid — answer:
The general answer to your question is no, there is no one site that perpetually archives all of Usenet news for public access. The size of such an archive would make its maintainence unworkable.
I feel comfortable calling that answer stupid since the person who gave it was me! You can see how stupid it is by visiting the thread on Google Groups, which is “a site that perpetually archives all of Usenet news for public access.” So much for unworkable….
Fortunately I’ve gone on to have enough of my other ideas proven right that I’m okay to have this stupidity available on the net — I’m even happy to make fun of myself about it. But let’s say I were applying for jobs and didn’t want employers to be able to see me denying the possibility of one of Google’s now-successful products, or let’s say I had posted even more embarassing things on Usenet (as plenty of people did) back in 1994. What could I do? 1994 was one year before DejaNews, the first company to archive Usenet, was even founded; and it was four years before Google, which later bought DejaNews, was incorporated. So one company that didn’t exist when I wrote that post bought another company that didn’t exist when I wrote that post, and as a result, my silliness is available on Google forever?
Google has taken what I consider to be a great stance on this problem, and has more than lived up to their “Don’t be evil” philosophy. They provide a page that explains how to remove your own posts, and they even provide a three-step removal tool to let you remove a post you’ve made. This is great, since it gives people control over the data they create. I’d love to know from Google how many posts have been removed by how many people — that would be interesting data about the demand for policies like this. We can see today that it is a common request, though, by looking at their Top 5 Groups Help Questions, which include:
- #2. How do I remove my own posts?
- #4. I don’t want you to archive my articles! How can I keep my messages from being archived on Google Groups?
People want control over their own information.
Of course this matters a lot to us at Wesabe — we know that a personal finance web site couldn’t possibly exist without us taking a very strong stand, and we want to be leaders in setting the right approach, as we think Google has in the Groups case above. So, at the panel I announced our “Data Bill of Rights,” which are the promises we make to everyone in the Wesabe community:
- You can export and/or delete your data from Wesabe whenever you want.
- Your data is your data, not ours. Our job is to help you understand and act on your data.
- We’ll keep all of your data online and accessible for as long as you have an account. No “archive access” charges.
- Any data you want us to keep private, we will.
- If a question comes up not covered by these rights, we will answer it remembering that your data belongs to you.
We’ve already implemented most of these as features on our product — we still need to add a good export format, which we’re choosing now, and I imagine that people will want more privacy controls as the product gets more mature. The intent of this list, though, is to lay out our intentions and the promises we make publicly.
I’d love feedback on these ideas. What’s missing? What needs to be more clear? Of course there are aspects of our product that will be open but are not explicitly mentioned yet — for instance, we will have an open API, and we do plan to open source some of our code after a short time — but we wanted this document to lay down the ideas we felt were most important. If you have feedback, please let me know in the comments.