Archive for November, 2008

On data ownership in a networked world

Every piece of content has a creator and owner (in this post, I will assume they are by default the same entity). I do not mean ownership in the traditional sense of, e.g., stashing a piece of paper in a drawer, but in the metaphysical sense that each artifact is forever associated with one or more “creators.”

This is certainly true of the end-products of intellectual labor, such as the article you are reading. However, it is also true of more mundane things, such as checkbook register entries or credit card activity. Whenever you pay a bill or purchase an item, you implicitly “create” a piece of content: the associated entry in your statement.  This has two immediately identifiable “creators”: the payer (you) and the payee.  The same is true for, e.g., your email, your IM chats, your web searches, etc. Interesting tidbit: over 20% of search terms entered daily in Google are new, which would imply roughly 20 million new pieces of content per day, or over 7 billion (over twice the earth’s population) per year—all this from just one activity on one website.

When I spend a few weeks working on, say, a research paper, I have certain expectations and demands about my rights as a “creator.” However, I give almost no thought to my rights on the trail of droppings (digital or otherwise) that I “create” each day, by searching the web, filling up the gas tank, getting coffee, going through a toll booth, swiping my badge, and so on.  However, with the increasing ease of data collection and distribution in digital form, we should re-think our attitudes towards “authorship”.

Unique identity

People call me “Spiros”, my identity documents list me as “Spyridon Papadimitriou” and on most online sites I’m registered as spapadim.  However, sometimes I’m s_papadim or spiros_papadimitriou, and so on.  Like most people, I lost track of all my accounts a time ago.  Vice versa, I’m not the only “Spiros Papadimitriou” in the real world.  For example, I occasionally get confused with my cousin, and receive comments about my interesting architectural designs!  Nor am I the only spapadim on the net.

A framework and mechanisms that allow (but do not enforce) asserting and verifying which of those labels (i.e., names, userids, etc) refer to the same entity (i.e., me) is missing. However, this is a prerequisite: how can we talk about data ownership and tackle portability, transparency and accountability, if we have to jump through countless hoops just to prove identity?

Some people, especially in the US, may object or even outright panic at the thought of such a global identifier.  In Greece, and in much of Europe, we’ve had national identity cards for decades.  Which is fine, as long as you know they exist and what are permissible uses-in other words, as long as transparency is ensured.  Furthermore, the illusion of privacy should not be confused with privacy itself—if in doubt, I suggest reading “Database Nation” (official site).  Its examples are largely US-centric, but the lessons are not.

OpenID (despite some shortcomings) and OAuth are emerging as open standards for authentication and authorization.  OpenID allows reuse of authentication credentials from one site on others: I can reuse, say, my Google username and password to log in to other sites (e.g., to leave a comment on this blog), without having to create yet another account from scratch.  OAuth resembles Kerberos’s ticket granting service but for the web, permitting other web services to ask for access to a subset of personal information: I could allow Facebook to access only my Google addressbook and not, potentially, all of my data on any Google service.  OpenID and OAuth can, at least in principle, work together.

Both high-profile individual developers and major companies are involved in these efforts.  For example, Yahoo! already supports OpenID and plans to support OAuth as well, while Google supports OAuth directly and OpenID indirectly in various ways.  Wide adoption of these standards would be a major step forwards for data portability and web interoperability.  However, I suspect they fall slightly short of providing a truly permanent and global personal identity.  What if, for any reason, my Yahoo! account disappears, either because I decided to shut it down or because Yahoo! went bust?

I was going to suggest a DNS-based solution and I was surprised when I found that the generic top-level domain .name has been instituted since 2001 to provide URIs for personal identities. You can register for a free three-month trial on FreeYourID (after that, it’s $11/year). What’s more, their service already provides OpenID authentication. In principle, this should allow easy switching of authentication and authorization service providers. Just as I can still keep the “label” for this site even if I move to a different web host, I can still keep my personal “label” no matter who I choose to manage my personal information.  So, now my universal username is spiros.papadimitriou.name, any emails sent to spiros@papadimitriou.name will find their way to me, you can call me on Skype using spiros.papadimitriou.name/call, and so on.

With such a unique identity tied to authorization and authentication services, the Giant Global Graph and its materializations would be one step closer to becoming really useful. If I want to use my identity to log and controll access to my data, I should be able to prove my claims.  Currently, FOAF and XFN allow assertion of relationshipt but provide no way to verify them.

Data portability

The point of this mental exercise so far is the following: A unique identity that can be verifiably associated with each and every data item that I produce is a prerequisite for making data ownership claims. Subsequently, we need to ask what fundamental rights should be associated with data ownership.  The first is the right to keep my information with me or, in other words, “data portability”. Just as I can freely move my money from one financial institution to another, I should be able to move any of my information from one data warehouse to another.

For example, consider my web search history. I don’t think I need to argue about the importance of historical information to improve search quality. If I decide for any reason to move to another search provider, I should be able to carry along all the information that’s directly associated with me.  This should include my search keyword history, as well as any additional information I may have contributed.

The actual details, however, may not be that straightforward.  Take, say, the third hit on a Google search.  Who is the “creator”?  Me by entering the search keywords, Google by producing the search results in response to those keywords, or the person who wrote the web page that contains them in the first place?  Similarly, when I buy gas, who is the “creator” of the transaction entry: me, Mobil, or American Express?

Even though intuition can often be wrong, my intuitive response to the Google search example would be that both I and Google have an ownership claim on this particular search, which includes the query keywords as well as a ranking of URLs.  On the other hand, the person who wrote the contents of, say, the third URL has ownership claims only on those, and not the search results.  Furthermore, the thousands of people that provided feedback to Google’s ranking algorithms by clicking on this URL on similar searches have ownership claims on those searches, but not on mine.

Finally, those two ownership claims (on keywords and on rankings) should probably not be treated the same.  If they were, then, say, MS Live could effectively copy Google by getting many users to move.  It seems reasonable to have the right to move my search history, but not the actual search results. However, I can imagine that some form of ownership claim on the rankings may be useful for other personal rights.

This is a highly idealized example and I’m not sure what an appropriate litmus test for ownership is, but some form of legal consensus must be in place.

Transparency

The second fundamental right is that I should know who is using my personal information and how. For example, if an insurance company accesses my credit history to give me a rate quote, I can find this out. It may not be a completely painless process but it is certainly possible today, with a regulatory framework that ensures this.  Similar regulations should be instituted to cover any and all forms of access to personal information.

Data access should be fully transparent to all parties involved. If the an insurance company accesses my medical records, I should know this.  If the government does a background check on me, I should know this too.  Transparency is a prerequisite for accountability. Otherwise, individuals have very limited power to protect themselves from improper uses of their personal information.

Concluding remarks

Much of the privacy research in computer science seems to assume that we can keep the existing legal and regulatory frameworks intact. Computer scientists taking such a position is even sadder than lawyers doing so; we have no excuse of failing to understand the technical issues. We cannot and should not make this assumption. Technical solutions should be subsidiary to new regulations.  But that doesn’t mean technologists cannot lead.  We should work towards supporting full transparency (for both individuals, as well as governments and corporations) rather than opacity and I’m currently in favor of a “shoot first, ask questions later” approach (and help lawmakers figure out the answers). After all, if there is anything that the DRM wars have taught us, it’s that information really wants to be free. Why do we think it’s technically hard (to say the least) to prevent copying of music, movies and software but we still think it may be possible to prevent copying of personal information? As I pointed out in an older post, it’s usually the use and not the possession of information that’s the problem.

My point in this post is simple: we should not fight the wrong war. Instead, we need an easy way to make data ownership claims, and use this to enforce at least two fundamental rights: the ability to keep any personal data with us, and the ability to know who is using this data and how.

Postscript. This post was wallowing for a while as a draft (originally separated from this post, then forgotten).  Since then, a recent MIT TR article discusses some aspects of data ownership.  Even better, I have since found an excellent short piece in the same issue by Esther Dyson, with which I could not agree more.

Update. After posting this last night, I did some further Googling and found another piece by Esther Dyson in the Scientific American. If you’ve read through my ramblings so far, then I’d urge you to read her article; she’s a much better writer than me, and has apparently been thinking about these issues for almost a decade, way before many people even knew what the Internet is. I should probably follow her more closely myself, as I agree disturbingly often with what I’ve read from her so far.

Comments (1)

First thoughts on Android

Update: I’ll keep this post for the record, even though I’ve completely changed my mind.

T-Mobile G1I recently upgraded to a T-Mobile G1 (aka. HTC Dream), running Android.  The G1 is a very nice and functional device. It’s also compact and decent looking, but perhaps not quite a fashion statement: unlike the iPhone my girlfriend got last year, which was immediately recognizable and a stare magnet, I pretty much have to slap people on the face with the G1 to make them look at it.  Also, battery life is acceptable, but just barely.  But this post is not about the G1, it’s about Android, which is Google’s Linux-based, open-source mobile application platform.

I’ll start with some light comments, by one of the greatest entertainers out there today: Monkey Boy made fun of the iPhone in January, stating that “Apple is selling zero phones a year“. Now he’s making similar remarks about Android, summarized by his eloquent “blah dee blah dee blah” argument.  Less than a year after that interview, the iPhone is ahead of Windows Mobile in worldwide market share of smartphone operating systems (7M versus 5.5M devices). Yep, this guy sure knows how entertain—even if he makes a fool of himself and Microsoft.

Furthermore, Monkey Boy said that “if I went to my shareholder meeting [...] and said, hey, we’ve just launched a new product that has no revenue model! [...] I’m not sure that my investors would take that very well. But that’s kind of what Google’s telling their investors about Android.”  Even if this were true, perhaps no revenue model is better than a simian model.

Anyway, someone from Microsoft should really know better—and quite likely he does, but can’t really say it out loud. There are some obvious parallels between Microsoft MS-DOS and Google Android:

  • Disruptive technology: In the 80s, it was the personal computer.  Today, many think it is “cloud computing” (or “services”, or “ubiquitous computing”, or “utility computing”, or whatever else you want to call it).
  • Commodity infrastructure: In the 80s, PC-compatibles became a commodity through standardization of the hardware platform and fierce competition that drove prices (and profit margins) down. Today, network infrastructure (the Internet at the core, and mobile devices on the fringes) as well as systems software (LAMP) are facing similar pressures.
  • Common software platform: MS-DOS was the engine that fueled the growth of the personal computer.  For cloud computing, there is still some way to go (which Android hopes to help pave).
  • Revenue model: Microsoft made a profit out of every PC sold. In today’s networked world, profit should come from services offered over the network and accessed via a multitude of devices (including mobile phones), rather than from selling software licenses.

An executive once said that money is really made by controlling the middleware platform. Lower levels of the stack face high competition and have low profit margins.  Higher levels of the stack (except perhaps some key applications) are too special-purpose and more of a niche.  The sweet-spot lies somewhere in the middle. This is where MS-DOS was and where Android wants to be.

Microsoft established itself by providing the platform for building applications on the “revolution” of its day, the personal computer. MS-DOS became the de-facto standard, much more open than anything else at that time. Subsequently, Microsoft took a cut of the profits out of each PC sold ever since. Taiwanese “PC-compatibles” helped fuel Microsoft’s (as well as Intel’s) growth. The rest is history.

In “cloud” computing, the ubiquitous, commodity infrastructure is the network.  This enables access to applications and information from any networked device. Even though individual components matter, it is common standards, rather than a single, common software platform, which further enable information sharing. If you believe that the future will be the same as the past, i.e., selling shrink-wrapped applications and software licenses, then Android not only has no revenue model, but has no hope of ever coming up with one. Ballmer would be absolutely right.  But if there is a shift towards network-hosted data and applications, money can be made whenever users access those.  There are plenty of obvious examples which could be profitable: geographically targeted advertising, smart shopping broker/assistant (see below), mobile office and add-on services, online games (location based or not), and so on. It’s not clear whether Google plans to get directly involved in those (I would doubt it), or just stay mostly on the back end and provide an easy-to-use “cloud infrastructure” for application developers.

The services provided by network operators are becoming commodities. This is nothing new. A quote I liked is that “ISPs have nothing to offer other than price and speed“.  I wouldn’t really include security in their offerings, as it is really an end-to-end service. As for devices, there is already evidence that commoditization similar to that of PC-compatibles may happen. Just one month after Android was open-sourced, Chinese manufacturers have started deploying it on smartphones. Even big manufacturers are quickly getting in the game; for example, Huawei recently announced an Android phone. Most cellphones are already manufactured in China anyway.  The iPhone is assembled in Shenzhen, where Huawei’s headquarters are also located (coincidence?). The Chinese already have a decent track record when it comes to building hardware and it’s only a matter of time until they fully catch up.

So, it’s quite simple: Android wants to be for ubiquitous services as MS-DOS was for personal computers. But Microsoft in the 80s did not really start out by saying “our revenue model is this: we’ll build a huge user base at all costs, which will subsequently allow us to get $200 out of each and every PC sold”?  Not really.  Similarly, Google is not going to say that “we want to build a user base, so we can make a profit from all services hosted on the [our?] cloud and accessed via mobile devices [and set-top boxes, and cars, and...].”  Such an announcement would be premature, and one of the surest ways to scare off your user base: unless Google first provides more evidence that it means no evil, the general public will tend to assume the worst.

The most interesting feature of Android is it’s component-based architecture, as pointed out by some of the more insightful blog posts. Components are like iGoogle gadgets, only Android calls them “activities.” Applications themselves are built using a very browser-like metaphor: a “task” (which is Android-speak for running applications) is simply a stack of activites, which users can navigate backwards and forwards. The platform already has a set of basic activities that handle, e.g., email URLs, map URLs, calendar URLs, Flickr URLs, Youtube URLs, photo capture, music files, and so on. Any application can seamlessly invoke any of these reusable activities, either directly or via a registry of capabilities (which, roughly speaking, are called “intents”). The correspondence between a task and an O/S process is not necessarily one-to-one. Processes are used behind the scenes, for security and resource isolation purposes. Activities invoked by the same task may or may not run in the same process.

In addition to activities and intents, Android also supports other types of components, such as “content providers” (to expose data sources, such as your calendar or todo list, via a common API), “services” (long-running background tasks, such as a music player, which can be controlled via remote calls) and “broadcast receivers” (handlers for external events, such as receiving an SMS).

I think that Google is really pushing Android because it needs a component-based platform, and not so much to avoid the occasional snafu. If embraced by developers, this is the major ace up Android’s sleeve.  Furthermore, the open source codebase is the strongest indication (among several) that Google has no intention to tightly regulate application frameworks like Apple, or to leverage it’s position to attack the competition like Microsoft has done in the past.  Google wants to give itself enough leverage to realize it’s cloud-based services vision. If others benefit too, so much the better—Google is still too young to be “evil“.  After all, as Jeff Bezos said, “like our retail business, [there] is not going to be one winner. [...] Important industries are rarely made by single companies.” I find the comparison to retail interesting. In fact, it is quite likely that many “cloud services” themselves will also become commodities.

I’d wager that really successful Android applications won’t be just applications, but components with content provided over the network. A shopping list app is nice. It was exciting in the PalmPilot era, a decade ago. But a shopping list component, accessible from both my laptop and my cellphone, able to automatically pull good deals from a shopping component, and allow a navigation component to alert me that the supermarket I’m about to drive by has items I need—well, that would be great! Android is built with that vision in mind, even though it’s not quite there yet.

It’s kind of disappointing, but not surprising, that many app developers do not yet think in terms of this component-based architecture. In fairness, there are already efforts, such as OpenIntents, to build collections of general-purpose intents. Furthermore, the sync APIs are not (yet) for the faint of heart. Even Google-provided services could perhaps be improved. For example, Google Maps does not synchronize stored locations with the web-based version. When I recently missed a highway exit on the way to work and needed to get directions, I had to pull over and re-type the full address. Neither does it expose those locations via a data provider. When I installed Locale, I had to manually re-enter most of “My Locations” from the web version of Google Maps. So, there are clearly some rough edges that I’m sure will be smoothed out.  After all, there have been other rough edges, such as forgotten debugging hooks, something I find more amusing than alarming or embarrassing and certainly not the “Worst. Bug. Ever.

Android has a lot of potential, but it still needs work and Google should move fast. The top two items on my wish list would be:

  1. Release a “signature” device (or two), like the Motorola Razr was a couple of years ago and the Apple iPhone was last year. The G1 is really nice, but not enough.  A device that people desire may be neither a necessary nor a sufficient condition for success, but it will sure help as a vehicle to move Android forward in market share.
  2. Expand the set of available activities and content providers, and release an easy-to-use data sync service and API. In principle, everything that is an iGoogle gadget should also be an Android activity, sharing the same data sources. This is at the core of what “cloud computing” is about.  After all, you could think of Android as a glorified modern browser for devices with small screens, intermittent network connectivity, location sensors, and so on.

I suspect it might not be that hard to build a Google gadget container for Android.  Google Gears is already there and some form of interaction with the local device via Javascript is already allowed.  Many gadgets don’t need that much screen real estate anyway, so this may be an interesting hack to try out.

But not many people will buy an Android device for what it could do some day. Google has created a lot of positive buzz, backed by a few actual features. Now it needs some sexy devices and truly interesting apps, to really jumpstart the necessary network effect. Building the smart shopping list app should be as easy as building the dumb one. In the longer run, the set of devices on which Android is deployed should be expanded.  Move beyond cell phones, to in-car computers, set-top boxes, and so on (Microsoft Windows does both cars and set-top boxes already, but with limited success so far)—in short, anything that can be used to access network-hosted data and applications.

Comments (2)