The pesky cousin from Greece

I’m far from an expert in economics, politics, or history; quite the contrary.  Which is why I try to cast events in more familiar, anthropomorphic terms, and also why such analogies are dangerous. Caveat lector—now, let’s get on with the story.

There once was a large family, with many brothers, uncles, and cousins spread over many different places. Each of them led their own lives.  The extended family spanned all sorts of lifestyles, from successful businessmen, dignified and well-dressed, to smart but somewhat irresponsible bon viveurs.  They lived in many different places and they occasionally exchanged gifts and money, some more frequently than others (admittedly, this part is rather weak in its simplicity, but a single analogy can only be taken so far). But they were getting tired of running to Western Union, paying transaction fees, losing money on currency conversions due to volatility in exchange rates, and so on. Furthermore, some of the more powerful family members had gotten into nasty feuds (world wars).

So, under the leadership of some of the more powerful siblings (Germany and France) they thought: well, we have enough money to go down to an international bank and open a common family account in a solid currency, say, dollars (they in fact created their own currency and bank, perhaps to avoid associations with existing institutions, but it’s probably safe to say that they heavily mirrored those of one of the leading siblings).  Then it will be so much easier to do the same things much more efficiently.  The richer craftsmen and businessmen among them could send their stuff with less hassle and waste [e.g., paragraph seven], and the poorer ones could gain a bit by wisely using their portion of the funds and an occasional advance withdrawal.

The leading siblings knew how to keep their checkbooks balanced, and it seemed reasonable to assume that these methods were general enough and suitable for everyone.  So, after opening the family account with all of them as joint holders, they shook hands and simply agreed to use the money wisely, pretty much in the way that had worked well for the richer and more productive ones (stability and growth pact).  Once in a while they might briefly meet and agree on some further rules of how the money should be used, but basically each one of them went their way, living the life they always had, managing their portion of the family funds.  One of the more cynical siblings (England) was a bit skeptical about opening a family account while living their separate lives apart, so it chose to stay out, at least for a while.  Times were good for several years, but they didn’t last forever.

The first to get into trouble would be one of the younger cousins (Greece), who generally valued time more than money (he occasionally complains about that himself, but to little effect so far). Using some money from the family account, he did a few renovations to make his home look better and bought some decent clothes. Using the family account to boost his creditworthiness and sporting a sharper new look, he managed to get a credit card with a promotional  0% APR (Euro membership).  He even threw a big party that impressed many (Olympic games). But after a few years, the credit card companies came back asking for payment, and he found himself in deeper trouble than before the good times had begun.

Some of the other relatives had also started getting into trouble, even if not all of them had been as irresponsible.  But the immediate problem was that cousin.  What was the family to do?  Other people had started noticing, and were beginning to have some questions.  ”What kind of family are you?”  Your cousin deserves what he gets, but did you really think it was that simple to run a family with such a diverse crowd? Obviously the little cousin should be taught a lesson and become more mature and responsible.  But it should also be a lesson that could be repeated on other relatives, if necessary.

One option would be to kick him out (bankruptcy). It might get him to change his ways (or not), but a homeless relative does not make the family look good, even if he’s largely responsible for his predicament (which he is, by the way). And what would happen to the other relatives that weren’t doing that great either?  A 0% promotional APR cannot last forever, and it’s not hard to shoot yourself in the foot with it, even if you aren’t irresponsible.  Will other relatives head for the door too?  If they do, will they come back?  And is it possible to neatly untangle the finances, after decades of using a common account?  Furthermore, the cousin may start hanging out with “strangers”, some of which may be of questionable character (IMF, Russia, etc). In fact, keeping him out of undesirable company might have played a role in inviting him to the extended family account in the first place.

Another option would be to bring him and his family into the home of a richer and more dignified family member, force him into a suit, grab him by the hand (or neck), and teach him how behave like a grown up under close supervision. But the other members of the household (citizens), who contribute to its finances (pay taxes) and get food and shelter in return (welfare and other benefits) would rightfully protest. “Who is this noisy, scruffy guy in our home?  Why do we have to feed him and pay so much attention to him?”  The cousin’s family, who also valued time over money (e.g., preferring a relaxing lifestyle on modest means over hard work), was also not very happy. “I just wish we could go down to the beach and spend 2-3 hours enjoying coffee under the sun like we used to.  And why is your big cousin telling us what to do anyway?”  In addition, it was always likely that other, equally noisy and scruffy distant relatives might show up knocking at the door of the mansion, and demand the same attention.  This was certainly more than big cousin had signed up for when opening the family bank account.

Then there is a third option, which does not so much focus on teaching a lesson, but on saving face and postponing the worst trouble. Just give the little cousin a scolding and some pocket money to pay the rent and interest for a few months.  At least he wouldn’t be out in the street.  And, who knows, he might change his ways on his own in the meantime.  Sweeping the mess under the rug is unlikely (although not provably impossible) that it’ll lead to any long-term solution, but it’s the option easiest to swallow by everyone involved.

Anyway, I’ll stop the anthropomorphic analogies here.  Using a different analogy, I’ll add that tweaking the knobs (fiscal policy targets) and, perhaps, changing batteries (bail-out loans) won’t do much good in the long run if the machine is basically broken.  But it’s hard to fix it if getting down to the cogs and gears that make it work (politics) is taboo, perhaps even more than it used to be (compare Victor Hugo’s vision of the “United States of Europe” more than a century ago, with the Lisbon treaty).

Although it’s a rather overloaded term, you can probably call me a technocrat.  As such, Deng Xiaoping’s famous quote (“it doesn’t matter if it’s a black cat or a white cat, it’s a good cat as long as it catches mice”) is basically appealing.  Cats competing with each other and against mice sounds like a “natural” situation, so it’s easy to overlook whether it’s the only possible state of affairs.  However,  if they’re domesticated and not out in the wild, it’s not hard to imagine the mice and both cats colluding to, basically, take it easy. Sometimes what is “natural” should be examined more closely.

Greece is the first to draw wide attention to such questions, but I don’t think it will be the last, nor is it the first mishap along European integration.  I’d venture that, unless the EU collapses, everyone will find their place in it.  Eventually.

I’ll finish with an annotated graph (original source via metablogging.gr, and public Google spreadsheet with subset of the data), showing Greek public debt (central government) as % of GDP over the past 40 years. I’ll just point out that 1981 looks like a particularly interesting year, for various reasons.

Greek public debt (central government) historical data

Postscript. It’s often mentioned that “Greece has been in default for 50 years during the past two centuries.”  This is true; after independence in 1821, Greece was bankrupt starting at the end of the 19th century under Charilaos Trikoupis, and ending after WWII.  During this period, Greece was involved in a number of wars in the Balkans and Asia Minor, growing and shrinking in size a few times. Obviously, this didn’t help financial matters, but I don’t think it bears much similarity to the current situation.

I’ve also been puzzled somewhat about the role of corruption.  Obviously, it’s not good and I’m not trying to justify it in any way.  On the other hand, it doesn’t seem to be the sole cause of trouble, as is often suggested.  Several East Asian countries (notably China, although it’s not the first neither the only one) have shown progress despite corruption. I don’t have an answer, but it seems to me that, when you steal money, it matters where you steal it from.  If I swipe some cash from my little brother’s wallet, it will make my brother poorer and angrier, but it probably won’t bankrupt the household; someone earned that money, even if it wasn’t me.  However, if I pocket an advance withdrawal using the credit card our father gave us, it’ll get everyone in trouble, eventually.

Finally, as for 0% APR credit cards, it’s rather different if, say, Bill Gates (US) gets one versus if I get one (not that I’m that irresponsible : ).  One of us has deeper pockets and that makes a difference on whether we deserve it, on the kind of trouble we can get in, and even on the moral hazards we face.  As long as the card is used wisely for an appropriate period of time, it isn’t necessarily bad.  Any comparisons between US and Greece are, at best, premature.

Comments (4)

Mobile OCR input: “Fully automatic” and reality

Recently I’ve been toying around with WordSnap OCR (project page, source code, app on Android Market), an app for OCR-based camera input on Android. In the process, I found out a few things about “smart” versus “fast”.

At least in data mining, “fully automatic” is an often unquestioned holy grail.  There are certainly several valid reasons for this, such as if you’re trying to scan huge collections of books such as this, or index images from your daily life like this.  In this case, you use all the available processing power to make as few errors as possible (i.e., maximize accuracy).

However, if the user is sitting right in front of your program, watching your algorithms and their output, things are a little different. No matter how smart your algorithm is, some errors will occur. This tends to annoy users. In that sense, actively involved users are a liability. However, they can also be an asset: since they’re sitting there anyway, waiting for results, you may as well get them really involved. If you have cheap but intelligent labor ready and willing, use it! The results will be better or, at the very least, no worse.  Also, users tend to remember the failures. So, even if end results were similar on average, allowing users to correct failures as early as possible will make them happier.

Instead of making algorithms as smart as possible, the goal now is to make them as fast as possible, so that they produce near-realtime results that don’t have to be perfect; they just shouldn’t be total garbage. When I started playing with the idea for WordSnap, I was thinking how to make the algorithms as smart as possible.  However, for the reasons above, I soon changed tactics.

The rest of this post describes some of the successful design decisions but,  more importantly, the failures in the balance between “automatic” and “realtime guidance”. The story begins with the following example image:

Original image

Incidentally, this image was the inspiration for WordSnap: I wanted to look up “inimical” but I was too lazy to type. Also, for the record, WordSnap uses camera preview frames, which are semi-planar YUV data at HVGA resolution (480×320). This image is a downsampled (512×384) full-resolution photograph taken with the G1 camera (2048×1536); most experiments here were performed before WordSnap existed in any usable form. Finally, I should point out that OCR isn’t really my area; what I describe below is based on common sense rather than knowledge of prior art, although just before writing this post I did try a quick review of the literature.

Binarization

A basic operation for OCR is binarization: mapping grayscale intensities between 0 and 255 to just two values: black (0) and white (1).  Only then can we start talking about shapes (lines, words, characters, etc).  One of the most widely used binarization algorithms is Otsu’s method.  It picks a single, global threshold so that it maximizes the within-class (black/white) variance, or equivalently maximizes the across-class variance. This is very simple to implement, very fast and works well for flatbed scans, which have uniform illumination.

However, camera images are not uniformly illuminated. The example image may look fine to human eyes, but it turns out that even for this image no global threshold is suitable (click on image for animation showing various global thresholds):

Binarization with global threshold

If you looked at the animation carefully, you might have noticed that at some point, at least the word of interest (“inimical”) is correctly binarized in this picture.  However, if the lighting gradient were steeper, this would not be possible. Incidentally, ZXing uses Otsu’s method for binarization, because of it is fast. So, if you wondered why barcode scanning sometimes fails, now you know.

So, a slightly smarter approach is needed: instead of using one global threshold, the threshold should be determined individually for each pixel (i,j). A natural threshold t(i,j) is the mean intensity μw(i,j) of pixels within a w×w neighborhood around pixel (i,j).  The key operation here is mean filtering: convolving the original image with a w×w matrix with constant entries 1/w2.

The problem is that, using pure Java running on Dalvik, mean filtering is prohibitively slow.  First, Dalvik is fully interpreted (no JIT, yet). Firthermore, the fact that Java bytes are always signed doesn’t help: casting to int and masking off the 24 most significant bits almost doubles running time.

Method Dalvik (msec) JNI (msec) Speedup
Naïve 109,882 ± 4,813 1,712 ± 261 64×
Sliding 2,435 ± 141 71 ± 19 34×

JNI to the rescue. The table above shows speedups for two implementations. The naïve approach uses a triple nested loop and has complexity O(w2mn), where m and n is the image height and width, respectively (m = 348, n = 512 in this example). The 1-D equivalent would simply be:

for i = 0 to N-1:
   s = 0
   for j = max(i-r,0) to min(i+r,N-1):
      s += a[j]

where w = 2r+1 is the window size. The second implementation updates the sums incrementally, based on the values of adjacent windows. The complexity now is just O(mn). An interesting aside is the relative performance of two implementations for sliding window sums (where w = 2r+1 is the window size). The first checks border conditions inside each iteration:

Initialize s = sum(a[0]..a[r])
for i = 1 to N-1:
   if i > r:
      s -= a[i-r-1]
   if i < N-r:
      s += a[i+r]

The second moves the border condition checks outside the loop which, if you think about it for a second, amounts to:

Initialize s = sum(a[0]..a[r])
for i = 1 to r:
   s += a[i+r]
for i = r+1 to N-r-1:
   s -= a[i-r-1]
   s += a[i+r]
for i = N-r to N-1:
   s -= a[i-r-1]

Among these two, the first one is faster, at least on a laptop running Sun’s JVM with JIT (I didn’t time Dalvik or JNI). I’m guessing that the second one messes loop unrolling, but I haven’t checked my guess.

It turns out that there is a very similar approach in the literature, called Sauvola’s method. Furthermore, there are efficient methods to compute it, using integral images. These are simply the 2-D generalization of partial sums. In 1-D, if partial sums are pre-computed, window sums can be estimated in O(1) time using the simple observation that sum(i…j) = sum(1..j) – sum(1..i-1).

Savuola’s method also computes local variance σw(i,j), and uses a relative threshold t(i,j) = μw(i,j)(1 + λσw(i,j)/127). WordSnap uses the global variance and an additive threshold t(i,j) = μw(i,j) + λσglobal, but after doing a contrast stretch of the original image (i.e., linearly mapping minimum intensity to 0 and maximum to 255). Doing floating point math or 64-bit integer arithmetic is much more expensive, hence the additive threshold. Furthermore, WordSnap does not use integral images because the same runtime can be achieved without the need to allocate a large buffer. Memory allocation on a mobile device is not cheap: the time needed to allocate a 480×320 buffer of 32-bit integers (about 600KB total) varies significantly depending on how much system memory is available, whether the garbage collector is triggered and so on, but on average it’s about half a second on the G1. Even though most buffers can be allocated once, startup time is important for this application: if it takes more than 2-3 seconds to start scanning, the user might as well have typed the result.

Anyway, here is the final result of locally adaptive thresholding:

Binarization with local mean filter

Conclusion: In this case we needed the slightly smarter approach, so we invested the time to implement it efficiently. WordSnap currently uses a 21×21 neighborhood.  Altogether, binarization takes under 100ms.

Skew

Another problem is that the orientation of the text lines may not be aligned with image edges.  This is called skew and makes recognition much harder.

Initially, I set out to find a way to correct for skew.  After a few searches on Google, I came across the Hough transform.  The idea is simple.  Sayyou want to detect a curve desribed by a set of parameters. E.g., for a line, those would be distance ρ from origin and slope θ. For each black pixel, find the parameter values for all possible curves to which this pixel may belong. For a line, that’s all angles θ from 0 to 180 degrees, and all distances ρ from 0 to sqrt(m2+n2).  Then, compute the density distribution of parameter tuples.  If a line (ρ00) is present in the image, then the parameter density distribution should have a local maximum at (ρ00).

If we apply this approach to our example image, the first maximum is detected at an angle of 20 degrees. Here is the image counter-rotated by that amount:

After rotating by angle detected using Hough transform

Success!  However, computing the Hough transform is too slow!  Typical implementations bucketize the parameter space. This would require a buffer of about 180×580 32-bit integers (for a 480×320 image), or about 410KB. In addition, it would require trigonometric operations or lookups to find the buckets for each pixel, not to mention counter-rotation. There are obvious optimizations one can try, such as computing histograms at multiple resolutions to progressively prune the parameter space.  Still, the cost implied by back-of-the envelope calculations put me off from even trying to implement this on the phone. Instead, why not just try to use the users:

Finder alignment guides

Conclusion: Simple approach with help from user wins, and the computer doesn’t even have to do anything to solve the problem! Incidentally, the guideline width is determined by the size of typical newsprint text at the smallest distance that the G1′s camera can focus.

Font size

Next, we need to detect individual words.  The approach WordSnap uses is to dilate the binary image with a rectangular structuring element (in the following image, the size 7×7), and then expand a rectangle (shown in green) until it covers the connected component which, presumably, is one word.

Dilation with 7x7 rectangle

However, the size of the structuring element should really depend on the inter-word spacing, which in turn depends on the typeface as well as the distance of the camera from the text.  For example, if we use a 5×5 element, we would get the following:

Dilation with 5x5 rectangular element

I briefly toyed with two ideas for font size detection.  The first is to do a Fourier transform.  Presumably the first spatial frequency mode would correspond to inter-word and/or inter-line spacing and the second mode to inter-character spacing. But that assumes we apply Fourier to a “large enough” portion of the image, and things start becoming complicated.  Not to mention computationally expensive.

The second approach (which also appears to be the most common?) is to to hierarchical grouping. First expand rectangles to cover individual letters (or, sometimes, ligatures), then compute histogram of horizontal distances and re-group into word rectangles, and so on.  This is also non-trivial.

Instead, WordSnap uses a fixed dilation radius.  The implementation is optimized to allow near-realtime annotation of the detected word extent.  This video should give you an idea:

Conclusion: Simple wins again, but this time we have to do something (and let the user help with the rest). But, instead of trying to be smart and find the best parameters given the camera position, we try to be fast: fix the parameters and let the user find the camera position that works given the parameters. WordSnap uses a 5×5 rectangular structuring element, although you can change that to 3×3 or 7×7 in the preferenfces screen. Altogether, word extent detection takes about 150-200ms, although it could be significantly optimized, if necessary, by using only JNI only, instead of a mix of pure Java and JNI calls.


I’m now looking into the possibility of moving OCR into the “live” loop: as you move the camera, the phone shows not only the word extent rectangle, but also the recognized word.  Perhaps as a hyperlink to Google, or along with Google Translate results.  Then I can justifiably use the buzzword of the day, “augmented reality”!  It looks that it might just be possible, but let me get back to you in a week or two.  :)

Postscript: Some of the papers referenced were pointed out to me by Hideaki Goto, who started and maintains WeOCR. Also, skew detection and correction experiments are based on this quick-n-dirty Python script (needs OpenCV and it ain’t pretty!). Update (9/2): Fixed really stupid mistake in parametrization of line.

Comments (12)

My godfather is a Markov model

Before the memory is completely lost in the dust of time, I’d like to document how I ended up with this domain name. It all started last summer, when I decided to start a personal site. Of course, both my first and last names were already taken, even in TLDs I’d never heard of before.  But using my name would have been too easy anyway.  Challenge is good.

Politically-correct and totally un-sarcastic as I am, I originally wanted to go with some combination of “principled anarchy”.  Now, that was available! Apparently, nobody wanted to touch it with a ten foot pole, not even cybersquatters; which kind of gave me a hint.  Wouldn’t want to, say, end up in a three-letter-agency watchlist, at least not while in the US on H1B.  They might not share my sense of humor.

So, armed with online thesauri, dictionaries, the internet anagram server, and things like that, I set out on a name quest.  I don’t remember anymore what I tried; “coredump” (which, in case you didn’t know, has “code rump” as an anagram—still available, if you’re interested), “segfault”, “brainfart”, “farout”, and pretty much anything else I could think of: all taken.   Even these names as well as these are taken (thank god!).

At some point I was naïve enough to hope that a Tolkien name would be free.  No luck of course, anything semi-pronnounceable was taken.  You’d have to go as far as, say,  ”gulduin” (which, by the way, means “magic river” in Elvish) to find something available. Good luck getting people to remember that!  Oh well, at least I had a reason to actually read some of the Silmarillion; if you’ve tried this and you’re not a religiously devoted Tolkien fan, you know what I’m talking about.

After the first week of searching, I think I even got temporarily banned from Yahoo! whois search. In desperation, I finally turned to one of many domain name generators.  I asked omniscient Google to give me one and, as always, it obliged.  By now I had decided that I wanted a name as free of any connotations as possible (say, like Google or Slashdot, not like Facebook or YouTube).  I went through things like “fractors”, “naphead”, “magnarchy”, “aniarchy”, “mallock”, “hexndex”, “squilt”, “terable”, and so on. It’s amazing how several weeks of searching in frustration temper one’s standards of quality. Anyway, one day “bitquill” popped up: neutral, inoffensive, bland, unusual, and a composite which is short and almost pronnounceable!  I couldn’t ask for much more, so I registered it.  

That, and “clusterhack”.  Sorry.  I couldn’t resist.

Comments (1)

Hello Android

After blabbering about Android, I decided to get my hands a little dirty and actually write some code. For various reasons, I won’t describe the app (it was a “weekend hack” anyway), but hopefully my first impressions will be clear even without a specific context.

Overall, the Android APIs are quite impressive, even though some edges are still rough.  It was reasonably easy to get up to speed, even though my prior experience on mobile application frameworks was zero.  The toughest part was getting used to the heavily event-based programming style, as well as the idea that your code may be interrupted, killed and restarted at any time.

Activity lifecycle. Although Android supports multitasking and concurrency, on a mobile device with limited memory and no swap it’s likely that the O/S will have to kill some or all of your tasks to reclaim resources needed by higher-priority, user-visible processes (e.g., an incoming phone call).  If you have non-persistent or external state, such as open database connections or separate threads that fetch data in the background, things may get a little tricky. Although Android has auxiliary features such as managed cursors and dialogs, you still need to know they exist and use them properly.

However, even things like screen orientation changes are handled by terminating and restarting any affected activities. At first, while spending a couple of hours to figure out why my app was crashing when I opened the keyboard, I bitched about this. Apparently, I wasn’t the only one who was confused. To my surprise, I found that many Android Market apps crash when the screen is rotated.  Some Market apps even come with grave-sounding warnings that, e.g., “the life counter [sic] resets on screen orientation change =/ Will fix for new version.” Luckily, I also found numerous good posts about orientation changes, such as this or this (the series by Mark Murphy are pretty good, by the way), as well as a post on the official blog.

In retrospect, handling orientation changes in this way is a good thing: it forces app developers to be prepared. After I fixed my code to handle orientation changes gracefully, I found that I was also ready to properly handle other sources of interruption: when an incoming call came as I was testing my app, everything worked out beautifully.

Now, whenever I download an app, I perform the following test: I flip the keyboard open when the app executes a background operation, even if I don’t need to type anything.  If the app crashes or gets into an inconsistent state (something that happens surprisingly often), that’s a strong indication that the code is not very robust.

Event handling. For APIs that are so heavily event-based, one of my gripes was that some (but not all) event handlers are based on inheritance rather than delegation. These design choices are probably due to performance reasons that may be specific to Dalvik, the Android VM which is motivated partly for non-technical reasons

However, inheritance sometimes complicates things. For example, Android supports managed cursors and dialogs via methods in the base Activity class. On more than one occasion I found that managed threads would also be nice.  Implementing this requires hooking into the activity lifecycle events (and has, on occasion, been over-engineered to death). Because there are several Activity subclasses (e.g., ListActivity, PreferenceActivity, etc), there is no simple way to extend them all. If lifecycle events were handled via delegates, it would be possible to implement a background UI thread manager as, say, an activity decorator that can be added to any activity instance.  

The delegation-based event model was introduced in Java 1.1 precisely to address such shortcomings of the inheritance-based model. But, being pragmatic about performance on current mobile devices, I should probably not complain too much.  Still, some API design choices seem a bit arbitrary, perhaps even Microsoft-esque: why would performance be an issue with lifecycle events (which are presumably rare, but handlers use inheritance) but not with click events (which are presumably more frequent, but handlers use delegation)?

Data sync and caching. Another gripe was the lack of syncable content providers, something I’ve mentioned before. Also, content providers aren’t really appropriate for network-hosted data. The requirement that content providers use an integer primary key (row ID) is reasonable for local databases and simplifies the APIs, but requires some book-keeping when that’s not the “natural” primary key.

Ideally, I’d like to see some support for caching remote data on the SD card (which would require gracefully handling card removal, and transparently fetching data either from the cache or the network). Although the core APIs provide all that is necessary to implement this from scratch, it was getting too complicated for my simple “weekend hack” app, so I decided to drop it.

I hope that, in the near future, porting web apps to mobile devices will become easier with the support for offline applications and client-side storage in HTML5, as well the proposed geolocation APIs (all of which are already part of Google Gears). An application manifest might include “web activities”, translating intents into HTTP POST requests, while granting device access permissions to those activities (e.g., see promising hacks such as OilCan). Porting might then involve little more than writing a new stylesheet. Perhaps that’s where Palm is going with its WebOS which apparently supports both “native application” and “web application” models, but information is rather thin at the moment.

Epilogue. My first Android app was an interesting learning experience, not only from a technical standpoint (perhaps more on this in another post). I also found that Android is quite stable. I sometimes used my phone for live debugging, forcefully killing threads and processes through ADB.  Let me put it this way: if it wasn’t for the RC33 OTA update, my phone would now have an uptime of a few months. For a piece of software that barely existed a year ago, this is impressive.

There is plenty of documentation available, but at times it can take some searching to find the necessary information.  However, since Android is open-source, it’s always possible to consult the source code itself (which is fairly well-written and documented).

Note: This post was mostly written sometime around February. Since then I had no time to try SDK v1.5, but I believe most points above are still relevant.

Comments (1)

Back again…

Seoul

After coming back from Seoul, New York seemed even dinkier than the last time I returned from a trip. As I was boarding the plane at Incheon, I picked up a copy of the Wall Street Journal (Asian edition). I had enough time to read almost all of it, as KAL arrived into Narita early, but Continental was six hours late. It might as well have been called “The GM Journal”, since about two thirds of the stories were about GM and Chrysler, and how the US government is trying to save them from doom due to chronic mis-management and exorbitant legacy costs.  

My wife, who has a far more sensitive nose than me, jokes that the first thing you smell upon disembarking the plane is cigarette smoke in Greece, and garlic in Korea.  Upon arriving at Newark (or any NYC airport, for that matter), even I can smell the mouldy carpets.  Getting on the subway the next morning, the smell was even worse and the signs of age everywhere.  I sat down, right across a poster ad by NYC Department of Consumer Affairs that read “Debt Stress?  You’re not alone”.  Someone had plastered a makeshift sticker on top, reading “Kill Your Boss”.  After a ride on Metro North, I got into a taxi to work.  It was one of those Ford relics, with a severely dented right side, a cracked windshield and a barely functioning transmission, but still street-legal.  As the cab ended up triple-booked and I was the last one to get off, I got a 35-minute scenic tour through backstreets and pothole-riddled roads before finally arriving to the office.

The experience was enough to make me look up the definition of “developing country” in Wikipedia. Honestly, I don’t get why South Korea is sometimes still listed as such (e.g., in WSJ and, if memory serves me right, in the Economist), while the US isn’t. Something tells me it’s more than GM that needs patching up. Anyway, welcome back home!

Comments

“Life is pointless” ??

That’s what you get when you use colorful tags like “life bits” and “pointless“, especially if you use them together:  Google thinks your website is highly relevant to the query “life is pointless”.

Google Webmaster Tools for bitquill.net, February 2009

I’m now experimenting with a separate tumblelog to post most random thoughts but, in the meantime, here you go Google: one more post about “pointless life (bits)”!

Comments

Revised thoughts on Android

The post I wrote a few days ago about Android is all over the place. The right elements are in that post, but my composition and conclusions are somewhat incoherent. Perhaps I have been partly infected by the conventional thinking (of, e.g., various older, big corporations) and missed the obvious.

First, in a networked environment, it is common standards, rather than a single, common software platform, which further enable information sharing. So, Google may be doing Android for precisely the opposite reason than I originally suggested: to avoid the emergence of a single, dominant, proprietary platform. Chrome may exist for a similar reason. After all, Android serves a purpose similar to a browser, but for mobile devices with various sensing modalities.

Finally, mobile is arguably an important area and Google probably wants to encourage diversity and experimentation which, as I wrote in a previous post, is a pre-requisite for innovation. This is in contrast to the established mentality summarized by the quote I previously mentioned, to “find an idea and ask yourself: is the potential market worth at least one billlion dollars? If not, then walk away.” In fairness, this approach is appropriate to preserve the status quo. (By the way, in the same public speech, the person who gave this advice also responded to a question about competition by saying with commendable directness that “Look: we’ll all be dead some day.  But there’s a lot of money to be made until then.”)  But for innovation of any kind, one should “ask ‘why not?’ instead of ‘why should we do it?’” as Jeff Bezos said, or “innovate toward the light, not against the darkness” as Ray Ozzie said.

Comments (2)

On data ownership in a networked world

Every piece of content has a creator and owner (in this post, I will assume they are by default the same entity). I do not mean ownership in the traditional sense of, e.g., stashing a piece of paper in a drawer, but in the metaphysical sense that each artifact is forever associated with one or more “creators.”

This is certainly true of the end-products of intellectual labor, such as the article you are reading. However, it is also true of more mundane things, such as checkbook register entries or credit card activity. Whenever you pay a bill or purchase an item, you implicitly “create” a piece of content: the associated entry in your statement.  This has two immediately identifiable “creators”: the payer (you) and the payee.  The same is true for, e.g., your email, your IM chats, your web searches, etc. Interesting tidbit: over 20% of search terms entered daily in Google are new, which would imply roughly 20 million new pieces of content per day, or over 7 billion (over twice the earth’s population) per year—all this from just one activity on one website.

When I spend a few weeks working on, say, a research paper, I have certain expectations and demands about my rights as a “creator.” However, I give almost no thought to my rights on the trail of droppings (digital or otherwise) that I “create” each day, by searching the web, filling up the gas tank, getting coffee, going through a toll booth, swiping my badge, and so on.  However, with the increasing ease of data collection and distribution in digital form, we should re-think our attitudes towards “authorship”.

Unique identity

People call me “Spiros”, my identity documents list me as “Spyridon Papadimitriou” and on most online sites I’m registered as spapadim.  However, sometimes I’m s_papadim or spiros_papadimitriou, and so on.  Like most people, I lost track of all my accounts a time ago.  Vice versa, I’m not the only “Spiros Papadimitriou” in the real world.  For example, I occasionally get confused with my cousin, and receive comments about my interesting architectural designs!  Nor am I the only spapadim on the net.

A framework and mechanisms that allow (but do not enforce) asserting and verifying which of those labels (i.e., names, userids, etc) refer to the same entity (i.e., me) is missing. However, this is a prerequisite: how can we talk about data ownership and tackle portability, transparency and accountability, if we have to jump through countless hoops just to prove identity?

Some people, especially in the US, may object or even outright panic at the thought of such a global identifier.  In Greece, and in much of Europe, we’ve had national identity cards for decades.  Which is fine, as long as you know they exist and what are permissible uses-in other words, as long as transparency is ensured.  Furthermore, the illusion of privacy should not be confused with privacy itself—if in doubt, I suggest reading “Database Nation” (official site).  Its examples are largely US-centric, but the lessons are not.

OpenID (despite some shortcomings) and OAuth are emerging as open standards for authentication and authorization.  OpenID allows reuse of authentication credentials from one site on others: I can reuse, say, my Google username and password to log in to other sites (e.g., to leave a comment on this blog), without having to create yet another account from scratch.  OAuth resembles Kerberos’s ticket granting service but for the web, permitting other web services to ask for access to a subset of personal information: I could allow Facebook to access only my Google addressbook and not, potentially, all of my data on any Google service.  OpenID and OAuth can, at least in principle, work together.

Both high-profile individual developers and major companies are involved in these efforts.  For example, Yahoo! already supports OpenID and plans to support OAuth as well, while Google supports OAuth directly and OpenID indirectly in various ways.  Wide adoption of these standards would be a major step forwards for data portability and web interoperability.  However, I suspect they fall slightly short of providing a truly permanent and global personal identity.  What if, for any reason, my Yahoo! account disappears, either because I decided to shut it down or because Yahoo! went bust?

I was going to suggest a DNS-based solution and I was surprised when I found that the generic top-level domain .name has been instituted since 2001 to provide URIs for personal identities. You can register for a free three-month trial on FreeYourID (after that, it’s $11/year). What’s more, their service already provides OpenID authentication. In principle, this should allow easy switching of authentication and authorization service providers. Just as I can still keep the “label” for this site even if I move to a different web host, I can still keep my personal “label” no matter who I choose to manage my personal information.  So, now my universal username is spiros.papadimitriou.name, any emails sent to spiros@papadimitriou.name will find their way to me, you can call me on Skype using spiros.papadimitriou.name/call, and so on.

With such a unique identity tied to authorization and authentication services, the Giant Global Graph and its materializations would be one step closer to becoming really useful. If I want to use my identity to log and controll access to my data, I should be able to prove my claims.  Currently, FOAF and XFN allow assertion of relationshipt but provide no way to verify them.

Data portability

The point of this mental exercise so far is the following: A unique identity that can be verifiably associated with each and every data item that I produce is a prerequisite for making data ownership claims. Subsequently, we need to ask what fundamental rights should be associated with data ownership.  The first is the right to keep my information with me or, in other words, “data portability”. Just as I can freely move my money from one financial institution to another, I should be able to move any of my information from one data warehouse to another.

For example, consider my web search history. I don’t think I need to argue about the importance of historical information to improve search quality. If I decide for any reason to move to another search provider, I should be able to carry along all the information that’s directly associated with me.  This should include my search keyword history, as well as any additional information I may have contributed.

The actual details, however, may not be that straightforward.  Take, say, the third hit on a Google search.  Who is the “creator”?  Me by entering the search keywords, Google by producing the search results in response to those keywords, or the person who wrote the web page that contains them in the first place?  Similarly, when I buy gas, who is the “creator” of the transaction entry: me, Mobil, or American Express?

Even though intuition can often be wrong, my intuitive response to the Google search example would be that both I and Google have an ownership claim on this particular search, which includes the query keywords as well as a ranking of URLs.  On the other hand, the person who wrote the contents of, say, the third URL has ownership claims only on those, and not the search results.  Furthermore, the thousands of people that provided feedback to Google’s ranking algorithms by clicking on this URL on similar searches have ownership claims on those searches, but not on mine.

Finally, those two ownership claims (on keywords and on rankings) should probably not be treated the same.  If they were, then, say, MS Live could effectively copy Google by getting many users to move.  It seems reasonable to have the right to move my search history, but not the actual search results. However, I can imagine that some form of ownership claim on the rankings may be useful for other personal rights.

This is a highly idealized example and I’m not sure what an appropriate litmus test for ownership is, but some form of legal consensus must be in place.

Transparency

The second fundamental right is that I should know who is using my personal information and how. For example, if an insurance company accesses my credit history to give me a rate quote, I can find this out. It may not be a completely painless process but it is certainly possible today, with a regulatory framework that ensures this.  Similar regulations should be instituted to cover any and all forms of access to personal information.

Data access should be fully transparent to all parties involved. If the an insurance company accesses my medical records, I should know this.  If the government does a background check on me, I should know this too.  Transparency is a prerequisite for accountability. Otherwise, individuals have very limited power to protect themselves from improper uses of their personal information.

Concluding remarks

Much of the privacy research in computer science seems to assume that we can keep the existing legal and regulatory frameworks intact. Computer scientists taking such a position is even sadder than lawyers doing so; we have no excuse of failing to understand the technical issues. We cannot and should not make this assumption. Technical solutions should be subsidiary to new regulations.  But that doesn’t mean technologists cannot lead.  We should work towards supporting full transparency (for both individuals, as well as governments and corporations) rather than opacity and I’m currently in favor of a “shoot first, ask questions later” approach (and help lawmakers figure out the answers). After all, if there is anything that the DRM wars have taught us, it’s that information really wants to be free. Why do we think it’s technically hard (to say the least) to prevent copying of music, movies and software but we still think it may be possible to prevent copying of personal information? As I pointed out in an older post, it’s usually the use and not the possession of information that’s the problem.

My point in this post is simple: we should not fight the wrong war. Instead, we need an easy way to make data ownership claims, and use this to enforce at least two fundamental rights: the ability to keep any personal data with us, and the ability to know who is using this data and how.

Postscript. This post was wallowing for a while as a draft (originally separated from this post, then forgotten).  Since then, a recent MIT TR article discusses some aspects of data ownership.  Even better, I have since found an excellent short piece in the same issue by Esther Dyson, with which I could not agree more.

Update. After posting this last night, I did some further Googling and found another piece by Esther Dyson in the Scientific American. If you’ve read through my ramblings so far, then I’d urge you to read her article; she’s a much better writer than me, and has apparently been thinking about these issues for almost a decade, way before many people even knew what the Internet is. I should probably follow her more closely myself, as I agree disturbingly often with what I’ve read from her so far.

Comments (1)

« Previous entries Next Page » Next Page »