bitquill - Spiros Papadimitriou

(Co)-clustering with Map-Reduce

Under construction

More details to come, when I find some time.

For now, you can take a look at the available source (highly experimental).

Also, this is the simple hand-coded benchmark described in this blog post. It's not very useful without the dataset, but if you see anything glaringly stupid in the code that may affect performance, I'd love to know. FYI, each line in the dataset is a couple of hundred characters long.

Todo

Due to lack of time, for now this is more a statement of desires, than plans:

  • Clean up codebase
  • Generalize to general co-clustering cost functions (Bregman divergences?)
  • Consider integration with Apache Mahout

Attachments