(Co)-clustering with Map-Reduce
Under construction
More details to come, when I find some time.
For now, you can take a look at the available source (highly experimental).
Also, this is the simple hand-coded benchmark described in this blog post. It's not very useful without the dataset, but if you see anything glaringly stupid in the code that may affect performance, I'd love to know. FYI, each line in the dataset is a couple of hundred characters long.
Todo
Due to lack of time, for now this is more a statement of desires, than plans:
- Clean up codebase
- Generalize to general co-clustering cost functions (Bregman divergences?)
- Consider integration with Apache Mahout
Attachments
-
histogram.cc
(0.8 kB) - added by spapadim
4 months ago.
Simple hand-coded histogram benchmark
