MapReduce は昔からのもの (たつをの ChangeLog)

MapReduce は昔からのもの

2008-04-08-2 [IIR]

「Introduction to Information Retrieval」(IIR) の第四章でも登場する MapReduce。

- Introduction to Information Retrieval
http://www-csli.stanford.edu/~hinrich/information-retrieval-book.html

MapReduce の詳細は「Googleを支える技術」[2008-03-25-1]や IIR の4章とかを読んでもらうとして、ここでは MapReduce の概念自体は別に新しいものではないよ、という話を。

- MapReduce: A major step backwards - The Database Column
http://www.databasecolumn.com/2008/01/mapreduce-a-major-step-back.html

より、かなり適当な翻訳。

3. MapReduce is not novel

The MapReduce community seems to feel that they have discovered an
entirely new paradigm for processing large data sets. In
actuality, the techniques employed by MapReduce are more than 20
years old. The idea of partitioning a large data set into smaller
partitions was first proposed in "Application of Hash to Data Base
Machine and Its Architecture" [11] as the basis for a new type of
join algorithm.

MapReduce コミュは巨大データセットを扱う新しいパラダイムを発見したと
思ってるかもだけど、実は　MapReduce　で採用されてるテクニックは
20年以上前からあるよ。巨大データセットを小さい塊に小分けするという
アイディアは "Application of Hash to Data Base Machine and Its
Architecture"（25年前の喜連川教授の論文）で新しいタイプの
join アルゴリズムの基礎として初めて提案されたよ。

昔からある基本的な概念なわけで。
論文になってなくてもアイディアとしてはさらに昔からありそう。

まあ、結論としては「ネーミングが重要」ということかな。

追記: なお、元の記事での MapReduce の認識についてはいろいろと異論・疑問があるようで、コメント欄で盛り上がっています。

via
- 御用学者と呼ばれて - tatemuraの日記
http://d.hatena.ne.jp/tatemura/20080404/1207401701