Coupled Human and Natural Systems

Just another WordPress.com site

Archive for August 2015

Abusing Amazon’s Elastic MapReduce Hadoop service… easily, from R

leave a comment »

Great resource for a beginner

Things I tend to forget

I built my first Hadoop cluster this week and ran my first two test MapReduce jobs. It took about 15 minutes, 2 lines of R, and cost 55 cents. And you can too with JD Long’s (very, very experimental) ‘segue’ package.

But first, you may be wondering why I use the word “abusing” in this post’s title. Well, the Apache Hadoop project, and Google’s MapReduce processing system which inspired it, is all about Big Data. Its raison d’être is the distributed processing of large data sets. Huge data sets, actually. Huge like all the web logs from Yahoo! and Facebook huge. Its HDFS file system is designed for streaming reads of large, unchanging data files; its default block size is 64MB, in case that resonates with your inner geek. HDFS expects its files to be so big that it even makes replication decisions based on its knowledge of…

View original post 733 more words

Written by shashidhungel

August 24, 2015 at 6:04 pm

Posted in Uncategorized