Category Archives: Big Data

The Elephant

I am reading Hadoop, the Definite Guide, an O’Reilly book written by Tom White. The book is an excellent reference – I have the fourth edition, which is already over one year old -. It gives a short but good introduction of what MapReduce is and then it has a much longer and wonderful course on it all, it goes then into the Hadoop Distributed Filesystem, describes YARN and the I/O. There is a general chapter on Hadoop operations and then it describes briefly but sufficiently soe projects like Sqoop, Flume, Pig, Hive, Crunch, Spark and Zookeeper. Those chapters are rather short, but then you cannot cover it all in one book. The focus is right on what you need to know about Hadoop.