`
javababy1
  • 浏览: 1170923 次
  • 性别: Icon_minigender_1
  • 来自: 北京
文章分类
社区版块
存档分类
最新评论

Hadoop书籍介绍

阅读更多

第一本当然是大名鼎鼎的《Hadoop: The Definitive Guide》,基本上是Bible级别的,目前已经有第二版了。去年读了第一版,当时是以旧的API为例子的。关于新版本,参考amazon的介绍:

Discover how Apache Hadoop can unleash the power of your data. This comprehensive resource shows you how to build and maintain reliable, scalable, distributed systems with the Hadoop framework -- an open source implementation of MapReduce, the algorithm on which Google built its empire. Programmers will find details for analyzing datasets of any size, and administrators will learn how to set up and run Hadoop clusters.

This revised edition covers recent changes to Hadoop, including new features such as Hive, Sqoop, and Avro. It also provides illuminating case studies that illustrate how Hadoop is used to solve specific problems. Looking to get the most out of your data? This is your book.

* Use the Hadoop Distributed File System (HDFS) for storing large datasets, then run distributed computations over those datasets with MapReduce
* Become familiar with Hadoop’s data and I/O building blocks for compression, data integrity, serialization, and persistence
* Discover common pitfalls and advanced features for writing real-world MapReduce programs
* Design, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloud
* Use Pig, a high-level query language for large-scale data processing
* Analyze datasets with Hive, Hadoop’s data warehousing system
* Take advantage of HBase, Hadoop’s database for structured and semi-structured data
* Learn ZooKeeper, a toolkit of coordination primitives for building distributed systems

"Now you have the opportunity to learn about Hadoop from a master -- not only of the technology, but also of common sense and plain talk."

--Doug Cutting, Cloudera


About the Author


Tom White has been an Apache Hadoop committer since February 2007, and is a member of the Apache Software Foundation. He works for Cloudera, a company set up to offer Hadoop support and training. Previously he was as an independent Hadoop consultant, working with companies to set up, use, and extend Hadoop. He has written numerous articles for O'Reilly, java.net and IBM's developerWorks, and has spoken at several conferences, including at ApacheCon 2008 on Hadoop. Tom has a Bachelor's degree in Mathematics from the University of Cambridge and a Master's in Philosophy of Science from the University of Leeds, UK.

第二本是《Hadoop in Action》,这本书不厚,目前看了大概一半了,非常实用,如果你想快速的了解并开始实践的话,推荐这个。参考amazon的介绍:

Hadoop in Action teaches readers how to use Hadoop and write MapReduce programs. The intended readers are programmers, architects, and project managers who have to process large amounts of data offline. Hadoop in Action will lead the reader from obtaining a copy of Hadoop to setting it up in a cluster and writing data analytic programs.

The book begins by making the basic idea of Hadoop and MapReduce easier to grasp by applying the default Hadoop installation to a few easy-to-follow tasks, such as analyzing changes in word frequency across a body of documents. The book continues through the basic concepts of MapReduce applications developed using Hadoop, including a close look at framework components, use of Hadoop for a variety of data analysis tasks, and numerous examples of Hadoop in action.

Hadoop in Action will explain how to use Hadoop and present design patterns and practices of programming MapReduce. MapReduce is a complex idea both conceptually and in its implementation, and Hadoop users are challenged to learn all the knobs and levers for running Hadoop. This book takes you beyond the mechanics of running Hadoop, teaching you to write meaningful programs in a MapReduce framework.

This book assumes the reader will have a basic familiarity with Java, as most code examples will be written in Java. Familiarity with basic statistical concepts (e.g. histogram, correlation) will help the reader appreciate the more advanced data processing examples.


About the Author

Chuck Lam is a Senior Engineer at RockYou!. Chuck received his B.S from San Jose State University and his Ph.D in Electrical Engineering from Stanford University, where his thesis topic was computational data acquisition.

然后就是《Pro Hadoop》,看名字就知道是进阶的版本哦,我的下一本书了。参考amazon的介绍:

You’ve heard the hype about Hadoop: it runs petabyte–scale data mining tasks insanely fast, it runs gigantic tasks on clouds for absurdly cheap, it’s been heavily committed to by tech giants like IBM, Yahoo!, and the Apache Project, and it’s completely open source (thus free). But what exactly is it, and more importantly, how do you even get a Hadoop cluster up and running?

From Apress, the name you’ve come to trust for hands–on technical knowledge, Pro Hadoop brings you up to speed on Hadoop. You learn the ins and outs of MapReduce; how to structure a cluster, design, and implement the Hadoop file system; and how to build your first cloud–computing tasks using Hadoop. Learn how to let Hadoop take care of distributing and parallelizing your software—you just focus on the code, Hadoop takes care of the rest.

Best of all, you’ll learn from a tech professional who’s been in the Hadoop scene since day one. Written from the perspective of a principal engineer with down–in–the–trenches knowledge of what to do wrong with Hadoop, you learn how to avoid the common, expensive first errors that everyone makes with creating their own Hadoop system or inheriting someone else’s.

Skip the novice stage and the expensive, hard–to–fix mistakes...go straight to seasoned pro on the hottest cloud–computing framework with Pro Hadoop. Your productivity will blow your managers away.
What you’ll learn

* Set up a stand–alone Hadoop cluster the smart way, laid out simply and step by step so you can get up and running quickly to build your next data center, collaborative, data–intensive Internet services application, Software as a Service (SaaS), and more.
* Optimize your Hadoop production tasks like an experienced pro.
* Work with time–proven, bulletproof standard patterns that have been tested and debugged in high–volume production.
* Understand just enough theoretical knowledge to know why something works in Hadoop, without getting bogged down in abstruse walls of theory.
* Get detailed explanations of not only how to do something with Hadoop, but also why, from a front–line coder with years in the Hadoop game.
* Turn someone else’s expensive cluster–wide “wrong” into an orderly, productive "right" with professional–level debugging and testing.

Who this book is for

IT professionals interested in investigating Hadoop and implementing it in their organizations, and existing Hadoop users who want to deepen their professional toolkits.
Table of Contents

1. Getting Started with Hadoop Core
2. The Basics of a MapReduce Job
3. The Basics of Multimachine Clusters
4. HDFS Details for Multimachine Clusters
5. MapReduce Details for Multimachine Clusters
6. Tuning Your MapReduce Jobs
7. Unit Testing and Debugging
8. Advanced and Alternate MapReduce Techniques
9. Solving Problems with Hadoop
10. Projects Based On Hadoop and Future Directions

About the Author


Jason Venner has 20+ years of software engineering, managing, designing, and coding. He has been a VP, director, and consultant. Currently, his interests and expertise are in Java, Hadoop, cloud computing, and more. For more, visit www.prohadoopbook.com.

最后是一本延伸读物《HBase: The Definitive Guide》,还没有上市,需要预定。

注意:Hadoop 0.20采用了全新的API,所以以前的代码很多都需要重新写过。所以很有必要了解这些变化,如果你直接从0.20开始,也是没有问题的。

下载:英文的在csdn下载里面都有

购买:china-pub和dangdang有《Hadoop权威指南(中文版)》,英文原版的太贵了。

分享到:
评论

相关推荐

    基于Hadoop图书推荐系统源码+数据库.zip

    基于Hadoop图书推荐系统源码+数据库.zip基于Hadoop图书推荐系统源码+数据库.zip基于Hadoop图书推荐系统源码+数据库.zip基于Hadoop图书推荐系统源码+数据库.zip基于Hadoop图书推荐系统源码+数据库.zip基于Hadoop图书...

    Hadoop书籍.rar

    Hadoop技术内幕,Hadoop权威指南(中文版),Hadoop实战-陆嘉恒,hadoop整理面试题。

    Hadoop书籍

    Hadoop学习的书籍,里面包含Hadoop实战和Hadoop权威指南两本Hadoop学习的经典书籍,高清。。

    hadoop入门书籍1

    hadoop的入门书籍,本人认为一共有以下五本书比较好: 1.云计算资料大全(了解云计算者必读).pdf 2.Hadoop开发者入门专刊 3.Hadoop权威指南%28第2版%29中文版 4.hadoop实战中文版+电子版pdf 5.精通HADOOP 由于上传...

    Hadoop自学书籍汇总

    Hadoop自学书籍汇总

    hadoop入门经典书籍

    hadoop的经验入门书籍,适合刚刚开始了解学习hadoop技术的人

    Hadoop商品推荐系统-源码.zip

    运用Hadoop 运用协同过滤算法 基于Hadoop的商品推荐系统的设计源码和设计指导书 使用Eclipse的export功能把所有源码打包,然后把打包后的jar文件拷贝到hadoop集群的$HADOOP_HOME/share/hadoop/mapreduce/lib目录...

    hadoop书籍下载

    hadoop是你进入大数据时代的第一步 楼主看完后收益匪浅 觉得分享 一下

    Hadoop系列书籍五本

    内含Hadoop系列之《Hadoop YARN 基本架构和发展趋势》、《Hadoop实战》、《Hadoop海量数据处理 技术详解与项目实战》、《Hadoop官网帮助手册》、《Hadoop源代码分析》五本书籍,你值得拥有。

    hadoop技术全套图书

    Facebook的实时Hadoop系统 hadoop的首次使用 Hadoop命令手册 Hadoop权威指南(第2版) hadoop伪分布配置自写 在Windows上安装Hadoop教程

    Hadoop有关书籍

    Hadoop的有关书籍,帮助正在学习Hadoop,使用Hadoop的朋友。了解有关云计算的有关内容,英文版。

    hadoop1.0.1董西成书籍配套版本

    董西成书籍配套版本hadoop源码,hadoop技术内幕,hadoop技术内幕

    [Hadoop] Hadoop 集群操作管理技巧 (英文版)

    [Packt Publishing] Hadoop 集群操作管理技巧 (英文版) [Packt Publishing] Hadoop Operations and Cluster Management Cookbook (E-Book) ☆ 图书概要:☆ Over 60 recipes showing you how to design, ...

    Hadoop经典技术书籍合集(Spark, Kafka, HBase, etc.)

    文件包含了现如今最新Hadoop大数据关键技术的教程书籍。

    Hadoop权威指南(中文版)

    Hadoop编程书籍,由浅入深,介绍Hadoop编程,特别适合初学者以及企业开发人员以及大学生以及其他深造学习者

    Hadoop几本相关经典学习书籍打包上传

    Hadoop经典学习数据 打包上传,包括: concepts+techniques-中英文版 Hadoop MapReduce Cookbook Hadoop集群与安全 Hadoop技术内幕:深入解析MapReduce架构设计i与实现原理 Hadoop技术内幕:深入解析YARN架构设计与...

    基于Hadoop的电影影评数据分析

    是大数据课程大作业,基于Hadoop的电影影评数据分析,需要安装Hadoop,了解MapReduce 和HDFS。

    hadoop自学书籍汇总

    hadoop自学书籍汇总 一个分布式系统基础架构,由Apache基金会所开发。 用户可以在不了解分布式底层细节的情况下,开发分布式程序。充分利用集群的威力进行高速运算和存储。 [1] Hadoop实现了一个分布式文件系统...

    Hadoop实战中文版

    《Hadoop实战》分为3个部分,深入浅出地介绍了Hadoop框架、编写和运行Hadoop数据处理程序所需的实践技能及Hadoop之外更大的生态系统。《Hadoop实战》适合需要处理大量离线数据的云计算程序员、架构师和项目经理阅读...

Global site tag (gtag.js) - Google Analytics