搭建 Hadoop 源代码阅读环境
环境
- Mac OS X EI Capitan 10.11.6
- java version “1.7.0_80”
- git version 2.7.4 (Apple Git-66)
- Apache Maven 3.3.9
下载源代码
从 Git 上下载最新源代码:
git clone git://git.apache.org/hadoop-common.git
构建代码
构建代码,使项目可以导入到 Eclipse 中。切换目录到 hadoop-common,执行以下命令:
$ mvn install -DskipTests
$ mvn eclipse:eclipse -DdownloadSources=true -DdownloadJavadocs=true
这个过程时间比较久,最好有内部的 Nexus 服务器,不然可能会不够顺畅。
Hadoop 2.7.3 源码环境
下载地址:http://hadoop.apache.org/releases.html。下载 2.7.3 源代码包到本地。使用以下命令解压:
$ tar xzvf hadoop-2.7.3-src.tar.gz
解压目录下的 hadoop-2.7.3-src/BUILDING.txt 文件提供了很多信息。
源码构建方式同上。
Eclipse 中的操作
Common
- File -> Import…
- Choose “Existing Projects into Workspace”
- Select the hadoop-common-project directory as the root directory
- Select the hadoop-annotations, hadoop-auth, hadoop-auth-examples, hadoop-nfs and hadoop-common projects
- Click “Finish”
- File -> Import…
- Choose “Existing Projects into Workspace”
- Select the hadoop-assemblies directory as the root directory
- Select the hadoop-assemblies project
- Click “Finish”
- To get the projects to build cleanly:
- Add target/generated-test-sources/java as a source directory for hadoop-common
- You may have to add then remove the JRE System Library to avoid errors due to access restrictions
最后一个操作的步骤:
- Go to the Build Path settings in the project properties.
- Remove the JRE System Library
- Add it back; Select “Add Library” and select the JRE System Library. The default worked for me.
HDFS
- File -> Import…
- Choose “Existing Projects into Workspace”
- Select the hadoop-hdfs-project directory as the root directory
- Select the hadoop-hdfs project
- Click “Finish”
YARN
- File -> Import…
- Choose “Existing Projects into Workspace”
- Select the hadoop-yarn-project directory as the root directory
- Select the hadoop-yarn-project project
- Click “Finish”
MapReduce
- File -> Import…
- Choose “Existing Projects into Workspace”
- Select the hadoop-mapreduce-project directory as the root directory
- Select the hadoop-mapreduce-project project
- Click “Finish”
错误
错误: 程序包com.sun.javadoc不存在
如果使用 JDK8 执行 mvn install -DskipTests 的话会报一下错误,需要替换为 JDK7 后再执行。
[INFO] -------------------------------------------------------------
[ERROR] COMPILATION ERROR :
[INFO] -------------------------------------------------------------
[ERROR] /Users/ling/work/git/hadoop-common/hadoop-common-project/hadoop-annotations/src/main/java/org/apache/hadoop/classification/tools/ExcludePrivateAnnotationsStandardDoclet.java:[20,22] 错误: 程序包com.sun.javadoc不存在
‘protoc –version’ did not return a version
错误信息如下:
[ERROR] Failed to execute goal org.apache.hadoop:hadoop-maven-plugins:3.0.0-SNAPSHOT:protoc (compile-protoc) on project hadoop-common: org.apache.maven.plugin.MojoExecutionException: 'protoc --version' did not return a version -> [Help 1]
这是因为没有安装 protoc 的缘故。我安装了最新版的 protoc3.0.0 重新执行报错信息如下:
[ERROR] Failed to execute goal org.apache.hadoop:hadoop-maven-plugins:3.0.0-SNAPSHOT:protoc (compile-protoc) on project hadoop-common: org.apache.maven.plugin.MojoExecutionException: protoc version is 'libprotoc 3.0.0', expected version is '2.5.0' -> [Help 1]
安装 protoc2.5.0 版本后重新执行成功。
hadoop-common 编译错误:Type AvroRecord cannot be resolved to a type
下载 avro-tools 最新版。
进入源码目录 hadoop-2.7.3-src/hadoop-common-project/hadoop-common/src/test/avro,执行以下命令:
$ java -jar <所在目录>/avro-tools-1.7.7.jar compile schema avroRecord.avsc ../java/
其中 avsc 文件是 avro 的模式文件,上面命令是要通过模式文件生成相应的 .java 文件。
- 右键单击 eclipse 中的 hadoop-common 项目,然后 refresh。
注意,avro-tools 不要下载最新版,要下载 1.7.7 版本;最新 1.8.1 版本测试失败。
hadoop-common 编译错误:Type EchoRequestProto cannot be resolved
进入源码目录 hadoop-2.7.3-src/hadoop-common-project/hadoop-common/src/test/proto,执行以下命令:
$ protoc --java_out=../java *.proto
右键单击 eclipse 中的 hadoop-common,然后 refresh。