搭建Hadoop源代码阅读环境

环境

  • Mac OS X EI Capitan 10.11.6
  • java version “1.7.0_80”
  • git version 2.7.4 (Apple Git-66)
  • Apache Maven 3.3.9

下载源代码

从Git上下载最新源代码:

1
git clone git://git.apache.org/hadoop-common.git

构建代码

构建代码,使项目可以导入到Eclipse中。切换目录到hadoop-common,执行以下命令:

1
2
$ mvn install -DskipTests
$ mvn eclipse:eclipse -DdownloadSources=true -DdownloadJavadocs=true

这个过程时间比较久,最好有内部的Nexus服务器,不然可能会不够顺畅。

Hadoop 2.7.3源码环境

下载地址:http://hadoop.apache.org/releases.html。下载2.7.3源代码包到本地。使用以下命令解压:

1
$ tar xzvf hadoop-2.7.3-src.tar.gz

解压目录下的hadoop-2.7.3-src/BUILDING.txt文件提供了很多信息。

源码构建方式同上。

Eclipse中的操作

Common

  • File -> Import…
  • Choose “Existing Projects into Workspace”
  • Select the hadoop-common-project directory as the root directory
  • Select the hadoop-annotations, hadoop-auth, hadoop-auth-examples, hadoop-nfs and hadoop-common projects
  • Click “Finish”
  • File -> Import…
  • Choose “Existing Projects into Workspace”
  • Select the hadoop-assemblies directory as the root directory
  • Select the hadoop-assemblies project
  • Click “Finish”
  • To get the projects to build cleanly:
    • Add target/generated-test-sources/java as a source directory for hadoop-common
    • You may have to add then remove the JRE System Library to avoid errors due to access restrictions

最后一个操作的步骤:

  1. Go to the Build Path settings in the project properties.
  2. Remove the JRE System Library
  3. Add it back; Select “Add Library” and select the JRE System Library. The default worked for me.

HDFS

  • File -> Import…
  • Choose “Existing Projects into Workspace”
  • Select the hadoop-hdfs-project directory as the root directory
  • Select the hadoop-hdfs project
  • Click “Finish”

YARN

  • File -> Import…
  • Choose “Existing Projects into Workspace”
  • Select the hadoop-yarn-project directory as the root directory
  • Select the hadoop-yarn-project project
  • Click “Finish”

MapReduce

  • File -> Import…
  • Choose “Existing Projects into Workspace”
  • Select the hadoop-mapreduce-project directory as the root directory
  • Select the hadoop-mapreduce-project project
  • Click “Finish”

错误

错误: 程序包com.sun.javadoc不存在

如果使用JDK8执行mvn install -DskipTests的话会报一下错误,需要替换为JDK7后再执行。

1
2
3
4
[INFO] -------------------------------------------------------------
[ERROR] COMPILATION ERROR :
[INFO] -------------------------------------------------------------
[ERROR] /Users/ling/work/git/hadoop-common/hadoop-common-project/hadoop-annotations/src/main/java/org/apache/hadoop/classification/tools/ExcludePrivateAnnotationsStandardDoclet.java:[20,22] 错误: 程序包com.sun.javadoc不存在

‘protoc –version’ did not return a version

错误信息如下:

1
[ERROR] Failed to execute goal org.apache.hadoop:hadoop-maven-plugins:3.0.0-SNAPSHOT:protoc (compile-protoc) on project hadoop-common: org.apache.maven.plugin.MojoExecutionException: 'protoc --version' did not return a version -> [Help 1]

这是因为没有安装protoc的缘故。我安装了最新版的protoc3.0.0重新执行报错信息如下:

1
[ERROR] Failed to execute goal org.apache.hadoop:hadoop-maven-plugins:3.0.0-SNAPSHOT:protoc (compile-protoc) on project hadoop-common: org.apache.maven.plugin.MojoExecutionException: protoc version is 'libprotoc 3.0.0', expected version is '2.5.0' -> [Help 1]

安装protoc2.5.0版本后重新执行成功。

hadoop-common编译错误:Type AvroRecord cannot be resolved to a type

  • 下载avro-tools最新版。
  • 进入源码目录hadoop-2.7.3-src/hadoop-common-project/hadoop-common/src/test/avro,执行以下命令:
1
$ java -jar <所在目录>/avro-tools-1.7.7.jar compile schema avroRecord.avsc ../java/

其中avsc文件是avro的模式文件,上面命令是要通过模式文件生成相应的.java文件。

右键单击eclipse中的hadoop-common项目,然后refresh。

注意,avro-tools不要下载最新版,要下载1.7.7版本;最新1.8.1版本测试失败。

hadoop-common编译错误:Type EchoRequestProto cannot be resolved

  • 进入源码目录hadoop-2.7.3-src/hadoop-common-project/hadoop-common/src/test/proto,执行以下命令:
1
$ protoc \-\-java_out=../java *.proto
  • 右键单击eclipse中的hadoop-common,然后refresh。