搭建Hadoop源代码阅读环境
环境
- Mac OS X EI Capitan 10.11.6
- java version “1.7.0_80”
- git version 2.7.4 (Apple Git-66)
- Apache Maven 3.3.9
下载源代码
从Git上下载最新源代码:
1 | git clone git://git.apache.org/hadoop-common.git |
构建代码
构建代码,使项目可以导入到Eclipse中。切换目录到hadoop-common,执行以下命令:
1 | $ mvn install -DskipTests |
这个过程时间比较久,最好有内部的Nexus服务器,不然可能会不够顺畅。
Hadoop 2.7.3源码环境
下载地址:http://hadoop.apache.org/releases.html。下载2.7.3源代码包到本地。使用以下命令解压:
1 | $ tar xzvf hadoop-2.7.3-src.tar.gz |
解压目录下的
hadoop-2.7.3-src/BUILDING.txt
文件提供了很多信息。
源码构建方式同上。
Eclipse中的操作
Common
- File -> Import…
- Choose “Existing Projects into Workspace”
- Select the hadoop-common-project directory as the root directory
- Select the hadoop-annotations, hadoop-auth, hadoop-auth-examples, hadoop-nfs and hadoop-common projects
- Click “Finish”
- File -> Import…
- Choose “Existing Projects into Workspace”
- Select the hadoop-assemblies directory as the root directory
- Select the hadoop-assemblies project
- Click “Finish”
- To get the projects to build cleanly:
- Add target/generated-test-sources/java as a source directory for hadoop-common
- You may have to add then remove the JRE System Library to avoid errors due to access restrictions
最后一个操作的步骤:
- Go to the Build Path settings in the project properties.
- Remove the JRE System Library
- Add it back; Select “Add Library” and select the JRE System Library. The default worked for me.
HDFS
- File -> Import…
- Choose “Existing Projects into Workspace”
- Select the hadoop-hdfs-project directory as the root directory
- Select the hadoop-hdfs project
- Click “Finish”
YARN
- File -> Import…
- Choose “Existing Projects into Workspace”
- Select the hadoop-yarn-project directory as the root directory
- Select the hadoop-yarn-project project
- Click “Finish”
MapReduce
- File -> Import…
- Choose “Existing Projects into Workspace”
- Select the hadoop-mapreduce-project directory as the root directory
- Select the hadoop-mapreduce-project project
- Click “Finish”
错误
错误: 程序包com.sun.javadoc不存在
如果使用JDK8执行mvn install -DskipTests
的话会报一下错误,需要替换为JDK7后再执行。
1 | [INFO] ------------------------------------------------------------- |
‘protoc –version’ did not return a version
错误信息如下:
1 | [ERROR] Failed to execute goal org.apache.hadoop:hadoop-maven-plugins:3.0.0-SNAPSHOT:protoc (compile-protoc) on project hadoop-common: org.apache.maven.plugin.MojoExecutionException: 'protoc --version' did not return a version -> [Help 1] |
这是因为没有安装protoc的缘故。我安装了最新版的protoc3.0.0重新执行报错信息如下:
1 | [ERROR] Failed to execute goal org.apache.hadoop:hadoop-maven-plugins:3.0.0-SNAPSHOT:protoc (compile-protoc) on project hadoop-common: org.apache.maven.plugin.MojoExecutionException: protoc version is 'libprotoc 3.0.0', expected version is '2.5.0' -> [Help 1] |
安装protoc2.5.0版本后重新执行成功。
hadoop-common编译错误:Type AvroRecord cannot be resolved to a type
- 下载avro-tools最新版。
- 进入源码目录
hadoop-2.7.3-src/hadoop-common-project/hadoop-common/src/test/avro
,执行以下命令:
1 | $ java -jar <所在目录>/avro-tools-1.7.7.jar compile schema avroRecord.avsc ../java/ |
其中avsc文件是avro的模式文件,上面命令是要通过模式文件生成相应的.java
文件。
右键单击eclipse中的hadoop-common项目,然后refresh。
注意,avro-tools不要下载最新版,要下载1.7.7版本;最新1.8.1版本测试失败。
hadoop-common编译错误:Type EchoRequestProto cannot be resolved
- 进入源码目录
hadoop-2.7.3-src/hadoop-common-project/hadoop-common/src/test/proto
,执行以下命令:
1 | $ protoc \-\-java_out=../java *.proto |
- 右键单击eclipse中的hadoop-common,然后refresh。