ERROR: Python with virtualenvwrapper module not found!

这个错误是我在本机启动Spark Standalone集群时出现的错误,如下:

1
2
3
4
5
6
$ ./sbin/start-all.sh 
starting org.apache.spark.deploy.master.Master, logging to /home/zhangjc/frin/spark/spark-3.5.1-bin-hadoop3/logs/spark-zhangjc-org.apache.spark.deploy.master.Master-1-frin.out
localhost: ERROR: Python with virtualenvwrapper module not found!
localhost: Either, install virtualenvwrapper module for the default python3 interpreter
localhost: or set VIRTUALENVWRAPPER_PYTHON to the interpreter to use.
localhost: starting org.apache.spark.deploy.worker.Worker, logging to /home/zhangjc/frin/spark/spark-3.5.1-bin-hadoop3/logs/spark-zhangjc-org.apache.spark.deploy.worker.Worker-1-frin.out

其实,这个错误跟Spark一点关系都没有。通过跟踪Spark启动脚本,发现在 sbin/workers.sh 中有以下代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
for host in `echo "$HOSTLIST"|sed  "s/#.*$//;/^$/d"`; do
if [ -n "${SPARK_SSH_FOREGROUND}" ]; then
ssh $SPARK_SSH_OPTS "$host" $"${@// /\\ }" \
2>&1 | sed "s/^/$host: /"
else
ssh $SPARK_SSH_OPTS "$host" $"${@// /\\ }" \
2>&1 | sed "s/^/$host: /" &
fi
if [ "$SPARK_WORKER_SLEEP" != "" ]; then
sleep $SPARK_WORKER_SLEEP
fi
if [ "$SPARK_SLAVE_SLEEP" != "" ]; then
>&2 echo "SPARK_SLAVE_SLEEP is deprecated, use SPARK_WORKER_SLEEP"
sleep $SPARK_SLAVE_SLEEP
fi
done

之所以出现这个错误是因为我本机安装了Python的virtualenvwrapper,并在 .bashrc 中做了以下配置:

1
source /home/zhangjc/.local/bin/virtualenvwrapper.sh

又因为我本地的virtualenvwrapper使用的非系统自带Python版本,并且在profile中配置了针对性的环境变量。在ssh远程登录时profile中的环境变量不会被自动加载,所以出现了上面的错误信息。

原因找到了,解决就很简单了。只需要在 .bashrc 中增加profile环境变量加载即可,如下:

1
2
source /etc/profile
source /home/zhangjc/.local/bin/virtualenvwrapper.sh