For more information about MapReduce Service you can refer to this Link
This guide is validated with MRS version 3.1.0-LTS and anaconda3 (Anaconda3-2020.07-Linux-x86_64.sh Link). Since MRS has python 2.7 and 3.8 installed, we choose the version of Anaconda which has also python 3.8 installed
- Install MRS Client
- Install Anaconda
- Integrate with Spark2x
Most of the details are described in Link
It is recommanded to install the VM which runs notebook in the same VPC of the MRS cluster. In this way MRS Manager can easily transfer MRS client to the target VM.
When the client is copied to the target VM, you need to configure NTP server on this VM then configure this MRS client.
For installing and configuring NTP:
sudo yum install ntp -y
Change /etc/ntp.conf with your master nodes ip
service ntpd stop
ntpdate 192.168.1.151 # change to your own master ip
service ntpd start
For configuring the MRS client:
./install.sh /opt/mrsclient
You can use wget to download a choosen version of anaconda for the VM. For example:
wget https://repo.anaconda.com/archive/Anaconda3-2020.07-Linux-x86_64.sh
It is advised to install in another place than the default one, for example /opt/anaconda3
Once done click yes to initiate Anaconda3, the initiation process will be written in ~/.bashrc
The problem is that if it is written in ~/.bashrc, everytime login it will automatically start Anaconda3, so you can copy paste is to ~/.bashrc.anaconda
cp ~/.bashrc ~/.bashrc.anaconda
Then do:
vi ~/.bashrc to remove the conda initialize part
Finally do source ~/.bashrc.anaconda to load the environment
Then do:
jupyter notebook --generate-config --allow-root to generate the conf file.
vi /root/.jupyter/jupyter_notebook_config.py to modify the ip to the host ip:
Change port if already in use:
Save the file
Once done for installing MRS client and anaconda, then you can launch jupyter notebook by the following commands:
source /opt/hadoopclient/bigdata_env
kinit developuser
source ~/.bashrc.anaconda
export PYSPARK_DRIVER_PYTHON="ipython"
export PYSPARK_DRIVER_PYTHON_OPTS="notebook --allow-root"
Finally start the notebook:
pyspark --master yarn --deploy-mode client &