Set up cluster : Spark 1.3.1 en Ubuntu 14.04

abril 25, 2017

Set up cluster manually: Spark 1.3.1 en Ubuntu 14.04

Introduction

When the cluster is running let configure Spark 1.3.1 and assuming Anaconda install to provide numpy.

Set up password-less SSH

Contact each node:

ssh -i /home/raf/Documents/Cloud/rvf_keele_cloud.pem ubuntu@10.8.3.127

ssh -i /home/raf/Documents/Cloud/rvf_keele_cloud.pem ubuntu@10.8.3.128

In the master

ubuntu@master:~$ ssh-keygen -t rsa -P ""
ubuntu@master:~$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
chmod 644 $HOME/.ssh/authorized_keys
ubuntu@master:~$ ssh localhost

with $HOME = /home/ubuntu

On workers:

copy ~/.ssh/id_dsa.pub from your master to the worker, then use:

cat /home/ubuntu/.ssh/id_rsa.pub >> /home/ubuntu/.ssh/authorized_keys

chmod 644 /home/ubuntu/.ssh/authorized_keys

References:

https://gist.github.com/samklr/75486c2d9e31c5998443

http://www.thecloudavenue.com/2012/01/how-to-setup-password-less-ssh-to.html

http://why-not-learn-something.blogspot.co.uk/2015/06/spark-installation-pseudo.html

http://blog.insightdatalabs.com/spark-cluster-step-by-step/

http://stackoverflow.com/questions/31899837/how-to-start-apache-spark-slave-instance-on-a-standalone-environment

With Maven

https://mbonaci.github.io/mbo-spark/

Configuration file: conf/slaves file on your master

We need to add hostname to /etc/host

sudo vim /etc/hosts
Add Full Qualified HostName (FQHN) and Hostname (to search for it command “ “hostname -f”)
Add your node: 192.168.0.24 p1-spark-node-001.novalocal p1-spark-node-001

Start Master and Workers

run sbin/start-all.sh

the cluster manager’s web UI should appear at

http://masternode:8080 and show all your workers.

Start Master and Workers by Hand

Run some commands in the Master and in the Workers:

Master

cd /opt/spark/bin/

./spark-class org.apache.spark.deploy.master.Master

Now you can check those urls:

192.162.0.52:7077 where you should send the jobs and 192.168.0.52:8080 the MasterUI.

I am going to visit the last one:

Remember the port should be open. As you remember 192.168.0.52 is the internal IP of my master. Tunneling:

ssh -i /home/raf/Documents/Cloud/rvf_keele_cloud.pem -L 8200:localhost:8080 ubuntu@10.8.3.127

Now you can check in the browser: http://localhost:8200/

Workers

cd /opt/spark/bin/

./spark-class org.apache.spark.deploy.worker.Worker spark://192.168.0.52:7077

Now we can see in the browser the inclusion of the new worker:

To submit a job

You have to have a copy of all files in the some location in master and all workers. I have copied a folder with data and a python script in /opt/spark/bin:

cd /opt/spark/bin

./spark-submit --master spark://192.168.0.52:7077 paper_cluster_spark_to_run/recommendations.py

To run it you have to open another ssh: ssh -i /home/raf/Documents/Cloud/rvf_keele_cloud.pem ubuntu@10.8.3.127

References

To set up everything commands: https://gist.github.com/samklr/75486c2d9e31c5998443
Karau, H., Konwinski, A., Wendell, P., & Zaharia, M. (2015). Learning spark: lightning-fast big data analysis. " O'Reilly Media, Inc.".

Buscar este blog

Incrédulos Anónimos

Set up cluster : Spark 1.3.1 en Ubuntu 14.04

Introduction

Set up password-less SSH

In the master

On workers:

Configuration file: conf/slaves file on your master

Start Master and Workers

Start Master and Workers by Hand

Master

Workers

To submit a job

References

Comentarios

Publicar un comentario

Entradas populares de este blog

Reflecting about SIR models and some examples

Adicction and Decision Making: a brainy view.

Making MongoDB remotely available.