
Quickly deploy your HPC workloads to the cloud with Elasticluster.
Elasticluster is a set of Ansible scripts created at the University of Zurich which makes setting up Linux clusters in the cloud shockingly easy.
The idea of having on on prem cloud with OpenStack which could be partially dedicated to HPC style workloads that could grown and shrink based off of need is a neat idea. Maybe you have a large HPC running at full capacity and want to push additional workloads out to AWS.
In my case I have a small OpenStack cluster at home and I want to play around with Slurm and Ceph. In the spirit of home lab projects I should be setting these up myself. However I have a 3 month old son and my time to tinker is very limited. I want to build a Linux cluster during nap time and get to playing around with it fast.
The installation process is quick and mostly painless. It can be installed as a Python module or Docker container. From the documentation it says that if Docker is not found it will install it for you. I put it to the test on my Ansible server and found that it mostly worked. The small exception is that I had to add my user to the docker group and I was off and running.
Download Elasticluster.
stevex0r@gunstar:~$ wget -O elasticluster.sh https://raw.githubusercontent.com/gc3-uzh-ch/elasticluster/master/elasticluster.sh
chmod +x elasticluster.sh
Add myself to the docker group.
stevex0r@gunstar:~$ sudo usermod -a -G docker stevex0r
Set up
stevex0r@gunstar:~$ ./elasticluster.shUnable to find image ‘riccardomurri/elasticluster:latest’ locallylatest: Pulling from riccardomurri/elasticluster45b42c59be33: Pull completef875e16ab19c: Pull complete6997dda769d8: Pull complete564b60cf6c28: Pull completee55c09d2e3e0: Pull complete7b2ac708e5b4: Pull complete1c48ec046e3a: Pull completed3df697eb645: Pull completecda3e1b2bfd6: Pull completeDigest: sha256:c40a20d769db3dfb42a2ba442c74d744bda75e485b516701ae643e8923d989bcStatus: Downloaded newer image for riccardomurri/elasticluster:latest
I need to collect some information about my cloud before I am ready to deploy.
I am going to use CentOS 7 for the operating system.
[root@gunstar ~(keystone_admin)]# openstack image list+ — — — — — — — — — — — — — — — — — — — + — — — — — — — — — — — — — — — -+ — — — — +| ID | Name | Status |+ — — — — — — — — — — — — — — — — — — — + — — — — — — — — — — — — — — — -+ — — — — +| 23063171–390c-4576-bc0f-f84da9ff8a32 | CentOS-7-x86_64 | active || 057c7dd4–8844–45cf-a23e-6e46c634bb1c | CentOS-8-x86_64 | active || 16333027-b3a7–4d5f-8133–566e25a0bc22 | Cirros-0.4.0-x86_64 | active || 07baab8f-a314–4f3e-84f4–01425d5b3745 | Ubuntu-20.04-x86_64 | active || af13a4b3-ff09–4165–93fe-904797e4bd56 | fedora-coreos-33.20210117.3.2 | active || a1e6fd2a-571f-4573–8070–8ef600d178a4 | flatcar | active || f7d438fb-0eae-4db6-b8de-73364350a39c | rancheros-1.5.7 | active |+ — — — — — — — — — — — — — — — — — — — + — — — — — — — — — — — — — — — -+ — — — — +
I have an internal network called HPC for my front end and compute nodes. My public network is directly accessible on my home network. Floating IP addresses will be created on the public network to access the cluster. By default I allow SSH access only from my Ansible host for configuration. Once things are working properly I will allow SSH access from the rest of my network to the front end node only.
[root@gunstar ~(keystone_admin)]# openstack network list+ — — — — — — — — — — — — — — — — — — — + — — — — — — + — — — — — — — — — — — — — — — — — — — +| ID | Name | Subnets |+ — — — — — — — — — — — — — — — — — — — + — — — — — — + — — — — — — — — — — — — — — — — — — — +| 48609e1b-0cc3–49cd-8442–2545a35900df | container | f3e79bfa-6952–4c1c-a172-f8f7452681de || 4959faec-0720–4d70-abf8-e80b3bd62912 | kubernetes | 46c28277-dc7d-4728-bbb7-d4ba84073c5f || 68bd8d18–215e-4247–8f29-e965b889b96c | public | cd1e095a-ad69–4b35–9592–85ca0e884280 || 724c00db-d2ee-48b8-b57e-ba9f70a7ca1f | hpc | 617ac8d4–822c-41cc-8641-e5afd5b27cd7 || bc526ae3–21ed-4f05–9043–17eed412a80b | vm | 8cb98547-e305–43a3-ba22-ae83ff9b4eb0 |+ — — — — — — — — — — — — — — — — — — — + — — — — — — + — — — — — — — — — — — — — — — — — — — +
My cloud is comprised of 2 desktop computers with a total of 16 cores and 48GB of RAM. Each node will get 1 core and 2GB of RAM.
[root@gunstar ~(keystone_admin)]# openstack flavor list+ — — + — — — — — -+ — — — -+ — — — + — — — — — -+ — — — -+ — — — — — -+| ID | Name | RAM | Disk | Ephemeral | VCPUs | Is Public |+ — — + — — — — — -+ — — — -+ — — — + — — — — — -+ — — — -+ — — — — — -+| 1 | m1.tiny | 512 | 1 | 0 | 1 | True || 2 | m1.small | 2048 | 20 | 0 | 1 | True || 3 | m1.medium | 4096 | 40 | 0 | 2 | True || 4 | m1.large | 8192 | 80 | 0 | 4 | True || 5 | m1.xlarge | 16384 | 160 | 0 | 8 | True |+ — — + — — — — — -+ — — — -+ — — — + — — — — — -+ — — — -+ — — — — — -+
I have elasticluster installed and I have collected all of the information I need to set things up. In my cloud section I specify that I am using OpenStack. With a little reconfiguration I could point this at a public cloud and be up and running. I will be using CentOS 7 with the username centos to log into my cloud instances. I specify a public and private SSH key. This SSH key is installed on my cloud and will be provided to the nodes when they boot. On my nodes I specify that I will be installing standard HPC utilities, R, Julia, and Ganglia for monitoring. Under boot disk type I list heartofgold. This is the name of a Synology NAS which I use for ISCSI block storage. I will be launching 1 front end node and 3 compute nodes for now.
stevex0r@gunstar:~$ cat /home/stevex0r/.elasticluster/config
################
#Login Info
[cloud/firefly]
provider=openstack
auth_url=http://192.168.1.146:5000/v3
username=admin
password=***********
project_name=admin[login/centos]
image_user=centos
image_user_sudo=root
image_sudo=True
user_key_name=elasticluster
user_key_private=/home/stevex0r/.ssh/elasticluster
user_key_public=/home/stevex0r/.ssh/elasticluster.pub
###############
#cloudcity
#Slurm Cluster centos
[setup/cloudcity]
provider=ansible
frontend_groups=slurm_master,ganglia_master,hpc,julia,r
compute_groups=slurm_worker,ganglia_monitor,hpc,julia,r[cluster/cloudcity]
cloud=firefly
login=centos
setup=cloudcity
security_group=default
allow_reboot=yes
disable_selinux=yes
image_id=23063171–390c-4576-bc0f-f84da9ff8a32
floating_network_id=68bd8d18–215e-4247–8f29-e965b889b96c
network_ids=724c00db-d2ee-48b8-b57e-ba9f70a7ca1f
flavor=m1.small
frontend_nodes=1
compute_nodes=3
ssh_to=frontend
request_floating_ip=True
#
[cluster/cloudcity/frontend]# The frontend shares /home via NFS to the compute nodes.
boot_disk_type=heartofgold
boot_disk_size=50[cluster/cloudcity/compute]
# Use whatever flavour you’d like to use for your compute nodes.
flavor=m1.small
My configuration is in place and I am ready to start the cluster.
stevex0r@gunstar:~$ ./elasticluster.sh start cloudcity
Starting cluster `cloudcity` with:
* 3 compute nodes.
* 1 frontend nodes.
(This may take a while…)
Roughly 30 minutes later my Cluster is ready. From Horizon i see my nodes.

Your cluster `cloudcity` is ready!
Cluster name: cloudcity
Cluster template: cloudcityDefault ssh to node: frontend001- frontend nodes: 1- compute nodes: 3To login on the frontend node, run the command:elasticluster ssh cloudcityTo upload or download files to the cluster, use the command:elasticluster sftp cloudcity
Simple enough I can log into the frontend.
stevex0r@gunstar:~$ ./elasticluster.sh ssh cloudcityLast login: Tue Feb 16 20:20:11 2021 from 192.168.1.144[centos@frontend001 ~]$
On the cluster I can now list some information about the cluster.
[centos@frontend001 ~]$ sinfo -NelTue Feb 16 21:03:59 2021NODELIST NODES PARTITION STATE CPUS S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASONcompute001 1 main* idle 1 1:1:1 1453 0 1 (null) nonecompute002 1 main* idle 1 1:1:1 1453 0 1 (null) nonecompute003 1 main* idle 1 1:1:1 1453 0 1 (null) none
Launch an interactive job into a compute node.
[centos@frontend001 ~]$ srun — nodes=1 — ntasks-per-node=1 — time=01:00:00 — pty bash -i[centos@compute001 ~]$
At this point I have a fully functioning cluster. If I find that I want to expand it I can add a node with the resize command.
stevex0r@gunstar:~$ ./elasticluster.sh resize cloudcity -a 1:compute
I found that out of the box some things such as ganglia didn’t work and needed additional configuration. There were some permissions problems, SELinux policy changes, and changes to configuration files needed to get it to work. With a few tweaks and configurations for storage and LDAP it’s ready to roll.

References
https://elasticluster.readthedocs.io/en/latest/#
Related Posts
https://stevex0r.medium.com/openstack-homelab-installation-75ad6d798994