Emma

Emma is a project to create a platform for development of application for Spark and DockerSwarm clusters.

3
mentions
4
contributors
Get started
613 commitsLast commit ≈ 65 months ago3 stars4 forks

Cite this software

What Emma can do for you

  • It is designed for users deploying Spark and DockerSwarm clusters in a cloud infra-structure.
  • It helps the user to prepare cloud virtual machines
  • The provision of machines is done with Ansible, an automation tool for IT infra-structure.
  • It provides command line access to the users to install the required libraries and systems, configure them, start/stop services, add new modules for Jupyter notebooks, and even update the firewall

Emma is an open-source project to create a platform for development of applications for Spark and DockerSwarm clusters. The platform runs on an infra-structure composed by virtual machines that must be reachable by SSH. The machines are either cloud virtual machines or Vagrant machines. The latter tool allows the platform to be simulated on a local machine, i.e. in a local development environment.

Once the machines are prepared, the servers are provisioned using Ansible, an automation tool for IT infra-structure. Ansible playbooks
are used to create a storage layer, processing layer, and JupyterHub services. The storage layer offers two flavors of storage, file-base by GlusterFS and Hadoop Distributed File System (HDFS), and object-based using Minio. The processing layer has a Apache Spark cluster and a Docker Swarm sharing the storage instances.

With Ansible we are able to deploy a platform with the same features at different locations, such as local cluster, national infra-structure, or even a commercial cloud provider. Such a feature allows us to have tool-provenance for easily repeatability of experiments between scientists.

Keywords
Programming languages
  • Shell 65%
  • Scala 28%
  • Python 5%
  • R 2%
License
</>Source code

Participating organisations

Netherlands eScience Center

Mentions

Contributors

Related projects

High spatial resolution phenological modelling at continental scales

Understanding phenological variability

Updated 2 months ago
Finished

eEcoLiDAR

eScience infrastructure for ecological applications of LiDAR point clouds

Updated 2 months ago
Finished