Emma is a project to create a platform for development of application for Spark and DockerSwarm clusters.
Emma is an open-source project to create a platform for development of applications for Spark and DockerSwarm clusters. The platform runs on an infra-structure composed by virtual machines that must be reachable by SSH. The machines are either cloud virtual machines or Vagrant machines. The latter tool allows the platform to be simulated on a local machine, i.e. in a local development environment.
Once the machines are prepared, the servers are provisioned using Ansible, an automation tool for IT infra-structure. Ansible playbooks
are used to create a storage layer, processing layer, and JupyterHub services. The storage layer offers two flavors of storage, file-base by GlusterFS and Hadoop Distributed File System (HDFS), and object-based using Minio. The processing layer has a Apache Spark cluster and a Docker Swarm sharing the storage instances.
With Ansible we are able to deploy a platform with the same features at different locations, such as local cluster, national infra-structure, or even a commercial cloud provider. Such a feature allows us to have tool-provenance for easily repeatability of experiments between scientists.
Understanding phenological variability
eScience infrastructure for ecological applications of LiDAR point clouds