LARA

LARA (Laboratory Automation Robotic Assistant) is an open source Research Data Management Suite and 2nd generation Electronic Lab Notebook with the goal to automate most of the data and metadata entering process by providing a rich infrastructure for lab communication, orchestration, data/metadata storage, evaluation and exchange.
It utilises Ontology based semantic web technology to make data findable via SPARAQL.

LARA is designed to reduce manual data entry by humans to the bare minimum, since scientists should fokus on the creative part of their research ("no one likes to enter data by hand, if a machine can do it").
This is achieved by a very fine grained, gRPC based API and a semantic, ontology based representation of most items.

Please note, that LARA is still in a vivid development stage, so please be not disappointed, if there are "hickups" or "gaps". We are very happy for any feeback and proposals via LARA gitlab issues, it should be a community project, so please feel free to contribute !! Thank you very much in advance !!

Planning

Scientific research is structured in LARA in Projects and Experiments:

Projects can have as many sub-experiments as desired, similarily, Experiments can have many sub-experiments.

Process Orchestration / Scheduling

Experiments can be structured in Processes and Procedures.
Processes describe "what" is done in an experiment, while Procedures describe "how" it is done.

Processes and Procedures can be denoted in the pythonLab Procdure/Process description language,
a very powerful, yet simple language, even suited for very complex procdures and processes, including loops, conditions, etc., enabling closed loop automation / experiments.

These Processes and Procedures are then executed by the Lab Orchestrator and pythonlab scheduler, which communicate with the corresponding SiLA servers, like instruments, reactors, robots, machine learning algorithms or even humans (human feedback is informed/integrated via an app).

The Laborchestrator is a microservice that receives messages from the LARA Django API to orchestrate the execution of experiments and procedures on the corresponding SiLA servers. The pythonlab scheduler microservice calculates the best schedule in which order the process and procedure steps should be executed.

Data Collection

The Data Collector is a microservice that receives messages from the Laborchestrator to collect data from corresponding SiLA servers, if data is ready for collection and stores the data in the corresponding LARA data database. Larger datasets are not directly stored in the database, but in an S3-compliant object storage, minio by default, and only linked to the entries in the database. We recoommend to store the data in an efficient, open data format, like SciDat, which combines tabular data with (semantic, JSON-LD based) metadata in a single, compact file.

LARA database

The LARA database is python django based and consist of many modules (the current set can be easily expanded by specific applications):

Currently we have modules for Procdures, Processes and Methods, Material - Parts, Devices and Labware, Substances, Polymers, Mixtures and Reactions, Sequences, like DNA, RNA, peptide/protein sequences, 3D Structures of molecules, Organisms, Samples, Data and for the management of Scientists, Institutions, Companies, etc..

LARA differentiates between "abstract" entities, like "the substance ethanol in general with its generic properties molecular weight and sum formula etc." or "thermometer" as a "generic thermometer as a device for measureing temperature" and an "instance" of these abstracta: "the bottle of ethanol in a particular lab from a particular vendor" / "the thermometer of vendor x with a temperature range 250-350K".

These instance information of individual objects is stored in the corresponding stores. Currently,

Substance Store, Material Store and Organism Store.

These stores can also be used as "free" inventories for the lab (ordering workflows will follow soon).

The "class/abstractum" versus "instance/individuum" paradigm is a common design paradigm in LARA.

Semantic Database

For the semantic representation of the items a triple store is used, which is based on openlink virtuoso. The terms are defined in Ontologies. This semantic representation is used to make the data findable via SPARQL.
The semantic information generated automatically, when new database items are created.

Data(-base) Synchronisation

Selected parts of projects, experiments and data can be synchronised with other LARA instances via the Data Synchronisation microservice.

Data Evaluation

Data can be evaluated via the Data Evaluation microservice, which is based on pandas, numpy and scipy.

The evalated data is then visualised via default Data Visualisations.

For more advanced evaluations and visualisations, Jupyter Notebooks can be used, as all data is accessible via the LARA gRPC APIs.

Search

For simple searches, each Module has a search function, which can be used to search for specific items.
For more advanced, complex searches, the LARA SPARQL Endpoint can be used.

Installation

Installation is made as easy as possible via only two commands. The recommended installation method for testing is currently via docker compose as described in the README.md of the lara-django repository. docker compose is now part of a recent docker installations, so no additional installation is required.

Related Projects

More generic Modules can be found at our OpenSourceLab Project.

Description