OBiBa Opal

Opal is OBiBa's core data repository application for epidemiological studies.

8
contributors
Get started
5543 commitsLast commit ≈ 7 days ago30 stars22 forks

Cite this software

What OBiBa Opal can do for you

What is Opal?

Opal is OBiBa's core data managment application. This server application provides all the necessary tools to import, transform and describe data. Subject’s identifiers can also be managed at data import and export time.

Analysis

Thanks to its integration with R , complex statistical analysis and reports can be performed. The implementation of the DataSHIELD process allows advanced statistical data analysis across multiple studies without sharing and disclosing any individual-level data.

Integration

Being integrated with Onyx and Mica, studies using Opal can seamlessly and securely import data collected with Onyx. They can also create web data portals with Mica that query Opal databases to obtain real-time aggregated reports on subject's data.

Secured REST web services are also available allowing to automate server management (Python command line tools) or to access to data (from R or any tools that are web-capable).

Features

Data Warehouse

Here are some of the main features of the Opal’s data warehouse technologies:

  • Store data on an unlimited number of variables,
  • Support MongoDB , Mysql , MariaDB and PostgreSQL as database software backend,
  • Customized variable dictionaries,
  • Import data from CSV, SPSS, SAS, Stata files and from SQL databases,
  • Export data to CSV, SPSS, SAS, Stata files and to SQL databases,
  • Incremental data importation,
  • Connect directly to multiple data source software such as SQL databases and LimeSurvey ,
  • Store data about any type of "entity", such as subject, sample, geographic area, etc.,
  • Store data of any type (e.g., texts, numbers, geo-localisation, images, videos, etc.),
  • Import and store genotype data as VCF files (Variant Call format ),
  • Advanced indexing functionality using ElasticSearch ,
  • SQL API for selecting, filtering, grouping, joining table's data.

Resources

Resources are datasets or computation units which location is described by a URL and access is protected by credentials. When assigned to a R/DataSHIELD server session, remote big/complex datasets or high performance computers are made accessible to data analysts. Opal provides an interface for managing the access to the resources and assigning them to a R/DataSHIELD server session, in integration with the resourcer R package. When using resources, the Opal installation is very light-weight as no database and no import process is required: the data are accessed where they are originaly located, from the R server.

Views and Derived Variables

Opal provides the software infrastructure to create virtual tables called "views" of derived variables that can be persisted on disk or exported into files. Main features are:

  • Comprehensive JavaScript library of util functions commonly used to derive new variables (e.g. unit conversion) See Magma Javascript API .
  • User-friendly interfaces to recode variables without programming,
  • Instant summary statistics computation of the new derived variables.

Privacy, Confidentiality and Security

Opal provides a state-of-the-art software infrastructure for data encryption, participant identifiers management and user authentication/authorization. Main features are:

  • Public Key Infrastructure (PKI) allowing Opal to manage public-private key pairs for encrypting and decrypting data,
  • Authentication using either certificates, username/password or token mechanisms,
  • Integration with any OpenID Connect providers,
  • Advanced participant identifiers manager enabling multiple identifiers per participant,
  • Distinct and highly secure database for storing participant identifiers,
  • Granular permission management down to the variable level,
  • REST web services using HTTPS protocol.

Opal File System

Studies's operations involve file management and exchanges. Opal comes with its own file system to facilitate these processes. Main features are:

  • Centralized and file management,
  • SFTP access.

Genotypes

Genotyping data can be stored in Opal as VCF files (Variant Call format ). This functionality is available as a plugin . Main features are:

  • Support of VCF and BCF formats,
  • Basic statistics,
  • Sample-participant mapping,
  • Extraction of VCF files combined with phenotypes criteria.

R Interface

Opal includes a module enabling data statistical analysis using R. Main features are:

  • R server monitoring from Opal,
  • Secured data access from R,
  • Opal R package (opalr ),
  • DataSHIELD R packages,
  • Import R dataset into Opal,
  • Export Opal dataset into R,
  • Opal files management from R,
  • R server workspaces can be saved and restored.

SQL API

Opal's tables can be queried with SQL:

  • Execute SQL from the web interface and download SQL output,
  • Execute SQL from the R client opalr R package,
  • Execute SQL from Python client sql command.

Reporting

Opal leverages R advanced graphic and statistical capabilities by allowing the design of reports in R Markdown format. Main features are:

  • Scheduled Execution (with email notifications),
  • Advanced statistical analysis,
  • Advanced graphics,
  • Secured data access,
  • RStudio IDE can be used for designing reports.

Indexing
Opal automatically indexes data imported in its embedded search engine (ElasticSearch ). This allows very fast retrieval and complex querying of the data. Main features are:

  • Real-time data dictionary search capability,
  • Real-time data faceted search capability,
  • Contingency tables.

Web Services (API)

Opal is built on REST web services: everything is accessible through an URL. Any client that can make an HTTPs request can be a client to an Opal server. Main features are:

  • The resources can be obtained in JSON or binary form (Protobuf),
  • Client authentication can be done by providing username/password credentials or a token or by establishing a Two-way SSL authentication ,
  • Clients are already available in Javascript, R, Python and Php.
Logo of OBiBa Opal
Keywords
Programming languages
  • Java 70%
  • Vue 20%
  • JavaScript 4%
  • TypeScript 4%
  • Other 2%
License
</>Source code

Participating organisations

Digital Research Alliance of Canada
Canarie
Research Institute of the McGill University Health Center

Contributors

Contact person

IF

Isabel Fortier

Principal Investigator
Research Institute of the McGill University
Mail Isabel
IF
Isabel Fortier
Principal Investigator
Research Institute of the McGill University
SK
Sofiya Koleva
Maelstrom Research
CS
Carsten-Oliver Schmidt
University of Greifswald
JRDA
Jordi Rambla De Argila
Centre for Genomic Regulation
KK
Kari Kuulasmaa
National Institute for Health and Welfare
AH
Ari Haukijärvi
National Institute for Health and Welfare
TN
Teemu Niiranen
National Institute for Health and Welfare

Related software

OBiBa Agate

OB

Agate is a web application that offers users related services to the OBiBa software stack.

Updated 23 months ago
8

OBiBa Mica

OB

Mica is OBiBa's software application that is used to create data web portals for large-scale epidemiological studies or multiple-study consortia.

Updated 23 months ago
8