Sign in

sv-channels

Genome-wide detection of structural variants using deep learning

4
mentions
4
contributors

Cite this software

DOI:

10.5281/zenodo.4584796

What sv-channels can do for you

  • structural variant (SV) caller in short read alignments (BAM files) using one-dimensional Convolutional Neural Networks
  • supports detection of major SV types: deletions (DEL), insertions (INS), inversions (INV), tandem duplications (DUP) and inter-chromosomal translocations (CTX)

The workflow includes the following key steps:

  1. Transform read alignments into channels
    First, split read positions are extracted from the BAM files as candidate regions for SV breakpoints. For each pair of split read positions (rightmost position of the first split part and leftmost position of the second split part) a 2D Numpy array called window is constructed. The shape of a window is [window_size, number_of_channels], where the genomic interval encompassing the window is centered on the split read position with a context of [-100 bp, +100 bp) for a window_size of 200 bp. From all the reads overlapping this genomic interval and from the relative segment subsequence of the reference sequence 79 (number_of_channels) channels are constructed, where each channel encode a signal that can be used for SV calling. The list of channels can be found here. The two windows are joined as linked-windows with a zero padding 2D array of shape [10, number_of_channels] in between to avoid artifacts related to the CNN kernel in the part at the interface between the two windows. The linked-windows are labelled as SV when the split read positions overlap the SV callset used as the ground truth and noSV otherwise, where SV is either DEL,INS,INV,DUP or CTX according to the SV type.

  2. Model training
    The labelled linked-windows are used to train a 1D CNN to learn to classify them as either SV or noSV. Two cross-validation strategies are possible: 10-fold cross-validation and cross-validation by chromosome, where one chromosome is used as the test set and the other chromosomes as the training set.

  3. SV calling with a trained model
    Once a trained model is generated and the BAM file for the test set is converted into linked-windows, the SV calling is performed using the predict.py script.

Keywords
Programming language
  • Jupyter Notebook 45%
  • Python 43%
  • R 9%
  • Shell 3%
License
  • Apache-2.0
</>Source code

Participating organisations

Netherlands eScience Center
University Medical Center Utrecht

Mentions

Contributors

Contact person

LS

Luca Santuari

University Medical Center Utrecht
Mail Luca
Arnold Kuzniar
Arnold Kuzniar
Netherlands eScience Center
CS
Carl Shneider
University Medical Center Utrecht
LS
Luca Santuari
University Medical Center Utrecht
Sonja Georgievska
Sonja Georgievska
Netherlands eScience Center

Related projects

Googling the cancer genome

Identification and prioritization of cancer-causing structural variations in whole genomes

Updated 7 days ago
Finished

Related tools

sv-callers

SV

Highly portable parallel workflow to detect structural variants in cancer genomes.

Updated 5 months ago
10 4

sv-gen

SV

Highly portable parallel workflow to generate artificial genomes with structural variants.

Updated 5 months ago
5

Xenon command line interface

XE

A command line interface for the Xenon library that allows you to use remote machines to do your computations.

Updated 5 months ago
9 2