sv-channels

Genome-wide detection of structural variants using deep learning

4
mentions
4
contributors
Get started
474 commitsLast commit ≈ 23 months ago37 stars5 forks

Cite this software

What sv-channels can do for you

  • structural variant (SV) caller in short read alignments (BAM files) using one-dimensional Convolutional Neural Networks
  • supports detection of major SV types: deletions (DEL), insertions (INS), inversions (INV), tandem duplications (DUP) and inter-chromosomal translocations (CTX)

The workflow includes the following key steps:

  1. Transform read alignments into channels
    First, split read positions are extracted from the BAM files as candidate regions for SV breakpoints. For each pair of split read positions (rightmost position of the first split part and leftmost position of the second split part) a 2D Numpy array called window is constructed. The shape of a window is [window_size, number_of_channels], where the genomic interval encompassing the window is centered on the split read position with a context of [-100 bp, +100 bp) for a window_size of 200 bp. From all the reads overlapping this genomic interval and from the relative segment subsequence of the reference sequence 79 (number_of_channels) channels are constructed, where each channel encode a signal that can be used for SV calling. The list of channels can be found here. The two windows are joined as linked-windows with a zero padding 2D array of shape [10, number_of_channels] in between to avoid artifacts related to the CNN kernel in the part at the interface between the two windows. The linked-windows are labelled as SV when the split read positions overlap the SV callset used as the ground truth and noSV otherwise, where SV is either DEL,INS,INV,DUP or CTX according to the SV type.

  2. Model training
    The labelled linked-windows are used to train a 1D CNN to learn to classify them as either SV or noSV. Two cross-validation strategies are possible: 10-fold cross-validation and cross-validation by chromosome, where one chromosome is used as the test set and the other chromosomes as the training set.

  3. SV calling with a trained model
    Once a trained model is generated and the BAM file for the test set is converted into linked-windows, the SV calling is performed using the predict.py script.

Keywords
Programming languages
  • Python 61%
  • Jupyter Notebook 24%
  • R 13%
  • Shell 2%
License
</>Source code

Participating organisations

Life Sciences
Life Sciences
Netherlands eScience Center
University Medical Center Utrecht

Reference papers

Mentions

Contributors

Arnold Kuzniar
Arnold Kuzniar
CS
Carl Shneider
LS
Luca Santuari
Sonja Georgievska
Sonja Georgievska

Related projects

Googling the cancer genome

Identification and prioritization of cancer-causing structural variations in whole genomes

Updated 1 month ago
Finished

Related software

sv-callers

SV

Highly portable parallel workflow to detect structural variants in cancer genomes.

Updated 15 months ago
32 4

sv-gen

SV

Highly portable parallel workflow to generate artificial genomes with structural variants.

Updated 30 months ago
5

Xenon command line interface

XE

A command line interface for the Xenon library that allows you to use remote machines to do your computations.

Updated 15 months ago
9 2