Skip to content

GitLab

Explore

Sign in

bioinformatics
PanTools
Merge requests
!147

Add (sub)graph export (AKA region of interest)

Review changes
Download
Patches
Plain diff

Workum, Dirk-Jan van requested to merge add_gfa_export into develop Mar 29, 2023

Overview 8
Commits 144
Pipelines 64
Changes 52

NB: Still under active development (this branch is a small side project of mine).

This merge request will add a subcommand to retrieve the cDBG from PanTools in GFA format (and others?).

TODO:

Check accuracy GFA v1 output
Whole pangenome export
- Decide on subcommand name
- Decide on what output formats should be supported (only GFA; which is slow)
- Check speed on large pangenomes
~~Add subcommand for building nucleotide layer from existing graph (GFA v1 format)~~
- => edit: to be done with !198
Add subcommand for extracting a subgraph in GFA format, including annotations for Bandage
- Get separate subcommand for regions only
- Define outputs for region (see below for implementation status)
Write all output formats
- GFAv1
- Include Bandage annotation CSV for outputs
- Fasta for each genome
- Gff3 for each genome
- PAV for each homology group
- PAV for each kmer/node
- Collinearity file (/visualization)

TODO after commit c565cb45 (where the 'novel' algorithm, which is a combination of kmer and alignment, has been implemented and tested):

Add parameter for minimal number of kmers in a block for the 'novel' algorithm
Make 'novel' algorithm default and rename to more sensible name
Remove other algorithms
Create homology based search using the 'novel' algorithm
Use simple (NJ?) clustering on kmer PAV for ordering the output
Add new parameter --flanking to add additional flanking sequence after the ROI finding algorithm
Clean up unused code

Edited Apr 19, 2024 by Workum, Dirk-Jan van

Merge request reports

Assignee

Select assignees

Reviewers

Request review from

Time tracking