Add (sub)graph export (AKA region of interest)
NB: Still under active development (this branch is a small side project of mine).
This merge request will add a subcommand to retrieve the cDBG from PanTools in GFA format (and others?).
TODO:
-
Check accuracy GFA v1 output -
Whole pangenome export -
Decide on subcommand name -
Decide on what output formats should be supported (only GFA; which is slow) -
Check speed on large pangenomes
-
-
Add subcommand for building nucleotide layer from existing graph (GFA v1 format)- => edit: to be done with !198
-
Add subcommand for extracting a subgraph in GFA format, including annotations for Bandage -
Get separate subcommand for regions only -
Define outputs for region (see below for implementation status)
-
-
Write all output formats -
GFAv1 -
Include Bandage annotation CSV for outputs -
Fasta for each genome -
Gff3 for each genome -
PAV for each homology group -
PAV for each kmer/node -
Collinearity file (/visualization)
-
TODO after commit c565cb45 (where the 'novel' algorithm, which is a combination of kmer and alignment, has been implemented and tested):
-
Add parameter for minimal number of kmers in a block for the 'novel' algorithm -
Make 'novel' algorithm default and rename to more sensible name -
Remove other algorithms -
Create homology based search using the 'novel' algorithm -
Use simple (NJ?) clustering on kmer PAV for ordering the output -
Add new parameter --flanking
to add additional flanking sequence after the ROI finding algorithm -
Clean up unused code
Edited by Workum, Dirk-Jan van