Skip to content

Handle co-features in annotation files for add_annotations

Workum, Dirk-Jan van requested to merge fix-transspliced-genes-annotation into master

Gff3 files are notoriously difficult to parse; that is why we switched to htsjdk for parsing them. However, it appeared that co-features – which we did not handle previously – occur a lot in organellar genomes in the form of trans-spliced genes. Therefore, this merge request adds code to handle these co-features:

  1. When creating all feature nodes in the pangenome database, it adds all co-features to a list.
  2. After creating all feature nodes but before creating protein sequences, we make all co-features one feature with an updated address that contains all locations (needed for later sequence extraction).
  3. Per co-feature, we keep only node such node and connect the children (CDS and exon) of all other (to be deleted) co-features to it. Also, we create one mRNA for them.

The only thing not handled now are co-features (genes) that have multiple mRNAs. This could occur when a trans-spliced gene undergoes alternative splicing.

Merge request reports