PanTools merge requestshttps://git.wageningenur.nl/bioinformatics/pantools/-/merge_requests2023-03-23T11:52:05Zhttps://git.wageningenur.nl/bioinformatics/pantools/-/merge_requests/143Moved from general constraint validators for msa to a specific msa validator2023-03-23T11:52:05ZRobin van EschMoved from general constraint validators for msa to a specific msa validatorMoved from general constraint validators from msa to a specific, more readable one after issues with changing user options in MSA.java.Moved from general constraint validators from msa to a specific, more readable one after issues with changing user options in MSA.java.Workum, Dirk-Jan vanWorkum, Dirk-Jan vanhttps://git.wageningenur.nl/bioinformatics/pantools/-/merge_requests/142Change default input sequence type behavior for MSA2023-08-15T14:34:34ZWorkum, Dirk-Jan vanChange default input sequence type behavior for MSAPreparing for the introduction of variation in PanTools (!128), we decided to split the default input sequences taken for `pantools msa` in boolean flags. Therefore, this merge request introduces `--align-protein` and `--align-nucleotide...Preparing for the introduction of variation in PanTools (!128), we decided to split the default input sequences taken for `pantools msa` in boolean flags. Therefore, this merge request introduces `--align-protein` and `--align-nucleotide` options to `pantools msa` (as well as MSA depending subcommands: `pantools core_phylogeny` and `pantools consensus_tree`).Workum, Dirk-Jan vanWorkum, Dirk-Jan vanhttps://git.wageningenur.nl/bioinformatics/pantools/-/merge_requests/141refactored add_variants, add_pav, remove_variants, remove_pav and variation_o...2023-03-17T12:31:20ZRobin van Eschrefactored add_variants, add_pav, remove_variants, remove_pav and variation_overviewRefactoring and optimization for variation functions
- [x] refactor add_variants
- [x] refactor add_pav
- [x] refactor remove_variants
- [x] refactor remove_pav
- [x] refactor variation_overview
- [x] localized database validationRefactoring and optimization for variation functions
- [x] refactor add_variants
- [x] refactor add_pav
- [x] refactor remove_variants
- [x] refactor remove_pav
- [x] refactor variation_overview
- [x] localized database validationWorkum, Dirk-Jan vanWorkum, Dirk-Jan vanhttps://git.wageningenur.nl/bioinformatics/pantools/-/merge_requests/140Fix maven compile warnings2023-03-16T15:54:59ZWorkum, Dirk-Jan vanFix maven compile warningsWhen running `mvn compile`, two warnings are given before compilation starts:
```bash
% mvn compile
[INFO] Scanning for projects...
[WARNING]
[WARNING] Some problems were encountered while building the effective model for nl.wur.bif:pan...When running `mvn compile`, two warnings are given before compilation starts:
```bash
% mvn compile
[INFO] Scanning for projects...
[WARNING]
[WARNING] Some problems were encountered while building the effective model for nl.wur.bif:pantools:jar:4.1.1
[WARNING] 'build.plugins.plugin.(groupId:artifactId)' must be unique but found duplicate declaration of plugin org.apache.maven.plugins:maven-compiler-plugin @ line 251, column 21
[WARNING] 'dependencies.dependency.scope' for org.junit:junit-bom:pom must be one of [provided, compile, runtime, test, system] but is 'import'. @ line 55, column 20
[WARNING]
[WARNING] It is highly recommended to fix these problems because they threaten the stability of your build.
[WARNING]
[WARNING] For this reason, future Maven versions might no longer support building such malformed projects.
[WARNING]
[INFO]
```
Here, I fix these by moving `junit-bom` to dependencyManagement and by making `maven-compiler-plugin` unique in the `pom.xml` file.Workum, Dirk-Jan vanWorkum, Dirk-Jan vanhttps://git.wageningenur.nl/bioinformatics/pantools/-/merge_requests/139Fix automatic skipping of genomes in core phylogeny2023-03-11T12:08:38ZJonkheer, EefFix automatic skipping of genomes in core phylogenyResolved issue where `core_phylogeny` did not correctly take over the skipped genomes from `gene_classifcation`. This only occurred when no homology groups were included by the user.Resolved issue where `core_phylogeny` did not correctly take over the skipped genomes from `gene_classifcation`. This only occurred when no homology groups were included by the user.https://git.wageningenur.nl/bioinformatics/pantools/-/merge_requests/138Reduce logging build pangenome2023-03-15T09:32:40ZRobin van EschReduce logging build pangenomeSet two problematic log statements clogging up the log file to traceSet two problematic log statements clogging up the log file to traceJonkheer, EefJonkheer, Eefhttps://git.wageningenur.nl/bioinformatics/pantools/-/merge_requests/137Moving busco_protein out of classification2023-03-11T12:08:04ZJonkheer, EefMoving busco_protein out of classification**Classification.java** is an extremely large file because it contains code for many PanTools functions. The solution is moving the code into separate classes. In this merge request I moved code for the `busco_protein` function. I only m...**Classification.java** is an extremely large file because it contains code for many PanTools functions. The solution is moving the code into separate classes. In this merge request I moved code for the `busco_protein` function. I only moved the code and renamed a few functions. Therefore, I don’t expect these changes to impact the outcome of this function.
Major changes:
- Moved ~700 lines of `busco_protein` related code to a new class called **Busco.java**
- get_percentage_str() renamed to percentageAsString() and moved to **Utils.java**
- retrieveNamePropertyAsArray() moved to **Utils.java**
- retrieveNamePropertyAsString() moved to **Utils.java**
- get_phenotype_for_genome() moved to **Utils.java**https://git.wageningenur.nl/bioinformatics/pantools/-/merge_requests/136fixing linking of mRNA node from non-standard GFF with functional annotation2023-02-24T16:03:54ZPardeshi, Lakhansingfixing linking of mRNA node from non-standard GFF with functional annotationCode changes to solves issue #54Code changes to solves issue #54Workum, Dirk-Jan vanWorkum, Dirk-Jan vanhttps://git.wageningenur.nl/bioinformatics/pantools/-/merge_requests/135Update conda environments2023-02-23T10:07:32ZWorkum, Dirk-Jan vanUpdate conda environmentsI noticed that the current conda environment can cause trouble for people who have set their repo priority to strict. After some feedback and a bit of playing around, I found the current conda yaml files to work for me. Please test them ...I noticed that the current conda environment can cause trouble for people who have set their repo priority to strict. After some feedback and a bit of playing around, I found the current conda yaml files to work for me. Please test them yourself; if they work for you as well, we can merge this.Workum, Dirk-Jan vanWorkum, Dirk-Jan vanhttps://git.wageningenur.nl/bioinformatics/pantools/-/merge_requests/134Fix backwards compatibility MSA2023-03-14T12:16:15ZWorkum, Dirk-Jan vanFix backwards compatibility MSAI was running `pantools msa` on an old pangenome and I discovered that !60 introduced a small breaking change: mRNA nodes have a String array for both "name" and "locus_tag" properties instead of the String that it used to be. This cause...I was running `pantools msa` on an old pangenome and I discovered that !60 introduced a small breaking change: mRNA nodes have a String array for both "name" and "locus_tag" properties instead of the String that it used to be. This causes incompatibility with MSA that is fixed in this merge request.Workum, Dirk-Jan vanWorkum, Dirk-Jan vanhttps://git.wageningenur.nl/bioinformatics/pantools/-/merge_requests/133Reformatted AnnotationLayer.addAnnotations2023-04-25T12:08:02ZRobin van EschReformatted AnnotationLayer.addAnnotations- Reformatted AnnotationLayer.addAnnotations
- split validation and graph interactions
- provided better feedback for validation exceptions
- Added --ignore-invalid-features flag to add_annotations
- Removed original logging file, now us...- Reformatted AnnotationLayer.addAnnotations
- split validation and graph interactions
- provided better feedback for validation exceptions
- Added --ignore-invalid-features flag to add_annotations
- Removed original logging file, now uses logger insteadWorkum, Dirk-Jan vanWorkum, Dirk-Jan vanhttps://git.wageningenur.nl/bioinformatics/pantools/-/merge_requests/132Fix adding annotations with missing mRNAs2023-02-21T11:04:45ZWorkum, Dirk-Jan vanFix adding annotations with missing mRNAsWe noticed that the current tutorial dataset (the five chloroplasts) adds some weird proteins. I found out that this is because of a wrong interpretation of the provided gff3 files where no mRNA features are present. Instead, the CDS and...We noticed that the current tutorial dataset (the five chloroplasts) adds some weird proteins. I found out that this is because of a wrong interpretation of the provided gff3 files where no mRNA features are present. Instead, the CDS and exon features have the gene feature as direct parent. In the current `develop` branch, it is assumed that each CDS belongs to a separate gene. However, in the case of chloroplasts these multiple CDS features belong to one transcript. Therefore, this merge request makes that latter the default behaviour but it also adds a new option for `add_annotations`: `--assume-one-mrna-per-cds` which gives the original interpretation.
TODO:
- [x] Update documentation with the new option and behaviourWorkum, Dirk-Jan vanWorkum, Dirk-Jan vanhttps://git.wageningenur.nl/bioinformatics/pantools/-/merge_requests/131Remove last jar files from code base2023-02-17T15:54:05ZWorkum, Dirk-Jan vanRemove last jar files from code baseI noticed that there are still three remaining jar files in our code base (`external/*.jar`). This merge request removes them. As can be seen in the CI/CD pipeline, PanTools compiles as normal and it has no effect on mapping and build_pa...I noticed that there are still three remaining jar files in our code base (`external/*.jar`). This merge request removes them. As can be seen in the CI/CD pipeline, PanTools compiles as normal and it has no effect on mapping and build_pangenome (which would be strange as it compiles without them).Workum, Dirk-Jan vanWorkum, Dirk-Jan vanhttps://git.wageningenur.nl/bioinformatics/pantools/-/merge_requests/130Fix typos in core_phylogeny that mixed up genes and groups2023-02-07T16:37:06ZWorkum, Dirk-Jan vanFix typos in core_phylogeny that mixed up genes and groups@jonkh004 discovered two typos in the logging of `core_phylogeny` that confused genes and groups. This merge request fixes those typos.@jonkh004 discovered two typos in the logging of `core_phylogeny` that confused genes and groups. This merge request fixes those typos.Workum, Dirk-Jan vanWorkum, Dirk-Jan vanhttps://git.wageningenur.nl/bioinformatics/pantools/-/merge_requests/129Change functional database parameter for add_functions2023-02-06T15:31:51ZWorkum, Dirk-Jan vanChange functional database parameter for add_functionsBecause the pantools wrapper script from Bioconda gets all parameters starting with "-D", `add_functions` should not use it. Therefore, we change `-D` parameter to `-F` for `add_functions` in this merge request.
TODO
- [x] update change...Because the pantools wrapper script from Bioconda gets all parameters starting with "-D", `add_functions` should not use it. Therefore, we change `-D` parameter to `-F` for `add_functions` in this merge request.
TODO
- [x] update changelog
- [x] remove functional databases from code baseWorkum, Dirk-Jan vanWorkum, Dirk-Jan vanhttps://git.wageningenur.nl/bioinformatics/pantools/-/merge_requests/128Add variation (SNPs, InDels and PAVs) to PanTools2023-03-30T13:25:26ZWorkum, Dirk-Jan vanAdd variation (SNPs, InDels and PAVs) to PanToolsThis merge request describes all changes needed for adding `add_variants`, `remove_variants`, `add_pavs` and `remove_pavs`.
All of these new subcommands are implemented in the `Variation.java` class as some form of a "variation layer". I...This merge request describes all changes needed for adding `add_variants`, `remove_variants`, `add_pavs` and `remove_pavs`.
All of these new subcommands are implemented in the `Variation.java` class as some form of a "variation layer". Importantly, these novel functionalities have effect on some downstream functions that rely on working with mRNA nodes: `msa`, `core_phylogeny`, `consensus_tree`, `gene_classification` and `pangenome_structure`. To all these subcommands a flag `--variation`/`-v` has been added to make use of this variation information.
The basic idea of this layer is implemented in the constructor of `Variation.java`: if any variation information has been added, they are described in an "accessionNode" per accession/strain/cultivar/... attached to the genome with respect to which this variation is called. This accessionNode must have one of: 1) VCF, 2) PAV properties or both. The accessionNodes contain all other relevant (metadata) information for these accessions. If there is an accessionNode for a given accession, there will also be mRNA nodes for this accession that are linked to the original mRNA node belonging to the genome to which the accessionNode is connected. These new mRNA nodes have an additional "variant_label" indicating that they don't belong to a genome but an additional accession.
For SNPs and InDels, variation is added by providing a VCF file to `add_variants`. This VCF file is processed in parallel for extracting a consensus sequence for each feature node. This consensus sequence is obtained using ~~`bedtools`~~`bcftools`. NB: Not **all** SNPs and InDels are put in the database but only those located within an annotated feature. In each "variant_label" mRNA node per accession, the consensus sequence is present.
For PAVs, the presence or absence is added as a property to all "variant_label" mRNA nodes per accession.
Finally, only adding and removing PAVs works for both pangenomes and panproteomes. Adding and removing SNP/InDel information can only be done for pangenomes because they rely on a genome sequence.
TODO:
- [x] Check and double-check that there are no breaking changes to develop.
- [x] Add novel functionalities to documentation.
- [x] Discuss strategy for adding variation (SNP/InDel) to the graph that are not part of annotated features.
- [x] Restrict `bcftools` version in conda YAML files based on possible breaking changes in `bcftools consensus`.
- [x] Add parameter to keep temporary files.
- [x] Extensively check all possible downstream functionalities for compatibility.
- [x] Fix `group` subcommand if run after `add_variants`.Workum, Dirk-Jan vanWorkum, Dirk-Jan vanhttps://git.wageningenur.nl/bioinformatics/pantools/-/merge_requests/127Merge v4.1.1 changes to develop2023-01-30T08:58:45ZWorkum, Dirk-Jan vanMerge v4.1.1 changes to developSome minor changes were done to the v4.1.1 release branch that still need to be merged with develop; that is done in this merge request.Some minor changes were done to the v4.1.1 release branch that still need to be merged with develop; that is done in this merge request.Workum, Dirk-Jan vanWorkum, Dirk-Jan vanhttps://git.wageningenur.nl/bioinformatics/pantools/-/merge_requests/126Release PanTools v4.1.12023-01-30T09:00:14ZWorkum, Dirk-Jan vanRelease PanTools v4.1.1This merge request merges PanTools v4.1.1 to the release branch pantools_v4.This merge request merges PanTools v4.1.1 to the release branch pantools_v4.Smit, SandraSmit, Sandrahttps://git.wageningenur.nl/bioinformatics/pantools/-/merge_requests/125Merge v4.1.1 to develop2023-01-27T16:42:53ZWorkum, Dirk-Jan vanMerge v4.1.1 to developupdate develop branch with new pantools version v4.1.1update develop branch with new pantools version v4.1.1Workum, Dirk-Jan vanWorkum, Dirk-Jan vanhttps://git.wageningenur.nl/bioinformatics/pantools/-/merge_requests/124Fix NullPointerException in `remove_grouping`2023-01-27T15:43:07ZWorkum, Dirk-Jan vanFix NullPointerException in `remove_grouping``remove_grouping` was giving a NullPointerException when no grouping version was given since the code assumed this parameter is always set. Therefore, I propose to make the `-v` parameter for the grouping version required.`remove_grouping` was giving a NullPointerException when no grouping version was given since the code assumed this parameter is always set. Therefore, I propose to make the `-v` parameter for the grouping version required.Workum, Dirk-Jan vanWorkum, Dirk-Jan van