|
|
# Parameters
|
|
|
|
|
|
---
|
|
|
|
|
|
Type `python bigscape.py -h` to display a list of all available parameters. See also the following sections:
|
|
|
|
|
|
## help
|
|
|
|
|
|
Type `-h` or `--help` to display all available options
|
|
|
|
|
|
## Input folder
|
|
|
|
|
|
```
|
|
|
-i INPUTDIR, --inputdir INPUTDIR
|
|
|
```
|
|
|
|
|
|
Specify a path with the starting point to look for `.gbk` files. If empty, the search will start where the BiG-SCAPE files are located. The search is recursive. See more information [here](input)
|
|
|
|
|
|
## Exclude string
|
|
|
|
|
|
```
|
|
|
--exclude_gbk_str EXCLUDE_GBK_STR
|
|
|
```
|
|
|
|
|
|
If any string in this list occurs in the filename, this file will not be used for the analysis. (default: 'final')
|
|
|
|
|
|
## Output folder
|
|
|
|
|
|
```
|
|
|
-o OUTPUTDIR, --outputdir OUTPUTDIR
|
|
|
```
|
|
|
|
|
|
Output directory, this will contain all output data files. See its structure and more details about each type of result [here](output)
|
|
|
|
|
|
## Pfam database
|
|
|
|
|
|
```
|
|
|
--pfam_dir PFAM_DIR
|
|
|
```
|
|
|
|
|
|
Location of Pfam files. Default is same location of BiG-SCAPE. See how to prepare this files in the [installation instructions](installation)
|
|
|
|
|
|
## Cores
|
|
|
|
|
|
```
|
|
|
-c CORES, --cores CORES
|
|
|
```
|
|
|
|
|
|
BiG-SCAPE will try to paralellize some steps in the analysis like domain prediction and distance calculation. Use this option to set the number of cores the script may use. If not specified, BiG-SCAPE will use all available cores.
|
|
|
|
|
|
## Verbose
|
|
|
|
|
|
```
|
|
|
-v, --verbose
|
|
|
```
|
|
|
|
|
|
Prints more detailed information of each step in the analysis. Toggle to activate. Because of the amount of information, it might be a good idea to redirect output to to a file e.g.:
|
|
|
|
|
|
```
|
|
|
$> python bigscape.py <options> --verbose > run.log &
|
|
|
```
|
|
|
|
|
|
## Include singletons
|
|
|
|
|
|
```
|
|
|
--include_singletons
|
|
|
```
|
|
|
|
|
|
Toggle to activate. This will include BGCs that don't have a distance lower than the cutoff distance specified.
|
|
|
|
|
|
## Domain overlap
|
|
|
|
|
|
```
|
|
|
-d DOMAIN_OVERLAP_CUTOFF, --domain_overlap_cutoff DOMAIN_OVERLAP_CUTOFF
|
|
|
```
|
|
|
|
|
|
Specify at which overlap percentage domains are considered to overlap. Domain with the best score is kept (default=0.1). See also [domain prediction](domain prediction).
|
|
|
|
|
|
## Minimum size
|
|
|
|
|
|
```
|
|
|
-m MIN_BGC_SIZE, --min_bgc_size MIN_BGC_SIZE
|
|
|
```
|
|
|
|
|
|
Provide the minimum size of a BGC to be included in the analysis. Default is 0 base pairs. This includes the sum of all loci in a multi-record GenBank file.
|
|
|
|
|
|
## Mix
|
|
|
|
|
|
```
|
|
|
--mix
|
|
|
```
|
|
|
|
|
|
By default, BiG-SCAPE separates the analysis according to the BGC product and will create network directories for each class (see [BiG-SCAPE classes](BiG-SCAPE classes)). Toggle to include an analysis mixing all classes. As BiG-SCAPE needs to calculate an all-vs-all distance network, this might use a lot of memory.
|
|
|
|
|
|
## No classify
|
|
|
|
|
|
```
|
|
|
--no_classify
|
|
|
```
|
|
|
|
|
|
By default, BiG-SCAPE classifies the output files analysis based on the BGC product. Toggle to deactivate (note that if the `--mix` parameter is not activated, BiG-SCAPE will not create any network file but all intermetiate files will be processed)
|
|
|
|
|
|
## Filter classes
|
|
|
|
|
|
```
|
|
|
--banned_classes {PKSI, PKSother, NRPS, RiPPs, Saccharides, Terpene, PKS-NRP_Hybrids, Others}
|
|
|
```
|
|
|
|
|
|
BiG-SCAPE Classes that should NOT be included in the classification. E.g. "--banned_classes PKSI PKSOther". Strings in lowercase are also allowed.
|
|
|
|
|
|
## Cutoffs
|
|
|
|
|
|
```
|
|
|
--cutoffs {0.0-1.0}
|
|
|
```
|
|
|
|
|
|
Generate networks using multiple raw distance cutoff values, example: `--cutoffs 0.1, 0.25, 0.5, 1.0`. Default: all values from 0.10 to 0.80 with 0.05 intervals. For every cutoff value, a different network file will be generated. Regarding the interactive visualization, only the highest cutoff will be shown. Automatic clustering of Gene Cluster Families will be done using each cutoff.
|
|
|
|
|
|
## Clans
|
|
|
|
|
|
```
|
|
|
--clans
|
|
|
```
|
|
|
|
|
|
If activated, BiG-SCAPE will perform a second layer of clustering and attempt to group GCFs assigned from clustering
|
|
|
|
|
|
```
|
|
|
--clan_cutoff {0.0-1.0}
|
|
|
```
|
|
|
|
|
|
Cutoff Parameters for which clustering families into clans will be performed in raw distance. First value is the GCF cutoff value used in clan clustering (default: 0.5). If this GCF cutoff value is not included within `--cutoffs`, it will be added automatically. Second value is the GCC cutoff value for clustering families into clans (default: 0.8). Average linkage for BGCs in a family is used for distances between families, so, every pair of GCFs linked with an average distance of GCC cutoff value or less will be taken into account. Example: `--clan_cutoff 0.5 0.8`)
|
|
|
|
|
|
Learn more about [GCFs and GCCs](GCFs and GCCs).
|
|
|
|
|
|
## Hybrids
|
|
|
|
|
|
```
|
|
|
--hybrids
|
|
|
```
|
|
|
|
|
|
Toggle to also add BGCs with hybrid predicted products from the PKS/NRPS Hybrids and Others classes to each subclass (e.g. a 'terpene-nrps' BGC will will usually be classified in Others would be added to the Terpene and NRPS classes)
|
|
|
|
|
|
## Alignment Mode
|
|
|
|
|
|
```
|
|
|
--mode {global,lcs,auto}
|
|
|
```
|
|
|
|
|
|
Alignment mode for each pair of gene clusters. 'global' (default) the whole list of domains of each BGC are compared; 'lcs': Longest Common Subcluster mode. Redefine the subset of the domains used to calculate distance by trying to find the longest slice of common domain content per gene in both BGCs, then expand each slice. 'auto' use LCS when at least one of the BGCs in each pair has the 'contig_edge' annotation from antiSMASH v4+, otherwise use global mode on that pair
|
|
|
|
|
|
## Anchorfile
|
|
|
|
|
|
```
|
|
|
--anchorfile ANCHORFILE
|
|
|
```
|
|
|
|
|
|
Point to a custom anchor file. Default is `anchor_domains.txt`, included in with the repository. Learn more about the anchor file [here](anchor file).
|
|
|
|
|
|
## Force hmmscan
|
|
|
|
|
|
```
|
|
|
--force_hmmscan
|
|
|
```
|
|
|
|
|
|
Force domain prediction using `hmmscan` even if BiG-SCAPE finds processed domtable files (e.g. to use a new version of the Pfam database).
|
|
|
|
|
|
## Skip alignment
|
|
|
|
|
|
```
|
|
|
--skip_ma
|
|
|
```
|
|
|
|
|
|
Skip multiple alignment of domains' sequences. Use if alignments have been generated in a previous run. Domain sequence alignment will also be skipped if BiG-SCAPE reutilizes an output directory and no new BGCs are found within the input folder
|
|
|
|
|
|
## MIBiG
|
|
|
|
|
|
```
|
|
|
--mibig
|
|
|
```
|
|
|
|
|
|
Use included BGCs from the [MIBiG database](https://mibig.secondarymetabolites.org/). Currently, version 1.3 of the database is included in the BiG-SCAPE project (about 1,400 BGCs) as a compressed file, which will be unzipped the first time the `--mibig` option is used.
|
|
|
|
|
|
## Query BGC mode
|
|
|
|
|
|
```
|
|
|
--query_bgc QUERY_BGC
|
|
|
```
|
|
|
|
|
|
Instead of making an all-VS-all comparison of all the input BGCs, choose one BGC to compare with the rest of the set (one-VS-all). The query BGC does not have to be within inputdir. The distances that will be used for the GCF and GCC analysis are all that are equal or lower than the maximum cutoff value.
|
|
|
|
|
|
## Version
|
|
|
|
|
|
```
|
|
|
--version
|
|
|
```
|
|
|
|
|
|
Show program's version number and exit |