- 07 Jul, 2017 1 commit
-
-
Jorge Navarro Muñoz authored
-
- 01 Aug, 2016 1 commit
-
-
Jorge Navarro Muñoz authored
-
- 27 Jul, 2016 1 commit
-
-
Jorge Navarro Muñoz authored
- Better choosing of sequence-pairs for calculation of sequence similarity
-
- 26 Jul, 2016 1 commit
-
-
Jorge Navarro Muñoz authored
-
- 19 Jul, 2016 3 commits
-
-
Jorge Navarro Muñoz authored
-
Jorge Navarro Muñoz authored
- Order list of domains by their absolute position (not their internal position within the feature) - Choose all possible pairs of domains in GK calculation. Before, the last nbhood-1 domains were missing from the pair-choosing - Fixed skewed values in the GK index. The absolute value of Ns-Nr meant that only values in the range [0.5,1.0] were being obtained. Values close to 0 mean the most difference in order of domains (all shared pairs are reversed) whereas values close to 1 mean least difference (all shared pairs are in the same order)
-
Jorge Navarro Muñoz authored
Use parameter --pfam_dir to specify location of hmmpress-procesed pfam files (.h3f, .h3i, .h3m and .h3p). If parameter is not given, default is to look in the same place as the BiG-SCAPE script. Thanks to Alex Cristofaro for helping with this!
-
- 12 Jul, 2016 1 commit
-
-
Jorge Navarro Muñoz authored
-
- 11 Jul, 2016 2 commits
-
-
Jorge Navarro Muñoz authored
-
Jorge Navarro Muñoz authored
- Parameter --skip_mafft is intended to be used when it's necessary to recalculate distance (most probably by changing the nbhood parameter). It's necessary to have: the original gbk files; the .pfs files; the .domtable files; the BGCs.dict and DMS.dict files - If any of the 'skip' parameters is activated, genbank_parser_hmmscan() no longer reads and parses fasta sequences from the genbank files. - --skip_all now recalculates logscore, distance and squared similarity in case that the user has submitted new weights for the Jaccard, DDS and GK indices - Minor other improvements in code, comments etc.
-
- 01 Jul, 2016 1 commit
-
-
Jorge Navarro Muñoz authored
-
- 30 Jun, 2016 1 commit
-
-
Jorge Navarro Muñoz authored
- Also fixed a potential bug: when an outlier distance pair was processed for disc nodes inclusion, only the first node would be added to the network, but not the second one (it had an 'elif' check for the second node) - Also, fixed a small bug when reporting number of cores in parameters.txt
-
- 29 Jun, 2016 2 commits
-
-
Jorge Navarro Muñoz authored
-
Jorge Navarro Muñoz authored
As the three distance indices depend on the domains identified in each gene cluster, there is no point in including gene clusters where hmmscan could not identify any. The method used relies on the fact that there is no information that can be extracted from the .domtable files.
-
- 24 Jun, 2016 2 commits
-
-
Jorge Navarro Muñoz authored
-
Jorge Navarro Muñoz authored
If SeqIO finds a problem with the file, it will be removed from the analysis at an early stage (even samples can be deleted if contain only bad files). For the time being, this includes genbank files from the MiBIG dump that include multiple loci.
-
- 23 Jun, 2016 1 commit
-
-
Jorge Navarro Muñoz authored
- There's now two possible settings to re-run BiG-SCAPE: * --skip_hmmscan * --skip_all The latter for just generating new network files - The --samples parameter is discarded. BiG-SCAPE will automatically create network files per sample in the case "S" is specified for --seqdist_networks or --domaindist_networks - The name of the samples (for use in the output network files) will be the name of the folder containing genbank files, so please take that into account when preparing the input directory. It's not valid also to repeat folder's names (e.g. day1/sample_pond, day2/sample_pond) and **strongly** discouraged to repeat genbank file's names - The network output files are no longer sorted by log2 score (this may change in the future)
-
- 17 Jun, 2016 3 commits
- 16 Jun, 2016 5 commits
-
-
Jorge Navarro Muñoz authored
Output subdirectories now named simply 'domains' and 'networks'. Also fixed writing of parameters file (output directory should exist first)
-
Jorge Navarro Muñoz authored
-
Jorge Navarro Muñoz authored
-
Jorge Navarro Muñoz authored
-
Jorge Navarro Muñoz authored
antiSMASH-processed genbank files contain the "cluster" identifier. The next "product" tag indicates the GC class. Nevertheless, MiBIG entries do not contain this information. Before commit 3d213c64, BiG-SCAPE only took into account the former. After the commit both kinds of files were accepted but it might be possible that antiSMASH-produced genbank files were not being correctly processed. This impacts the "group" columns in the output network files.
-
- 15 Jun, 2016 2 commits
-
-
Jorge Navarro Muñoz authored
Case 1 (domain is not shared between GC): Not being separated according to whether the domains is in the anchor list or not Case 3 (multiple domains shared between GCs): Matrix of distances for use in Munkres algorithm was not being correctly filled
-
Jorge Navarro Muñoz authored
-
- 13 Jun, 2016 1 commit
-
-
Jorge Navarro Muñoz authored
-
- 08 Jun, 2016 1 commit
-
-
Jorge Navarro Muñoz authored
Prettyfied output Minor other stuff in the code Before being tested on Windows, code was checked against [previous version](https://git.wageningenur.nl/yeong001/BGC_networks/commit/3d213c640a8ca3c4a9e16c909ad9cfd2cc498a7f) using the following genbank files from MiBIG (v1.2, renamed with name of the class and compound): * Cyanobactins.BGC0000477_patellin2-3.gbk * Lanthipeptides_I.BGC0000538_nisin_z.gbk * Lanthipeptides_I.BGC0000547_salivaricin_9.gbk * Lanthipeptides_I.BGC0000548_salivaricin_a.gbk * Lanthipeptides_I.BGC0000549_salivaricin_d.gbk * Lanthipeptides_I.BGC0000550_salivaricin_g32.gbk * Lanthipeptides_I.BGC0000624_salivaricin_crl1328_alpha-beta_peptide.gbk * Lanthipeptides_II.BGC0000508_epidermin.gbk * Lanthipeptides_II.BGC0000514_gallidermin.gbk * Lanthipeptides_III.BGC0000506_entianin.gbk * Lanthipeptides_III.BGC0000511_ericin_s.gbk * Lanthipeptides_III.BGC0000559_subtilin.gbk * Lanthipeptides_IV.BGC0000529_microbisporicin_a2.gbk * Lanthipeptides_IV.BGC0000544_planosporicin.gbk * Linaridins.BGC0000582_cypemycin.gbk * Linaridins.BGC0000583_grisemycin.gbk * Microcins.BGC0000586_microcin_E492.gbk * Microcins.BGC0000587_microcin_H47.gbk * Microcins.BGC0000589_microcin_m.gbk * Microviridins.BGC0000592_microviridin_b.gbk * Microviridins.BGC0000593_microviridin_j.gbk * Microviridins.BGC0000594_microviridin_k.gbk * Thiopeptides_I.BGC0000604_GE2270a.gbk * Thiopeptides_I.BGC0000605_GE37468.gbk * Thiopeptides_I.BGC0000613_thiomuracin.gbk * Thiopeptides_I.BGC0001155_GE2270.gbk * Thiopeptides_II.BGC0000614_thiostrepton.gbk * Thiopeptides_II.BGC0000655_siomycin.gbk * Thiopeptides_III.BGC0000608_nocathiacin.gbk * Thiopeptides_III.BGC0000609_nocathiacin.gbk * Thiopeptides_III.BGC0000610_nosiheptide.gbk * Thiopeptides_IV.BGC0000612_thiocillin_I.gbk * Thiopeptides_IV.BGC0000628_lactocillin.gbk Save for minor differences in the alignment files (which depend on MAFFT), output is the same.
-
- 06 Jun, 2016 4 commits
-
-
Jorge Navarro Muñoz authored
-
Jorge Navarro Muñoz authored
Sanitized input directory name a bit more. Definitive solution, probably, would be to simply name subdirectories within outputdir 'domains' and 'networks'
-
Jorge Navarro Muñoz authored
-
Jorge Navarro Muñoz authored
It should be possible to use paths outside current directory as inputdir (such as ~/). Also, try to prevent double slashes in paths used throughout the script.
-
- 03 Jun, 2016 1 commit
-
-
Jorge Navarro Muñoz authored
Feed correct parameters for sample network (domaindist). Do nothing for sample network generation (seqdist and domaindist) if sample size == 1
-
- 02 Jun, 2016 6 commits
-
-
Jorge Navarro Muñoz authored
save like 2 clock cycles when calculating percentage of identity between domains for each position in their sequences
-
Jorge Navarro Muñoz authored
-
Jorge Navarro Muñoz authored
-
Jorge Navarro Muñoz authored
-
-
Jorge Navarro Muñoz authored
-