- 31 Oct, 2017 2 commits
-
-
Kautsar, Satria authored
Move html output folders into each networks output folder (i.e. networks_all,… See merge request medema-group/BiG-SCAPE!9
-
Kautsar, Satria authored
Move html output folders into each networks output folder (i.e. networks_all, networks_all_lcs, ...) to prevent html output being overidden by multiple bigscape runs
-
- 18 Oct, 2017 6 commits
-
-
Kautsar, Satria authored
Html visualization See merge request medema-group/BiG-SCAPE!8
-
Kautsar, Satria authored
-
Satria Ardhe Kautsar authored
2.Change 'Network:' to 'Networks:'
-
Satria Ardhe Kautsar authored
-
Kautsar, Satria authored
Html visualization See merge request medema-group/BiG-SCAPE!7
-
Satria Ardhe Kautsar authored
-
- 17 Oct, 2017 2 commits
-
-
Kautsar, Satria authored
-
Kautsar, Satria authored
-
- 16 Oct, 2017 3 commits
-
-
Jorge Navarro Muñoz authored
-
Jorge Navarro Muñoz authored
-
Kautsar, Satria authored
-
- 11 Oct, 2017 1 commit
-
-
Jorge Navarro Muñoz authored
- Small speedup for LCS: trim pfam identifiers ("PF") when forming words for difflib's SequenceMatcher - Distance calculation stage now outputs number of gene where LCS seed starts for both BGCs + correct orientation of the second BGC. This will be exported eventually to the json file used for BiG-SCAPE's visualization
-
- 10 Oct, 2017 2 commits
-
-
Jorge Navarro Muñoz authored
- Removed local and local-extended modes - Implemented a new parameter called 'mode'. This can be set to 'global' (default), 'lcs' and 'auto' - New mode: 'auto'. Apply lcs mode only if any of the BGCs if fragmented (this is known in the form of the 'contig_edge' annotation by AntiSMASH v4.0+) - Code cleanup - Removed MAFFT as an alignment option for domain sequences - Removed BGCclassIndex column from network distance matrix
-
Jorge Navarro Muñoz authored
-
- 29 Sep, 2017 1 commit
-
-
Jorge Navarro Muñoz authored
-
- 28 Sep, 2017 2 commits
-
-
Jorge Navarro Muñoz authored
-
Jorge Navarro Muñoz authored
-
- 27 Sep, 2017 1 commit
-
-
Jorge Navarro Muñoz authored
-
- 26 Sep, 2017 1 commit
-
-
Jorge Navarro Muñoz authored
Bugfix: BGCs from Others class would get added twice to that class. This happened once when finding BGCs with mixed-type annotations, then again if any of its sub-annotations was also from the Others class. Improvement: Only read fasta and pfd files once when calling GCFs (instead of doing it at each cutoff) Improvement: Pass reduced distance matrix to clusterJsonBatch (so we can get rid of it quickly) Improvement: Pass already sorted list of BGCs to clusterJsonBatch Improvement: improved human readability of .js file Improvement: Cleaned and re-ordered arguments (a bit) Other: Changed clan-calling parameter to '--clans'
-
- 13 Sep, 2017 5 commits
-
-
Jorge Navarro Muñoz authored
-
Jorge Navarro Muñoz authored
Note that they are currently different from the official bundle on MIBiG's page as these ones have been further processed by antiSMASH. You can get this version by downloading the gbk file in each entry's page
-
Jorge Navarro Muñoz authored
Lcs See merge request !6
-
Jorge Navarro Muñoz authored
(cherry picked some changes. Will make further polishing) # Conflicts: # Installation Guide.md # bigscape.py
-
Jorge Navarro Muñoz authored
Clans clustering See merge request !5
-
- 12 Sep, 2017 5 commits
-
-
Emzo de los Santos authored
-
Emzo de los Santos authored
1. Distance is average of average of pairwise distances between family members 2. Two arguments for clan cutoff: first is the cutoff value used for family assignment, second is the cutoff value for clans clustering 3. Fixed bug in distance calculation (max of similarities instead of famSimilarities) 4. Added clan_cutoff value to cutoffs if it's not there yet 5. Clans TSV file 6. Reworked clustering TSV file to take advantage of data structures
-
Jorge Navarro Muñoz authored
+ Some code cleanup
-
-
Jorge Navarro Muñoz authored
fixed typo and clarified installation instructions for pySAPC (probably will have to work on that more in the future for people on Macs)
-
- 07 Sep, 2017 1 commit
-
-
Jorge Navarro Muñoz authored
in their start/end positions. Number of domains in each gene + gene orientation have to be obtained (after having pfd information) Prepare A_string and B_string. Each is a list of concatenated pfam ids. Concatenation follows downstream orientation of each gene e.g.: A_domlist = a b c d e f g List of pfam ids as found in the BGC dcg_a = 1 3 1 2 Number of domains per each gene in the BGC go_a = 1 -1 -1 1 Orientation of each Gene A_string = a dcb e fg List of concatenated domains (note reverse order of bcd) SequenceMatcher from difflib is used to find the largest common slice between A_string and B_string. reverse(B_string) is also tested. Best orientation is kept Extension of slices: As expansion is relatively costly, a minimum of 3 overlapped genes are asked for in the seed subcluster Extension occurs: For each upstream/downstream side: Find which is the BGC closest to the end. The slice of this BGC will be extended until the end The slice of the other BGC will be extended according to max_score (if both slices have the same length, the extension with the best score will be considered) Scoring: Input: slice of 'other' BGC and slice of extended BGC as reference. For each gene in 'other': Try to find gene in reference slice, starting in 'pos_y' If not found, decrease score by 'mismatch' If found, update score with 'match' + 'match_position'*'gap' and update 'pos_y' If score >= max_score, update max_score and current position Output: position where max_score occurred, max_score. Match = +5 Gap = -2 Mismatch = -3 If smallest resulting slice has size 5 or bigger: If Biosynthetic Genes are in the final slices of both BGCs: Use slices for (normal) distance calculation. else Use the whole range of domains else Use the whole range of domains
-
- 28 Aug, 2017 1 commit
-
-
Jorge Navarro Muñoz authored
-
- 22 Aug, 2017 1 commit
-
-
Emzo de los Santos authored
-
- 21 Aug, 2017 2 commits
-
-
-
Jorge Navarro Muñoz authored
-
- 18 Aug, 2017 1 commit
-
-
Jorge Navarro Muñoz authored
-
- 10 Aug, 2017 1 commit
-
-
Emzo de los Santos authored
-
- 08 Aug, 2017 2 commits
-
-
Jorge Navarro Muñoz authored
the one that extracted its sequences. Now all input files are opened only once. Plus, this paves the way to having the Biosynthetic Genes annotated by antiSMASH
-
Jorge Navarro Muñoz authored
Also, sanitize input cutoff values a bit more
-