README.md 3.28 KB
Newer Older
Ninjani's avatar
Ninjani committed
1
2
# Turterra

Terlouw, Barbara's avatar
Terlouw, Barbara committed
3
Turterra is a portal for analysing protein families. It consists of two main parts: turterra, which runs a web portal from a folder tree, and turterra-build, which creates any files in the folder tree that may be missing from a .fasta file and a directory containing templates for homology modelling.
Ninjani's avatar
Ninjani committed
4
5
6

## Installation

Terlouw, Barbara's avatar
Terlouw, Barbara committed
7
8
9
10
11
12
13
14
15
16
17
Turterra and turterra-build are installed together as follows:

First, we recommend you install [Anaconda](https://www.anaconda.com/products/individual-b) or [Miniconda](https://docs.conda.io/en/latest/miniconda.html). Then, create a new conda environment for turterra and activate it:

```sh
conda create -n turterra python=3.9
conda activate turterra
```

Next, clone the turterra repository into a location of your choice, navigate to the folder, and install turterra.

Ninjani's avatar
Ninjani committed
18
19
20
21
22
23
```sh
conda install -c bioconda epa-ng hmmer muscle fasttree
git clone https://github.com/TurtleTools/turterra.git
cd turterra
pip install .
```
Terlouw, Barbara's avatar
Terlouw, Barbara committed
24
25
26
27
28
29
30
31
32

The majority of turterra's dependencies are installed through the provided setup.py file. However, some dependencies will need to be installed through conda.

```sh
conda install -c bioconda epa-ng hmmer muscle fasttree
```

Congratulations! Turterra was installed!

Terlouw, Barbara's avatar
Terlouw, Barbara committed
33
## Turterra folder architecture
Terlouw, Barbara's avatar
Terlouw, Barbara committed
34

Terlouw, Barbara's avatar
Terlouw, Barbara committed
35
36
37
In order to run turterra with your own data, create a folder called 'data' in the top-level folder called turterra. This folder should contain the following files and folders:

```
Terlouw, Barbara's avatar
Terlouw, Barbara committed
38
39
40
41
42
43
44
45
turterra
    |--data
        |--data.txt
        |--sequences.fasta
        |--sequence_alignment.fasta
        |--smiles.tsv
        |--structure_alignment.fasta
        |--structures
Terlouw, Barbara's avatar
Terlouw, Barbara committed
46
47
48
            |--accession1_model.pdb
            |--accession2.pdb
            |--accession3_model.pdb
Terlouw, Barbara's avatar
Terlouw, Barbara committed
49
50
            |--...
        |--tree.txt
Terlouw, Barbara's avatar
Terlouw, Barbara committed
51
```
Terlouw, Barbara's avatar
Terlouw, Barbara committed
52

Terlouw, Barbara's avatar
Terlouw, Barbara committed
53
54
55
56
57
58
| file name | file contents |
| ------ | ------ |
| data.txt | tab-separated file, with categories in the first row and data for each sequence in the following rows. Any category can be defined. These are the categories that turterra will later be able to filter your data on. Currently, the categories 'Accession', 'Species' and 'Compounds' should always be present. |
| sequences.fasta | A .fasta file containing all the sequences in the analysis, with the accessions specified in data.txt as headers. |
| sequence_alignment.fasta | A .fasta file containing an alignment of all sequences in the analysis, with the accessions specified in data.txt as headers. |
| smiles.tsv | A tab-separated file, with as header 'Name\tSMILES', and all compounds names in the analysis and their corresponding structures in [SMILES format](http://opensmiles.org/opensmiles.html). |
Terlouw, Barbara's avatar
Terlouw, Barbara committed
59
60
61
| structure_alignment.fasta | A .fasta file containing a structure-based sequence alignment of all sequences in the analysis, with the accessions specified in data.txt as headers. We recommend you create this file with [caretta](https://github.com/TurtleTools/caretta) through turterra-build. |
| structures | Directory containing structural (homology) models for sequences in the analysis in .pdb format. File names should have the format 'accession_model.pdb' for homology-modelled structures, and 'accession.pdb' for crystal structures. Accessions should match the accessions in data.txt. |
| tree.txt | A phylogenetic tree in newick format. Leaf nodes should be labelled with the accessions specified in data.txt. |
Terlouw, Barbara's avatar
Terlouw, Barbara committed
62

Terlouw, Barbara's avatar
Terlouw, Barbara committed
63
64
65
66