Tree likelihood model#
torchtree is designed to infer parameters of phylogenetic models, with every analysis containing at least one tree likelihood object. This object is responsible for calculating the probability of an alignment given a tree and its associated parameters. Below, we describe the structure of a tree likelihood object through its JSON representation.
1[
2 {
3 "id": "taxa",
4 "type": "Taxa",
5 "taxa": [
6 {
7 "id": "A",
8 "type": "Taxon"
9 },
10 {
11 "id": "B",
12 "type": "Taxon"
13 },
14 {
15 "id": "C",
16 "type": "Taxon"
17 }
18 ]
19 },
20 {
21 "id": "alignment",
22 "type": "Alignment",
23 "datatype": {
24 "id": "data_type",
25 "type": "NucleotideDataType"
26 },
27 "taxa": "taxa",
28 "sequences": [
29 {
30 "taxon": "A",
31 "sequence": "ACGT"
32 },
33 {
34 "taxon": "B",
35 "sequence": "AATT"
36 },
37 {
38 "taxon": "C",
39 "sequence": "ACTT"
40 }
41 ]
42 },
43 {
44 "id": "like",
45 "type": "TreeLikelihoodModel",
46 "tree_model": {
47 "id": "tree",
48 "type": "UnrootedTreeModel",
49 "newick": "((A:0.1,B:0.2):0.3,C:0.4);",
50 "branch_lengths": {
51 "id": "branch_lengths",
52 "type": "Parameter",
53 "tensor": 0.1,
54 "full": [
55 3
56 ]
57 },
58 "site_model": {
59 "id": "sitemodel",
60 "type": "ConstantSiteModel"
61 },
62 "substitution_model": {
63 "id": "substmodel",
64 "type": "GTR",
65 "rates": {
66 "id": "gtr_rates",
67 "type": "Parameter",
68 "tensor": 0.16666,
69 "full": [
70 6
71 ]
72 },
73 "frequencies": {
74 "id": "gtr_frequencies",
75 "type": "Parameter",
76 "full": 0.25,
77 "tensor": [
78 4
79 ]
80 }
81 },
82 "site_pattern": {
83 "id": "patterns",
84 "type": "SitePattern",
85 "alignment": "alignment"
86 }
87 }
88 }
89]
The first object with type Taxa
defines the taxa in the alignment. Each taxon is defined by an object with type Taxon
and it might contain additional information such sampling date and geographic location.
The second object is an alignment object with type Alignment
which contains the sequences of the taxa defined in the previous object.
The third object is a tree likelihood model with type TreeLikelihoodModel
and is composed of four sub-models:
tree_model
: A tree model extending theTreeModel
class which contains the tree topology and its associated parameters.site_model
: A site model extending theSiteModel
class which contains rate heterogeneity parameters across sites, if any.substitution_model
: A substitution model extending theSubstitutionModel
class which contains the paramteres that parameterize the substitution process.site_pattern
: A site pattern model extending theSitePattern
class which contains the compressed alignment defined in the alignment object.
An optional sub-model extending the BranchModel
class can be added to the tree likelihood model to model the rate of evolution across branches using the branch_model
key.
In the JSON object above, we have specified a tree likelihood model for an unrooted tree with a GTR substitution model and equal rate across sites.
This modular design allows the definition of different tree likelihood models using different combinations of the sub-models.
For example if we wanted to define a tree likelihood model with a proportion of invariant sites we would change the value of the site_model
key to:
{
"id": "sitemodel",
"type": "InvariantSiteModel",
"invariant": {
"id": "proportion",
"type": "Parameter",
"tensor": 0.5
}
}