Fitting a Pagel '94 model with hidden-rates using phytools (2024)

A friend & colleague recently asked me how to fit a Pagel (1994) type model, but in which our binary traits could also be affected by a third, hidden trait. The Pagel ’94 model, remember, is often described as a correlated discrete character evolution – but it would be perhaps more apt to characterize it as a state-dependent trait evolution model, in which the rate of evolution of character 2 might dependent on character 1 and/or vice versa. This might also be a correlated evolutionary process if the rates of transition between character levels in each trait are such that certain character combinations disproportionately accumulate via the evolutionary process.

A hidden-rates model considers a third possibility – and that is that the rates of evolution in one of our traits (or the other) are affected not only by the second observed trait, but also by some other, unobserved, hidden process that too evolves on the phylogeny.

I’ve no doubt whatsoever that this category of model can be fit to data using the powerful corHMM R package, seeing how it is specialized to this category of model; however, it is also not too difficult to do this using phytools, so I thought that I would show how!

To that end, I’m going to load phytools. Users inclined to follow along should update phytools from its GitHub page.

library(phytools)packageVersion("phytools")

## [1] '2.3.6'

Next, I’m going to simulate a discrete character history on my tree using phytools::sim.history. This will ultimately be the history of my unobserved trait. I chose the computer seed for the example because I know from simulation that it will produce a spuriously significant Pagel ’94 model fit.

set.seed(12)tree<-pbtree(n=400,scale=1)Q<-matrix(c(-0.4,0.4,0.4,-0.4),2,2,dimnames=list(0:1,0:1))tt<-sim.history(tree,Q,message=FALSE)plot(tt,direction="upwards", colors=setNames(c("blue","red"),0:1), ftype="off")

Next, let’s simulate a second character, x, whose rate of evolution between two trait levels (0 and 1) depends on the mapped state in tt. We can do this using phytools::sim.multiMk.

Q<-setNames(list( matrix(c(-0.5,0.5,0.5,-0.5),2,2,dimnames=list(0:1,0:1)), matrix(c(-10,10,10,-10),2,2,dimnames=list(0:1,0:1))),0:1)Q

## $`0`## 0 1## 0 -0.5 0.5## 1 0.5 -0.5## ## $`1`## 0 1## 0 -10 10## 1 10 -10

This is a simulation condition in which the red mapped edges on our tree have a twenty fold higher rate of evolution for character x than do the blue edges.

x<-sim.multiMk(tt,Q)head(x,10)

## t22 t190 t253 t254 t156 t7 t143 t144 t354 t355 ## 0 0 0 0 0 0 1 0 0 0 ## Levels: 0 1

Next, we can simulate another character independently of x, that does not depend on the mapped trait.

y<-sim.Mk(tree,Q[[1]])head(y,10)

## t22 t190 t253 t254 t156 t7 t143 t144 t354 t355 ## 0 0 0 0 0 0 0 0 0 0 ## Levels: 0 1

Since we have our data, let’s plot it.

plotFanTree.wTraits(tree,cbind(x,y),ftype="off",part=0.5)

Next, we can fit our Pagel ‘94 model for trait x and y. If you’re following along, this should be significant.

pagel_fit<-fitPagel(tree,x,y,rand_start=TRUE)pagel_fit

## ## Pagel's binary character correlation test:## ## Assumes "ARD" substitution model for both characters## ## Independent model rate matrix:## 0|0 0|1 1|0 1|1## 0|0 -2.473531 0.760731 1.712800 0.000000## 0|1 0.418071 -2.130871 0.000000 1.712800## 1|0 3.236643 0.000000 -3.997374 0.760731## 1|1 0.000000 3.236643 0.418071 -3.654714## ## Dependent (x & y) model rate matrix:## 0|0 0|1 1|0 1|1## 0|0 -1.359170 0.623445 0.735725 0.000000## 0|1 0.597380 -3.728825 0.000000 3.131445## 1|0 2.800125 0.000000 -3.807071 1.006945## 1|1 0.000000 3.993172 0.216549 -4.209721## ## Model fit:## log-likelihood AIC## independent -323.2707 654.5413## dependent -316.4617 648.9234## ## Hypothesis test result:## likelihood-ratio: 13.6179 ## p-value: 0.00861993 ## ## Model fitting method used was fitMk

OK, now fitting a hidden-rates model is no more complicated that first pulling out our input data for the Pagel ‘94 model, then doubling it up, and building an appropriate design matrix for the model. This may be hard to believe, but it’s true! Let’s do it.

## original dataX<-to.matrix(pagel_fit$data,levels(pagel_fit$data))head(X,10)

## 0|0 0|1 1|0 1|1## t22 1 0 0 0## t190 1 0 0 0## t253 1 0 0 0## t254 1 0 0 0## t156 1 0 0 0## t7 1 0 0 0## t143 0 0 1 0## t144 1 0 0 0## t354 1 0 0 0## t355 1 0 0 0

## double it uptmp<-X## add hidden charactercolnames(tmp)<-paste(colnames(tmp),"|+",sep="")colnames(X)<-paste(colnames(X),"|-",sep="")X<-cbind(X,tmp)## new datahead(X,10)

## 0|0|- 0|1|- 1|0|- 1|1|- 0|0|+ 0|1|+ 1|0|+ 1|1|+## t22 1 0 0 0 1 0 0 0## t190 1 0 0 0 1 0 0 0## t253 1 0 0 0 1 0 0 0## t254 1 0 0 0 1 0 0 0## t156 1 0 0 0 1 0 0 0## t7 1 0 0 0 1 0 0 0## t143 0 0 1 0 0 0 1 0## t144 1 0 0 0 1 0 0 0## t354 1 0 0 0 1 0 0 0## t355 1 0 0 0 1 0 0 0

(Here our hidden trait is coded with the two levels + and -.)

Now comes the “harder” part: we have to make a model design matrix that corresponds to our hypotheses.

To start with, I’m going to build a Pagel ’94 model. Why? Just to show that it results in the same parameter estimates and model fit that we obtained using phytools::fitPagel. All this entails is constructing a design matrix in which the transitions between (say) state 0 and state 1 for x can be influenced by the level of y (and vice versa), but not by a hidden character.

See Also

msm function - RDocumentation

Here’s what that design matrix looks like.

D1<-matrix(0,ncol(X),ncol(X), dimnames=list(colnames(X),colnames(X)))D1[1,]<-c( 0, 1, 2, 0, 3, 0, 0, 0)D1[2,]<-c( 4, 0, 0, 5, 0, 3, 0, 0)D1[3,]<-c( 6, 0, 0, 7, 0, 0, 3, 0)D1[4,]<-c( 0, 8, 9, 0, 0, 0, 0, 3)D1[5,]<-c(10, 0, 0, 0, 0, 1, 2, 0)D1[6,]<-c( 0,10, 0, 0, 4, 0, 0, 5)D1[7,]<-c( 0, 0,10, 0, 6, 0, 0, 7)D1[8,]<-c( 0, 0, 0,10, 0, 8, 9, 0)D1

## 0|0|- 0|1|- 1|0|- 1|1|- 0|0|+ 0|1|+ 1|0|+ 1|1|+## 0|0|- 0 1 2 0 3 0 0 0## 0|1|- 4 0 0 5 0 3 0 0## 1|0|- 6 0 0 7 0 0 3 0## 1|1|- 0 8 9 0 0 0 0 3## 0|0|+ 10 0 0 0 0 1 2 0## 0|1|+ 0 10 0 0 4 0 0 5## 1|0|+ 0 0 10 0 6 0 0 7## 1|1|+ 0 0 0 10 0 8 9 0

We can verify it’s correct by turning it into a "Qmatrix" object and plotting it.

plot(as.Qmatrix(D1),show.zeros=FALSE,color=TRUE, logscale=FALSE,palette=rainbow(n=20,start=0.2), mar=rep(0.1,4))

Rather than rates, these different integer values just show the different types of allowed transitions in the model. We can see it’s not a hidden-rates model because transitions of the same type in the observed characters have the same integer indices regardless of the level of the hidden trait (+ or -).

fit1<-fitMk(tree,X,model=D1,lik.func="pruning",rand_start=TRUE)fit1

## Object of class "fitMk".## ## Fitted (or set) value of Q:## 0|0|- 0|1|- 1|0|- 1|1|- 0|0|+ 0|1|+ 1|0|+ 1|1|+## 0|0|- -1.925536 0.623445 0.735725 0.000000 0.566366 0.000000 0.000000 0.000000## 0|1|- 0.597380 -4.295191 0.000000 3.131445 0.000000 0.566366 0.000000 0.000000## 1|0|- 2.800121 0.000000 -4.373431 1.006944 0.000000 0.000000 0.566366 0.000000## 1|1|- 0.000000 3.993181 0.216549 -4.776096 0.000000 0.000000 0.000000 0.566366## 0|0|+ 0.745452 0.000000 0.000000 0.000000 -2.104623 0.623445 0.735725 0.000000## 0|1|+ 0.000000 0.745452 0.000000 0.000000 0.597380 -4.474277 0.000000 3.131445## 1|0|+ 0.000000 0.000000 0.745452 0.000000 2.800121 0.000000 -4.552517 1.006944## 1|1|+ 0.000000 0.000000 0.000000 0.745452 0.000000 3.993181 0.216549 -4.955182## ## Fitted (or set) value of pi:## 0|0|- 0|1|- 1|0|- 1|1|- 0|0|+ 0|1|+ 1|0|+ 1|1|+ ## 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125 ## due to treating the root prior as (a) flat.## ## Log-likelihood: -316.461718 ## ## Optimization method used was "nlminb"## ## R thinks it has found the ML solution.

If we compare this back to the model we obtained using phytools::fitPagel, we (hopefully) see that it has both the same parameter estimates and the same likelihood. Cool! So far so good.

Next, let’s proceed to a model in which both the observed character levels for x and y and the hidden character affect the rates of trait. The design matrix for this model looks as follows.

D2<-matrix(0,ncol(X),ncol(X), dimnames=list(colnames(X),colnames(X)))D2[1,]<-c( 0, 1, 2, 0, 3, 0, 0, 0)D2[2,]<-c( 4, 0, 0, 5, 0, 3, 0, 0)D2[3,]<-c( 6, 0, 0, 7, 0, 0, 3, 0)D2[4,]<-c( 0, 8, 9, 0, 0, 0, 0, 3)D2[5,]<-c(10, 0, 0, 0, 0,11,12, 0)D2[6,]<-c( 0,10, 0, 0,13, 0, 0,14)D2[7,]<-c( 0, 0,10, 0,15, 0, 0,16)D2[8,]<-c( 0, 0, 0,10, 0,17,18, 0)D2

## 0|0|- 0|1|- 1|0|- 1|1|- 0|0|+ 0|1|+ 1|0|+ 1|1|+## 0|0|- 0 1 2 0 3 0 0 0## 0|1|- 4 0 0 5 0 3 0 0## 1|0|- 6 0 0 7 0 0 3 0## 1|1|- 0 8 9 0 0 0 0 3## 0|0|+ 10 0 0 0 0 11 12 0## 0|1|+ 0 10 0 0 13 0 0 14## 1|0|+ 0 0 10 0 15 0 0 16## 1|1|+ 0 0 0 10 0 17 18 0

plot(as.Qmatrix(D2),show.zeros=FALSE,color=TRUE, logscale=FALSE,palette=rainbow(n=20,start=0.2), mar=rep(0.1,4))

Obviously, this model has substantially more parameters to estimate. Let’s fit it.

fit2<-fitMk(tree,X,model=D2,lik.func="pruning",rand_start=TRUE)fit2

## Object of class "fitMk".## ## Fitted (or set) value of Q:## 0|0|- 0|1|- 1|0|- 1|1|- 0|0|+ 0|1|+ 1|0|+ 1|1|+## 0|0|- -1.654085 0.556445 0.240598 0.000000 0.857042 0.000000 0.000000 0.000000## 0|1|- 0.100805 -15.226264 0.000000 14.268417 0.000000 0.857042 0.000000 0.000000## 1|0|- 6.057901 0.000000 -6.914943 0.000000 0.000000 0.000000 0.857042 0.000000## 1|1|- 0.000000 13.506162 0.477997 -14.841201 0.000000 0.000000 0.000000 0.857042## 0|0|+ 0.755982 0.000000 0.000000 0.000000 -5.728516 1.359871 3.612663 0.000000## 0|1|+ 0.000000 0.755982 0.000000 0.000000 0.902253 -1.816693 0.000000 0.158457## 1|0|+ 0.000000 0.000000 0.755982 0.000000 0.000000 0.000000 -2.476454 1.720472## 1|1|+ 0.000000 0.000000 0.000000 0.755982 0.000000 0.000000 0.000000 -0.755982## ## Fitted (or set) value of pi:## 0|0|- 0|1|- 1|0|- 1|1|- 0|0|+ 0|1|+ 1|0|+ 1|1|+ ## 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125 ## due to treating the root prior as (a) flat.## ## Log-likelihood: -301.105356 ## ## Optimization method used was "nlminb"## ## R thinks it has found the ML solution.

Regardless of whether or not evolution in x or y is well-explained by a hidden-rates process, this model should always fit at least as well as our Pagel ’94 model – but it has the latter as a special case.

Finally, we can also consider a model in which the hidden character affects the rates of evolution of our trait, but in which the two observed traits don’t affect each other. We can see that this is a hidden-trait only state-dependent model because the rates of x and y depend only on the hidden character in our design matrix.

D3<-matrix(0,ncol(X),ncol(X), dimnames=list(colnames(X),colnames(X)))D3[1,]<-c( 0, 1, 2, 0, 3, 0, 0, 0)D3[2,]<-c( 4, 0, 0, 2, 0, 3, 0, 0)D3[3,]<-c( 5, 0, 0, 1, 0, 0, 3, 0)D3[4,]<-c( 0, 5, 4, 0, 0, 0, 0, 3)D3[5,]<-c( 6, 0, 0, 0, 0, 7, 8, 0)D3[6,]<-c( 0, 6, 0, 0, 9, 0, 0, 8)D3[7,]<-c( 0, 0, 6, 0,10, 0, 0, 7)D3[8,]<-c( 0, 0, 0, 6, 0,10, 9, 0)D3

## 0|0|- 0|1|- 1|0|- 1|1|- 0|0|+ 0|1|+ 1|0|+ 1|1|+## 0|0|- 0 1 2 0 3 0 0 0## 0|1|- 4 0 0 2 0 3 0 0## 1|0|- 5 0 0 1 0 0 3 0## 1|1|- 0 5 4 0 0 0 0 3## 0|0|+ 6 0 0 0 0 7 8 0## 0|1|+ 0 6 0 0 9 0 0 8## 1|0|+ 0 0 6 0 10 0 0 7## 1|1|+ 0 0 0 6 0 10 9 0

plot(as.Qmatrix(D3),show.zeros=FALSE,color=TRUE, logscale=FALSE,palette=rainbow(n=20,start=0.2), mar=rep(0.1,4))

Let’s fit it.

fit3<-fitMk(tree,X,model=D3,lik.func="pruning",rand_start=TRUE)fit3

## Object of class "fitMk".## ## Fitted (or set) value of Q:## 0|0|- 0|1|- 1|0|- 1|1|- 0|0|+ 0|1|+ 1|0|+ 1|1|+## 0|0|- -1.742069 0.638807 0.417643 0.000000 0.685618 0.000000 0.000000 0.000000## 0|1|- 0.829804 -1.933065 0.000000 0.417643 0.000000 0.685618 0.000000 0.000000## 1|0|- 0.000000 0.000000 -1.324425 0.638807 0.000000 0.000000 0.685618 0.000000## 1|1|- 0.000000 0.000000 0.829804 -1.515422 0.000000 0.000000 0.000000 0.685618## 0|0|+ 0.000000 0.000000 0.000000 0.000000 -9.748353 1.053756 8.694596 0.000000## 0|1|+ 0.000000 0.000000 0.000000 0.000000 0.288720 -8.983316 0.000000 8.694596## 1|0|+ 0.000000 0.000000 0.000000 0.000000 8.559675 0.000000 -9.613432 1.053756## 1|1|+ 0.000000 0.000000 0.000000 0.000000 0.000000 8.559675 0.288720 -8.848395## ## Fitted (or set) value of pi:## 0|0|- 0|1|- 1|0|- 1|1|- 0|0|+ 0|1|+ 1|0|+ 1|1|+ ## 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125 ## due to treating the root prior as (a) flat.## ## Log-likelihood: -305.737418 ## ## Optimization method used was "nlminb"## ## R thinks it has found the ML solution.

If we compare all three models we’ll find that the third, our hidden-trait dependent only model is best-supported, after controlling for parameter complexity. This is pretty satisfying since the significant Pagel ’94 model fit, in this case, was simulated to be spurious.

anova(fit1,fit2,fit3)

## log(L) d.f. AIC weight## fit1 -316.4617 10 652.9234 0.000021## fit2 -301.1054 18 638.2107 0.033312## fit3 -305.7374 10 631.4748 0.966667

We could’ve also fit a model in which x depended on y but not the converse, it’s inverse, and even the equivalent for our hidden trait. All of those merely require updating our design matrices from above.

That’s it for now, folks.

Fitting a Pagel '94 model with hidden-rates using <i>phytools</i> (2024)