# pvclust Node Values In ggtree

** Published:**

I recently was using the R package `pvclust`

to test the ‘robusticity’ of clusters in a microbiome-related clustering analysis. While `pvclust`

provides it’s own plots via `plot()`

on a `pvclust`

object, this plots the dendrogram in base R. For readability and customisability reasons I prefer using the packages `ggplot2`

and `ggtree`

for making my figures. However, I was having a hard time to extract the node uncertainty values from the `pvclust`

object and integrate them into a generic R `phylo`

object for plotting the dendrogram in ggtree.

Fortunately, a bit of googling (a couple of days…) showed me someone else had already solved the problem of transferring additional `hclust`

information into a `phylo`

object but in a different context. The `fastbap`

package has the function `as.phylo.hclust.node.attributes()`

` which essentially does what I needed to do - i.e. when calculating the node numbers from each merge event, also store with the same node number the corresponding attribute, or in this case, `

pvclust` AU value.

I then modified this function slightly to make it more consistent with how `pvclust`

will display the values in the base R plot (rounding and converting to a ‘percentage’). Note that this outputs in the `phylo`

object the metadata `node.label`

, not as `node.attributes`

as in the original `fastbaps`

function.

So in summary:

```
## 1. Make pvclust object e.g.
hclust_boot <- pvclust::pvclust(otu_matrix,
method.hclust = selected_method,
method.dist = "euclidean",
nboot = 1000,
parallel = T)
## 2. Set modified fastbaps function
as.phylo.pvclust.node.attributes <- function(x, attribute)
{
N <- dim(x$merge)[1]
edge <- matrix(0L, 2*N, 2)
edge.length <- numeric(2*N)
## `node' gives the number of the node for the i-th row of x$merge
node <- integer(N)
node[N] <- N + 2L
node.attributes <- rep(NA, N)
cur.nod <- N + 3L
j <- 1L
for (i in N:1) {
edge[j:(j + 1), 1] <- node[i]
for (l in 1:2) {
k <- j + l - 1L
y <- x$merge[i, l]
if (y > 0) {
edge[k, 2] <- node[y] <- cur.nod
cur.nod <- cur.nod + 1L
edge.length[k] <- x$height[i] - x$height[y]
node.attributes[edge[k, 1] - (N + 1)] <- attribute[i]
} else {
edge[k, 2] <- -y
edge.length[k] <- x$height[i]
node.attributes[edge[k, 1] - (N + 1)] <- attribute[i]
}
}
j <- j + 2L
}
if (is.null(x$labels))
x$labels <- as.character(1:(N + 1))
## MODIFICATION: clean up node.attributes so they are in same format in
## pvclust plots
node.attributes <- as.character(round(node.attributes * 100, 0))
node.attributes[1] <- NA
obj <- list(edge = edge, edge.length = edge.length / 2,
tip.label = x$labels, Nnode = N, node.label = node.attributes)
class(obj) <- "phylo"
stats::reorder(obj)
}
## 3. Use the modified fastbaps function by accessing the hclust object in first
## position, and the corresponding au values from the edges list entry.
hclust_boot_phylo <- as.phylo.pvclust.node.attributes(hclust_boot$hclust,
hclust_boot$edges$au)
```

To display the values on the tree with `ggtree`

you can then run

```
ggtree(hclust_boot_phylo aes(x, y)) +
geom_text2(aes(subset = !isTip, label = label))
```

as described in the `ggtree`

FAQ under the section “bootstrap values from newick format”.