SplitsTree App User Manual

Daniel H. Huson and David Bryant

SplitsTree App (version 6.4.11, built 19 Dec 2024)

1 Using SplitsTree
1.1 Getting started
1.2 Layout of the main window
1.3 Main toolbar items
1.4 The main tabs
1.5 Alignment tab
1.6 Tree-View tab
1.7 Tree-Pages tab
1.8 Tanglegram tab
1.9 DensiTree tab
1.10 Split-Network tab
1.11 Network tab
1.12 World map tab
1.13 Workflow graph tab
1.14 How to cite tab
1.15 Input editor tab
1.16 Report tabs
1.17 Text tabs
1.18 The sidebar
1.19 The draft genome dialog
2 Building trees and networks
2.1 Using the workflow
2.2 Building trees
2.3 Neighbor Net and other split network methods
  2.3.1 Neighbor Net
  2.3.2 Manipulating split networks
  2.3.3 Split Decomposition
  2.3.4 Splits in characters
2.4 Haplotype networks
  2.4.1 Minimum spanning network
  2.4.2 Median Joining network
2.5 Rooted phylogenetic networks
  2.5.1 Implicit vs explicit trees and networks
  2.5.2 Hybridization networks
  2.5.3 How to compute a rooted network from rooted trees
  2.5.4 Cluster networks
3 Consensus trees and networks
3.1 Consensus trees
  3.1.1 Average consensus method
  3.1.2 Strict-, majority- and greedy-consensus methods
  3.1.3 Densi-tree consensus
3.2 Networks representing trees
  3.2.1 Consensus networks
  3.2.2 Consensus outline
  3.2.3 Confidence networks
A The main menu bar
A.1 The File menu
A.2 The Edit menu
A.3 The Select menu
A.4 The View menu
A.5 The Data menu
A.6 The Distances menu
A.7 The Tree menu
A.8 The Network menu
A.9 The Analysis menu
A.10 The Window menu
A.11 The Help menu
B Main data blocks
B.1 Taxa block
B.2 Traits block
B.3 Characters block
B.4 Distances block
B.5 Trees block
B.6 Splits block
B.7 Network block
B.8 View block
B.9 Algorithms block
B.10 Report block
B.11 Sets block
B.12 SplitsTree6 block
B.13 Genomes block
C Algorithms
C.1 Algorithms on a Characters Block
C.2 Algorithms on a Distances Block
C.3 Algorithms on a Splits Block
C.4 Algorithms on a Trees Block
C.5 Algorithms on a Network Block
D Supported import and export formats
D.1 Supported import formats
  D.1.1 Importers for a characters block
  D.1.2 Importers for a distances block
  D.1.3 Importers for a trees block
  D.1.4 Importers for a splits block
  D.1.5 Importers for a network block
  D.1.6 Importers for a genomes block
D.2 Supported output formats
  D.2.1 Exporters for a taxa block
  D.2.2 Exporters for a characters block
  D.2.3 Exporters for a distances block
  D.2.4 Exporters for a trees block
  D.2.5 Exporters for a splits block
  D.2.6 Exporters for a network block
  D.2.7 Exporters for a genomes block
  D.2.8 Exporters for a view block
D.3 Taxon display labels import
D.4 Traits import
E Workflow
E.1 Input and working nodes
E.2 Data and algorithm nodes
E.3 Exporting the workflow
E.4 Running a workflow on multiple datasets
F Styling labels

Introduction

The SplitsTree App is new software for exploring and analyzing phylogenetic data, with an emphasis on phylogenetic networks. Offering a comprehensive set of features, the software provides over 100 algorithms for computing distances, phylogenetic trees, split networks, haplotype networks, rooted phylogenetic networks, tanglegrams, consensus trees and consensus networks.

This new software [Huson and Bryant, 2024] is designed to accommodate the increasing scale and intricacy of modern data sets. It extends, integrates and supersedes our earlier applications SplitsTree4 [Huson and Bryant, 2006] for unrooted phylogenetic trees and networks, Dendroscope3 [D. H. Huson, 2012] for rooted trees and networks, and PopArt [Leigh and Bryant, 2015] for haplotype analysis.

If you use this program, the please cite:

Daniel H. Huson and David Bryant. The SplitsTree App: interactive analysis and visualization using phylogenetic trees and networks. Nature Methods (2024) https://doi.org/10.1038/s41592-024-02406-3.

Figure 1: Example of SplitsTree analysis of primate mtDNA.

Chapter 1
Using SplitsTree

In this chapter we give an overview of how the interface of the SplitsTree app is organised. Later we provide more details on the actual methods and procedures.

1.1 Getting started

To get started using this program, download the latest installer from https://software-ab.cs.uni-tuebingen.de/download/splitstree6 for Linux, Mac OS X or Windows, and install the program on your computer. Versions for iOS and Android are being tested.

Launch the program by double-clicking the program icon or launch it from the command line (Linux).

Use the File->Open... menu item to open a file containing data in one of the supported formats (see Chapter D).

If the data you provide is set of characters (or multiple sequence alignment), then by default, the SplitsTree app will compute P-distances (see Section C.1) and then will run Neighbor Net (see Section C.2) to obtain a split network (see Section 1.10). If you provide a distance matrix, then this will also result in a split network being displayed. If you supply trees, then the first tree will be displayed (see Section 1.6).

Here is a toy example of characters data. You can copy this text from the manual and then paste it onto the import data button (see Section 1.3) to obtain the network show in the figure (see Fig. 1.1).

6 64
Taxon1 TAAGTAGATCGGAGTTTTTACTCGTGTGATTTTGGGTATTTTTTATTTAGATTATGAAATTATA
Taxon2 CTTAATATATAATGATATTACTTAAACATTATTAAATGATACACTAACTATAATTATTGAACAT
Taxon3 AAAATTATATAATATAACATTATATTCATTACCACAAGATTATATTATAAAATATATTGTACAC
Taxon4 TTCAATATATAATGAAACTTTATAAATAACTTTAGAAATCTTATAAAAAAATATCGACGAACAA
Taxon5 TAAGTAGATCTGAGTTTTTACTCGTGTGATTTTGGGTATTTTTTATTTAGATTATGAAATTATA
Taxon6 TAAATTGGATAATATTTTATTATGTGTTACTACAGAAATCATTAATTATATAAACGATATACAA

Figure 1.1: Neighbor net computed for toy characters data.

1.2 Layout of the main window

In the SplitsTree app, you can open one or more documents and each one has its own main window. Different analyses of the same data in the same document are shown in different tabs in that window (see Fig. 1.2). Tabs can be laid out side-by-side.

Figure 1.2: The main window. (a) Text and graphic output are presented in the main tabs on the right-hand side of the window. (b) There is a side-bar on the left hand side that shows (c) the workflow at the top and (d) tabs for setting algorithm parameters at the bottom. At the top left (e) there are some document-specific tool-bar items whereas the items at the top right (f) apply to the current main tab. While all program features can be directly accessed from the main window, the menu bar (g) provides alternative access to many of the features.

Each document opened in SplitsTree has its own main window. The main window has the following parts:

A menu bar providing menu items to access features of the program. Note that all features or the program can also be accessed from within the main window as well (see Fig. 1.2) (figure part g).
A toolbar containing both document- and tab-specific items (figure parts e,f).
A main tab pane that contains all the text and data tabs (figure part a).
A sidebar providing access to the workflow (figure parts b,c).
An algorithms tab pane (inside the sidebar) for parameterizing and running algorithms (figure parts b,d).

1.3 Main toolbar items

There are three document-specific toolbar items on the left side of the main toolbar and several tab-specific toolbar items ones on the right (see Fig. 1.2) (see Fig. 1.3).

Figure 1.3: The main toolbar has three document-specific items on the left: Files menu button, import data button and sidebar toggle button. It has several tab-specific ones on the right: Undo and redo, font resizing, select all/none, find/replace and export.

Document specific toolbar items:

The files menu button - provides a menu that contains file-related items from the main files menu and a list of recently opened documents.
The import data button - use this open any string or file from the system clipboard, or via drag-and-drop, into a new SplitsTree document.
The sidebar toggle button - use this to show or hide the sidebar.

Tab specific toolbar items (that apply only to the currently selected main tab):

An undo button and redo button.
An increase font size button and increase font size button.
A selection button to select all or none.
A find button that you press to open the find dialog, press again to open the replace dialog (if available) and once again to close the dialog.
An export button that gives access to items for copying, exporting or printing the data or image associated with the current tab. An additional item is provided for showing or hiding the QR code associated with a given tree or network.

1.4 The main tabs

The main window uses a tabbed pane to present all text and visualizations (see Fig. 1.2). Here is an overview of the supported tabs:

Alignment tab - provides a visualization of the input multiple sequence alignment.
Tree-View tab - shows a phylogenetic tree or rooted network.
Tree-Pages tab - shows pages of phylogenetic trees or rooted networks.
Tanglegram tab - shows a tanglegram of trees or rooted networks [Scornavacca et al., 2011].
Densi-Tree tab - shows a densi-tree visualization of a profile of trees [Bouckaert, 2010].
Split-Network tab - show a split network [Dress and Huson, 2004] or phylogenetic outline [Bagci et al., 2021].
Network tab - shows a network such as a haplotype network [Bandelt et al., 1999].
World map tab - shows a map of the world and displays traits data that have associated and latitude and longitude coordinates.
Workflow tab - provides access to the workflow graph.
How to cite tab - provides a description of the data and algorithms used, and provides the necessary citations.
Input editor tab - provides an interactive editor for entering and parsing input data.
Report tabs - these are used to present the results of analyses such as phylogenetic diversity.
Text tabs - any of the data blocks can be displayed in such a text tab, they provide several different formats.

1.5 Alignment tab

The Alignment tab provides a visualization of the input characters or multiple sequence alignment (see Fig. 1.4).

Figure 1.4: The alignment tab for displaying and working with a multiple sequence alignment.

The alignment tab has a drop-down menu button at the left that contains items for selecting sites. There is a button for selecting a color scheme. There is a button for toggling between a close-up view and a total view of the alignment, and buttons for zooming in and out both vertically and horizontally.

1.6 Tree-View tab

The Tree-View tab shows a phylogenetic tree or a rooted network (see Fig. 1.5).

Figure 1.5: The tree-view tab for drawing a phylogenetic tree or rooted network.

The tree view tab has a toolbar and side panel that are hidden by default, but can be opened using two toggle buttons at the top right of the tab.

The toolbar provides items for selecting how to draw the tree (or rooted network), the choices are between rectangular, circular and radial cladogram or phylogram. There is a button that toggles the scale bar and addition information (such as name of the tree, if any, and number of nodes, edges and leaves). In addition, there are buttons for rotating, flipping and zooming.

The side panel (on the right) contains items for styling the taxon labels and for adding marks to the taxa. These are colored shapes that appear next to the taxon label. There are items for displaying taxon traits, if present. Moreover, there are items for setting the line width of edges and to choose whether to label edges by associated weights or confidence values.

1.7 Tree-Pages tab

The Tree-Pages tab shows pages of phylogenetic trees (see Fig. 1.6).

Figure 1.6: The tree-pages tab for displaying a collection of phylogenetic trees or rooted networks.

The tree-pages tab has a toolbar and side panel that are hidden by default, but can be opened using two toggle buttons at the top right of the tab.

The toolbar provides items for selecting how to draw the tree (or rooted network), the choices are between rectangular, circular and radial cladogram or phylogram. There is a button that toggles addition information (such as name of the tree, if any, and number of nodes, edges and leaves). In addition, there are buttons for rotating, flipping and zooming. At the right side of the toolbar, there is a text input field for setting the dimensions of a page in the format rows x cols.

The side panel contains items for styling the taxon labels and for adding marks to the taxa.

At the bottom of the tree-pages tab there is a row of buttons that can be used to navigate through the pages.

1.8 Tanglegram tab

The Tanglegram tab shows a tanglegram of trees or rooted networks [Scornavacca et al., 2011] (see Fig. 1.7).

Figure 1.7: The tanglegram tab for comparing two phylogenetic trees or rooted networks.

The tanglegram tab has a toolbar and side panel that are hidden by default, but can be opened using two toggle buttons at the top right of the tab.

The toolbar provides items to determine the first and second trees (from the same file) and for selecting how to draw either tree (or rooted network), the choices are between rectangular phylogram, rectangular cladogram and triangular cladogram (trees only). There is a button that toggles addition information (such as name of the tree, if any, and number of nodes, edges and leaves). In addition, there are buttons for rotating, flipping and zooming.

The side panel contains items for styling the taxon labels and for adding marks to the taxa.

1.9 DensiTree tab

The Densi-Tree tab shows a densi-tree visualization of a Bayesian profile of trees [Bouckaert, 2010] (see Fig. 1.8).

Figure 1.8: The densi-tree tab for displaying a Bayesian profile of phylogenetic trees.

The densi-tree tab has a toolbar and side panel that are hidden by default, but can be opened using two toggle buttons at the top right of the tab.

The toolbar provides items for determining how to draw the trees, choices are between rectangular, triangular, rounded and radial phylogram. (The rounded phylograms are time-consuming to draw and can cause problems.) In addition, there are buttons for rotating, flipping and zooming.

The side panel contains items for styling the taxon labels and for adding marks to the taxa. The line width can be set here. In addition, two colors can be set. The first is used for edges that in the tree profile that are compatible with the displayed greedy consensus tree (default color is black), and the second is used for incompatible edges (default color is red).

1.10 Split-Network tab

The Split-Network tab shows a split network [Dress and Huson, 2004] or phylogenetic outline [Bagci et al., 2021] (see Fig. 1.9).

Figure 1.9: The split-network tab for displaying a collection of splits as a split network or phylogenetic outline.

To reshape the layout of the network by rotating the edges associated with one or more selected splits, press-and-drag on the network.

The split-network tab has a toolbar and side panel that are hidden by default, but can be opened using two toggle buttons at the top right of the tab.

The toolbar provides a choice box to determine how to draw the splits. The choices are as a split network, to-scale or as a topology (with all edges of uniform length), or as a phylogenetic outline, again, either to-scale or as a topology (with all edges of uniform length). There is a second choice box to determine whether the network is to be drawn unrooted or rooted , using either mid-point rooting or outgroup rooting. The latter requires that some taxa have been selected; these are treated as the outgroup.

In addition, there are buttons for rotating, flipping, zooming and for setting the scale ratio to a specific value to ensure that different networks are drawn to the same scale. Use the rotate buttons to rotate the entire network.

The right side panel contains items for styling the taxon labels and for adding marks to the taxa. The line width can be set here. In addition, the line width and color can be set. The color of the inner area of an outline can be set. You can request to have the splits labeled by their weight, their confidence values (if available) or their internal split ids.

1.11 Network tab

The Network tab shows a network (see Fig. 1.10). The network tab has a toolbar and side panel that are hidden by default, but can be opened using two toggle buttons at the top right of the tab.

Figure 1.10: The network tab for displaying a haplotype networks and related on constructs.

The toolbar provides a choice box to determine how to draw the network. In addition, there are buttons for rotating, flipping and zooming.

The side panel contains items for styling the taxon labels and for adding marks to the taxa. The line width can be set here. In addition, there are items to determine which traits are to be shown in pie charts and whether to a legend. To change the colors used in pie charts, press on the items in the legend. These are then stored in the traits block in the TRAITCOLOR entry. Also, there is a menu button for determining how to represent character-state changes along an edge. The choices are has hatches (short marks), labels, compact labels and counts.

1.12 World map tab

The World Map tab shows a map of the world and places any traits data that comes with latitude and longitude assignments on the map (see Fig. 1.11). To change the colors used in pie charts, press on the items in the legend. They are stored in the traits block in the TAXONCOLOR entry. The world tab has a toolbar and side panel, the latter is hidden by default, but can be opened using the toggle button at the top right of the tab.

This tab appears when the input data contains a traits block that has latitude and longitude specifications (see Section B.2).

Figure 1.11: The world map tab for displaying haplotype locations of origin.

There is a Show menu button to determine whether country names, continent names and/or oceans should appear as labels. There is a button to determine whether to show two copies of the map side-by-side for Pacific-centric data. There is a button to zoom to the shown haplotype data.

1.13 Workflow graph tab

The workflow tab provides access to the workflow graph (see Fig. 1.12).

Figure 1.12: The workflow tab provides access to the workflow tab, for advanced users.

The workflow tab has a toolbar that contains a number of items, whose purpose and enabled state depends on which nodes in the workflow graph are currently selected.

The first toolbar item will open the corresponding algorithm, text display or view tab, depending on whether the selected node is an algorithm node, data node or view node. Double-clicking on a node has the same effect.

When a data node is selected, then the second toolbar item can be used to attach an additional algorithm to the data node.

When an algorithm node is selected, then the next two items can be used either to duplicate the analysis, or to delete it, respectively. Each algorithm node also carries a similar menu button.

There are two items for zooming in and out.

1.14 How to cite tab

The How to cite tab provides a description of the data and algorithms used, and provides the necessary citations (see Fig. 1.13).

Figure 1.13: The how-to-cite tab provides methods summary of the data and algorithms, and provides all suggested references for the methods used.

The toolbar of the input tab contains a button to copy the complete or selected content of the tab. There are buttons to turn line-wrapping and lines numbers on and off.

If the input is a Nexus or SplitsTree file that contains a comment at the beginning of the file describing the source of the data, then this will be reported at the top of the text area. This is followed by a description of the methods used. Finally, all suggested references are listed.

1.15 Input editor tab

The Input editor tab provides an interactive editor for entering and parsing input data (see Fig. 1.14).

Figure 1.14: The editor tab is used to enter data into a new document. Pressing the run button will parse the data and launch an analysis of the entered data.

The toolbar of the input tab contains a button to copy the complete or selected content of the tab. There buttons to turn line-wrapping and lines numbers on and off.

The program will try to guess to which input format the entered text adheres to and will indicate the name of the format in the toolbar. When a valid format has been detected, then the run button will be enabled. Pressing the run button will parse the data and launch an analysis of the entered data.

The input editor can be opened from the File menu and is automatically open when the user imports a text or file into the program that is not in one of the recognized input formats.

1.16 Report tabs

Report tabs are used to present the results of analyses such as Tajima’ D, phylogenetic diversity or Shapely values as a text (see Fig. 1.15).

Figure 1.15: Report tabs are text tabs that are used to provide the result of an analysis, here the Shapely values for a set of taxa based on splits.

The toolbar of the report tab contains a button to copy the complete or selected content of the tab. There are buttons to turn line-wrapping and lines numbers on and off.

1.17 Text tabs

Text tabs are used to show the content of data blocks, in a choice of several different formats (see Fig. 1.16).

Figure 1.16: Text tabs are used to display the content of any of the data nodes in the workflow. Here we show three such tabs, one for characters data, one for distance data and one for splits data.

The toolbar of any text tab contains a button to copy the complete or selected content of the tab. There are buttons to turn line-wrapping and lines numbers on and off. There is a format pane that can be used to select the desired display format and to specify any options associated with the format.

Such a text tab can be opened by selecting a data node item in the sidebar and then pressing the show/edit button at the top of the sidebar, or by double-clicking on the item.

1.18 The sidebar

The sidebar (see Fig. 1.2) (figure part b) contains a representation of the workflow as a tree at the top, and the algorithms tab pane at the bottom (figure parts c,d).

The workflow tree view contains a representation of all input data, computed data and algorithms used in the computation. There are three types of nodes:

data nodes that represent data blocks and result blocks,
algorithm nodes that represent algorithms, and
view nodes that represent visualizations.

Double-clicking on a data node will open a text tab displaying the corresponding data, or analysis result, if the data block is a report. Double-clicking on an algorithm node will open the corresponding algorithm tab.

An algorithm tab has a run button (at the right) to execute the algorithm and may contain some optional input items (below) to set parameters of the algorithm.

1.19 The draft genome dialog

The SplitsTree App supports the calculation of the phylogenetic context of a draft prokaryotic genome [Bagci et al., 2021]. One or more files (FastA format) each containing one or more sequences representing draft genomes (or metagenomic assembly bins) can be imported into the program and then compared against a set of GTDB reference genomes [Parks et al., 2018] using mash distances [Ondov et al., 2016] and then represented as a phylogenetic outline.

The dialog is opened using the File->Analyze Draft Genomes... menu item and is set up using three tabs, as shown in the Figure (see Fig. 1.17).

Figure 1.17: The first tab is used to specify the input files, the type of input (DNA or protein sequences) and whether to use files or FastA records as input genomes. Also, specify the output file (and whether to store input sequences as sequences or as references to files). The second tab is used to edit the labels of genomes. The third tab is used to specify the database to compare against (downloaded from the SplitsTree page), the distance to search in, and distance within which to include references.

Chapter 2
Building trees and networks

2.1 Using the workflow

In SplitsTree4, data analysis was based on a simple linear sequence. To construct a Neighbor Net, for example, one might input character data, apply a transform to infer a distance matrix, apply another transform to produce the set of splits in the Neighbor Net and another transform for convert those splits into a network on the screen.

That simplicity came with limitations. For example, to compare the result of analyses using different parameters or distance methods it was necessary to duplicate the whole file and start again.

The SplitsTree App implements a far more sophisticated system for workflows. It is still straightforward to run a simple linear workflow as in SplitsTree4, but it is now possible to branch that workflow at any point, exploring alternative parameters or methods. The use of frames make it easy to view the results of different analyses side-by-side.

The branching structure of a document’s workflow can be viewed in the side panel (as a hierarchy) or in the workflow panel (as a graph). To illustrate, open the example file ungulates.nex which can be found in the directory publications/WelkerEtal2015 in the Examples directory. By default, the SplitsTree App creates a network by running Neighbor Net and using the p-distance. Switching to the workflow panel displays the (linear) workflow for this initial analysis (see Fig. 2.1).

Figure 2.1: Right hand side of the workflow created when ungulates.nex is opened.

In this graph, nodes correspond to algorithms (indicated by a icon) or data (indicated by an icon). From the algorithm nodes you can edit the parameters of the method. Selecting an algorithm node and clicking the delete button (top of pane) removes that node and any descendants of that node.

Figure 2.2: Attaching a BioNJ algorithm to an existing Distances block.

Suppose we want to compare a network computed by the Neighbor Net algorithm with a tree obtained using BioNJ. Assuming both are to be computed from the same distance matrix, we can select the corresponding node and choose BioNJ from the popup menu marked with a plus (either on the node or in the toolbar) (see Fig. 2.2). SplitsTree then constructs and displays the BioNJ tree. Switching back to the workflow panel we see that a new sequence of nodes has branched off the distances node, indicating the revised analysis.

Figure 2.3: Right hand side of the workflow after adding a BioNJ analysis.

This analysis also creates a new window tab. Right-click on a tab to get a context menu that allows you to split the main tab pane into two parts, then drag the tabs to the left or right panes to view both the Neighbor Net network and the BioNJ tree side-by-side.

2.2 Building trees

SplitsTree implements four standard tree construction methods:

NJ (Neighbor-Joining), the original method of Saitou and Nei [Saitou and Nei, 1987].
BioNJ, the modification of NJ introduced by Gascuel to reduce variance of the node-to-node estimates [Gascuel, 1997].
UPGMA, the agglomerative method for constructing ultrametric (molecular clock) trees, introduced by Sokal and Sneath [Sokal and Michener, 1958].
Buneman, a method for inferring compatible splits (and therefore trees) from distances which tends to produce trees with large multifurcations [Bandelt and Dress, 1992].

Each of these can be called from the Trees menu, or added as an algorithm in the workflow. There are several options for displaying trees, available by clicking on one of the two buttons on the right-hand-side of the tree window:

. (see Section 1.6).

2.3 Neighbor Net and other split network methods

2.3.1 Neighbor Net

Given a distance matrix as input, the Neighbor Net algorithm operates in three stages. First, an agglomerative method is used to identify a circular ordering of the taxa. The splits computed by the algorithm are a subset of the set of all splits that can be formed from consecutive sets of taxa in that ordering. Second, a heavily customized algorithm is used to efficiently compute split weights. Those with zero weight are removed (use a split filter to remove splits with larger weight). Finally, a planar split network algorithm takes the weighted splits and produces the split network representation. A complete description of the entire process is available in Bryant and Huson [2023].

There is a single option available in Neighbor Net, the method used to infer split weights. We found that the Active Set method performed better than the other methods, and this is the default and recommended option. We have left the other algorithms as options in order to enable a repeat of the analysis in Bryant and Huson [2023].

When Neighbor Net is called, SplitsTree produces a split block and a split network block in the workflow. As we stress in Huson and Bryant [2006], the main information in the network is the set of weighted splits. Think of the network as a means of visualising the splits, in the sense that the same set of splits can be represented in several different ways.

2.3.2 Manipulating split networks

To rotate or flip the entire network, use the toolbar revealed by pressing the preferences button at the top right of the split network panel, making sure that none of the nodes or edges in the network are selected (see Section 1.10).

Click on an edge in the split network to select that split. The edges associated to that split can be rotated using the rotate buttons in the toolbar or the arrows in the side panel which appears when you click the button on the right (see Fig. 2.4).

Figure 2.4: If one or more splits are selected, then highlighted buttons can be used to change the angles of the selected splits.

The traditional approach to displaying split networks marks out the splits with a mesh of quadrilaterals and polygons. The outline representation Bagci et al. [2021] constructs just the outer perimeter of the network. This is sufficient to represent all the split weights, and is generally much faster to compute and draw. To switch back and forward between the graph mode and the outline mode use the pop-up menu at the left of the toolbar (see Section 1.10).

2.3.3 Split Decomposition

Given a distance matrix, the Split Decomposition method [Bandelt and Dress, 1992] can be selected in the Network menu, or on a distances node in the workflow. Split Decomposition is a predecessor of Neighbor Net, though the structures of the two methods are quite different. Split Decomposition works by inferring a set of splits satisfying a quartet condition in the distance matrix. Split Decomposition produces a set of weakly compatible splits and, as such, can produce more complex split networks than those produced by Neighbor Net. The resulting split network will not necessarily be planar. In practice, the conservative nature of the selection criteria means that Split Decomposition produces far fewer splits than Neighbor Net.

2.3.4 Splits in characters

SplitsTree includes several methods for extracting splits directly from character data. These methods do not assume any explicit model for sequence evolution. As such they do not correct for hidden mutations. However, they can reveal important structure within sequences from closely related organisms, as well as artefacts resulting from data handling problems.

The simplest is BinaryToSplits (see Section C.1), which applies to binary data only. Each binary character determines a split separating those with allele/state 0 and those with allele/state 1. The weight assigned to a split equals the summed weight for all characters inducing that split, defaulting to a count of those characters if weights are not specified. The BinaryToSplits algorithm is available via the workflow graph or workflow hierarchy. The user can specify a weight/count threshold on the splits, a cap on the maximum dimension of the split network (see Section C.3) and the option to include all ‘trivial’ splits separating one taxon from the remainder automatically.

The DNAtoSplits method (see Section C.1) carries out a similar analysis but on nucleotide data. Splits are either determined via an RY coding (AG vs CT) or by splitting the most frequent state (assumed ancestral) from the other states (assumed derived) at each site.

The Parsimony Splits method (see Section C.1), introduced by Bandelt and Dress [1992], produces a set of weakly compatible splits directly from character taxa. The method is quartet based, like Split Decomposition, but for each four taxa, it determines the two most frequent pairings of two taxa versus the other two taxa.

2.4 Haplotype networks

A haplotype network is an elegant and efficient way to represent character or sequence data. Each node corresponds to a particular sequence with the size of the node proportional to the number of copies of that sequence in the data. Sequences which differ in one position are connected by an edge which is (optionally) labelled by the exact difference. Different methods for constructing haplotype networks generate different graphs for connecting sequences at larger distances. For them all, a key property is that given one sequence, the network, and the mutations along each edge, the entire alignment can be reconstructed.

SplitsTree provides implementations of two widely-used haplotype network methods, MinSpanningNetwork [Excoffier and Smouse, 1994] and MedianJoining [Bandelt et al., 1999]. Haplotype networks are drawn as graphs with each edge labelled by marks indicating the number of mutations/differences along that edge. This can be modified using the side panel which appears when clicking the preferences button at the top right of the network panel.

2.4.1 Minimum spanning network

A minimum spanning tree for a graph is a connected subgraph of minimum weight. Sometimes there is a unique minimum spanning tree; other times there are multiple.

In this context, the graph contains a node for each input sequence and edge between every pair of nodes. The length of each edge is the Hamming distance between the corresponding sequences. Other distance measures can be used, but the Hamming distance is appropriate for Haplotype Network construction.

The minimum spanning network is formed from all those edges in the graph which appear in every minimum spanning tree (see Section C.2).

A minimum spanning network is constructed from a characters block by first determining Hamming distances (right-click on the characters block and select Add Algorithm -> Hamming distance). Then right-click on the distance block produced and add the Min Spanning Network algorithm.

2.4.2 Median Joining network

Median Joining (see Section C.1) is probably the most highly-cited method for constructing phylogenetic networks. The implementation in SplitsTree is based on the method described in Bandelt et al. [1999]. The Median-Joining network method makes repeated use of minimum spanning networks, each time augmenting the set of observed sequences with putative ancestral sequences.

A Median-Joining network is constructed from a characters block via the Network menu, or by adding an algorithm to the workflow. The method comes with a single option ϵ that is an integer controlling a threshold determining when two sequences are considered adjacent. In Bandelt et al. [1999], ϵ varies between 0, 1 and 2.

2.5 Rooted phylogenetic networks

2.5.1 Implicit vs explicit trees and networks

A haplotype network is a direct representation of the input data and a split network represents groupings or splits between taxa. Both are examples of so-called implicit or data-display networks that aim at visualizing evolutionary data. In contrast, an explicit network is a representation of the putative evolutionary history, including reticulate events such as speciation-by-hybridization or horizontal gene transfer.

Strictly speaking, unrooted phylogenetic trees, too, are implicit representations of evolutionary data, whereas rooted phylogenetic trees have a direction (away from the root) and this allows branching nodes to be explicitly interpreted as representing speciation events.

Explicit phylogenetic networks are necessarily rooted. The Autumn algorithm [Huson and Linz, 2018] (see Section C.4) produces an explicit rooted phylogenetic network in which reticulations may be interpreted as putative hybridization or HGT events. However, just because a phylogenetic tree has a root does not mean that it is explicit. For example, the Cluster Network algorithm (see Section C.4) takes as input a set of rooted trees and aims at displaying all their clusters as a rooted network (in the hardwired sense [Huson et al., 2012]). Here, the reticulate nodes do not have a direct biological interpretation.

2.5.2 Hybridization networks

Figure 2.5: On the left we show 10 different gene trees for the NADH dehydrogenase-like complex in waterlilies [Gruenstaeudl, 2019] and on the right we show a network that contains all 10 trees, with hybridization number h = 5, computed using the PhyloFusion algorithm.

In mathematical phylogenetics, a hybrization network is a rooted phylogenetic network that contains or displays an input set of rooted phylogenetic trees. Usually, the requirement is that such a network minimizes the “hybridization number”, that is, the number of reticulations. (To be precise, a reticulation node of indegree k contributes k - 1 toward the hybridization number.)

SplitsTree currently offers two algorithms for computing such networks for real world data. The Autumn algorithm [Huson and Linz, 2018] (see Section C.4) takes as input two rooted phylogenetic trees and computes, as output the list of all different hybridization networks that contain the two trees. The input trees may have multifurcations and unequal taxon sets. This algorithm aims at providing an exact solution (networks that minimize the hybridization) of a computational hard problem, so it might not terminate if the input trees have too many conflicts.

The PhyloFusion algorithm [Zhang et al., 2023, 2024] takes as input multiple rooted trees and computes one or more rooted phylogenetic networks that display all the input trees. Again, we allow multifurcations and missing taxa. This very fast heuristic aims at minimizing the hybridization number. With this, we provide a versatile method for exploring the practical use of rooted networks in phylogenetics (see Section C.4) (see Fig. 2.5).

2.5.3 How to compute a rooted network from rooted trees

Assume that you have a collection of phylogenetic trees for which you would like to explore the use of rooted phylogenetic networks to represent them. To obtain a useful network, you must setup a pipeline consisting of several steps (see Fig. 2.6). In this analysis, incorrect edges are particularly harmful because they generate unnecessary reticulations and so it is important that the input trees have confidence values (such as bootstrap support values, say) associated with the edges so that low-confidence can be ignored.

Figure 2.6: For a set of 48 genes in waterlilies [Gruenstaeudl, 2019], on the left we see that 10 different gene trees for the NADH dehydrogenase-like complex have bee selected. On the right we see a hybridization network computed using the PhyloFusion algorithm.

First, use the Reroot or Reorder algorithm (see Section C.4) to ensure that all trees are correctly rooted (using either midpoint- or outgroup rooting).
Second, use the Trees Filter (see Section C.4) to select the subset of trees that you would like to place into the network.
Third, use the Trees Edges Filter (see Section C.4) to contract any low-confidence edges. By default, the confidence threshold is set to 70.
Finally, use the Phylo Fusion algorithm (see Section C.4) to compute a rooted network that contains all input trees.

2.5.4 Cluster networks

The Cluster Network algorithm extracts all clusters from an input set of rooted phylogenetic trees and computes a network using the cluster-popping algorithm [Huson et al., 2012]. This is a fast algorithm that provides a network that contains all input trees. However, it does not aim at minimizing the hybridization number.

Chapter 3
Consensus trees and networks

The methods in this chapter all attempt to summarise information contained in a set of trees. (Most also work if the input contains rooted phylogenetic networks, in which case the calculations are based on “hardwired clusters” contained in the networks.) There are several possible sources:

Trees returned from different genes or loci.
Trees produced from different methods.
Trees produced from different bootstrap replicates.
Trees sampled from the posterior distribution in a Bayesian analysis.

One of the big improvements with the most recent version of SplitsTree is that the routines for reading in files of trees can now cope with large tree files or large trees.

3.1 Consensus trees

A consensus method summarises a set of trees (on the same set of taxa) with a single tree. It can be thought of as analogous to an average tree or median tree.

3.1.1 Average consensus method

The average consensus method implements an idea of Lapointe and Cucumel [1997]. Additive (leaf to leaf) distance matrices are constructed for each tree. This can take some time on larger files. The average of these matrices are then used to construct either a Neighbor-Joining tree or a NeighborNet.

The method can be called from the workflow by selecting a trees block and adding the algorithm ‘Average Consensus’ (see Section C.4). Alternatively, add an ‘Average Distances’ algorithm to the tree block. This creates a new distance block which can be output or analysed using a method of choice.

3.1.2 Strict-, majority- and greedy-consensus methods

The strict consensus, majority rule consensus and greedy consensus are three of the oldest and most widely used consensus methods in phylogenetics.

The strict consensus tree is formed from all splits appearing in all trees;
The majority rule tree is formed from all splits appearing in over half the trees;
The greedy consensus tree is constructed using a greedy algorithm aimed at producing a collection of splits with maximal weight, the weight of each split given by the number of trees containing it.

These methods are available from the Trees menu in the menu bar or by adding an algorithm to the trees block in the workflow.

Note that there is a slight difference in the consensus tree depending on whether the input trees are to be considered rooted or unrooted. For example the two trees

((a,b),c,d)	and	(a,b,(c,d));

share a split ab|cd which would appear in their unrooted consensus tree, but they share no clusters, so their rooted consensus tree would be completely unresolved.

3.1.3 Densi-tree consensus

The densi-tree consensus [Bouckaert, 2010] shows the greedy consensus tree together with a rendering of all input trees (see Section 1.9).

3.2 Networks representing trees

3.2.1 Consensus networks

Consensus networks are based on the idea of using split networks to represent more splits than can appear in a single tree Bandelt [1995], Holland et al. [2004]. They can be constructed using the menu command Network>Consensus network, or by adding an algorithm to a trees block in the workflow. Note that, with the menu command, if there is more than one trees block then SplitsTree will ask the user to select one.

SplitsTree implements several weighting methods for the splits. These are used to determine the split weights used in the output tree or network. A standard analysis consensus network analysis will use the frequency (or count) of a split as the weight used for selecting and displaying splits.

Mean - use the mean of the weights in the input trees. This treats different trees as estimations of the distances between taxa.
TreeSizeWeightedMean - use the mean of the weights in the input trees after normalizing each of the input trees to total length 1. This should be used if the different trees are on different scales, e.g. because they were computed using different methods.
Median - use the median weight. Use as an alternative to mean weights.
Count - use the number of trees that contain a split as its weight. This is useful to emphasize the conflicts in different trees when using a network for consensus.
Sum - use the sum over all weights in the input trees. Similar use-case to counts.
Uniform - give all splits weight 1. This emphasizes the topology of the consensus tree or network.
TreeNormalizedSum - use the sum over all weights in the normalized input trees. Not sure when you would want to use this.

The threshold percent controls how many splits are included in the network. When the weight is computed from split counts it specifies the percentage of trees which a split needs to be contained in for the split to be included in the network. Reducing this threshold will increase the number of splits, giving a more complex network. The High Dimension Filter is the same as that used in the split weight filter (see Section C.3), greedily removing splits which generate high dimensional boxes in the diagram.

3.2.2 Consensus outline

The consensus outline method (see Section C.4) takes as input a set of trees and produces as output a set of circular splits that are displayed either as a planar split network or as a phylogenetic outline. It operates by greedily selecting a subset of input splits that are compatible with some circular ordering of the input tree, computed using the PQ-tree algorithm [Booth and Lueker, 1976]. One possible application is as an alternative to the densi-tree visualization (see Fig. 3.1).

Figure 3.1: For a Bayesian profile of trees (from the Beast examples directory), we show the densi-tree consensus on the left and the outline consensus on the right (rooted by the three taxa at the top of both diagrams).

3.2.3 Confidence networks

The idea behind a confidence network (see Section C.4) is to choose the threshold in a consensus network so that at least 95% of the trees have all their splits contained in that network. The method was originally designed as a way to create confidence intervals from bootstrap distributions Huson and Bryant [2006], however the dimensionality of the problem, and shortcomings of empirical bootstrap distributions, meant that the confidence sets produced were massive. The same machinery can be readily applied to samples from the posterior distribution of trees in a Bayesian analysis, in which case the network represents a confidence set.

The main option in a confidence network is the level, which is 0.95 by default. This is the proportion of input trees which will have their splits contained in the network. Decreasing this number produces smaller networks.

Appendix A
The main menu bar

All functionality of the program can be used directly from the main window. In addition, the program provides menus to access the most often used features.