Muller plots are the prettiest, but possibly the least quantitative plots a researcher could use and at worst could be grossly misleading (examples here if already lost). If one simply looks at the origin of one of these plots you can almost guess the amount of interpolation that had to occur in order to get to the first arbitrary timepoint. I’m writing this as Chandler Gatenbee and I have now had two people contact us asking about this tool, one for a barcoding experiment who specifically asked when is a Muller Plot appropriate. Chandler and I have also had long debates about what is accurate, when are different options appropriate if they are at all, and what options to provide users in EvoFreq. Sometimes this has been a rather heated discussion leading to stark divisions in our thoughts.
Before I jump into these plots please take pause and consider that there are going to be two users that will display information using a Muller Plot. One of these is more accurate than another.
Users 1: Mathematical modelers. These users will have population information at each timescale and a Muller plot will accurately represent their data from the founding cell (or tissue in the case of my Epidermis paper). Regardless, the data will have amazing resolution.
Users 2: Bioinformaticians and Experimentalists. These users will not have time represented perfectly I’m afraid. The lack of resolution in this data precludes these users from displaying all information perfectly, sampling an in vitro/in vivo experiment often has a significant delay between timepoints and often these samplings are not equidistant, this will not be conveyed properly in a Muller Plot.
Between these two, user 2 must be much more careful in how their data is plotted and interpreted, especially if an origin is interpolated for their initial timepoint.
We then get to the issue of the y-axis. This can either be Population Size or Population Frequency. Data for user 1 can be either, but for user 2 it will never be population size. The point at which these plots originate is also meaningless, they are drawn so that they are always positive, but the important information is simply the ratio of clones to one another relative to the total size (height) at any given timepoint. EvoFreq specifically offers some options for representing the data as a frequency to aid in conveying information properly; however, as stated for user 1 this won’t be necessarily sought after.
Filtering poses an additional problem for the interpretability of the Muller plots and when this filtering is done during calculations of how to display the data. Most mutations are below the limits of detection throughout the genome within somatic tissue. For user 1 this matters a lot. If you visualize perfect resolution data using a Muller plot you will almost certainly want to filter the data using a threshold. EvoFreq uses a default of 0.1 (10%), meaning that a clone/subclone must have reached >=10% of the population at some point in its lifetime. There is also options for displaying only extant clones (i.e. those clones that exist at the end timepoint).
All of this said, Muller plots are undoubtedly the most favored choice for displaying clonal dynamics over time that conveys abundance information. The bottom line, as with most plots, please be careful to accurately describe and interpret Muller plots and what information they can hold and at what resolution. I’m sure that this can lead to debate over some of the options that are used to display Muller Plots and I may very well be wrong about some of this, discuss or correct me with #MullerPlots on twitter (so others can benefit) @research_junkie.