Full title: Impact and mitigation of sampling bias to determine viral spread: Evaluating discrete phylogeography through CTMC modeling and structured coalescent model approximations.
Layan M, Müller NF, Dellicour S, De Maio N, Bourhy H, Cauchemez S, Baele G.
Bayesian phylogeographic inference is a powerful tool in molecular epidemiological studies, which enables reconstruction of the origin and subsequent geographic spread of pathogens. Such inference is, however, potentially affected by geographic sampling bias. Here, we investigated the impact of sampling bias on the spatiotemporal reconstruction of viral epidemics using Bayesian discrete phylogeographic models and explored different operational strategies to mitigate this impact. We considered the continuous-time Markov chain (CTMC) model and two structured coalescent approximations (Bayesian structured coalescent approximation [BASTA] and marginal approximation of the structured coalescent [MASCOT]). For each approach, we compared the estimated and simulated spatiotemporal histories in biased and unbiased conditions based on the simulated epidemics of rabies virus (RABV) in dogs in Morocco. While the reconstructed spatiotemporal histories were impacted by sampling bias for the three approaches, BASTA and MASCOT reconstructions were also biased when employing unbiased samples. Increasing the number of analyzed genomes led to more robust estimates at low sampling bias for the CTMC model. Alternative sampling strategies that maximize the spatiotemporal coverage greatly improved the inference at intermediate sampling bias for the CTMC model, and to a lesser extent, for BASTA and MASCOT. In contrast, allowing for time-varying population sizes in MASCOT resulted in robust inference. We further applied these approaches to two empirical datasets: a RABV dataset from the Philippines and a SARS-CoV-2 dataset describing its early spread across the world. In conclusion, sampling biases are ubiquitous in phylogeographic analyses but may be accommodated by increasing the sample size, balancing spatial and temporal composition in the samples, and informing structured coalescent models with reliable case count data.
More information at https://doi.org/10.1093/ve/vead010