Massart S., Chiumenti M., De Jonghe K., Glover R., Haegeman A., Koloniuk I., Kominek P., Kreuze J., Kutnjak D., Lotos L., Maclot F., Maliogka V., Maree H. J., Olivier T., Olmos A., Pooggin M. M., Reynard J.-S., Ruiz-García A. B., Safarova D., Schneeberger P. H. H., Sela N., Turco S., Vainio E. J., Varallyay E., Verdin E., Westenberg M., Brostaux Y., Candresse T.
Virus detection by high-throughput sequencing of small RNAs: large scale performance testing of sequence analysis strategies.
Recent developments in high-throughput sequencing (HTS), also called next-generation
sequencing (NGS), technologies and bioinformatics have drastically changed research on viral
pathogens and spurred growing interest in the field of virus diagnostics. However, the
reliability of HTS-based virus detection protocols must be evaluated before adopting them for
diagnostics. Many different bioinformatics algorithms aimed at detecting viruses in HTS data
have been reported, but little attention has been paid so far to their sensitivity and reliability
for diagnostic purposes. We therefore compared the ability of 21 plant virology laboratories,
each employing a different bioinformatics pipeline, to detect 12 plant viruses through a
double-blind large scale performance test ten datasets of 21-24 nt small (s)RNA sequences
from three different infected plants. The sensitivity of virus detection ranged between 35 and
100% among participants, with a marked negative effect when sequence depth decreased. The
false positive detection rate was very low and mainly related to the identification of host
genome-integrated viral sequences or misinterpretation of the results. Reproducibility was
high (91.6%). This work revealed the key influence of bioinformatics strategies for the
sensitive detection of viruses in HTS sRNA datasets and, more specifically (i) the difficulty to
detect viral agents when they are novel and/or their sRNA abundance is low, (ii) the influence
of key parameters at both assembly and annotation steps, (iii) the importance of completeness
of reference sequence databases and (iv) the significant level of scientific expertise needed
when interpreting pipelines results. Overall, this work underlines key parameters and proposes
recommendations for reliable sRNA-based detection of known and unknown viruses.