The COVID-Miner is a bioinformatics framework that enables to the extraction of single mutations, Pangolin lineages, and consensus from GISAID SARS-CoV-2 sequences. This interactive Data Portal stands as a user-friendly interface to browse the most prominent lineages and Receptor Binding Domain (RBD) Spike mutations worldwide. The choice of RBD variants stems from the scientific evidence that immune response escape from vaccines and neutralizing antibodies mostly lies in this section of the SARS-CoV-2 genome.
The workflow takes advantage of the consensus command from bcftools to construct a consensus sequence separately for the complete viral genome, Spike protein, and RBD.
The result of this work has been published on Journal of Translational Medicine (Massacci A. 2020)
The implemented workflow takes as input a multi-FASTA file of viral sequences, automatically fetched from the GISAID feed and aligned against the reference Wuhan strain (NC_045512.2) via the NUCmer function from the MUMmer package. This generates a delta encoded alignment file which is then parsed using the show-snps utility to produce a catalog of all the Single Nucleotide Variants and indels internal to the alignments between the reference and query sequences. Show-snps output is converted into standard VCF and the genetic variants are annotated using snpEff canonical transcripts. A human-readable table is then formatted thanks to the SnpSift jar package of snpEff and used to compute the absolute frequency of each annotated mutation.