Comparative Genomics: CG7
CG7 provides a complete service for comparison of sets of bacterial genomes:
1. In silico MLST
2. Analysis of SNPs in the core genome
3. Whole pair-wise genome comparison with Differences program
4. Orthologous table
We have designed CG7 so it fits in a wide range of project types: from large scale epidemiological projects for analyzing thousands of genomes to really specific projects focused on functional differences among a small set of strains of interest.
It provides insights on the genomic differences at very different levels:
- The smallest scale, the most specific level: one genome
- Comparison of two genomes
- Large scale comparative genomics projects

One service from different approaches:
- De novo assembled genomes comparative analysis
- Reference-guided comparative analysis

Service description:
1. In silico MLST
It consists in the in silico typing using the genome sequences. It is based on the sequence types (ST) defined for each species in the corresponding MLST database
2. Analysis of SNPs in the core genome
The search of SNV (Single Nucleotide Variants) or SNP (Single Nucleotide Polymorphisms) will be focused on conserved genome, also known as core genome, avoiding the analysis of repetitive regions, mobile elements or phage regions. The strategy of analysis is similar to the carried out in the reference: PubMed ID: 24066741
Focusing on the core genome and avoiding working with sequences likely to be subject of horizontal gene transfer or recombination allows us to infer the evolutionary distance on the strains and build phylogenetic trees that could be interesting for epidemiological or evolutionary purposes.
- Mapping and SNP calling
The reads of each genome are mapped to a reference genome and then the SNV detection is performed analyzing the alignment locally. The SNP calling will be done across all the mapped core genome sites.
- Effects of the detected SNPs
The filtering and evaluation of the effect of the variants is performed providing data of the location of the SNPs with respect to the annotated genes of the reference genome.
- Phylogenetic tree
A phylogenetic tree of the strains under study is generated based on the SNPs detected in the core genome.

3. Whole pair-wise genome comparison with Differences program
- Detection of insertions and deletions of any length across all the genome
Differences program compares two genomes at a whole genome level. It is specially well-suited for the detection of substitutions, and insertions or deletions of any length and at any region of the genome (not only in the core genome).
- Differences in the genomic context
The differences between the two compared genomes are also provided in the genomic context of the BG7 annotation [Pareja-Tobes-2012] for each genome. It allows us a better evaluation of their possible implications in phenotypic changes or in epidemiological identification. We use Mauve tool for the alignment of the two genomes and then we integrate the detected differences with the functional annotations obtained with BG7 [Pareja-Tobes-2012]. It allows analyzing the differences in gene sequences as well as in intergenic non coding regions.
4. Orthologous table: The community pangenome
Firstly we build a “pangenome set of proteins” representing all the proteins encoded by all the genes from all the genomes of the set to be compared. Secondly, we detect all the ortholous proteins at each genome and build the orthologous table.
A rich functional annotation for each protein representing a “pangenome protein” is provided.
Developed by Web4Bio