Microbial or Other Small Genome Sequencing

Due to the emergence of powerful next-generation sequencing (NGS) technologies, it has never been easier to de novo sequence or re-sequence whole genomes. One of Microsynth’s core competencies is the sequencing of small microbial genomes as well as profiling microbial communities. Besides offering its customers state-of-the-art NGS platforms, Microsynth has also made signifcant investments into the bioinformatics area. This application note gives you an overview about Microsynth’s  various bioinformatics tools/services in the field of microbial re-sequencing as well as its possible impact on your research. In brief, Microsynth offers a basic bioinformatics service for detecting small InDels & substitutions and its effect on proteins as well as a more advanced service capable of finding even difficult-to-detect large insertions and deletions (see also the application note "Microbial Illumina Resquencing").


Re-sequencing including Basic Bioinformatics Analysis

Sequencing: DNA quality check, library preparation and MiSeq sequencing is performed from either genomic DNA isolated by the client or DNA isolated at Microsynth.

Mapping: Sequencing reads are mapped to the reference sequence/genome; this step builds the foundation for the following analyses.

Variant Calling: Possible small insertions, deletions and substitutions are detected and reported in the VCF format.

Annotation of Variants: A variant calling step results in a user-friendly graphic summarizing the major findings of all investigated samples (e.g. bacterial strains).  For those variants which occur within protein coding regions, the impact on the translated amino acid sequence is shown. As a consequence, each mutation detected can be specified (silent vs. missense vs. nonsense mutation).

Provided Output Files: Raw data: Fastq;

Mapping: BAM files (see Fig. 1); Variant calling, protein consequences: VCF (for each sample separately) and HTML (includes all samples, see Fig. 2)


Re-sequencing including Advanced Bioinformatics Analysis

The advanced bioinformatics analysis includes the results from the basic analysis  and additionally summarizes all observations providing evidence for possible large deletions and insertions. Such mutations are difficult to detect using standard bioinformatics tools and often need additional experiments for confirmation.

Large InDels: Possible large insertions and deletions are detected using a breakpoint identification algorithm. Regions are reported in a table visualizing the read alignment at the position of the candidate InDel.

Unmapped Reads: Reads which could not be mapped to the reference sequence are de novo assembled and the resulting contigs are BLASTed to NCBI‘s nucleotide database. This analysis provides useful information about sample contaminants (e.g. plasmids, phages)  etwhich are not part of the reference sequence.  In addition, large indels can often be recovered in the de novo assembly.

Provided Output Files: Large InDels: HTML (see Fig. 3); Unmapped Reads: FASTA (assembled contigs), NCBI BLAST hits.


Examples for Output Files Provided by Microsynth

Figure 1: Results of the mapping are stored in a BAM file and can be viewed using the freeware Integrated Genomics Viewer (IGV). The IGV allows to browse and visualize the mapping from the single nucleotide position to whole genome level. The above figure shows a typical result of a mapping with the coverage at the top (in red) and the position of the single sequencing reads below (in grey). The mapping builds the basis for all further bioinformatics analyses.

Figure 2: Typical output overview file (HTML) resulting from the variant calling step for SNPs and small InDels. In the first block, the number of reads matching the reference and/or the variant is shown for each sample (in this case three bacterial strains). In the second block of the table, variant information such as the reference sequence, the position of the variant on the reference as well as the kind of variant are displayed. In the third block the effect of SNPs and small InDels for all annotated features of the reference Genbank file are shown. The DNA sequence as well as the protein sequence for the wild type (upper line) and the variant (lower line) is displayed. Moreover, the affected positions are highlighted in yellow in the sequences.

Figure 3: Example analysis for detection of large InDels in a genome. The above graphics shows the presence of a 1006 bp deletion in the bacterium Pseudomonas. 14 sequencing reads were found to show this deletion.

rechte sp
Contact Form
Interested to discuss your NGS project with an expert or to receive an offer? Then, please fill in our NGS contact form

rechte sp
to the top