Shotgun Sequencing

Shotgun Metagenomics is a gene-based method for analyzing the biological communities in environmental samples such as water, sediment and soil. Here, a mixture of genomic DNA originating from various organisms (mostly microorganisms) is sequenced and analyzed. This approach can be used to achieve multiple different aims, including taxonomic identification, functional analysis (the role of an organism in the ecosystem, e.g. nitrogen fixation), and the characterization of whole genomes.

The workflow begins with sampling. Sampling methodology depends on the environment in question: a soil sample, for example, may be taken with a soil corer, whereas a water sample may be taken with a simple sterile bottle. To ensure reliable data, it is important to use a standardized protocol, to carefully and appropriately select sampling sites, and to include field negative controls and sample replicates. Because shotgun metagenomics is usually used to examine the microbial community, which can metabolize and change very rapidly, the sample must be preserved or processed immediately after it has been taken. Options to halt the biological activity and degradation of the sample include freezing (e.g. in liquid nitrogen, which cools down to -196°C) or buffering in a solution that stabilizes the DNA (e.g. ethanol for whole organisms).

The DNA is then extracted under sterile conditions and prepared for sequencing via several processing steps. DNA fragment length and concentration are measured and adjusted to be equal across samples. In contrast to metabarcoding, which uses primers of known sequence identity, here the sequences at the ends of each fragment are not known. Thus, adapters with known sequences are added to the ends of the fragments. Sample-specific “index” sequences, consisting of seven bases, are then attached to these adapters. This allows different samples to be combined (“pooled”) for sequencing without losing information on their sample origin. These steps are summarized as “Library Preparation.” Next-generation sequencing is then carried out using an Illumina platform. The name “shotgun sequencing” is based on the fact that large numbers of DNA fragments are sequenced simultaneously.

After this, the sequence data undergo bioinformatic processing. First, the sequences are assigned to their respective samples based on their indices (“demultiplexing”), and the indices and primers, which are no longer required, are cut off (“primer trimming”). This is followed by a quality control step in which, for example, fragments that are too short are removed from the data set. The processed sequences are then assigned to taxonomic or functional groups (e.g. nitrogen-fixing genes) using a reference database. Reference databases contain known sequences of certain species, against which the unknown sequences of the sample can be compared. In the case of whole-genome analysis, the individual DNA fragments are reassembled into one long sequence. The taxonomic and/or functional compositions of the samples can then be statistically analyzed.

The chief obstacles to this method are the high cost and the relative lack of reference databases. As a result, the assignment of taxonomic identities and functions can be difficult. However, a major advantage is that, unlike metabarcoding, shotgun metagenomics does not rely on PCR. This circumvents the problem of amplification bias, which can skew the abundance of sequences originating from different taxa. By avoiding PCR, shotgun metagenomics thus enables more reliable inferences about the relative biomass of different taxa in a sample. Furthermore, this method can be used to examine entire communities and their functional biology. Overall, shotgun metagenomics is a very promising option for analyzing environmental samples, which we expect will become increasingly relevant as costs decrease and reference databases improve.