Skip to content

Molecular identification

Diagnosticians should ideally obtain DNA from specimens using relatively non-destructive techniques, to ensure that a voucher specimen is available for future morphological re-examination (Floyd et al. 2010).

See the DNA extraction page for methods used for the molecular identification of fruit flies.

The molecular techniques presented here involve the amplification of particular regions of the fly genome using a polymerase chain reaction (PCR).

PCR targets are either the mitochondrial gene for cytochrome oxidase subunit I (COI), known as the DNA barcode region, or a region of the ribosomal RNA operon (either just the first internal transcribed spacer (ITS1) or part of the 18S subunit plus the ITS1), or regions of nuclear protein coding genes (POP4, EIF3L, RPA, DDOST). These latter have been developed as fruit fly-specific diagnostic loci (see Krosch et al. 2017).

Sequencing of the universal COI barcode has become a standard protocol for all Australian diagnostic labs. This technique can provide information about population-level variation and has an international, independently growing reference dataset. However, difficulties can still arise among closely related species within species complexes.

For the ITS1, the size of the PCR amplicon is useful for identification of a few species. However, restriction digestion of the ITS1 PCR amplicon, which denotes the actual sequence in defined regions of the amplicon, is recommended for all analyses as a more robust method of identification. This is referred to as restriction fragment length polymorphisms (RFLP) analysis. Reference data has been developed for the economically important species. However, the RFLP does not necessarily eliminate non-economic fruit flies for which reference data have not been developed.

In addition to the tradition DNA barcode region (COI), a new suite of diagnostic markers have been developed. Further, there have been some updated to the COI primers used for Diptera to mitigate issues surrounding the amplification of numts (pseudogenes).

Three methods are presented here, all of which can use a common sample storage and handling technique, DNA extraction method, and are based on PCR (Polymerase Chain Reaction) analysis.

Choice of method

DNA barcoding is generally recommended over the PCR-RFLP (restriction fragment length polymorphism) methods because:

(a) DNA barcoding can produce better resolution between species as it utilises variation in the complete sequence amplified. PCR-RFLP is limited to variation at just a few or several 4-6 bp restriction sites within the amplicon, the suite of which are dependent on the nature of the restriction enzymes used

(b) DNA barcoding uses a very large reference sequence database. This is international, publicly accessible and constantly being added-to by unrelated institutions and projects. Consequently there is greater inclusion of taxonomically comparative species and population data to improve confidence in identification. PCR-RFLP generally relies on in-house developed reference restriction patterns; therefore comparative species and population data are incorporated at a significantly reduced rate without contributions from independent and international laboratories

(c) DNA barcoding is quantifiable and accessible to bioinformatics analyses. PCR-RFLP is essentially qualitative, relying on visual inspection against control samples and molecular weight markers.

If access to a laboratory with DNA sequencing equipment is difficult, PCR-RFLP is a useful alternative for the majority of species. There are also some species for which identification is ambiguous with DNA barcoding but not for PCR-RFLP (see Diagnostic methods used to identify fruit flies). However, while this may be a function of the different gene regions used, it may also be a result of the many more species and populations included in the DNA barcode database covering more of the variation.

Gene regions used

DNA barcoding generally utilises a mitochondrial locus within the cytochrome oxidase I (COI) gene as a standard barcode; however, recently developed nuclear protein coding loci can also be used and are discussed here. RFLP methods both utilise a ribosomal DNA (rDNA) gene region that includes the first internal transcribed spacer (ITS1).

Both gene regions have been chosen for the suitability of their sequences to be distinct between species (Jinbo et al. 2011; Wang et al. 2015).