The MitochondrialGenome of the Lepidopteran Host Cadaver (Thitarodessp.) of Ophiocordyceps sinensis

发表时间:2023-08-24 11:36

The MitochondrialGenome of the Lepidopteran Host Cadaver (Thitarodessp.) of Ophiocordyceps sinensis andRelated Phylogenetic Analysis

Xincong Kanga,b,c, Yongquan Hub,c, Jiang Hud,Liqin Hub,c, Feng Wangd, Dongbo Liua,b,c*

aHunanProvincial Key Laboratory of Crop Germplasm Innovation and Utilization, HunanAgricultural University, Changsha, Hunan, P. R. China;

bStateKey Laboratory of Subhealth Intervention Technology, Changsha, Hunan, P. R.China;

cHorticultureand Landscape College, Hunan Agricultural University, Changsha, Hunan, P. R.China;

dNextomics Biosciences, Wuhan, Hubei,P. R. China


*Corresponding author

Room 221, LifeScience Building, Hunan Agricultural University, No. 1 Nongda Road, FurongDistrict, Changsha, Hunan, P. R. China

E-mail: chinasaga@163.com


Abstract

To understand the phylogeny ofthe host insect (Thitarodes sp.) of the fungus Ophiocordyceps sinensis,we sequenced, annotated and characterized the complete mitochondrial(mt) genome of the host cadaver of a natural O. sinensis. Further, we compared the Thitarodes sp. mt genome with those of the other 7 sequenced Hepialidaeand examined the phylogenetic relationships using a constructedMaximum Likelihood (ML) phylogenetic tree and mt genomic features (genetic distancesand intergenic spacers). The mt genome is a circular molecule of 16,280 bp inlength with a high A+T content (81.20 %) and contains 13 protein-codinggenes (PCGs), 2 rRNA genes, 22 tRNA genes and an AT-rich region. The genearrangement is identical to the ancestral arrangementbut differs from those of other lepidopteran mt genomes because of thearrangement of tRNA genes. The tRNA region, which is located between theAT-rich region and nad2, is trnI/trnQ/trnM (IQM) in Thitarodes sp., rather than the trnM/trnI/trnQ (MIQ) of theLepidoptera-specific rearrangement. All PCGs begin with thecanonical start codons ATN or NTG, except forcox1, which starts with CGA. Most PCGsterminate with the typical stop codon TAA, although some have an incompletestop codon (T). The 1,473 bp AT-rich region is located between the rrnS (12S rRNA) and trnI, which is the longest sequenced in a Thitarodes mt genome to date, containing nine 112 bp copies and onepartial copy of a 55 bp sequence. The results derived from the phylogenetictree, the genetic distances and the intergenic spacers of the mt genome showthat the host insect of O. sinensis belongsto the Thitarodes, while Endoclita signifer and Napialus hunanensis form a relatively distinct lineage fromThitarodes. The sequence and fullannotation of this moth mt genome will provide more molecular information aboutthe Exoporia within the Lepidoptera, and the clarification of its phylogenywill improve the management of this insect resource and theconservation and sustainable use of this endangered medicinal speciesin China.

Keywords: Mitochondrial genome, Lepidoptera, Phylogenetic analysis, Ophiocordyceps sinensis, Host insect, Thitarodes


1.   Introduction

Ophiocordyceps sinensis (Berk.) G.H. Sung, J.M. Sung, Hywel-Jones & Spatafora (Ascomycota:Ophiocordycipitaceae) is a fungus that parasitizes moth larvae and convertsthem into sclerotia from which the fungus fruiting body grows (Pegler et al., 1994). The natural O. sinensis, a combination of the fungusand insect larva (Wang, 1995), has been used for centuries as a traditionalChinese medicine for treating asthma, respiratory infection and kidney diseases(Zhu et al., 1998; China, 2010). Dueto its high commercial value, the species has become endangered in recent yearsbecause of overexploitation and habitat degradation (Zhou et al., 2014). The natural O.sinensis population is extremely limited; it is confined to the Tibetan Plateau and itssurrounding regions, at elevations of 3,000 meters and higher (Li et al., 2011). Therefore,protection of O. sinensis and itshost is critically needed (Zhu, 2006). Species of Hepialidae (Lepidoptera) havelong been considered as the main host insect of O. sinensis based on the geographic distribution, altitude and biological characteristics of the insects.However, there is little information about host identification by molecularanalysis.

Due to its unique characteristics, including small genome size, fastevolutionary rate, relatively conserved gene content and organization, maternalinheritance and limited recombination, mitochondrial (mt) DNA has been widely usedin phylogeny and evolutionary biology (Saccone et al., 1999; Zhang et al.,2015). To date, 282 lepidopteran mt genomes (Exoporia: 7, Heteroneura: 275)have been deposited in GenBank. Compared with the number of known species of Lepidoptera(180,000 species, http://www.ucl.ac.uk/taxome/), the existing molecularinformation regarding   Lepidoptera,especially the Exoporia, is rather limited. The sizes of these 7 Exoporia mtgenomes range from 15,064 bp to 16,173 bp. Their arrangement of tRNA at thejunction between the AT-rich region and nad2is trnI/trnQ/trnM (IQM), unlikethe most common type in other Lepidoptera, trnM/trnI/trnQ(MIQ) (Cao et al., 2012; Chen et al., 2015; Shi et al., 2015; Yi et al.,2016).

In this study, we collected the O.sinensis complex of stroma and host cadaver, enriched the mitochondrion ofthe host cadaver, sequenced the mt genome by a combination of massivelyparallel DNA sequencing and PCR sequencing, analyzed the complete mt genome ofthis host cadaver, and constructed the phylogenetic relationship with otherLepidoptera. Clarification of the host insects of O. sinensis will not only provide additional molecular informationabout Lepidoptera but also provide basic clues for the management of these insectresources and for the conservation and sustainable use of this endangeredmedicinal species.

2.   Materialsand Methods

2.1 SampleCollection

Samples of the O. sinensishost insect (Thitarodes sp.) were isolated fromthe natural O. sinensis, which contained both the stroma and the moth cadaver. Fresh specimens werepurchased in a local market in Guoluo, Qinghai Province, China (Latitude 34.48°N, Longitude 100.23°E). All fresh O. sinensis specimens were washed thoroughly on site in running water with gentlebrushing, soaked in 75 %ethanol for 10 min for surface sterilization and washed again 3 times withsterile water. These samples were snap-frozen in liquid nitrogen immediatelyafter sampling, transported to our laboratory by using the solid carbon dioxide,and stored at -80°C until further use. The natural O. sinensis was preliminarily identified based on morphologicalcharacteristics (Shrestha et al.,2010).

2.2 DNAExtraction and PCR Amplification

The host insect mitochondrion was isolated using amitochondrion extraction kit (Solarbio, Beijing) according to the manufacturer’srecommendation. Total mt DNA wasextracted by a modified CTAB method (Kang et al., 2011) and sequenced intwo ways. In one method, we sequenced the mt genome by paired-end 150sequencing with an insert size of 400 bp using Illumina HiSeq Platform. In theother method, 11 overlapping fragments (F1 to F11) totally covered the entiremt genome by long PCR using PrimeSTAR® GXL DNA Polymerase (Takara, Dalian). Thesequences of primer pairs 1-9 are from Cao and coworkers (Cao et al.,2012), while primer pairs 10 (F: 5’-CAAAATAAYCCTTTARTCAGGCAC-3’; R: 5’-CGCAACTGCTGGCACAAAAT-3’) and 11 (F: 5’-ATGAAAAATGAATCGGTTGTAAAT-3’, R: 5’- ATTTA GCAAGATTTATTACTCGTT-3’) were designed using Primer Premier5 based on the obtained sequence results. All primers were synthesized by Sangon Biotechnology Co. Ltd. (Shanghai, China).

The fragments were amplified in a volume with 200 μM of each dNTPs, 0.2 μM of each primers, 1.25 U/50 μL of DNA polymerase and 20-100ng of DNA. The reaction wasperformed as follows: an initial denaturation at 95 °Cfor 4 min, followed by 30cycles of denaturation at 98 °C for 10 s, annealingat 56 °C for 15 s, extension at 68 °C for 2 min, and then a final extension at68 °C for 5 min. All products were checked using agarose gels and sequenced directly in two directions at Sangon Biotechnology Co.Ltd. (Shanghai, China).

2.3 SequenceAssembly and Annotation

Approximately 4,990 Mb ofhigh-quality clean reads were produced by the Illumina platform after filteringout low-quality and adapter-contaminated reads. Thesequencing reads of the mt genome were extracted from the raw data by mappingto the mt genomes of other Lepidoptera using bwa (Li and Durbin, 2009). Approximately15.26 Mb of reads were extracted and then assembled using SOAPdenovov2.04 (Li et al., 2010). Using the PCR product sequencing data, this mtgenome was further assembled as a linear contig using SeqMan Pro software (DNASTARInc., Madison, WI, USA). Because insect mt genomes are circular, we filled in thegap by using PCR amplification with primers designed at the 3’ and 5 termini of this linear contig.Finally, we adjusted the linear representation of the circular DNA by circular permutationso that the sequence started with trnI (GAT), which is always reportedas the beginning of ancestral lepidopteran mt genomes.

The mt genome sequences wereannotated using MITOS WebServer (http://mitos.bioinf.uni-leipzig.de) (Bernt etal., 2013). The nucleotide sequences of protein-coding genes (PCGs) were translatedto protein sequences using the invertebrate mitochondrial code (transl_table=5).The MITOS predictions were manually curatedusing other published lepidopteran mt genomes as references. The genebeginnings and ends were modified, if necessary, to be consistent with otherspecies. The rrnL gene was annotated to extend to the boundaries of theflanking trnL and trnV. The 5' end of rrnS was determinedvia comparison with orthologous genes of other Lepidoptera, while the 3' endwas annotated to be adjacent to the start of trnV.The secondary structures of tRNA genes were also predicted using MITOS (Bernt etal., 2013). For the base composition of the nucleotide sequences, the compositionskewnesswas calculated by Lobry as follows: AT skew = [A−T]/[A+T], GC skew =[G−C]/[G+C] (Lobry, 1996).

2.4 EvolutionaryAnalysis

The evolutionary relationshipwas assessed using a combination of phylogeneticanalysis, comparison of genetic distances and intergenic spacers. A phylogenetictree was constructed by the ML method using whole mt genome data. Theconcatenated nucleotide sequences of 13 PCGs in 17 complete mt genomes (SupplementaryMaterial, Table S1)were aligned using MAFFT v7.215 (Katoh and Standley, 2013), and unreliablyaligned regions were removed using Gblocks (Castresana, 2000). Eubasilissaregina (Trichoptera) was used as an outgroup. The best model used in the MLphylogenetic tree was determined by using “find best DNA/protein models (ML)”in Mega 7.0. The “GTR+G” model producedthe lowest values for both the BIC (Bayesian Information Criterion) and theAICc (corrected Akaike information criterion), therefore it was chosen forphylogenetic analysis. The phylogenetic tree among the Lepidoptera was constructed using MEGA7.0 (Kumar et al., 2016) with 500 bootstrap replicates.

Genetic distance is generallyconsidered to be an important aspect of species discrimination (Hebert et al., 2003).To further explore the evolutionary development of the O. sinensis host insect with otherspecies, we calculated Kimura's two parameter (K2P) genetic distances amongdifferent genes and genome regions using MEGA 7.0 (Kumar et al., 2016).Pairwise distances were calculated among all sequenced Hepialidae individuals (SupplementaryMaterial, Table S1) and compared with that of the O. sinensis host insect.

The intergenic regions do notencode genes; as a result, intergenic regions are under morerelaxed selection and evolve more quickly than coding genes. Consequently, intergenicregions can be used to analyze the evolutionary relationships of closelyrelated species (Ghikas et al., 2010). We hand-counted and compared theintergenic-spacer sequences of Hepialidae to further evaluate the phylogeneticdevelopment of the host insect of O. sinensis.

3.   Resultsand Discussion

3.1 GenomeOrganization, Base Composition, and Gene Arrangement

The complete mt genome of this moth (Thitarodes sp.) is a circular DNAmolecule of 16,280 bp in length (GenBank No.: KX527574; Table 1), making it thelargest mt genome in the Hepialidae to date (Table 2). Itis comparable to other lepidopteran mt genomes, which rangefrom 14,535 bp in Ostrinia nubilalis (Coates et al.,2005) to 16,499 bp in Prays oleae (van Asch et al., 2016). The large mt genomesize is primarily attributed to the expansion of the AT-rich region (Fig 1;Table 2) (Cao et al., 2012; van Asch et al., 2016).


Table 1 Annotation of the Thitarodes sp.mt genome

Gene

Strand

Gene position

Intergenic region

Start codon

Stop codon

Anticodon

trnI

H

1-66

0



GAT

trnQ

L

67-135

0



TTG

trnM

H

139-208

3



CAT

nad2

H

209-1226

0

ATT

T


trnW

H

1227-1292

0



TCA

trnC

L

1285-1351

-8



GCA

trnY

L

1358-1424

0



GTA

cox1

H

1427-2957

2

CGA

T


trnL2

H

2958-3026

0



TAA

cox2

H

3028-3709

1

ATG

T


trnK

H

3710-3779

0



CTT

trnD

H

3780-3844

0



GTC

atp8

H

3845-4006

0

ATA

TAA


atp6

H

4000-4677

-7

ATG

TAA


cox3

H

4677-5465

-1

ATG

TAA


trnG

H

5468-5533

2



TCC

nad3

H

5534-5885

0

ATT

T


trnA

H

5886-5954

0



TGC

trnR

H

5958-6023

3



TCG

trnN

H

6028-6093

4



GTT

trnS1

H

6094-6153

0



TCT

trnE

H

6154-6218

0



TTC

trnF

L

6221-6286

2



GAA

nad5

L

6287-8024

0

ATT

T


trnH

L

8025-8091

0



GTG

nad4

L

8093-9433

1

ATG

TAA


nad4L

L

9433-9726

-1

ATG

TAA


trnT

H

9729-9794

2



TGT

trnP

L

9795-9858

0



TGG

nad6

H

9861-10385

2

ATA

TAA


cob

H

10385-11530

-1

ATG

TAA


trnS2

H

11536-11608

5



TGA

nad1

L

11624-12556

15

TTG

TAA


trnL1

L

12557-12627

0



TAG

rrnL

L

12628-13962

0




trnV

L

13964-14028

1



TAC

rrnS

L

14029-14807

0




AT-rich region

H

14808-16280

1473




H and L refer to the heavy and light strands, respectively.

Positive values of intergenic regions indicate gapnucleotides; a negative value indicates overlapped nucleotides.


Fig 1. Circular map of the mitochondrial genome of Thitarodes sp..Genes on the H-strand are located on the first circle (from outermost toinnermost), while genes on the L-strand are on the second circle. On the thirdcircle, green represents the locations of tRNAs and rRNAs. The fourth circleshows the GC content, which is plotted using a sliding window (100 bp), as thedeviation from the average GC content of the entire sequence. The black colorrepresents below-average GC content in the 100 bp window, while the deep brick redindicates above-average GC content. The inner circle indicates the GC skew(dark green: G>C; dark purple: G<C).


Table 2 Comparison of the nucleotide composition of mitochondrialgenomes of the moths from family Hepialidae

Species

Thitarodes sp.

Ahamus   yunnanensis

Endoclita   signifer

Hepialus   xiaojinensis

Napialus   hunanensis

Thitarodes   gonggaensis

Thitarodes pui

Thitarodes   renzhiensis

Size (bp)

16,280

15,816

15,285

15,397

15,301

15,940

15,064

16,173

A+T (%)

81.20

82.34

81.90

81.13

81.85

81.37

80.73

81.28

AT-rich region

Size (bp)

1,473

978

389

635

368

1,134

287

1,367

A+T (%)

88.60

89.37

95.40

89.00

93.20

90.60

92.30

90.56











PCGs

No. codons

3,720

3,720

3,720

3,709

3,721

3,719

3,720

3,720

A+T (%)

78.96

80.58

80.27

79.47

80.16

79.29

79.14

78.99











tRNA

Size (bp)

1,473

1,495

1,482

1,473

1,468

1,474

1,475

1,474

A+T (%)

83.30

85.22

84.00

83.60

84.40

83.40

83.90

83.44











rrnL

Size (bp)

1,335

1,329

1,344

1,325

1,324

1,335

1,281

1,335

A+T (%)

85.50

86.00

85.50

85.40

85.30

85.40

84.20

85.39











rrnS

Size (bp)

779

777

782

740

782

779

777

779

A+T (%)

85.40

86.10

84.90

85.50

85.50

85.80

85.70

85.37











1st codon position

A (%)

37.07

36.94

36.23

37.23

35.80

37.16

36.80

37.18

T (%)

39.24

38.41

38.13

37.61

38.94

37.76

37.85

37.66

A+T (%)

76.32

75.35

74.36

74.84

74.74

74.92

74.65

74.84











2nd codon position

A (%)

21.97

22.10

21.96

21.97

22.06

22.05

22.18

22.04

T (%)

48.57

49.09

48.70

48.88

48.62

48.68

48.68

48.82

A+T (%)

70.53

71.18

70.65

70.85

70.68

70.73

70.86

70.86











3rd codon position

A (%)

42.99

46.10

45.34

43.97

44.24

43.42

44.17

43.20

T (%)

47.87

49.11

50.47

48.72

50.85

48.47

47.72

48.04

A+T (%)

90.87

95.22

95.81

92.69

95.08

91.89

91.88

91.24

GenBank No.

KX527574

NC_018095.1

NC_029873.1

NC_028348.1

NC_024424.1

NC_026903.1

NC_023530.1

NC_018094.1


The proportions of A,T, C, and G of the H-strand in this moth mt genome indicate a remarkably high A+Tcontent (81.20 %) and low G+C content (18.80 %) (Table 2), which is consistentwith the AT-bias that is generally observed in other lepidopteran mt genomes (Kimet al., 2006; Niu et al., 2016). The A+T content of Grapholita dimorpha Komai (74.84 %) is the lowest reported in Lepidoptera(Niu et al., 2016), while that ofCoreana raphaelis is the highest, at up to 82.7 % (Kim et al., 2006).

The AT and GC skewvalues in the H strand of this moth mt genome (Fig 1, AT-skew=0.010, GC-skew=0.191) indicate aslight A skew and a moderate C skew. A similar trend has been observed inother lepidopteran mt genomes, with the AT-skew varying from−0.04742 (C. raphaelis)to 0.05878 (Bombyx mori) and the GC-skewranging from −0.31769 (Ochrogasterlunifer) to −0.15802 (C. raphaelis)(Cao et al., 2012).

This moth mt genome contains 13 PCGs, 2 rRNAs, 22putative tRNAs, and an AT-rich region (Table 1, Fig 1). More genes (9 PCGs and14 tRNAs) are located on the heavy strand (H-strand), whereas fewer genes (4PCGs, 8 tRNAs, and 2 rRNA genes) are located on the light strand (L-strand),similar reports of other species of Lepidoptera (Park et al., 2016). The gene order isidentical to the ancestral gene arrangement but differs from that of other Lepidopteradue to the rearrangement of 3 tRNAs between the AT-rich region and nad2.The ancestral type is IQM arrangement at the AT-rich region and nad2 junction, while most lepidopteranmt genomes show MIQ arrangement (Park etal., 2016). This IQM arrangementis also found in other Hepialidae, such as Thitarodesrenzhiensis, Thitarodes yunnanensis(Ahamus yunnanensis) and Hepialus xiaojinensis,as well as inNepticuloidea (Stigmella roborella),which is also an ancient, non-ditrysian lepidopteran group (Cao et al., 2012; Timmermans et al., 2014; Chen et al., 2015; Shi et al.,2015a; Yi et al., 2016).

3.2 Protein-CodingGenes (PCGs)

The 13 PCGs of this moth mt genome include 7 NADH dehydrogenase subunits (nad1-6, nad4L), 3 cytochromec oxidase subunits (cox1-3), 2 ATPasesubunits (atp6, atp8), and one cytochrome b gene (cob). The total length of these 13 PCGs is   11,189 bp, accounting for 68.73 % of the entiremitochondrial genome, encoding 3,720 amino acids (Table 2). Excluding stop codons,the A+T content of the PCGs is 78.96 %, lower than that of the mt genome as awhole (Table 2). The codon frequency analysis shows that a total of 62 codons areused for transcription, with the absence of UAG and UGC (Table 3). The most frequentlyused codon is UUA for Leu, followed by AUU for Ile (Table 3). The fraction ofcodons encoding the hydrophobic amino acids (Met, Trp, Phe, Val, Leu, Ile, Pro,Ala) is 55.29 % (Table 3), reflecting the biased usage of A/T nucleotides andthe hydrophobic nature of respiratory membrane complexes. Thecodon distribution patterns of 8 compared Hepialidae species are consistentwith the finding that Leu, Ile, Phe, Ser, Met and Asn are the six amino acids most frequently used, whereasCys is rare (Fig 2). The six most common amino acids in Hepialidae differedfrom those of the other Lepidoptera, in which the six most common amino acids areAsn, Ile, Leu, Lys, Tyr and Phe (Dai etal., 2015).

Table 3 Codon usage of Thitarodes sp. mitochondrial PCGs

Codon

Amino acid

Number

Codon

Amino acid

Number

Codon

Amino acid

Number

AAA

Lys

99

CAA

Gln

61

GCC

Ala

8

AAC

Asn

18

CAC

His

11

GCG

Ala

2

AAG

Lys

8

CAG

Gln

3

GCU

Ala

65

AAU

Asn

229

CAU

His

56

GGA

Gly

72

ACA

Thr

63

CCA

Pro

43

GGC

Gly

12

ACC

Thr

15

CCC

Pro

11

GGG

Gly

46

ACG

Thr

1

CCG

Pro

1

GGU

Gly

62

ACU

Thr

76

CCU

Pro

62

GUA

Val

60

AGA

Ser

62

CGA

Arg

35

GUC

Val

2

AGC

Ser

5

CGC

Arg

2

GUG

Val

5

AGG

Ser

12

CGG

Arg

4

GUU

Val

56

AGU

Ser

40

CGU

Arg

13

UCG

Ser

5

AUA

Met

286

CUA

Leu

35

UCU

Ser

100

AUC

Ile

28

CUC

Leu

2

UGA

Trp

88

AUG

Met

17

CUG

Leu

2

UGG

Trp

7

AUU

Ile

425

CUU

Leu

13

UGU

Cys

30

GAA

Glu

70

UAA

*

8

UUA

Leu

499

GAC

Asp

13

UAC

Tyr

20

UUC

Phe

23

GAG

Glu

7

UAU

Tyr

174

UUG

Leu

25

GAU

Asp

51

UCA

Ser

84

UUU

Phe

335

GCA

Ala

48

UCC

Ser

13




Fig 2. Comparison of codonusage among Hepialidae mitochondrial genomes.

The base composition at each PCG codon position indicatesthat the third codon position (90.58 %), though somewhat lower than in theother sequenced Hepialidae, harbors a higher A/T content than the first (76.53 %)and second (70.75 %) codon positions (Table 2). A similar codon usage patternwas also detected in other sequenced Hepialidae, with an average of 74.80 % inthe first position, 70.84 % in the second position, and 93.33 % in the thirdposition, respectively (Table 2).

Except for cox1,all PCGs began with a canonical start codon (ATN or NTG). More specifically, sixPCGs (cox2, cox3, atp6, nad4, nad4L and cob) startedwith ATG, three PCGs (nad2, nad3 and nad5) with ATT, two PCGs (nad6and atp8) with ATA, and nad1 with TTG, whereas the start codonfor cox1 was CGA (Table 1). Whenannotating the start codon of cox1,ambiguities may arise due to irregular initiation codons, such as the trinucleotidesCGA in T. renzhiensis (Cao et al., 2012) and TCG in Anopheles funestus (Krzywinski etal., 2006), the tetranucleotide TTAG in C.raphaelis (Kim et al., 2006) and thehexanucleotide ATTACG in Papilio xuthus(Feng et al., 2010). In this study,we tentatively considered that the codon CGA (codingfor arginine) was involved in the initiation signaling of cox1 because this codon is found to be conserved as an initial aminoacid in most Lepidoptera (Kim et al.,2009).

For the stop codon, eight PCGs (atp6, atp8, nad1, nad4, nad4L, nad6, cox3 and cob) are terminatedwith the typical stop codon TAA, while five PCGs (nad2, nad3, cox1, cox2 and nad5) locatedimmediately upstream of tRNAs ended with only T. The incomplete stop codon iscommon in insect mt genomes, and their corresponding transcripts could beprocessed into mature RNA by precise endonucleolytic cleavages using therecognition signals of tRNA secondary structures. The single T stop codon wouldbe a complete and functional stop codon (TAA) after posttranslationalmodification that occurs during the mRNA maturation process (Ojala et al., 1980; Ojala et al., 1981). From the perspective of evolutionary frugality, theexistence of a partial stop codon is proposed to minimize the intergenicspacers and gene overlaps (Hao et al., 2012).

3.3 Ribosomaland Transfer RNA genes

The rrnL (16SrRNA) and rrnS (12S rRNA) genes wereidentified on the L-strand in the mt genome, being 1,335 bp and 779 bp in size,respectively, falling into the reported range for the Hepialidae (1,324-1,375 bp,740-781 bp) (Table 2). The rrnL gene islocated between trnL1 (TAG) and trnV (TAC), while rrnS is located between trnV(TAC) and the AT-rich region (Table 1, Fig 1). The A+T percentages of rrnL and rrnS are 85.50 % and 85.40 %, respectively. These rRNA characteristicsare consistent with those observed in other lepidopterans.

Twenty-two tRNAs are encoded in this moth mt genome,ranging from 60 bp to 73 bp in size and spread across theentire mt genome. All tRNAs were shown to be folded into the expectedclover-leaf secondary structure except for trnS(UCU), which lacks the dihydrouridine (DHU) loop (Fig 3). This feature is commonto most of the available lepidopteran mt genomes (Cong and Grishin, 2016). In addition,the anticodon of trnS1 was UCUinstead of GCU, as found in T.renzhiensis and T. yunnanensis (Caoet al., 2012). A total of 18unmatched base pairs were detected in the moth tRNAs, 12 of which are GU pairs,which formed a weak bond in the tRNAs. The remaining 6 are atypical pairs,including 3 UU pairs, two AC pairs, and one AA pair (Fig 3).

3.4Intergenic Spacers and Overlapping Sequences

In addition to the AT-rich region, this moth mt genome contains a total of 49 bpintergenic regions, distributed in 14 regions. The length of this intergenicspacer ranges from 1 bp to 15 bp, with the longest intergenic region, Spacer1, located between trnS2and nad1and containing the high ATcontent of 93.33 % (Fig 4A). This differs from the mt genomes of other Lepidoptera,in which the longest intergenic spacer is typically located between trnQ and nad2 (Dai et al., 2015;He et al., 2015; Liu et al., 2016; Park et al., 2016). There is no intergenic spacer between trnM and nad2 in Hepialidae.

Spacer 1 of the Thitarodessp. mt genome was observed to contain the conserved motif TATACTA (Fig 4A). However,the sequence of this corresponding conserved motif is ATACTAA in othersequenced lepidopteran mt genomes (Cameron and Whiting, 2008; Salvato et al., 2008; Wang et al., 2016), while it is TATACTA

Fig 3. Secondary structures of22 tRNAs encoded by the Thitarodes sp.mt genome. The tRNAs are labeledwith the abbreviations for their corresponding amino acids.

in all sequenced Hepialidae to date, exceptin T. pui (CATACTA). This motif isalso observed in variable in other Lepidoptera, being ATACTAT in Corcyra cephalonica (Wu et al., 2012) and ATCATAT in Sesamia inferens (unpublished,NC_015835). This motif has been suggested to be a possible binding site of thetranscription termination peptide (mtTERM protein) (Taanman, 1999; Roberti et al., 2003). It is located just downstream of the cob gene in the heavy strand of the mtgenome and might serve to stop the transcription of cob. In addition to this transcriptional stop function, thisbinding site might also control theoverlapping of a large number of transcripts generated by the peculiartranscription mechanism operating in this organism (Roberti et al., 2003).

The mt genome overlaps by 18 bp in 5 locations, with thelongest overlap being 8 bp in length between trnW and trnC (Table 1).Among these overlapped sequences, most were conserved in the number ofoverlapped nucleotides, with the exception of those sequences that overlapped betweennad4 and nad4L (SupplementaryMaterial, Table S2). The 8 bp overlapping sequence between trnW and trnC is conserved as AARYYTTA in the lepidopteran mt genomes (Fig4B).


Fig4. Alignment of the spacer regions between trnS2and nad1 (Fig 4A) and between trnW and trnC (Fig 4B) of Hepialidae insects.

3.5 The AT-richRegion

The 1,473 bp AT-rich region, with an AT content of 88.60 %, is located between the rrnS and trnI genes andis the longest AT-rich region of the sequenced Thitarodes mt genomes to date. Nine 112 bp copies and one partialcopy of a 55 bp sequence were found in the AT-rich region in this moth mtgenome (Fig 5). Conspicuousmacro-repeat units (>50 bp long) have been found in most of theavailable Hepialidae mt genomes, includingthose of T. pui (119 bp), H. xiaojinensis (118 bp), T. gonggaensis (112 bp), T. renzhiensis (113 bp) and T. yunnanensis (107 bp) (Cao et al., 2012; Shi et al., 2015a; Yi et al.,2016). These repeat sequences account for some of the variations in mt genomelength (Fig 5). An explanation for the origin of these repeat sequences isslipped-strand mispairing during mtDNA replication (Levinson and Gutman, 1987;Moritz and Brown, 1987).

The microsatellite-like repeat (TA)6 was foundto be preceded by an ATTTA motif in the AT-rich region inThitarodes sp., whereas it is commonly found asATTTA+(AT)n in other lepidopteran species (Yin et al., 2014; He et al.,2015; Shi et al., 2015b; Wang et al., 2016). However, thismicrosatellite-like repeat is not found in H.xiaojinensis, T. pui or T. renzhiensis in Hepialidae.Therefore, it might not be a common feature in the AT-rich region in Hepialidaeas it is in other Lepidoptera. Additionally, another widely conserved structurethat includes the motif “ATAGA” and a 16-22 bp poly-T stretch has also not beenfound in Hepialidae mt genomes. Combining these several features, including therearrangement of the tRNA gene, the absence of the conserved elements in theAT-rich region, and the absence of the intergenic spacer upstream of nad2,the Hepialidae mt genome displayed mt genome characteristics uniquein Lepidoptera.

Fig5. Schematic representation of the AT-rich region in the sequenced Hepialidae. All currentlydetermined Hepialidae species have the ancestral gene order, which is differentfrom previously sequenced Lepidoptera. There are 2.4 repeat units (A-Bwith 119 bp per unit) in T. pui, 4.4repeat units (a-d with 118 bp per unit) in H. xiaojinensis, 4.9 repeat units (α-δ with 107 bp perunit) in A. yunnanensis, 6.5 repeatunits (- with 112 bp per unit) in T. gonggaensis,8.5 repeat units (I-VIII with 113 bp perunit) in T. renzhiensis and 9.5repeat units (i-ix with 112 bp per unit) in the host insect of O.sinensis (Thitarodes sp.). Similarrepetitive sequences are absent from N.hunanensis and E. signifer.

3.6 EvolutionaryRelationships Based on Phylogenetic Analysis

To confirm the evolutionary position of the host insect of O. sinensis, we built a phylogenetic tree of 17 species (16 ingroupand 1 outgroup) using published mt genomes based on the concatenatednucleotides alignment of 13 PCGs by the ML method (Fig 6). This topology is largely consistent with previously reportedphylogenetic studies (Chen et al.,2015; Shi et al., 2015a). These 16 mt genomes represented five superfamilies(Gelechioidea, Tortricoidea, Bombycoidea, Yponomeutoidea andHepialoidea) in two suborders (Exoporia and Heteroneura). Ahamus, Thitarodes, Hepialus, Endoclita and Napialuswere clustered in Hepialoidea, while E.signifer and N. hunanensis formeda separate monophyletic clade. The O.sinensis host grouped with T.gonggaensis and T. renzhiensis, sothat it could be inferred that the host insect of O. sinensis belongs to the Thitarodes.   

Notably, in this phylogenetic tree and the previous study, it could beobserved that H. xiaojinensis wassister to the Thitarodes. This couldbe traced back to the nomenclature of ghostmoth. Although foreign scholarschanged most Hepialus speciesdescribed by Chinese scientists to the Thitarodesin 1968, Thitarodes was stillreferred to as Hepialus until 2010 inChina (Zou et al., 2010). In the NCBIdatabases, the taxonomy of H. gonggaensishas been revised as T. gonggaensis,while there is no change to H. xiaojinensis.From this phylogenetic tree, it could be inferred that H. xiaojinensis should also be classified as a species of Thitarodes.

Fig 6. Phylogenetictree of 16 lepidopteran species and one outgroup. The phylogenetictree was inferred based on the nucleotide dataset of 13 PCGs using ML analysis (GTR+G model). Numbers above branchesspecify bootstrap percentages (500 replicates).

3.7 EvolutionaryRelationships Based on Genetic Distances

In the whole mt genome and allof the PCGs (except cox2), the K2P distance patterns ofthe host insect to the other 7 Hepialidae are similar.The three smallest K2P distances from the host insect are those to T.renzhiensis, T. gonggaensis and H. xiaojinensis, while thetwo greatest distances are those toE. signifer and N.hunanensis (Fig 7). This finding suggests a close relationship of this hostinsect to T. renzhiensis, T. gonggaensis and H. xiaojinensisand a lineage of E. signifer and N. hunanensisrelatively distinct from the host insect. This result is concordant with the evolutionaryrelationships inferred based on phylogenetic analysis.

Fig7.K2P genetic distances between Thitarodes sp.and   the other 7 sequenced Hepialidae for variousregions of the mt genome.

3.8 EvolutionaryRelationships with Intergenic Spacers

There are 15 intergenic (oroverlapped) spacers in the 8 analyzed Hepialoidea mt genomes (SupplementaryMaterial, Table S2). Nine are longer than 10 bp: trnI-trnQ, trnC-trnY,trnY-cox1, trnA-trnR, trnR-trnN, trnE-trnF,trnF-nad5, trnH-nad4 and trnS2-nad1.However, these long spacers are mostly distributed in E. signifier and N.hunanensis. There is only one spacer longer than 10 bp, and it is locatedbetween trnS2 and nad1 in the host insect of O. sinensis,in T. gonggaensis and T. renzhiensis (Supplementary Material,Table S2). Compared with the overall intergenic patterns among the Hepialoidea,this moth was most similar to T. gonggaensis and T. renzhiensis,with < 2 bp of difference distributed in 3 and 4 loci, respectively (Fig 8).The greatest difference was observed from N. hunanensis and E. signifier,with 9 and 7 loci containing 5 bp difference compared to the host insect ofO. sinensis, respectively (Fig 8). The most variable locus of N.hunanensis was observed to differ from the host insect of O. sinensisby up to 42 bp in length (Fig 8). These findings further support the conclusionthat this host insect of O. sinensis evolves as Thitarodes, whileE. signifer and N. hunanensis form as an independent lineage.   

Fig 8. The differences in the numbers of basesin various intergenic regions in 7 sequenced Hepialidaecompared with those in Thitarodes sp.


4.   Conclusions

In this study, the complete mt genome of the host cadaver(Thitarodes sp.) of O. sinensis, a circular molecule of 16,280bp in length, was determined using the combination of massively parallel DNAsequencing and PCR sequencing. The gene organization, base composition, andsecondary structure of tRNA are similar to those of the other lepidopteran mt genomes.The gene arrangement is identical to the ancestral arrangement but differentfrom those of other lepidopteran mt genomes due to the rearrangement of trnI, trnQ and trnM. The 1,473 bp AT-rich region located between the rrnS and trnI genes is the longest in a sequenced Thitarodes mt genome to date. Evolutionary analyses, including thephylogenetic tree, the genetic distances and the intergenic spacers in the mtgenome, showed that the host cadaver of O. sinensis belongs to Thitarodes, while Endoclita signifer and Napialus hunanensis form arelatively distinct lineage fromThitarodes. This is the first phylogenetic analysis to explore the placement of thehost insect of O. sinensis based on molecular information. The sequence and fullannotation of this moth mt genome will provide more molecular resources about Exoporiain the Lepidoptera, and the clarification of its phylogeny will benefit the managementof this insect resource, as well as the conservation and sustainable use ofthis endangered medicinal speciesinChina.

Author Contributions

DBL conceived this study. XCK analyzedthe data and drafted the manuscript. YQH participated in the data analysis and preparedfigures. JH and FW sequenced the mt genome. LQH participated in the dataanalysis. All authors have read and approved the final manuscript.

Conflict of Interest

The authors declare that they have noconflicts of interest.

Acknowledgments

This work was supported by the Programof International Science & Technology Cooperation of Ministry of Scienceand Technology (2013DFG32060). The funders had no role in the study design,data collection and analysis, decision to publish, or preparation of themanuscript.


References

Bernt, M., Donath, A., Juhling, F., Externbrink, F., Florentz,C., Fritzsch, G., Putz, J., Middendorf, M. and Stadler, P.F., 2013. MITOS:improved de novo metazoan mitochondrial genome annotation. Mol Phylogenet Evol69, 313-9.

Cameron, S.L. and Whiting, M.F., 2008. The completemitochondrial genome of the tobacco hornworm, Manduca sexta, (Insecta: Lepidoptera: Sphingidae), and anexamination of mitochondrial gene variability within butterflies and moths.Gene 408, 112-23.

Cao, Y.Q., Ma, C., Chen, J.Y. and Yang, D.R., 2012. Thecomplete mitochondrial genomes of two ghost moths, Thitarodes renzhiensis and Thitarodesyunnanensis: the ancestral gene arrangement in Lepidoptera. BMC Genomics13, 276.

Chen, S., Shi, P., Zhang, D., Lu, Z. and Li, L., 2015.Complete mitochondrial genome of Hepialusxiaojinensis (Lepidoptera: Hepialidae). Mitochondrial DNA, 1-2.

China, T.S.P.C.o.P., 2010. Pharmacopoeia of the People'sRepublic of China, China Medical Science Press, Beijing.

Coates, B.S., Sumerford, D.V., Hellmich, R.L. and Lewis, L.C.,2005. Partial mitochondrial genome sequences of Ostrinia nubilalis and Ostriniafurnicalis. Int J Biol Sci 1, 13-8.

Cong, Q. and Grishin, N.V., 2016. The complete mitochondrialgenome of Lerema accius and itsphylogenetic implications. PeerJ 4, e1546.

Dai, L., Qian, C., Zhang, C., Wang, L., Wei, G., Li, J., Zhu,B. and Liu, C., 2015. Characterization of the Complete Mitochondrial Genome of Cerura menciana and Comparison withOther Lepidopteran Insects. PLoS One 10, e0132951.

Feng, X., Liu, D.F., Wang, N.X., Zhu, C.D. and Jiang, G.F.,2010. The mitochondrial genome of the butterfly Papilio xuthus (Lepidoptera: Papilionidae) and related phylogeneticanalyses. Mol Biol Rep 37, 3877-88.

Ghikas, D.V., Kouvelis, V.N. and Typas, M.A., 2010. Phylogenetic andbiogeographic implications inferred by mitochondrial intergenic region analysesand ITS1-5.8S-ITS2 of the entomopathogenic fungi Beauveria bassiana and B.brongniartii. BMC Microbiol 10, 174.

Hao, J., Sun, Q., Zhao, H., Sun, X., Gai, Y. and Yang, Q.,2012. The Complete Mitochondrial Genome of Ctenoptilumvasava (Lepidoptera: Hesperiidae: Pyrginae) and Its PhylogeneticImplication. Comp Funct Genomics, 328049.

He, S.L., Zou, Y., Zhang, L.F., Ma, W.Q., Zhang, X.Y. and Yue,B.S., 2015. The Complete Mitochondrial Genome of the Beet Webworm, Spoladea recurvalis (Lepidoptera:Crambidae) and Its Phylogenetic Implications. PLoS One 10, e0129355.

Hebert, P.D., Cywinska, A., Ball, S.L. and deWaard, J.R., 2003. Biologicalidentifications through DNA barcodes. Proc Biol Sci 270, 313-21.

Kang, X.C., Liu, D.B., Xia, Z.L. and Chen, F., 2011.Comparison of genomic DNA extraction from Cordycepsmilitaris. Journal of Hunan Agricultural University (Natural Sciences) 37,147-149, 210.

Katoh, K. and Standley, D.M., 2013. MAFFT multiple sequencealignment software version 7: improvements in performance and usability. MolBiol Evol 30, 772-80.

Kim, I., Lee, E.M., Seol, K.Y., Yun, E.Y., Lee, Y.B., Hwang,J.S. and Jin, B.R., 2006. The mitochondrial genome of the Korean hairstreak, Coreana raphaelis (Lepidoptera:Lycaenidae). Insect Mol Biol 15, 217-25.

Kim, M.I., Baek, J.Y., Kim, M.J., Jeong, H.C., Kim, K.G., Bae,C.H., Han, Y.S., Jin, B.R. and Kim, I., 2009. Complete nucleotide sequence andorganization of the mitogenome of the red-spotted apollo butterfly, Parnassius bremeri (Lepidoptera:Papilionidae) and comparison with other lepidopteran insects. Mol Cells 28,347-63.

Krzywinski, J., Grushko, O.G. and Besansky, N.J., 2006.Analysis of the complete mitochondrial DNA from Anopheles funestus: an improved dipteran mitochondrial genomeannotation and a temporal dimension of mosquito evolution. Mol Phylogenet Evol39, 417-23.

Kumar, S., Stecher, G. and Tamura, K., 2016. MEGA7: MolecularEvolutionary Genetics Analysis Version 7.0 for Bigger Datasets. Mol Biol Evol33, 1870-4.

Levinson, G. and Gutman, G.A., 1987. Slipped-strandmispairing: a major mechanism for DNA sequence evolution. Mol Biol Evol 4,203-21.

Li, H. and Durbin, R., 2009. Fast and accurate short readalignment with Burrows-Wheeler transform. Bioinformatics 25, 1754-60.

Li, R., Zhu, H., Ruan, J., Qian, W., Fang, X., Shi, Z., Li,Y., Li, S., Shan, G., Kristiansen, K., Li, S., Yang, H., Wang, J. and Wang, J.,2010. De novo assembly of human genomes with massively parallel short readsequencing. Genome Res 20, 265-72.

Li, Y., Wang, X.L., Jiao, L., Jiang, Y., Li, H., Jiang, S.P.,Lhosumtseiring, N., Fu, S.Z., Dong, C.H., Zhan, Y. and Yao, Y.J., 2011. Asurvey of the geographic distribution of Ophiocordycepssinensis. J Microbiol 49, 913-9.

Liu, Q.N., Chai, X.Y., Bian, D.D., Zhou, C.L. and Tang, B.P.,2016. The complete mitochondrial genome of Plodia interpunctella (Lepidoptera: Pyralidae) andcomparison with other Pyraloidea insects. Genome 59, 37-49.

Lobry, J.R., 1996. Asymmetric substitution patterns in the twoDNA strands of bacteria. Mol Biol Evol 13, 660-5.

Moritz, C. and Brown, W.M., 1987. Tandem duplications inanimal mitochondrial DNAs: variation in incidence and gene content amonglizards. Proc Natl Acad Sci U S A 84, 7183-7.

Niu, F.F., Fan, X.L. and Wei, S.J., 2016. Completemitochondrial genome of the Grapholita dimorpha Komai (Lepidoptera:Tortricidae). Mitochondrial DNA 27, 775-6.

Ojala, D., Merkel, C., Gelfand, R. and Attardi, G., 1980. ThetRNA genes punctuate the reading of genetic information in human mitochondrialDNA. Cell 22, 393-403.

Ojala, D., Montoya, J. and Attardi, G., 1981. tRNA punctuationmodel of RNA processing in human mitochondria. Nature 290, 470-4.

Park, J.S., Kim, M.J., Jeong, S.Y., Kim, S.S. and Kim, I.,2016. Complete mitochondrial genomes of two gelechioids, Mesophleps albilinella and Dichomeris ustalella (Lepidoptera:Gelechiidae), with a description of gene rearrangement in Lepidoptera. Curr Genet.

Pegler, D.N., Yao, Y.J. and Li, Y., 1994. The Chinese'Caterpillar Fungus'. Mycologist 8, 3-5.

Roberti, M., Polosa, P.L., Bruni, F., Musicco, C., Gadaleta,M.N. and Cantatore, P., 2003. DmTTF, a novel mitochondrial transcriptiontermination factor that recognises two sequences of Drosophila melanogaster mitochondrial DNA. Nucleic Acids Res 31,1597-604.

Saccone, C., De Giorgi, C., Gissi, C., Pesole, G. and Reyes,A., 1999. Evolutionary genomics in Metazoa: the mitochondrial DNA as a modelsystem. Gene 238, 195-209.

Salvato, P., Simonato, M., Battisti, A. and Negrisolo, E.,2008. The complete mitochondrial genome of the bag-shelter moth Ochrogaster lunifer (Lepidoptera,Notodontidae). BMC Genomics 9, 331.

Shi, P., Lu, Z., He, Y., Chen, S., Qin, S., Qing, Y. and Yan,J., 2015a. Complete mitochondrial genome of Hepialusgonggaensis (Lepidoptera: Hepialidae), the host insect of Ophiocordycepssinensis. Mitochondrial DNA, 1-2.

Shi, Q.H., Sun, X.Y., Wang, Y.L., Hao, J.S. and Yang, Q.,2015b. Morphological characters are compatible withmitogenomic data in resolving the phylogeny of nymphalid butterflies(lepidoptera: papilionoidea: nymphalidae). PLoS One 10, e0124349.

Shrestha B., Zhang, W.M., Zhang Y.J., Liu X.Z., 2010. What isthe Chinese caterpillar fungus Ophiocordycepssinensis (Ophiocordycipitaceae)? Mycology 1, 228-36.

Taanman, J.W., 1999. The mitochondrial genome: structure,transcription, translation and replication. Biochim Biophys Acta 1410, 103-23.

Timmermans, M.J., Lees, D.C. and Simonsen, T.J., 2014. Towards a mitogenomicphylogeny of Lepidoptera. Mol Phylogenet Evol 79, 169-78.

van Asch, B., Blibech, I., Pereira-Castro, I., Rei, F.T. andda Costa, L.T., 2016. The mitochondrial genome of Prays oleae (Insecta: Lepidoptera:Praydidae). Mitochondrial DNA A DNA MappSeq Anal 27, 2108-9.

Wang, G.D., 1995. CordycepsSpecies: Ecology, Cultivation and Application, Science and Technology ReferenePress, Beijing.

Wang, Q., Zhang, Z. and Tang, G., 2016. The mitochondrialgenome of Atrijuglans hetaohei Yang (Lepidoptera:Gelechioidea) and related phylogenetic analyses. Gene 581, 66-74.

Wu, Y.P., Li, J., Zhao, J.L., Su, T.J., Luo, A.R., Fan, R.J.,Chen, M.C., Wu, C.S. and Zhu, C.D., 2012. The complete mitochondrial genome ofthe rice moth, Corcyra cephalonica. JInsect Sci 12, 72.

Yi, J., Que, S., Xin, T., Xia, B. and Zou, Z., 2016. Completemitochondrial genome of Thitarodes pui(Lepidoptera: Hepialidae). Mitochondrial DNA 27, 109-10.

Yin, Y., Qu, F., Yang, Z., Zhang, X. and Yue, B., 2014.Structural characteristics and phylogenetic analysis of the mitochondrialgenome of the rice leafroller, Cnaphalocrocismedinalis (Lepidoptera: Crambidae). Mol Biol Rep 41, 1109-16.

Zhang, Y., Guo, L., Zhang, S., Liu, X., 2015. Determiningmitochondrial molecular markers suitable for genetic diversity analysis of Cordyceps militaris. Wei Sheng Wu XueBao 55(7): 826-33.

Zhou, X.W., Li, L.J. and Tian E.W., 2014. Advances in researchof the artificial cultivation of Ophiocordycepssinensis in China. Crit Rev Biotechnol 34, 233-43.

Zhu, H.Y., M.J., 2006. In Crisis and Break through of China'sEnvironment, Beijing, China.

Zhu, J.S., Halpern, G.M. and Jones, K., 1998. The scientificrediscovery of an ancient Chinese herbal medicine: Cordyceps sinensis: part I. J Altern Complement Med 4, 289-303.

Zou, Z.W., Liu, X. and Zhang, G.R., 2010. Revision oftaxonomic system of the genus Hepialus (Lepidoptera, Hepialidae) currentlyadopted in China. Journal of Hunan University of Science & Technology(Natural Science Edition) 25, 114-20.

Supplementary Materials

Table S1.   The list of species analyzed in this study

Table S2.   The lengths of theintergenic/overlapping regions in reported Hepialidae