Journal of Big Data in Agriculture | Genome-wide identification and expression analysis of WRKY gene family in five legume species

Abstracts:

In order to enhance the understanding of the diversity and evolutionary aspects of WRKY genes in legumes, we explored the functions and applications of WRKY transcription factor family members in breeding. In this study, the WRKY genes of five legumes, soybean, chickpea, kidney bean, thistles alfalfa and bermudagrass, were analyzed in terms of chromosomal localization, phylogeny, conserved motifs, gene structure and gene duplication by using bioinformatics methods. 185, 61, 90, 108 and 83 WRKY genes were identified from the above five legumes, respectively. The five legume WRKY proteins were classified into three classes and five subclasses, and the same subclasses had similar protein and gene structures.Gene duplication events existed in the WRKY gene family members of the five legume species, and they were significantly differentially expressed in various tissues.The expression patterns of the WRKY genes in different tissues suggested that they might play important roles in the growth and development of legumes.

1 Introduction.

WRKY is one of the largest families of transcription factors in plants, named for its possession of a special conserved heptapeptide sequence, WRKYGOK [1].Ishiguro et al[2] identified the first plant WRKY gene in sweet potato (Ipomoea batatas), named Sweet Potato Factor1 (SPF1).Zhao et al[3] showed that the WRKY structural domain is mainly composed of the highly conserved WRKYGQK heptapeptide at the N-terminus and the zinc finger motif (C2H2 or C2HC) at the C-terminus. Based on the number of WRKY structural domains and the type of zinc finger motifs, WRKY transcription factors are categorized into three groups (I, II, and III), with members of group I usually containing two WRKY structural domains, and both groups II and III containing one WRKY structural domain and one zinc finger structural motif (C2H2 or C2HC) [1]. Most plant WRKY transcription factors have more members of group II, and they are subdivided into five subgroups based on other conserved structural motifs: IIa, IIb, IIc, IId, and IIe [4].Group IIa WRKY genes are widely involved in the regulation of a wide range of physiological processes; group IIc WRKY genes can resist salt stress by modulating the synthesis of ABA signaling; group IId WRKY genes can interact with Ga2+ proteins to help plants resist salt stress [5-7].
WRKY genes are widely involved in plant development and abiotic stress response [1]. Pillai et al. [8] found that the average weight (in terms of total fresh weight) of rice (Oryza sativa) OsWRKY42 transgenic lines was significantly higher than that of the wild-type plants after salt stress treatment, and overexpression of the OsWRKY42 gene enhanced the salt tolerance of the transgenic plants. Most WRKY transcription factors function by binding to specific promoters. For example, Jiang et al [9] found that Populus alba PalWRKY77 negatively regulated the ABA signaling pathway by binding to the W-box in the promoters of the PalNAC002 and PalRD26 genes, and that CsWRKY4 and CsWRKY28 could directly bind to the CsNIP promoter [10], which in turn affects the plant tolerance to water stress.Lim et al [11] found that OsWRKY5 can directly bind to the W-box element in the OsMYB2 promoter region and inhibit the expression of OsMYB2, thus down-regulating the genes downstream of OsMYB2 in the ABA signaling pathway, and is a negative regulator of ABA-induced drought stress tolerance.The WRKY proteins can also interact with their own family proteins or other family proteins for interaction [12-13]. Meanwhile, using yeast two-hybrid technology, Arabidopsis (Arabidopsis thaliana) AtWRKY10 was found to interact with AtVQ14, which is involved in the seed development process [14]. In addition, there is an antagonistic interaction between WRKY and VQ, as shown by chromatin immunoprecipitation (ChIP) analysis, WRKY8 directly binds to the promoter of RD29A under salt conditions, and the VQ9 protein can act as a deterrent to the WRKY8 factor in order to maintain the balance of the WRKY8-mediated signaling pathway, thus establishing salt stress tolerance [15].
Legumes are one of the families with the largest number of species among angiosperms, and are also one of the most important sources of starch, oil and dietary fiber in human food, and are important cash and feed crops worldwide [16]. With the development of genomics, the WRKY gene family has been mined in a variety of plants, e.g., 109, 72, 176 and 171 WRKY proteins have been identified in rice, Arabidopsis thaliana, soybean and wheat (Triticum aestivum), respectively [17-20]. However, a comprehensive and systematic study of WRKY family members in a wide range of legumes has not yet been conducted. In this study, five legume species, namely soybean (Glycine max), chickpea (Cicer arietinum), kidney bean (Phaseolus vulgaris), peperomia (Lotus japonicus) and thistledown alfalfa (Medicago truncatula), were selected for detailed bioinformatics analyses of their WRKY gene families, including gene identification, gene analysis, gene characterization, and gene sequencing. The WRKY gene family of these five legumes was analyzed by detailed bioinformatics analysis, including gene identification, gene structure, gene duplication, chromosomal localization, protein interactions network and miRNA prediction, etc., to provide theoretical support for further in-depth investigation of the biological functions of WRKY transcription factors.
2 Principles and methods

2.1 Identification of WRKY genes
Protein data and gene sequence information of five legumes were obtained from phytozome and lotus base databases. The conserved WRKY structural domains in each candidate WRKY protein were confirmed by searching the HMM software search (login number PF03634) from the Pfam protein family database (http://pfam.sanger.ac.uk) using the Hidden Markov Model [21]. Multiple known WRKY families were compared with data from five legume species to identify WRKY members in five legume species. The molecular weight (MW), isoelectric point (pI) and length of each WRKY protein were predicted by the online ExPASy program (http://www.expasy.org/tools/). A genome-wide identification and expression analysis dataset of five legume WRKY gene families was constructed based on these treatments [22].

2.2 Phylogenetic analysis method
Claustal X 2.1 was utilized to compare the WRKY gene sequences of five legume species to study the phylogenetic relationships of their WRKY gene families [23]. The phylogenetic tree among Arabidopsis thaliana, soybean, chickpea, kidney bean, thymus root and thistles alfalfa was constructed using the neighbor-joining method with MEGA7 program [24].

2.3 Multiplex matching, conserved motifs and gene structure analysis of WRKY genes
Genetic structures of the WRKY family, such as exons and introns, were extracted from the genome annotation information of the five legumes and submitted to the GSDS web tool for analysis and presentation. The WRKY protein sequences of the five legumes were predicted and analyzed using the MEME online program to retrieve the maximum motif number of 15, and then the phylogenetic tree, motifs, introns, and exons were visualized and analyzed using the TBtools software. The downloaded genome annotation files were submitted to TBtools software for visualization and analysis of gene structure.

2.4 Gene duplication, covariance analysis and Ka and Ks calculation of WRKY genes
Circos [25] was utilized to locate the physical position of WRKY genes on the chromosomes. Synteny analysis between legume genomes was performed locally, using the national blast online comparison platform (https://ngdc.cncb.ac.cn/blast/) to search for potential homologous gene pairs (E<1e-5, first 5 pairs paired). The genomes of five legume species were analyzed for interspecific colinearity using MCScanX technology. Using TBtools technology, the interspecific covariance of WRKY gene family members was plotted for the five legume genomes.Ka / Ks can be used to evaluate the history of selection or the timing of differentiation [26]. We used KaKs Calculator 2.0 and the NG method to calculate the number of non-synonymous (Ka) and synonymous (Ks) substitutions in WRKY genes [27].

2.5 Analysis of cis-elements of WRKY promoter
Cis-element regulation of the WRKY promoter was analyzed to further understand the WRKY transcription factors. The WRKY genes of five legumes were extracted from the genome-wide database, 1500 bp upstream of the promoter cis-regulatory element start codon, and then searched for the cis-elements of the promoters using the PlantCARE database (http://bioinformatics.psb.ugent.be/webtools/plantcare/html/). promoter cis-elements, and analyzed the type, number and function of cis-acting elements [28]. Finally, cis-acting elements in promoters were visualized using TBtools software [29].

2.6 WRKY protein interaction network and target gene network of miR5658 and miR5021
We searched for mature miR5658 and miR5021 sequences using the miRBase database (http://www.mirbase.org/), analyzed the WRKY coding sequences using psRNAT (http://plantgrn.noble.org/psRNATarget) and predicted the targets of miR5021 and mi5658 as WRKY genes. To explore the sequences of WRKY proteins, we selected the protein sequences of 50 WRKY transcription factors in the genome databases of several legumes and localized them to the AtWRKY proteins in the Arabidopsis Information Resource database. We then used the PAIR website (http://www.cls.zju.edu.cn/pair/) to predict interactions between WRKYs and other proteins and mapped the interaction networks in Cytoscape 3.3.0 [30].

2.7 Gene expression analysis of WRKYs
The database was obtained from NCBI under the accession number GSE57252.The transcriptome data of chickpea and thistle alfalfa were obtained from NCBI-SRA (http://www.ncbi.nlm.nih.gov) under the accession numbers PRJNA413872 and PRJNA80163, respectively.Using the software TopHat v2.1.0 [31] to map the transcriptome sequencing return sequence pairs to the chickpea and thistles alfalfa genome sequences. Cufflinks v2.1.1 [32] was used to estimate the abundance of reads mapped to genes by calculating fragment per kilobase of transcript (FPKM) values. The transcriptome data of different tissues of soybean and kidney bean were obtained in the annotation files of Phytozome v12.1 database, and the transcriptome data of different tissues of Bacopa monnieri were obtained in Lotus Base (https://lotus.au.dk/expat/). The FPKM of WRKY genes was extracted using Perl program. heatmaps showing expression profiles were generated using log10 transformed FPKM values, and heatmaps and k-means clustering were generated using the ggplot2 package in R language [33].

3 Data Acquisition and Analysis

3.1 Protein analysis and identification of WRKY genes in legumes
We constructed a chromosomal localization map to illustrate the location of WRKY genes on each chromosome, and identified 527 WRKY genes by a combination of HMMER analysis and blast, of which 185 were from soybean, 61 were from chickpea, 90 were from kidney bean, 108 were from thistles alfalfa, and 83 were from pachypodium. The WRKY family members of five legumes (soybean, chickpea, kidney bean, tribulus alfalfa, and thymus root) were identified based on genome-wide protein sequence information. EXTENSIVE PREDICTIONS The 527 WRKY proteins have different physical and chemical properties; molecular weights range from 6413.2 (MtWRKY43) to 155716.32 (GmWRKY46) kDa, and isoelectric points range from 4.62 (LjWRKY31) to 10.11 (CaWRKY41) [22].

3.2 Phylogenetic analysis of WRKY genes in legumes
To further investigate the phylogenetic relationships of the five legume WRKY gene families as well as Arabidopsis thaliana, the WRKY proteins were categorized into three classes and seven subclasses (I, IIa, IIb, IIc, IId, IIe, and III) [22]. 128 WRKY genes belonged to class I, 456 WRKY genes belonged to class II, and 15 WRKY genes belonged to class III (Figure 1 ).

Among the five legume species, I, IIa, IId, IIe and III members were found to be mosaic in other groups or subgroups and the largest proportion of class II gene family members were found in soybean: 77.30% (143/185), chickpea: 77.05% (47/61), kidney bean: 76.67% (69/90), tribulus terrestris clover: 70.37 % (76/108) and Thymus vulgaris: 79.52 % (66/83); Genetic family I was second only to Genetic family II, Soybean: 21.62 % (40/185), Chickpea: 22.95 % (14/61), Vegetable bean: 22.22 % (20/90), Tribulus terrestris alfalfa: 19.44 % (21/108) and Thymus vulgaris: 18.07 % (15/83); The smallest percentage of gene family members were in class III, soybean: 1.08% (2/185), chickpea: 0 (0/61), kidney bean: 1.11% (1/90), tribulus alfalfa: 10.19% (11/108) and thymus root: 2.41% (2/83).
3.3 Conserved structural domains, gene structure and motif analysis
A total of 15 motifs (motif1-motif15) were identified in all legume WRKY gene family members. Among them, motif1 and motif2 are present in almost all WRKY genes and are possessed by some members of class I (motif1/3/6/13/15); all 15 motifs are possessed by some members of class II; and some members of class III (motif1/3/7/11/13/15) (Fig. 2). Most class I family members have shorter WRKY protein sequences and relatively more conserved motif compositions compared to the other subfamilies.

In our study of conserved structural domains, we found that exon-intron patterns are also important for the different functions of WRKY. Structural analysis of the genes showed that all five legume WRKY species consisted of different numbers of exon-introns. gene families I, II, and III contained 2-4, 2-8, and 2-3 exons, respectively. 88 out of 527 genes had no introns, and 63 genes had only one intron. members of subfamily II had a longer intronic region than the other subfamilies, with good structural diversity and non-conserved (Figure 2). good and not conserved (Fig. 2).

3.4 Chromosome localization and gene duplication analysis of five legume species
In five legume species, WRKY genes were unequally distributed on chromosomes. Soybean, chickpea, kidney bean, thistles alfalfa and thymus root WRKY genes were distributed on their chromosomes 20, 7, 11, 8 and 6, respectively, with soybean chromosome 6 containing the highest number of WRKY genes, and kidney bean chromosome 11 containing the lowest number of WRKY genes. Although gene duplication events are present in WRKY gene family members, they rarely form gene clusters. In the present study, 22, 5, 10, 15 and 3 gene clusters comprising a total of 130 genes were identified in the WRKY genes of soybean, chickpea, kidney bean, tribulus alfalfa and thymus root, respectively (Fig. 3).

A total of 116 gene duplication events were detected in five legume WRKY genes (81 in soybean, 5 in chickpea, 10 in kidney bean, 15 in thistledown alfalfa, and 5 in thymus root), involving 195 genes and accounting for 37% of all WRKY genes. Among them, 122 genes belonged to segmental duplication and 130 genes belonged to tandem duplication, suggesting that the formation of these WRKY genes in the five legumes may not be mainly dependent on gene duplication to complete. By calculating the Ka/Ks values of gene pairs with replication relationships, it was found that the Ka/Ks values of the five legume species were in the range of 0 to 0.5, all of which were less than 1, suggesting that these gene pairs underwent strong purification selection.
3.5 Evolutionary analysis of WRKY genes in five legume species
In order to further infer the phylogenetic mechanism of the WRKY family in legumes, we performed covariance analysis of WRKY genes in five dicotyledonous plants, and the results speculated that there were covariant gene pairs in tribulus terrestris/chickpea (117 pairs), chickpea/soybean (338 pairs), soybean/vegetable bean (954 pairs), and vetiver/bacopaillus (114 pairs). Among them, the highest homology was observed in soybean (homologous gene pairs were mainly distributed on 14 chromosomes) and kidney bean (homologous gene pairs were mainly distributed on 9 chromosomes). Next, soybean and chickpea (homologous gene pairs mainly distributed on 6 chromosomes except chromosome 2) had high homology (Fig. 4).

3.6 Cis-acting regulatory elements in the WRKY gene promoter
Promoter sequences were analyzed using PlantCARE to identify cis-acting regulatory elements within 1500 bp upstream of the WRKY start codon. In this study, a total of 327 genes in five legume species were involved in cis-acting elements of the ABRE (ABA) response; 318 genes were associated with light-responsive elements; 175 genes with gibberellin-responsive elements; 85 genes with growth hormone-responsive elements; 24 genes with damage-responsive elements; and eight genes were involved in endosperm-specific expression processes. In addition, 320 MYB binding sites can be involved in drought-induced, light-responsive, and flavonoid biosynthesis gene regulation [22].

3.7 Analysis of WRKY gene regulatory networks in multiple legume species
Among these five legumes, we found that 13 genes contained miR5021 recognition sites and 15 genes contained miR5658 recognition sites. Comparison of multiple miR5021 and miR5658 mature sequences revealed that the miR5021/5658-WRKY regulatory module is highly conserved among the five legume species. Although there was a mismatch between miR5021 and miR5658 targeting WRKY mRNAs, the core target sequences were more conserved and played a regulatory role in the WRKY gene family (Fig. 5).

In addition, we found that the WRKY gene and other proteins were fully localized to the WRKY gene and other proteins of Arabidopsis in all five legume species. The interactions between WRKY genes and other proteins were predicted using the PAIR tool. These WRKY genes are thought to interact with different proteins, including mitogen-activated protein kinase (MKK), transcription factor (MYB), nucleoside diphosphate kinase (NDPK), 1-aminocyclopropane-1-carboxylate synthase (ACS) and other proteins. In addition, we found that some WRKY proteins interact with each other, such as WRKY25/WRKY48(MtWRKY55), WRKY33(LjWRKY20/37/43)/WRKY48(MtWRKY55), WRKY72(GmWRKY78)/WRKY9, etc. (Figure 5).

3.8 Tissue specific expression of WRKY genes
Several tissues of soybean, chickpea, kidney bean, thistles alfalfa and larkspur, respectively, were studied and it was shown that class I family members (PvWRKY60 and GmWRKY177) were more highly expressed in flowers, MtWRKY49 in roots, (GmWRKY47/58/82) in leaves and the rest of the members were less highly expressed in various tissues; class II family members (PvWRKY4/29/38) were highly expressed in 10 cm roots, (PvWRKY4/29/83) were more highly expressed in 10 cm stems, (PvWRKY66, MtWRKY50/82, GmWRKY88) were all highly expressed in flowers, (MtWRKY1/19/50, LjWRKY68, GmWRKY42/ 45/81/123, CaWRKY8/56) were all highly expressed in roots, (PvWRKY29, MtWRKY41/50, CaWRKY43) were all highly expressed in rhizomes, (MtWRKY82 and CmWRKY100) were highly expressed in leaves, and MtWRKY82 was also highly expressed in cardiac cortex, while the remaining members showed low expression in various tissues; Class III family members were barely expressed in various tissues except for MtWRKY50, which was highly expressed in flowers (Fig. 6).

4 Results and Discussion

WRKY transcription factors have been identified in many plants such as Arabidopsis thaliana [18], rice [17], wheat [20], potato [34], vaccinium [35], mung bean [36] and barley [37]. It has been shown to affect plant growth and development, as well as to have an important role in the response to biotic and abiotic stresses [37]. However, previous studies have been based on evolutionary analyses of the composition of WRKY gene families in individual species, and few have analyzed and compared WRKY gene families among different species. In the present study, 185, 61, 90, 108 and 83 genes containing WRKY motifs were identified in soybean, chickpea, kidney bean, thistles alfalfa and bermudagrass, respectively. These WRKY genes can be categorized into seven subfamilies: I, IIa, IIb, IIc, IId, IIe, and III similar to the classification of WRKY genes in Arabidopsis and rice [38,18]. In our study class II had the highest number of members, class I was second only to class II, and class III had the least diversity.

Although the structural domains of the WRKY gene family are highly conserved in legumes, substitutions occur in their amino acid sequences, e.g., WRKYGKK in maize and soybean [1].There are 19 WRKY structural domain variants in the WRKY family members, of which WRKYGEK and WRKYGKK are the two common variants shared by seven and five structural domains, and other variants include WRICGQK, WRMCGQK, WKKYGQK, WIKYGQK, WKRYGQK, WSKYEQK and WRKYSEK [39]. WRKYs with similar motif composition and gene structure were clustered together by phylogenetic analysis. We found that CaWRKY genes were always clustered with MtWRKY genes. These results suggest that CaWRKY and MtWRKY genes are evolutionarily more closely related than other legumes. In addition, the same situation exists between GmWRKY and PvWRKY genes and between MtWRKY and AtWRKY genes.

Analysis of gene duplication in the WRKY gene family showed that segmental duplication had a greater effect on WRKY gene amplification than tandem duplication in the five legume species. These results were similar to those observed in peanut and purple peppercorn, indicating that the expansion of the plant WRKY gene family is mainly dependent on segmental duplication [40]. WRKY genes may be unevenly distributed throughout the genome and chromosomes as found in monocotyledonous crops, and in legumes, so we hypothesized that this phenomenon may be related to segmental duplications [41-42].Wang et al [42] proposed that soybean is a diploid, which has undergone at least one whole-genome doubling event during its evolutionary process. In this study, we found that soybean had the most gene family members and fragment duplication events compared to the other four legumes. Therefore, we hypothesized that these phenomena were caused by genome polyploidy in soybean.

Protein interaction network analysis revealed that the homologous genes of WRKY33, LjWRKY20/37/43, all contain MYB-binding sites and can interact with the MYB51 protein.Tao et al [43] found that WRKY33 induces the expression of MYB51 and CYP83B1 and promotes the biosynthesis of the precursor of 4MI3G, I3G, which inhibits oilseed rape streptococcus formation. MPK3 protein is considered to be an important component of the biological clock in the model plant Arabidopsis thaliana, and Sheikh et al [44] found that MPK3 phosphorylates WRKY46, leading to the activation of WRKY transcription factors, which positively regulate the expression of defense genes. In the protein interaction network, six WRKY proteins interacted with MPK3 proteins. We hypothesized that MPK3 may phosphorylate these six WRKY transcription factors. In addition, different WRKY members have different hormone-related elements, such as ABRE (ABA)/TGA element (growth hormone), GARE motif/P-box (gibberellin), and TCA element (Figure S2). These results suggest that WRKY genes may also play important roles in plant signaling regulatory pathways.

In this study, we found that GmWRKY42/45/81/123 genes were highly expressed in roots, Zhou et al [45] confirmed that WRKY genes were indeed highly expressed in roots by RT-PCR experiments in the study, Wu et al [46] found that WRKY genes in bamboo were also significantly up-regulated in transgenic roots that encountered stress.Wu et al [47] found that PvWRKY35 expression was significantly up-regulated in both rhizomes Khandal et al [48] used a transgenic approach and found that CaWRKY31, a homologue of PvWRKY29, drove the expression of CaCKX6 resulting in the expansion of the root system of T4 transgenic chickpea plants and the increase of CKX activity in the root system.

In summary, in this study, 185, 61, 90, 108 and 83 WRKY genes were identified from the genomes of soybean, chickpea, kidney bean, thistle alfalfa and bermudagrass by means of bioinformatics, and analyzed for gene identification, gene structure, gene duplication, chromosomal localization, protein interactions network, and miRNA prediction, which laid the foundation for analyzing the functions of the WRKY gene family. This lays the foundation for analyzing the functions of WRKY gene family, which can provide reference for genetic breeding work.

Leave a Reply

Your email address will not be published. Required fields are marked *