Alternative promoter and GATA5 transcripts in mouse

Bohao Chen, Elena Yates, Yong Huang, Paul Kogut, Lan Ma, Jerrold R. Turner, Yun Tao, Blanca Camoretti-Mercado, Deborah Lang, Eric C. Svensson, Joe G. N. Garcia, Peter J. Gruber, Edward E. Morrisey, Julian Solway


GATA5 is a member of the GATA zinc finger transcription factor family involved in tissue-specific transcriptional regulation during cell differentiation and embryogenesis. Previous reports indicate that null mutation of the zebrafish GATA5 gene results in embryonic lethality, whereas deletion of exon 1 from the mouse GATA5 gene causes only derangement of female urogenital development. Here, we have identified an alternate promoter within intron 1 of the mouse GATA5 gene that transcribes a 2.5-kb mRNA that lacks exon 1 entirely but includes 82 bp from intron 1 and all of exons 2–6. The alternative promoter was active during transient transfection in cultured airway myocytes and bronchial epithelial cells, and it drove reporter gene expression in gastric epithelial cells in transgenic mice. The 2.5-kb alternative transcript encodes an NH2-terminally truncated “short GATA5” comprising aa 226–404 with a single zinc finger, which retains ability to transactivate the atrial natriuretic factor promoter (albeit less efficiently than full-length GATA5). Another new GATA5 transcript contains all of exons 1–5 and the 5′ portion of exon 6 but lacks the terminal 1143 bp of the 3′-untranslated region from exon 6. These findings extend current understanding of the tissue distribution of GATA5 expression and suggests that GATA5 expression and function are more complex than previously appreciated.

  • differentiation
  • airway
  • gut
  • epithelium

the gata transcription factors contain a highly conserved DNA binding domain consisting of two zinc fingers that mediate binding to the sequence (A/T)GATA(A/G). These factors have been divided into two subfamilies (GATA1, 2, and 3, and GATA4, 5, and 6) based on their expression patterns and amino acid sequence homologies. GATA1, 2, and 3 genes are prominently expressed in hematopoietic cells and ectoderm derivatives, whereas GATA4, 5, and 6 genes are expressed in various mesoderm- and endoderm-derived tissues (19, 26).

GATA5 temporal and spatial expression patterns suggest its involvement in tissue-specific transcriptional regulation during cell differentiation and embryogenesis. In zebrafish, GATA5 expression is detected by 4.3 hours postfertilization (hpf) in the yolk syncytial layer and at 9 hpf in endoderm and mesoderm. A critical role of GATA5 in heart development was demonstrated by the discovery that null mutation of the GATA5 gene (faust) in zebrafish results in embryonic lethality due to defects in endocardial and myocardial differentiation migration (10, 28). Xenopus GATA5 is expressed in the yolk-rich vegetal cells of embryos from the early gastrula stage onward and in the subblastoporal endoderm during midgastrula stages, revealing an important role for GATA5 in endoderm differentiation (32). In chick, GATA5 is transcribed at its highest levels in a zone of epithelial cells apical of the progenitor crypt cells, suggesting a function of GATA5 in regulation of terminal differentiation (7). During mouse development, GATA5 is expressed initially in the precardiac mesoderm between E7 and E8 and continues throughout the heart until E16.5. Beginning at midgestation, mouse GATA5 is also expressed within pulmonary mesenchyme, as well as in the urogenital ridge, in epithelial cells lining the urogenital sinus, in the bladder, and in the gut epithelium. Postnatally, GATA5 expression becomes markedly upregulated in the intestine, stomach, lungs, bladder, and endocardium, but not in myocardium (14, 19, 21, 23, 27). The mutant GATA5 (faust) zebrafish exhibits a similar phenotype to that of GATA4-null mice (cardia bifida). In contrast, GATA5 exon 1-deleted mice lived to adulthood, but females exhibited genitourinary abnormalities (20). These findings suggest two nonmutually exclusive possibilities: 1) unlike GATA4 or GATA6 deletion, GATA5 gene deletion does not result in embryonic lethality in mice, and/or 2) there is an alternative GATA5 transcript that excludes exon 1 but that nonetheless encodes a functional GATA5 variant that fulfills some of the roles of full-length GATA5.

Here, we addressed the latter possibility. A previous study had demonstrated that two distinct isoforms of chicken GATA5 are expressed from two alternative promoters, which transcribed mRNAs that included different exons upstream of those encoding most of the full-length protein (15). One of these mRNAs encoded an NH2-terminally truncated GATA5 protein with one zinc finger that still bound DNA and exhibited GATA transcription-promoting activity. Mouse GATA1 (12), human and mouse GATA2 (18, 25), human and mouse GATA3 (1), and human and mouse GATA6 (3) genes all possess two promoters and two initiation codons. We therefore hypothesized that the mouse GATA5 gene might share this feature with other members of the GATA family, employing distinct transcripts to regulate downstream tissue-specific gene expression.

In this study, we identified two additional GATA5 transcripts in mouse using 5′ RNA ligase mediated rapid amplification of cDNA ends (RLM RACE), 3′ RACE, RT-PCR, and Northern blot. One transcript (2.5 kb) begins within intron 1 at 82 bp upstream of exon 2 and includes exons 2–6 in their entireties. Another new GATA5 transcript contains all of exons 1–5 and the 5′ portion of exon 6 but lacks the terminal 1143 bp of the 3′-untranslated region (UTR) from exon 6. A novel promoter region located at GATA5 genomic sequence bp +890 to +2312 (relative to the previously known transcription start site) directs expression of the 2.5-kb GATA5 transcript initiated in intron 1. This alternative transcript encodes an NH2-terminally truncated short GATA5 isoform that retains transcription-promoting activity. As such, GATA5 expression and function are more complex than previously appreciated.


Northern analysis.

Northern blots of polyA+ RNA from a variety of mouse tissues (OriGene, Rockville, MD) and of total RNA from intestinal mucosa of wild-type C57B6l/6J mice were used to identify transcripts of GATA5. Total RNA was isolated by using the TotalRNA kit (Ambion, Austin, TX), electrophoresed on a formaldehyde 1.2% agarose gel, and transferred to a positively charged nylon membrane that was then cut into three pieces for hybridization with three nonoverlapping GATA5 cDNA probes. A 5′-UTR cDNA probe contained the first 350 bp of the previously known GATA5 exon 1, whereas a 3′-UTR cDNA probe (full-length transcript bp 2232–2712) was derived exclusively from GATA5 exon 6. An exon 2–6 cDNA probe spanned 578 bp corresponding to previously known GATA5 exons 2–6 (full-length transcript bp 968–1542). Each probe was labeled with [32P]α-dCTP by using a random primer labeling kit (Ambion) and purified with ProbeQuant G-50 Micro Columns (GE Healthcare, Piscataway, NJ). Northern hybridization was carried out at 42°C by using NorthernMax (Ambion) according to manufacturer's instructions. Blots were washed at high stringency and exposed to film overnight at −80°C.

5′ RLM-RACE and 3′ RACE.

5′ RLM-RACE was employed to identify the 5′ end of mouse GATA5 transcripts following the manufacturer's manual (FirstChoice RLM-RACE kit, Ambion, Austin, TX). Briefly, 10 μg of total RNA from mouse stomach, lung, or heart was dephosphorylated to remove the 5′-phosphate group from RNA or contaminating DNA molecules. Tobacco acid pyrophosphatase was then used to specifically remove the cap structure from mRNA. An RNA oligonucleotide was next ligated to newly decapped mRNA by use of T4 RNA ligase, and the resulting RNA was reverse transcribed with SuperScript III (Invitrogen, Carlsbad, CA) and a GATA5 gene-specific primer 5′-AGGCAAAGTCTTCAGGTTCG-3′ that maps to exon 6. PCR amplification was performed with Takara (Madison, WI) Taq polymerase and GC buffer with 5′ RACE Outer Primer 5′-GCTGATGGCGATGAATGAACACTG-3′ and a GATA5 gene-specific primer 5′-GAGAAGAGGCTGTGGTGTTTGC-3′ that maps to exon 5. Nested PCR was done with 5′ RACE Inner Primer 5′-CGCGGATCCGAACACTGCGTTTGCTGGCTTTGATG-3′ and a different GATA5 gene-specific primer 5′-AGTATGGCAGTTGGAGCAGCATAG-3′ that maps to exon 3. For 3′ RACE, first-strand cDNA was synthesized from total RNAs of mouse stomach by using Takara oligo(dT)-3 sites adaptor primer, and PCR amplification was performed by using Takara Taq polymerase and GC buffer with Takara 3 sites adaptor primer 5′-CTGATCTAGAGGTACCGGATCC-3′ and a sense primer 5′-AGTATGGCAGTTGGAGCAGCATAG-3′ within exon 3. The PCR amplification protocol was 94°C for 180 s; 40 cycles of 94°C for 30 s, 60°C for 30 s, and 72°C for 150 s; followed by extension at 72°C for 300 s in an Applied Biosystems (Foster City, CA) GeneAmp PCR System 9700. PCR products were purified with a QIAquick Gel Extraction Kit (Qiagen, Valencia, CA) and subcloned into pCR3.1-TOPO vector (Invitrogen). Ten independent clones were sequenced for each RACE-PCR reaction in the University of Chicago Sequencing Facility.


Reverse transcription was performed using SuperScript First-Strand Synthesis System (Invitrogen) according to the manufacturer's instructions, using 2.5 μg total RNA from stomach, lung, or heart of C57BL/6J mice and GATA5 gene-specific primer 5′-AACAAAACAAAATAACAACAAC-3′, corresponding to the last 22 nucleotides of GATA5 mRNA 3′-UTR. Takara Taq polymerase and GC buffer were used for PCR amplification of GATA5 cDNA with sense primer 5′-AAAAGGAGGCCAAGCATAGCAGAC-3′, which corresponds to intron 1 immediately upstream of exon 2 and antisense primer 5′-ATCTGCCTTGTGCCTTACACTGTGGC-3′ within the exon 6 3′-UTR. The PCR amplification protocol was as above. PCR products were purified with QIAquick Gel Extraction Kit and were subcloned into pCR3.1-TOPO vector. Ten independent clones were sequenced for each PCR reaction. To assess the tissue distribution of mRNAs encoding full-length or short GATA5, we isolated DNA-free total RNA from various tissues of C57Bl/6J mice using the RNeasy kit (Qiagen) and using 0.5 μg RNA/tissue we reverse transcribed cDNAs with the iScript cDNA Synthesis Kit (Bio-Rad Laboratories, Hercules, CA). Sequence corresponding to 494 bp of full-length GATA5 cDNA was PCR amplified by use of primers that map to exon 1 (5′-AGCCTTCGACAGCA-GCATC-3′) and exon 5 (listed above), and sequence corresponding to 303 bp of the short GATA5 cDNA was amplified by using the intron 1 and exon 3 primers listed above. A 159-bp fragment of the β-actin cDNA was also amplified by use of forward primer 5′-TTGCTGACAGGATGCAGAAGGAGA-3′ and reverse primer 5′-ACTCCTGCTTGCTGATCCACATCT-3′. The PCR protocol was as listed above, except that 35 amplification cycles were used.

Alternative GATA5 promoter-luciferase reporter plasmids.

The pGL3-basic vector (Promega, Madison, WI) was used to create promoter-reporter plasmids to measure potential alternative GATA5 promoter activities. A series of DNA fragments comprising bp +890, +1348, or +1895 to +2312 of the mouse GATA5 gene (numbering is relative to the previously known GATA5 transcription start site) were generated by PCR amplification and inserted into MluI/NheI sites upstream of the firefly luciferase gene. Sense primers were 5′-CACACGCGTAACTAAGGCGCGCAGCAATAAACC-3′ (corresponding to bp +890 to +913); 5′-CACACGCGTACCACAGCCATCTTGTTCTGCAAC-3′ (corresponding to bp +1348 to +1371); 5′-CACACGCGTCCATCCATTTCCTTCTGCCTGCTTC-3′ (corresponding to bp +1895 to +1919). The antisense primer was 5′-CACACTAGTATTGCATAGATAGTGTCCGGTGCC-3′ (corresponding to bp +2289 to +2312); MluI or SpeI sites included in these primers are underlined. PCR amplifications were performed with Taraka LA Taq polymerase and GC buffer recommended by the supplier (Takara). All constructs were verified by DNA sequence analysis and purified by EndoFree Plasmid Maxi Kit (Qiagen).

Expression plasmids.

Total RNA was extracted by the ToTALLY RNA Kit (Ambion) from lungs of 8-wk-old C57Bl/6J mice. cDNA was synthesized from 2 μg total RNA with dT12–18 adaptor primer and SuperScript II Reverse Transcriptase (Invitrogen, Carlsbad, CA). To create expression vector pcDNA-GATA5-f, a full-length murine GATA5 cDNA was amplified with a sense primer corresponding to the region upstream of the start codon in exon 1 (5′-CACGAATTCTCTGCAGGTCAAGCTCG-3′) and an antisense primer corresponding to the region spanning the stop codon on exon 6 (5′-CACCTCGAGT-GTGGTGACAGTTTCCTGAGC-3′); EcoRI and XhoI sites are underlined. For expression vector pcDNA-GATA5-s encoding short GATA5, a portion of the GATA5 cDNA encoding amino acid residues 226 to 404 was amplified with a sense primer corresponding to the region including the predicted new start codon (underlined) in exon 2 (5′-CACGAATTCTGCGGCCTCTATCACAAGATGAACGGGGTCAACCG-3′) and the above antisense primer within exon 6. The resultant PCR fragments were digested with EcoRI and XhoI and cloned into the EcoRI-XhoI sites of pcDNA3.1 (Invitrogen), which directs cDNA expression under control of the cytomegalovirus promoter. Both expression plasmids were sequenced and purified with an EndoFree Plasmid Maxi Kit (Qiagen). The expression vector pcDNA-GATA4 and the atrial natriuretic factor (ANF)-promoter/luciferase reporter plasmid were described previously (22, 29).

Cell culture.

Canine tracheal smooth muscle (CTSM) cell cultures were established as described previously (9) and cells from the second passage were used for transfection assays. 16HBE14o (generously provided by Dr. D. C. Gruenert) and NIH3T3 cells (ATCC, Manassas, VA) were maintained in DMEM/F12 supplemented with 10% fetal bovine serum, 100 units/ml penicillin, and 100 μg/ml streptomycin at 37°C in a 5% CO2 humidified incubator.


To measure the GATA5 promoter activity, transient transfection was performed in 12-well plates at 60,000 cells/well of CTSM cells or 16HBE14o cells with 1 μg pGL3-basic or equal molar amounts of pGL3-basic-derived plasmids containing mouse GATA5 promoter constructs, and 3 ng of pKT-null vector containing the Renilla luciferase reporter gene as an internal control. In experiments with forced expression of GATA5 transcription factors, 1 μg of expression vectors pcDNA-GATA4, pcDNA-GATA5-f, or pcDNA-GATA5-s, or empty pcDNA3.1 (as control) were cotransfected along with 1 μg of ANF promoter-luciferase reporter plasmid, 1 μg empty pcDNA3.1, and 3 ng of pKT-null vector into NIH3T3 cells. All mammalian cells were transfected with Qiafect Reagent (Qiagen) according to manufacturer's instructions. At 48 h posttransfection, cells were harvested and assayed for reporter activity by use of a dual luciferase assay system (Promega). Luciferase activity was normalized as luciferase/Renilla activity and was expressed as average ± SE from at least three independent experiments, each performed in triplicate wells.

Western analysis.

Full-length GATA5 or short GATA5 (containing aa 226–404) were expressed by transient transfection of their encoding plasmids in NIH3T3 cells. Extracts of transfected NIH3T3 cells and of mouse intestinal mucosa were prepared using RIPA buffer with protease inhibitors. Total protein (25 μg per sample) was boiled in Laemmli buffer and resolved on SDS-PAGE. Proteins were transferred to Hybond-polyvinylidene difluoride membranes and probed with AB4133 antibody (recognizes aa 235–247 in the middle portion of full-length GATA5) or Y-19 antibody (recognizes NH2-term of GATA5) (Santa Cruz Biotechnology, Santa Cruz, CA) at 1/200 dilution, or Sigma anti-GATA5 antibody (recognizes aa 235–247) (Sigma, St. Louis, MO) at 1/200 dilution. GATA5 proteins were detected using anti-rabbit or anti-goat horseradish peroxidase antibody (GE Healthcare UK, Little Chalfront, Buckinghamshire, UK) as appropriate at 1/10,000 dilution and the SuperSignal Chemiluminescence system (Pierce, Rockford, IL).

Transgenic mice.

Plasmid pMB105 was obtained from Dr. Ravi Misra (Medical College of Wisconsin, WI) and modified by inserting a short linker containing NotI-XbaI-SmaI-NheI-XbaI sites upstream of the LacZ gene contained therein. The GATA5 genomic sequence from +890 bp to +2312 bp was amplified by PCR using the primers listed above, digested with NheI, and ligated into SmaI-NheI digested pMB105. The pIRES2-EGFP vector (Clontech, Mountain View, CA) was used as template to amplify its internal ribosome entry site (IRES)-enhanced green fluorescent protein (EGFP) sequence with sense primer 5′-CACAGATCTATCCGC-CCCTCTCCCTCC-3′ and antisense primer 5′-CACCTGCAGAACAACACTCAACCCTATCT-CG-3′ (restriction sites underlined). A point mutation was introduced by PCR to delete a NotI site in the IRES-EGFP sequence. BglII- and PstI-digested IRES-EGFP was isolated and inserted into the BglII-PstI site of pMB105 downstream of the LacZ gene. The alternative GATA5 promoter reporter construct built in pMB105 thus contained a 6.2-kb transgene cassette, including 1.4-kb potential alternative GATA5 promoter, 3.1-kb LacZ gene, and 1.7-kb IRES and EGFP with SV40 early mRNA polyadenylation signal. The construct was verified by sequencing and the linear transgene released by digestion with NotI (Fig. 4A). Transgenic mice harboring this cassette were generated by the transgenic mouse core facility of the University of Chicago Animal Resources Center.

X-gal staining.

Tissues dissected from alternative GATA5 promoter adult transgenic mice and wild-type C57BL/6J mice were rinsed with phosphate buffer solution then fixed for 1 h with 2% paraformaldehyde on ice. After three washes with phosphate buffer solution, the tissues were treated at 37°C overnight with 1 mg/ml X-gal, 5 mM potassium ferricyanide, 5 mM potassium ferrocyanide, 0.01% Nonidet P-40 (NP-40), and 0.1% deoxycholate. Tissues were embedded in paraffin, sectioned, and counterstained with eosin (Sigma).

Cell and tissue immunostaining.

NIH3T3 cells transiently transfected with either pcDNA-GATA5-f or pcDNA-GATA5-s were fixed with 1% paraformaldehyde for 10 min, washed five times with PBS, permeabilized with 0.1% Triton X-100 in PBS for 5 min, preincubated with 10% goat serum-PBS for 15 min, and incubated with polyclonal antibodies against GATA5 (diluted 1/500, Sigma) and Alexa 488-phalloidin (diluted 1/100, Invitrogen-Molecular Probes) in 1% BSA-PBS for 2 h at room temperature in a humidified chamber. After three washes of 5 min each in PBS, GATA5 was detected with an Alexa 594-conjugated goat anti-mouse antibody (1/250 dilution; Invitrogen-Molecular Probes), and counterstained with 4,6-diamidino-2-phenylindole. Fluorescence micrographs were taken on an Axlesia 200 microscope and images were pseudocolored and merged by use of ImageJ (NIH, Bethesda, MD). Tissues from GATA5 promoter/reporter transgenic mice were frozen in OCT, and 6-μm sections were immunostained for EGFP using anti-GFP primary rabbit polyclonal antibody (1/100 dilution; Clontech), horseradish peroxidase-coupled goat anti-rabbit secondary antibody, and 3,3′-diaminobenzidine after permeabilization by 5-min incubation in 0.5% NP-40 and quenching endogenous peroxidase activity by 5-min incubation with Peroxidase Block (Dako, Glostrup, Denmark).


Identification of two alternative GATA5 transcripts.

Northern analysis performed using poly-A+ RNA from a variety of mouse tissues and probes corresponding to GATA5 5′-UTR in exon 1, GATA5 exons 2–6, and GATA5 far 3′-UTR in exon 6 revealed three transcripts ∼3.3, 2.5, and 2.1 kb in size (Fig. 1A). Two bands (3.3 and 2.1 kb) were detected with the GATA5 5′-UTR probe and GATA5 exon 2–6 probe prominently in intestine, stomach, lung, and kidney. However, the 2.1-kb band was not present when the GATA5 3′-UTR probe was used. In addition, a much weaker hybridizing band of 2.5 kb was evident in intestine. We found similar results upon Northern analysis of total RNA from small intestinal mucosa (Fig. 1B). The previously known 3.3-kb transcript was identified with all three probes, confirming that it includes sequences within exons 1, 2–6, and the far 3′ end of exon 6. In contrast, the 2.1-kb transcript hybridized with GATA5 5′-UTR and GATA5 exons 2–6 probes, but not the 3′-UTR probe, indicating that it lacks the 3′ terminal sequence of the full-length GATA5 mRNA (3.3 kb). The third, much less abundant 2.5-kb transcript hybridized with exons 2–6 and 3′-UTR probes, but not the 5′-UTR probe, indicating that it omits this sequence from exon 1.

Fig. 1.

Northern analyses of poly-A+ RNA from various mouse tissues (A) or of total RNA from mouse intestinal mucosa (B) identify 3 transcripts ∼3.3, 2.5, and 2.1 kb in size. The 3.3-kb transcript was detected with 3 radiolabeled probes corresponding to mouse GATA5 5′-untranslated region (UTR) (full-length cDNA bp 1–350), coding exons (Ex) 2–6 (full-length cDNA bp 968–1542), and exon 6 3′-UTR (full-length cDNA bp 2232–2712). The 2.1-kb transcript hybridizes with the 5′-UTR and exons 2–6 probe but not with the 3′-UTR probe. Conversely, the 2.5-kb transcript hybridizes with the exons 2–6 and 3′-UTR probes but excludes the 5′-UTR sequence. When present, the 2.5-kb transcript is much less abundant than are the 3.3- and 2.1-kb transcripts. C: alignment of regions of human and mouse genomic DNA showing homology (VISTA analysis, % homology shown) with exons found in 3 mouse GATA5 mRNAs as inferred from 5′ RNA ligase-mediated rapid amplification of cDNA ends (RLM-RACE) and 3′ RACE. Translation start codons (ATG) in exons 1 and 2 are indicated. D: RT-PCR demonstrates that mRNAs encoding full-length GATA5 (494-bp amplicon spanning exon 1 to exon 5) or short GATA5 (303-bp amplicon spanning intron 1 to exon 3) are expressed in similar tissue distributions, and are most prevalent in intestine, stomach, and lung, but are also expressed at low levels in heart and kidney. E: Western analysis demonstrates full-length GATA5 (∼45 kDa) in intestinal mucosa from four regions indicated; in addition, short GATA5 (∼27 kDa) is present in proximal colonic mucosa.

Identification of transcription start site and 3′ end in the alternative GATA5 mRNAs.

We employed 5′ RLM-RACE to amplify mouse stomach cDNAs only from full-length capped mRNAs and so to map the transcription start sites of mouse GATA5 mRNA. Besides recovering the previously known GATA5 transcription start site in exon 1, we discovered a novel transcription start site in intron 1, at bp +2121 of the mouse GATA5 gene, located 82 bp upstream of exon 2. Sequence analysis disclosed that the first exon of this novel GATA5 transcript includes 258 bp, comprising 82 bp of intronic sequence immediately upstream of exon 2 and the whole 176 bp exon 2; sequence corresponding to exon 3 was also present in the PCR product, confirming that this transcript splices normally to exon 3. 5′ RLM-RACE was repeated and a similar product was recovered with the use of total RNA from mouse lung and heart.

To identify whether this new GATA5 transcript shares the same downstream exons and 3′-UTR as the full-length (3.3 kb) GATA5 mRNA, GATA5 cDNA was reverse transcribed with a 22-nucleotide gene-specific primer corresponding to the 3′ end of the full-length GATA5 transcript by using total RNA from mouse lung or stomach. PCR amplification of GATA5 cDNA was the performed with a sense primer corresponding to the first (most 5′) 24 nucleotides of the putative novel transcript (in intron 1) and an antisense primer that maps to the 3′-UTR. We found that the amplified sequence consists of 82 bp upstream of exon 2 at its 5′ end and is thereafter identical to exons 2–6 of full-length GATA5 transcript. These data unequivocally confirm that a novel transcription start site occurs at bp +2121 of the mouse GATA5 gene and indicate that this new GATA5 mRNA of 2483 bp is expressed in mouse stomach, lung, and heart. This sequence is consistent with the 2.5-kb GATA5 transcript observed in intestine.

The sequences of the 3′ RACE products revealed two different termini of GATA5 cDNAs. One was identical with the previously known 3′ end of full-length GATA5 transcripts, but a novel 3′ terminus of GATA5 mRNA was also found, corresponding to bp 2106 of the previously known full-length GATA5 transcript (bp +8398 of the mouse GATA5 gene); transcripts with this novel 3′ end lack the terminal 1143 bp of the 3′-UTR. The predicted resultant 2.1-kb GATA5 transcript is evidently abundantly expressed, as shown in Fig. 1, A and B. Figure 1C shows the inferred exon usage of all three GATA5-encoding mRNAs. Note that an ATG codon in exon 2 is in frame with the full-length GATA5 cDNA; this suggested the possibility that the 2.5-kb GATA5 transcript that lacks exon 1 might nonetheless encode an NH2-terminally truncated “short” GATA5, whose translation start site might occur at the indicated ATG in exon 2 (Fig. 1C).

Short GATA5 is expressed in gut.

To validate the expression and assess the tissue distribution of mRNA encoding short GATA5, we performed RT-PCR using GATA5 intron 1 primer, which is specific for short GATA5 mRNA. Figure 1D shows that mRNAs encoding short GATA5 isoform is expressed in a similar distribution to full-length GATA5 mRNA among tissues. Both full-length and short GATA5-encoding mRNAs are most abundant in intestine, stomach, and lung and are also expressed at low levels in heart and kidney. In contrast, very little or no GATA5 mRNAs appear in liver, spleen, brain, or thymus. In addition, Western analysis demonstrates full-length GATA5 (∼45 kDa) in intestinal mucosa from four regions indicated in Fig. 1E, and short GATA5 (∼27 kDa) is present in proximal colonic mucosa. The latter confirms that short GATA5 is indeed expressed endogenously.

In the Affymetrix Mouse Genome 430 2.0 Array, two probe sets (1450125_at and 1450126_at) detect GATA5 expression. Probe 1450125_at is derived from full-length GATA5 cDNA bp 2668–2985 whereas probe 1450126_at corresponds to bp 1507–2085. Because it lacks bp 2107–3249 of the full-length GATA5 cDNA, the novel 2.1-kb transcript cannot be detected by the probe set 1450125_at. In contrast, the probe set 1450126_at can detect all isoforms of GATA5 transcripts. This suggested to us that the apparent abundance of GATA5 as judged by hybridization to 1450126_at should exceed that suggested by hybridization to 1450125_at, assuming that the entire transcript is reverse transcribed during processing of the mRNA for microarray hybridization. To test this prediction, we analyzed three sets of lung gene expression previously published by us (11, 17, 31), version 3 of the Genomics Institute of the Novartis Research Foundation's Mouse GeneAtlas (13), and a report of colonocyte gene expression (16). In each of these studies, the GATA5 signal intensities reported by probe set 1450126_at were higher than those reported by probe set 1450125_at in at least 90% of samples (P < 0.0001 for each data set) (Table 1).

View this table:
Table 1.

Signal intensities from probe sets 1450125_at and 1450126_at on Affymetrix transcription microarrays, as reported in 5 published studies

Genomic sequence within GATA5 intron 1 has transcription-promoting activity.

To determine whether an alternative promoter transcribes the novel 2.5-kb GATA5 mRNA in mouse, we amplified a series of DNA fragments and cloned these upstream of the luciferase cDNA in pGL3-basic (Fig. 2A). These constructs comprised a 5′ deletion series with 3′ ends in exon 2 upstream of the alternate initiation codon at +2312 of GATA5 gene. Reporter plasmids were transiently transfected into CTSM cells and 16HBE14o cells, and promoter activity determined as firefly luciferase activity normalized to Renilla luciferase activity. The genomic DNA sequence comprising +890 bp to +2312 bp had clear-cut promoter activity in CSM cells (7.4-fold that of empty pGL3-basic) and 16HBE14o cells (5.8-fold that of empty pGL3-basic) (Fig. 2B). Furthermore, the genomic DNA sequence spanning +890 bp to +1347 bp evidently contains positive cis-acting transcriptional regulatory elements, since their deletion reduced promoter activity. This finding supports the possibility that an intron 1 promoter located at mouse genomic DNA sequence +890 to +2312 drives the transcription of the 2.5-kb GATA5 transcript in mouse.

Fig. 2.

A: schematic illustration of the alternative GATA5 promoter constructs evaluated. B: relative promoter activities in canine tracheal smooth muscle (CTSM) and human bronchial epithelial (16HBE14o) cells. The GATA5 genomic sequence spanning bp +890 to +2312 has promoter activity in both CTSM (7.4-fold that of empty pGL3-basic) and 16HBE14o (5.8-fold that of empty pGL3-basic) cells. Deletion of the sequence from +890 bp to +1347 bp reduced promoter activity in both cell types. Means ± SE of 3 experiments are shown. Luc, luciferase gene.

Putative intron 1 promoter drives reporter gene expression in transgenic mice.

To test whether the intron 1 GATA5 sequence exhibits promoter activity in vivo, we generated random-integration transgenic mice on the C57Bl/6 background, using a transgene in which bp +890 to +2312 of the GATA5 gene drove expression of a dual reporter cassette (lacZ-IRES-EGFP) (Fig. 3A). A 6.2-kb transgene (5′-NotI-intronic GATA5 promoter-lacZ-IRES-EGFP-SV40 polyA signal-NotI-3′) was purified and injected into fertilized eggs for transgenic mouse generation in the University of Chicago transgenic core facility. Transgenic mice were genotyped by Extract-N-Amp Tissue PCR kits (Sigma, St. Louis, MO). X-gal staining and immunostaining were used to detect β-galactosidase activity and EGFP expression, respectively. In each of two founder lines of transgenic mice studied, the intron 1 GATA5 promoter directed both β-galactosidase and EGFP expression in gastric epithelial cells (Fig. 3B), although reporter gene expression was more intense in the line shown. In addition, there was weak reporter gene expression in epithelium of kidney, lung, small intestine, and colon (data not shown) in both of the transgenic lines studied. These data indicate that the GATA5 genomic DNA fragment spanning bp +890 to +2312 contains a functional promoter in vivo that could transcribe the 2.5-kb GATA5 mRNA with transcription start site at +2121 bp of the GATA5 gene.

Fig. 3.

A: schematic illustration of the transgene used to reveal the tissue distribution of alternative (intron 1) GATA5 promoter activity. B: reporter gene expression is observed in gastric mucosa of a transgenic mice. β-Galactosidase reporter expression was detected by X-gal staining (blue color) and eosin counterstaining in gastric mucosa from a transgene positive mouse (2) but not in stomach of a wild-type mouse (1). EGFP reporter expression was detected by immunostaining (brown color) and hematoxylin counterstaining in transgenic mice (4); omission of primary anti-EGFP (3) obviated all staining, indicating the specificity of the staining in 4. Together, these results demonstrate that the alternative GATA5 promoter in intron 1 can drive both LacZ and EGFP reporter gene expression in gastric mucosa.

Short GATA5 retains partial ability to transactivate the ANF promoter.

To determine whether the novel 2.5-kb transcript encodes a protein that retains GATA5 function, we subcloned the portion of the full-length (3.3 kb) GATA5 cDNA from 18 bp upstream of the predicted translation initiation ATG in exon 2 through the translation stop site into pcDNA3.1. Western blot confirmed that this construct expresses a short GATA5 isoform (containing aa 226–404) during transient transfection in NIH3T3 cells (Fig. 4A). Moreover, immunofluorescent staining confirmed that both full-length (Fig. 4B) and short (Fig. 4C) GATA5 were located in the nucleus when expressed by transient transfection in NIH3T3 cells. To assess whether short GATA5 retains transcription-promoting activity, we cotransfected expression plasmids encoding short GATA5, full-length GATA5, or GATA4 with a reporter plasmid in which firefly luciferase expression is driven by the GATA-sensitive ANF promoter. As shown in Fig. 5, short GATA5 significantly transactivates the GATA-sensitive ANF promoter, although to a lesser extent than does full-length GATA5 or GATA4.

Fig. 4.

A: expression of full-length (left lane of each blot) or short GATA5 (right lane of each blot) in NIH3T3 cells transfected with pcDNA-GATA5-f or pcDNA-GATA5-s, respectively. AB4133 antibody (recognizes amino acids 235–247 in the middle portion of full-length GATA5) detects both species (∼45 and 27 kDa); Y-19 antibody (recognizes NH2 terminus) shows that short GATA5 lacks NH2-terminal residues. B and C: immunofluorescence staining of NIH3T3 cells transfected with pcDNA-GATA5-f, which encodes full-length GATA5 (B) or pcDNA-GATA5-s, which encodes short GATA5 cDNA (C). Both GATA5 isoforms exhibit nuclear localization. Primary anti-GATA5 antibody used is directed against aa 235–247 of full-length GATA5. GATA5 staining is shown in red and Hoescht nuclear staining is shown in blue, so that nuclear GATA5 appears magenta. F-actin staining (phalloidin) is shown in green.

Fig. 5.

In vitro transactivation experiments indicate that cotransfection with expression plasmid encoding short GATA5 (pcDNA-GATA5-s) transactivates the GATA-sensitive atrial natriuretic factor (ANF) promoter (P < 0.0001), though full-length GATA5 (pcDNA-GATA5-f) and GATA4 (pcDNA-GATA4) are more effective. Means ± SE of 4 experiments with triplicate wells are shown.


In this study, we identified an alternative promoter region of mouse GATA5 gene located at bp +890 to +2312 that drives the transcription of a 2.5-kb GATA5 mRNA with a novel transcription start site at bp +2121 within intron 1. It is a common feature for GATA family members to possess two promoters and two initiation codons. In the mouse, a distal promoter of the GATA1 gene drives testis-specific expression, whereas a more proximal promoter directs transcription in hematopoietic cells (24, 30). Similarly, mouse GATA2 expression is regulated by two distinct promoters. One, whose structure is homologous to that of the Xenopus and human GATA2 gene promoters, directs mouse GATA2 transcription in all cells, whereas the other promoter regulates the expression of GATA2 specifically in hematopoietic cells (4, 6, 18). Human and mouse GATA3 genes also are controlled by two promoters that may direct lineage- and tissue-specific expression. The human and mouse GATA6 genes possess two alternative promoters and two initiation codons, although both transcripts are expressed in essentially the same tissue-specific and developmental stage-specific pattern (3). Two promoter regions regulate transcription of chicken GATA5 mRNAs that encode two distinct GATA5 proteins (15). It is not surprising therefore that there are distinct promoters mediating the transcriptional regulation of GATA5 genes in mouse. The use of alternative promoters and transcriptional start sites can create diversity and flexibility in the regulation of gene expression (2).

Our finding of a second GATA5 promoter in intron 1 (Figs. 2 and 3) that drives expression of a short but still functional GATA5 variant, in a tissue distribution that parallels expression of mRNA encoding full-length GATA5 (Fig. 1D), makes plausible our speculation that previously reported GATA5 exon 1-deleted mice (20) might have retained a transcriptionally active short form of GATA5 protein. The predicted 181 aa GATA5 protein would retain only one zinc finger and the COOH-terminal activation domain and would have had a structure quite similar to the alternative isoform of chicken GATA5, in which splicing from an upstream alternative exon 1 excludes the coding exon 1 and the consequent mRNA encodes a partially functional single zinc finger isoform of chicken GATA5 (15). GATA5 exon 1-deleted mice are not presently available to evaluate the expression and function of short GATA5 in the absence of full-length GATA5. However, our studies indicate that an immunoreactive short isoform can be synthesized in vitro and localizes to the nucleus, similar to full-length GATA5 (Fig. 4). Even with a single zinc finger, short GATA5 still transactivates the GATA-sensitive ANF promoter although to a lesser extent than full-length GATA5 or GATA4 (Fig. 5). Furthermore, short GATA5 is expressed endogenously in proximal colon (Fig. 1E) along with full-length GATA5. The physiological role of endogenously expressed short GATA5 has yet to be determined.

Using 3′ RACE, we also identified the structure of a 2.1-kb GATA5 transcript that includes exons 1–5 and the 5′ half of exon 6, but lacks the 3′ terminal 1143 bp of exon 6; this transcript is likely the same as the apparently 1.8-kb transcript evident in previously published Northern blots (5, 21). The tissue distribution of the 2.1-kb transcript parallels that of full-length GATA5 mRNA; both are highly expressed in the stomach and intestine as well as slightly in lung, liver, and heart (Fig. 1A). This 3′ truncated mRNA should encode the same GATA5 protein as the full-length GATA5 transcript. However, 3′-UTR sequences often harbor protein binding sites to stabilize transcript and control polyadenylation and nuclear export (8) and may contain microRNA binding sites that regulate protein translation. We do not yet know the function of the GATA5 3′-UTR or whether there is any functional difference between these two transcripts. However, the finding of this new GATA5 isoform provides a critically important guide to interpreting data concerning GATA5 expression. For example, the GeneChip Mouse Genome 430 2.0 Array (Affymetrix, Santa Clara, CA) is a powerful tool that is widely used for profiling gene expression in mouse tissues; its probe target sequences are selected from current GenBank, dbEST, and RefSeq. Probe set 96500_at (GC content 53%), which targets GATA5 gene sequence from bp +8962–9345, was used to detect GATA5 transcripts in the Affymetrix MG-U74AV2 chip. In the Mouse Genome 430 2.0 Array, probe set 1450125_at (GC content 49.8%) replaced the probe set 96500_at and probe set 1450126_at (GC content 47.3%) was added as a second probe set for GATA5 mRNA. The target sequences for probe 1450125_at and 1450126_at are derived from GATA5 cDNA bp 2668–2985 and 1507–2085, respectively. Thus probe set 1450126_at detects all three isoforms of GATA5 transcripts whereas the probe set 1450125_at only detects full-length GATA5 transcripts and the 2.5-kb GATA5 transcript transcribed from the intron 1 promoter; however, it is insensitive to the 2.1-kb transcript that lacks the 3′ terminal 1143 bp of full-length GATA5 exon 6. This suggests that the signal of GATA5 gene detected by probe set 1450126_at (which detects all GATA5 mRNAs) should be higher than or at least equal to that detected by probe set 1450125_at (which excludes those that lack the full 3′-UTR). Indeed, we confirmed this anticipated result in gene expression data sets from five recent studies (11, 17, 31), as shown in Table 1. Thus comparison of relative signal intensities from these two probe sets may yield insight into the differential abundances of GATA5 transcripts that contain or lack the terminal portion of the 3′-UTR.

Our findings also extend current understanding of the tissue distribution of GATA5 expression. Morrisey et al. (21) described the temporal and spatial pattern of GATA5 gene expression during mammalian development using in situ hybridization to a GATA5 5′-UTR cDNA probe (bp 253–707). Since the 5′-UTR cDNA probe cannot detect the alternative 2.5-kb GATA5 mRNA transcribed from the intron 1 promoter, their in situ hybridization results may not have revealed additional tissues that express short, but not full-length, GATA5 proteins. In addition, in contrast to the report that GATA5 is not expressed in the heart during late fetal and postnatal development, using the more sensitive RT-PCR method we discovered that novel and full-length GATA5 transcripts were expressed in adult heart, albeit at low levels.

In summary, we have identified an alternative promoter region and two alternative isoforms of GATA5 transcripts in mouse. These findings are critical to elucidate the temporal and spatial expression patterns of GATA5 isoforms and their functions.


This work was supported by NIH grants AI-056352, HL-056399, HL-079398, and HL-007605 and by a grant from the Cancer Research Foundation.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.
  26. 26.
  27. 27.
  28. 28.
  29. 29.
  30. 30.
  31. 31.
  32. 32.
View Abstract