Bates L et al. (2024), A Novel Method to Profile Transcripts Encoding ...

ECB-ART-53404

Cells 2024 Nov 18;1322:. doi: 10.3390/cells13221898.

Show Gene links Show Anatomy links

A Novel Method to Profile Transcripts Encoding SH2 Domains in the Patiria miniata Mature Egg Transcriptome.

Bates L , Wiseman E , Whetzel A , Carroll DJ .

???displayArticle.abstract???
The critical mechanism to restart zygote metabolism and prevent polyspermy during fertilization is the intracellular Ca2+ increase. All of the signaling molecules leading to the Ca2+ rise are not fully known in any species. In the sea star Patiria miniata, SFK1, SFK3, and PLCγ participate in this fertilization Ca2+ increase. These proteins share common regulatory features, including signaling via tyrosine phosphorylation and their SH2 domains. In this study, we explore two different bioinformatic strategies to identify transcripts in the Patiria miniata mature egg transcriptome (Accession PRJNA398668) that code for proteins possessing an SH2 domain. The first identified the longest open reading frame for each transcript and then utilized similarity searching tools to provide identities for each transcript. The second, novel, method involved a six-frame translation of the entire transcriptome to identify SH2 domain-containing proteins. The identified transcripts were aligned against the NCBI non-redundant database and the SwissProt database. Eighty-two transcripts that encoded SH2 domains were identified. Of these, 33 were only found using the novel method. This work furthers research into egg activation by providing possible target proteins for future experiments and a novel method for identifying specific proteins of interest within a de novo transcriptome.

???displayArticle.pubmedLink??? 39594646
???displayArticle.pmcLink??? PMC11593052
???displayArticle.link??? Cells
???displayArticle.grants??? [+]

???attribute.lit??? ???displayArticles.show???

	Figure 1. Two annotation methods were used to search the P. miniata mature egg transcriptome for SH2 domain-containing transcripts. (A) Workflow showing key differences between the methods. The classic method matches transcripts against known proteins with SH2 domains. The novel method is an unbiased method that finds all transcripts that might encode a protein that has an SH2 domain. (B) Both methods identified 49 transcripts that might encode a protein with an SH2 domain, while the novel method identified 33 transcripts not found using the classic method. The classic method did not identify any unique transcripts.
	Figure 2. Pathway to the identification of the 52 transcripts that encode a protein with an SH2 domain contained within an open reading frame. The total number of transcripts for each step of the SH2 domain annotation process is shown above. After matching all possible transcripts (580,338) to the conserved domain database, 82 RNAs were found to encode at least one SH2 domain. Some of these were on duplicate transcripts (i.e., coded for the same amino acids), some were on truncated open reading frames (ORFs), and some were not located within an ORF, leaving 52 unique transcripts that could produce a protein containing an SH2 domain. Four of these were found to be located on an ORF that was not the largest encoded by that specific transcript. * Some of these transcripts belong to more than one category, which explains the number discrepancy (i.e., 82 − 33 ≠ 52).
	Figure 3. Category 1 transcript, SFK1, represents a transcript that meets all the criteria for a match to a homologous protein. The Patiria miniata SFK1 protein sequence (AAS01045.1) was used to identify the homologous transcript in our P. miniata mature egg transcriptome using NCBI’s tBLASTn software as a test of the method. The homologous transcript to AAS01045.1 was GGEY02029025.1. The top match from Homo sapiens in the SwissProt database was P07947, which is tyrosine-protein kinase Yes. The overall identity between the protein encoded by sea star mature egg transcriptome record GGEY02029025 and AAAS01045 was 99%, and with P07947 it was 59%. The domain arrangement is the same between the sea star and human proteins, with a 62% identity between the SH3 and SH3 domains and a 73% identity between the Src_like protein tyrosine kinase domains.
	Figure 4. Both the classic and novel methods allow for identification of truncated transcripts. PLCγ represents a transcript that has an assembly error and shows that the expected protein is split between two transcripts. The previously identified Patiria miniata PLCγ protein sequence (AAR85355.1) was used to identify a homologous transcript in the P. miniata mature egg transcriptome using tBLASTn software of the NCBI. A) The transcripts identified as homologous to AAR85355.1 were contained within separate transcript records: GGEY02080031.1 and GGEY02080032.1. The longest ORFs of GGEY02080031.1 and GGEY02080032.1 were submitted to the NCBI CDD. GGEY02080031.1 contains all the domains of AAR85355.1 except the PI-PLC and C2 domains, which are on GGEY02080032.1. Together, these assemble into a complete PLCγ sequence. Twelve amino acids overlap between the two partial polypeptides: HERKMRIAKEFS.
	Figure 5. The novel method allows for identification of transcripts with assembly errors. The transcript GGEY02003111 was identified to code for an SH2 domain through the CDD search, and two proteins were identified by BLAST: extensin-like isoform X2 (XP_038070565) and B-cell linker protein-like (BLNK; XP_038070561) as possible matches. These two proteins are identical except for a missing 38 amino acid segment in XP_038070565. Shown here is BLNK aligned with the two matching frames from the transcript. The frame of GGEY02003111.1 that coded for a PHA02682 domain and a SH2 domain (frame -1) matched the identified protein at an identity level of 100%. A longer ORF on the same transcript but in frame -3 encoded a SAM domain that matched with the identified protein at an identity level of 92%. This would have been missed using the classic method.
	Figure 6. An example of category 3 transcripts, SH2/WW, represents a transcript that did not match to a known sea star protein. The ORF of the SH2/WW transcript (GGEY02085024.1) was matched against the nr and SwissProt protein databases to identify orthologous proteins using BLASTp. The best match in the nr database at 99.6% identical was to an uncharacterized Patiria miniata protein (XP_038061760). No characterized (known) proteins that contained the same domains (SH2 and WW) were identified using BLASTp. Identification of a protein with conserved domain architecture using CDART revealed a GRB2-related adapter protein 2-like from the lancelet Branchiostoma belcheri, which was 31% identical to the sea star protein at the amino acid level, with a domain architecture that matched our unidentified transcript and the uncharacterized sea star protein.
	Figure 7. Agarose gel (0.8%) displaying RT-PCR products of the selected transcripts. A transcript(s) from each of the three categories was selected for RT-PCR to confirm the transcript assembly. Lanes 1 and 6 are the molecular weight ladders (1 kB DNA Ladder, New England BioLabs, Inc.). The remaining lanes display the RT-PCR products of SFK3 (lane 2, 1759 base pairs), Extensin/BLNK-like (lane 3, 832 base pairs), SH2/WW (lane 4, 1001 base pairs), SFK1 (lane 5, 1619 base pairs), and PLCγ (lane 7, 3754 base pairs).