|
Figure 1. Domains of insulin/IGF-related peptides.Human insulin and IGF and Drosophila dilp7 are aligned and the different domains that are recognized in the precursors of these peptides are indicated. In insulin the domain borders are the convertase cleavage sites that are hihglighted in red. The A- and B-domains of insulin correspond to the A- and B-chains of insulina and the C-domain to the connecting peptide. Although IGF consists of a single protein chain due to its strong sequences similarity to insulin the A- and B-domains correspond to homologous regions of those domains in insulin, while the C-domain is the sequence between the A- and B-domains. In insulin there is only a single amino acid residues after the last cysteine residue, but in IGF there is a longer sequence, that has been called the D-domain. The IGF precursor is cleaved by furin in the Golgi apparatus and the sequence that is removed has been called the E-domain. Dilp7 is only known from nucleotide sequences, it is unknown how the precursor is exactly processed. Nevertheless, the presence of putative convertase cleavage sites, highlighted in red, suggests the presence of A-, B- and C-domains quite similar to those in insulin. However, unlike insulin or IGF, the putative B-chain of dilp7 has a long N-terminal extension that I propose to call the F-domain. The latter is well conserved in dilp7 orthologs from other bilaterians (Fig. 4).
|
|
Figure 2. Sequences of selected ambulacrarian IGF.Partial IGF sequences from selected ambulacrarians are illustrated to show their sequence similarity. The A-, B- and C-domains of the insulin core are aligned, but not the putative D- and E- domains, as their amino acid sequence is only conserved in closely related species (Fig. S1). Not aligning D- and E- domains allows the visualization the context of putative convertase cleavage sites. None of the arginine or lysine residues conform to a typical arthropod or vertebrate convertase cleavage site. Although the sequence of the latter part of the IGF precursors is not well conserved, all of them are rich in positively charged amino acid residues. Conserved cysteine residues are indicated in red, conserved amino acid residues are highlighted in black and conserved substitutions in grey. The arginine and lysine residues in the D- and E- domains are highlighted in blue.
|
|
Figure 3. Sequences of selected echinoderm GSS.Sequence alignment of a few echinoderm GSS showing relatively conserved A- and B- domains of the insulin core sequence and likely KR convertase cleavage sites that can be expected to be cleaved by neuroendocrine convertase as well as a few potential furin sites. Conserved cysteine residues are indicated in red, conserved amino acid residues are highlighted in black and conserved substitutions in grey. The arginine and lysine residues that form likely âor possibly in the case of Apostichopus GSS-2 - part of a convertase site are highlighted in blue. For the alignment of a larger number or echinoderm GSS sequences see Fig. S3.
|
|
Figure 4. Sequences of selected dilp7 orthologs.Sequences of Drosophila dilp7 and several ambulacrarian orthologs illustrating well conserved sequences, not only in typical insulin core of the peptides, but also in the F-domain (underlined in blue). Note that the sequence conservation of these peptides is stronger than in the IGFs or GSSs (Figs. 1 and 2). Conserved cysteine residues are indicated in red, conserved amino acid residues are highlighted in black and conserved substitutions in grey. Likely convertase cleavage sites have been highlighted in blue. Sequences are from Spreadsheet S1 and (Veenstra, 2020b), a comparison of a larger number of sequences is presented in Figs. S5 and S6.
|
|
Figure 5. Sequences of selected ambulacrarian octinsulins.Sequence alignment of a number of octinsulin sequences show that these sequences all have typical neuroendocrine convertase KR cleavage sites, suggesting these precursors are processed by enteroendocrine and/or neuroendocrine cells. Conserved cysteine residues are indicated in red, conserved amino acid residues are highlighted in black and conserved substitutions in grey. Likely convertase cleavage sites have been highlighted in blue. Sequences are from Spreadsheet S1, a comparison of a larger number of sequences is presented in Figs. S7 and S8.
|
|
Figure 6. Sequence comparison of selected ambulacrarian multinsulins and dilp7 orthologs.Three different sets of sequences are compared. The top five sequences are dilp7 orthologs, the next five are multinsulins having three disulfide bridges and the last five multinsulins having four disulfide bridges. Note that although the multinsulins and the dilp7 orthologs share some sequences similarity this does not include the F-domain. Like the octinsulins these sequences all have typical neuroendocrine convertase KR cleavage sites, suggesting they are processed by enteroendocrine and/or neuroendocrine cells. Conserved cysteine residues are indicated in red, conserved amino acid residues are highlighted in black and conserved substitutions in grey. Likely convertase cleavage sites have been highlighted in blue. Sequences are from Spreadsheet S1, a comparison of a larger number of sequences is presented in Figs. S9 and S10.
|
|
Figure 7. Position of introns in ambulacrarian irp genes.Schematic representation of the location of the cysteine residues, indicated as purple rectangles, and introns, represented by green Ts, in the coding sequences of the various types of ambulacrarian insulin-like genes. Numbers indicate the phase of each intron. All genes share the typical phase 1 intron present in insulin-like genes, whereas dilp7 and multinsulin genes also share a phase 2 intron. Signal peptides indicated as interrupted bars.
|
|
Figure 8. Synteny of ambulacrarian irp genes.Schematic representation of the relative localization of different irp genes in several arthropod and ambulacrarian genomes. Arrow heads indicate transcription direction of the various genes, the numbers below the line indicate the number of nucleotides between the coding regions of adjacent genes in kilo base pairs. Note that the relative organization of the two insects âthe cockroach Blattella germanica and the stick insect Timema crisitinae âis the same as in the hemichordate Saccoglossus kovalewskii and remarkably similar to that of the sea urchin Strongylocentrotus purpuratus and the sea cucumber Holothuria scabra. In the spider Pardosa pseudoannulata and the sea cucumber Apostichopus japonicus some of the genes are also next to one another. However, in the sea stars Acanthaster planci and Pisaster ochraceus synteny has been lost. Arthropod data from (Veenstra, 2020b).
|
|
Figure 9. Radial sequence similarity tree of ambulacrarian irps.Note that the GSSs are similar to IGFs and seem to be related to them, while the multinsulins are most similar to the dilp7 orthologs. Echinoderm branches are in black, hemichordate branches in red. More extensive sequence comparisons and sequence trees are in the supplementary data (Figs. S1âS10). All sequences are from Spreadsheet S1.
|
|
Figure 10. Phylogenetic tree of LGRs.Phylogenetic tree constructed from the transmembrane regions of ambulacrarian LGRs that are putative receptors for irps. A few human and insect sequences have been added for comparison. The insert at the top shows the same data to which the glycoprotein LGRs have been added and where characteristic ligands for each branch have been identified. Numbers in blue indicate the apparent probabilities as determined by Fasttree. For details of the glycoprotein LGRs see Fig. S11.
|
|
Figure 11. Ectodomains of ambulacrarian LGRs.Schematic representation of the various domains of the putative receptors for ambulacrarian insulin-related peptides. Each green circle symbolizes an LDLa repeat and each purple rectangle an LRR repeat, while the yellow oval indicates the seven transmembrane regions. The top representation corresponds to the gonadulin and dilp7 receptors (Figs. S11, S12). Note though, that the latter are somewhat variable, notably in the sea stars of two species of the Patiria genus and Acanthaster planci those receptors have two LDLa repeats (for details see Fig. S12). The bottom representation corresponds to the GRL101 receptors (Fig. S13).
|
|
Figure 12. How echinoderm irps may have evolved.A represent an early metazoan in which an arch irp is a ligand for both an LGR and an RTK. B represents an early protostome or deuterostome that has three irps, an IGF and a dilp7 ortholog as well as gonadulin/octinsulin ortholog that evolved from local gene duplication from the arch irp. All three of these ligands each have their own LGR and at least two of them, IGF and the dilp7 ortholog, can also activate the RTK. C represents the Asterozoa where the dilp7 gene got duplicated and yielded several multinsulin genes which are represented here as one. The Asterozoa also have one or two GSSâs that evolved earlier during echinoderm evolution. Both multinsulins and GSSâs act exclusively through the RTK. Closed arrows indicate gene duplication events and interrupted arrows show ligandâreceptor interactions. The question mark conveys uncertaintity with regard to whether or not the gonadulin/octinsulin peptides are able to activate the RTK.
|