Allosteric coupling between protein domains is fundamental to many cellular processes. For example, Hsp70 molecular chaperones use ATP binding by their actin‐like N‐terminal ATPase domain to control substrate interactions in their C‐terminal substrate‐binding domain, a reaction that is critical for protein folding in cells. Here, we generalize the statistical coupling analysis to simultaneously evaluate co‐evolution between protein residues and functional divergence between sequences in protein sub‐families. Applying this method in the Hsp70/110 protein family, we identify a sparse but structurally contiguous group of co‐evolving residues called a ‘sector’, which is an attribute of the allosteric Hsp70 sub‐family that links the functional sites of the two domains across a specific interdomain interface. Mutagenesis of Escherichia coli DnaK supports the conclusion that this interdomain sector underlies the allosteric coupling in this protein family. The identification of the Hsp70 sector provides a basis for further experiments to understand the mechanism of allostery and introduces the idea that cooperativity between interacting proteins or protein domains can be mediated by shared sectors.
Allostery is a biologically critical property by which distantly positioned functional surfaces on proteins functionally interact. This property remains difficult to elucidate at a mechanistic level (Smock and Gierasch, 2009) because long‐range coupling within proteins arises from the cooperative action of groups of amino acids. As a case study, consider the Hsp70 molecular chaperones, a large and diverse family of two‐domain allosteric proteins required for cellular viability in nearly every organism (Figure 1) (Mayer and Bukau, 2005). In the ADP‐bound state, the two domains act independently, the C‐terminal substrate‐binding domain displays a stable configuration in which the so‐called ‘lid’ region is docked against the β‐sandwich subdomain, and substrates bind with relatively high affinity (Figure 1A) (Moro et al, 2003; Swain et al, 2007; Bertelsen et al, 2009). Exchange of ADP for ATP in the N‐terminal nucleotide‐binding domain causes significant local and propagated conformational change, formation of an interface with the substrate‐binding domain, opening of the lid subdomain, and a decrease in the binding affinity for substrates (Figure 1B) (Rist et al, 2006; Swain et al, 2007). Upon ATP hydrolysis by the nucleotide‐binding domain, Hsp70 is returned to the ADP‐bound configuration suitable for another round of substrate binding and release. This process of cyclical substrate binding and release underlies all biological functions of Hsp70 proteins.
What is the structural basis for the long‐range functional coupling within Hsp70? When allostery is a conserved property of a protein family, one approach to this problem is to analyze the correlated evolution of amino acids in the family—the expected statistical signature of cooperative action of protein residues (Lockless and Ranganathan, 1999; Kass and Horovitz, 2002; Suel et al, 2003). Previous work using an implementation of this concept (the statistical coupling analysis or SCA) showed that proteins contain sparse networks of co‐evolving amino acids termed ‘sectors’ that link protein active sites with distinct functional surfaces through the protein core (Halabi et al, 2009). This architecture is consistent with known allosteric mechanisms in protein domains (Suel et al, 2003; Halabi et al, 2009).
However, the principle of co‐evolution of protein residues need not be limited to the study of individual protein domains. Indeed, conserved allosteric coupling between two (or more) non‐homologous domains implies the existence of shared sectors that span functional sites on different domains. Here, we test this concept by extending the SCA method to consider the allosteric mechanism acting between the two domains of the Hsp70 proteins. Hsp70‐like proteins include not only the allosteric Hsp70s, but also the Hsp110s—homologs that contain both domains and are regarded as structural models for Hsp70s, but that do not exhibit allosteric coupling. In this study, we take advantage of the functional divergence between the Hsp70s and Hsp110s to reveal patterns of co‐evolution between amino acids that are specifically associated with the allosteric mechanism.
To identify the allosteric sector in Hsp70, we used SCA to compute a weighted correlation matrix, , that describes the co‐evolution of every pair of amino‐acids positions in a sequence alignment of 926 members of the Hsp70/110 family. We then applied a mathematical method known as singular value decomposition to simultaneously evaluate the pattern of divergence between sequences and the pattern of co‐evolution between amino‐acid positions. The basic idea is that if the pattern of sequence divergence is able to classify members of a protein family into distinct functional subgroups, then we can rigorously identify the group of co‐evolving residues that correspond to the underlying mechanism. Figure 2A shows the principal axis of sequence variation in the Hsp70/110 family, showing a clear separation of the allosteric (Hsp70) and non‐allosteric (Hsp110) members of this family. The corresponding axis of co‐evolution between amino‐acid positions reveals a subset of Hsp70/110 positions (∼20%, 115 residues out of 605 total) that underlie the divergence of Hsp70 and Hsp110 proteins (Figure 2B). These positions derive roughly equally from the nucleotide‐binding domain (in blue, 56 positions) and the substrate‐binding domain (in green, 59 positions) and are more conserved within the Hsp70 sub‐family. These results define a protein sector that is predicted to underlie the allosteric mechanism of Hsp70.
What is the structural arrangement of the putative allosteric sector within the Hsp70 protein? Consistent with a function in allosteric coupling, the 115 sector residues form a physically contiguous network of atoms, linking the ATP‐binding site on the nucleotide‐binding domain to the substrate recognition site on the substrate‐binding domain through the interdomain interface (Figure 2C). The physical connectivity is remarkable given that only ∼20% of overall Hsp70 residues is involved (Figure 2B). Thus, functionally coupled but non‐homologous protein domains can share a single sector of co‐evolving residues that connects their respective functional sites.
We compared the Hsp70 sector mapping with the large body of biochemical studies that have been carried out in this family. We find strong experimental support for the involvement of sector positions in the Hsp70 allosteric mechanism in several regions: (1) within the ATP‐binding site, (2) at the interface linking the two domains, and (3) within the β‐sandwich core of the substrate‐binding domain. The sector analysis also makes predictions about the involvement of some previously untested residues; we show that mutations at two such sites in fact reduce the allosteric coupling within Hsp70 in vitro and fail to complement a DnaK knockout strain of E. coli in a stress‐response assay. Taken together, we conclude that sector positions are associated with the allosteric mechanism of Hsp70.
This work also adds a new finding with regard to the concept of protein sectors. Previous work showed that multiple quasi‐independent sectors, each of which contributes a different aspect of function, are possible within a single protein domain (Halabi et al, 2009). This work shows that a single sector can also span two different protein domains when biological function (here, nucleotide‐dependent substrate binding) arises from their coupled action. This result emphasizes the point that sectors are units of functional selection and are not obviously related to traditional hierarchies of structural organization in proteins. An interesting possibility is that evolution of allostery between proteins might evolve through the joining of protein sectors, a conjecture that can be tested in future work.
The Hsp70 family of molecular chaperones provides a well defined and experimentally powerful model system for understanding allosteric coupling between different protein domains.
New extensions to the statistical coupling analysis (SCA) method permit identification of a group of co‐evolving amino‐acid positions—a sector—in the Hsp70 that is associated with allosteric function.
Literature‐based and new experimental studies support the notion that the protein sector identified through SCA underlies the allosteric mechanism of Hsp70.
This work extends the concept of protein sectors by showing that two non‐homologous protein domains can share a single sector when the underlying biological function is defined by the coupled activity of the two domains.
Allosteric coupling, the process by which spatially distant sites on proteins functionally interact, is a defining biological property of many proteins, but the underlying structural basis remains difficult to understand (Smock and Gierasch, 2009). The central problem is the difficulty of detecting the pattern of cooperative functional interactions between amino‐acid residues in protein structures. One approach to this problem is to analyze the correlated evolution of amino acids in a protein family—the expected statistical signature of conserved cooperative actions of amino acids (e.g. Lockless and Ranganathan, 1999; Kass and Horovitz, 2002; Liu et al, 2008). Recently, an approach for the global analysis of correlated evolution in protein families has been introduced (Estabrook et al, 2005; Russ et al, 2005; Socolich et al, 2005; Lee et al, 2008), and the results imply a new and potentially general architecture of amino‐acid interactions within protein domains. The basic finding is that most residues evolve nearly independently, whereas a small fraction of residues is collectively coupled to form functional units called sectors (Halabi et al, 2009). A characteristic of sectors is structural connectivity; a contiguous system of sector residues within the protein core often connects distant surfaces in the three‐dimensional structure. Thus, at least within single protein domains, sectors provide a structural basis for explaining functional properties of proteins such as allosteric coupling.
However, the principle of co‐evolution of protein residues that underlies the sectors is not limited to the coupling of amino acids within a single domain. Indeed, allosteric coupling (or signal transmission) between two or more protein domains is a common finding in studies of cellular function. This suggests the existence of sectors—units of evolutionary selection—that are shared between different non‐homologous protein domains. For example, a sector spanning two domains could couple a functional site on one protein domain to a functional site on a second protein domain. Such sectors could explain conserved aspects of allosteric coupling.
The Hsp70 molecular chaperones—a large and diverse family of allosteric two‐domain proteins—present an excellent case study to test this concept. Hsp70 proteins interact with substrate proteins at a C‐terminal substrate‐binding domain, but both the affinity and kinetics of substrate binding are controlled by the activity of an N‐terminal nucleotide‐binding domain (Figure 1A). Specifically, exchange of ADP for ATP in the N‐terminal domain reduces the binding affinity for substrates at the C‐terminal domain and is accompanied by significant conformational change and interdomain docking (Mayer and Bukau, 2005; Rist et al, 2006; Swain et al, 2007; Bertelsen et al, 2009). The structure of the ATP‐bound state of the Escherichia coli Hsp70, DnaK, is yet unsolved, but we made a model by homology‐based methods and simulated‐annealing molecular dynamics using the crystal structure of ATP‐bound Hsp110 from yeast (Liu and Hendrickson, 2007) (Figure 1B; Supplementary Figure 8). This model illustrates the large conformational change in the substrate‐binding domain associated with ATP binding in the nucleotide‐binding domain, and indicates the expected interaction surface between the two Hsp70 domains. The allosteric cycle is completed when the intrinsic ATPase activity of the nucleotide‐binding domain reverses the conformational rearrangement, returning the Hsp70 to an ADP‐bound configuration suitable for another round of substrate binding and release.
The overall family of Hsp70‐like proteins that comprises the Hsp70s and the Hsp110s, homologs that contain both domains and as indicated above, are regarded as structural models for Hsp70s (Easton et al, 2000). However, despite their sequence similarity, the Hsp110 proteins have evolved to be non‐allosteric, such that the nucleotide‐binding domain remains stably bound to ATP and does not appear to regulate the substrate‐binding domain. Consistent with these findings, Hsp110s are incapable of folding substrate proteins on their own through cycles of nucleotide exchange and hydrolysis (Shaner and Morano, 2007).
Here, we present a new application of the statistical coupling analysis (SCA) to sequences of the Hsp70‐like family in which we take advantage of the functional divergence of Hsp70s and Hsp110s to reveal patterns of co‐evolution that can be associated with interdomain allostery (see Box 1). The identification of a group of co‐evolving residues that show structural contiguity between the two Hsp70 domains provides testable hypotheses about allosteric function in these molecular chaperones and introduces methods that may be applicable for more generally characterizing co‐evolution within and between protein domains.
Box 1 SCA overview
The aim of SCA is to examine the joint conservation of all pairs of amino‐acid positions in a protein family to identify sectors—groups of sequence positions that mutually co‐evolve in a protein family. As previously described (Halabi et al, 2009), the basic process is to start with a large and diverse multiple sequence alignment (MSA) of a protein family comprising M sequences by L positions, and to compute an L × L‐weighted correlation matrix (, the SCA matrix) that describes the co‐evolution of all pairs of sequence positions. When sequences are diverged to such an equal extent (i.e. homogeneously) that distinct sub‐families are not clearly evident, sector identification amounts to identifying positions that group in the top eigenmodes of the SCA correlation matrix. Here, we extend this approach to the case of ‘inhomogeneous’ sequence alignments in which functionally distinct sub‐families of sequences can be identified in the MSA. Such functional structure in alignments can facilitate sector identification because of mathematical methods that provide a direct mapping between patterns of sequence divergence and patterns of positional covariation.
Step 1: Definition of alignment and correlation matrices
In general, an MSA can be described as a three‐dimensional binary tensor Xsia (M × L × 20) whose elements are 1 if sequence s contains amino‐acid a at position i and 0 if not (A). To use the mathematical methods below, we reduce the MSA to an M × L two‐dimensional binary matrix Xsi by only including the terms in Xsia representing the most prevalent amino‐acid at each position, a process we term the ‘binary approximation’ of the MSA. We next compute a weighted, normalized alignment
where ϕi is related to the conservation of position i in the MSA and is the weighting function used in the current implementation of SCA (B, and see SOM and Materials and methods), and 〈Xsi〉s represents the average value of Xsi over all sequences. In effect, weighting by ϕi provides a measure of the significance of amino‐acid occurrences and correlation in the MSA. From the X˜ matrix representation of the MSA, two correlation matrices can then be computed: , which is a ϕi‐weighted version of a sequence correlation matrix, and , which is a ϕi‐weighted version of a positional correlation matrix (i.e. the SCA matrix).
The use of the binary approximation of the MSA is a necessary simplification for usage of the specific mathematical methods for sector analysis in this work (described below). Generalization of the methods to consider the full alignment will be a subject of future work. However, we note that for instances such as the Hsp70/110 family in which the function of interest (e.g. allostery) is a property of a major sub‐family, sector identification is robust to the binary approximation (Halabi et al, 2009).
Step 2: Mapping modes of sequence covariation and positional covariation
To relate the divergence of sub‐families of sequences to the correlated evolution of groups of positions, we use the method of singular value decomposition. In this method, the M × L binary matrix X˜ can be written as a product of three matrices: X˜=UΣVT, where U is an M × M matrix whose columns contain the eigenvectors of , the sequence correlation matrix, and V is an L × L matrix whose columns contain the eigenvectors of , the SCA positional correlation matrix. Σ is a diagonal M × L matrix of so‐called ‘singular values’ that are related to the eigenvalues of the and matrices.
The important concept is that if an eigenmode of the sequence correlation matrix (a column of U, ∣Un〉) reveals a separation of two classes of sequences, then the corresponding eigenmode of the positional correlation matrix (a column of V, ∣Vn〉) will reveal the positions that primarily contribute to this sequence divergence. However, in general, the eigenvectors of the or matrix need not represent statistically independent modes of sequence correlation or of positional correlation. For example, examination of the top three modes of the U matrix for the Hsp70/110 family (∣U1…3〉, C) reveals the existence of distinct sub‐families of sequences (in different colors), but fails to clearly separate these sub‐families along the orthogonal eigenvectors. To better represent the divergence of the sub‐families, we used a simple implementation of independent component analysis (ICA), a method specifically designed to transform the k top eigenmodes of a correlation matrix into k maximally independent components. Application of ICA to the significant top eigenmodes of the Hsp70/110 family (see SOM and Materials and methods) indeed shows the separation of the sequences in the MSA into a few major sub‐families that now largely separate along orthogonal independent components (∣U1…3S〉), D).
Application of the same ICA‐transformation computed for the U matrix to the k top eigenmodes of the matrix (∣V1…k〉) then provides a corresponding transformation to define the positions responsible for the directions of sequence variation observed in panel D. Thus, we can identify correlated groups of sequence positions that are responsible for the divergence of groups of sequences.
In the main paper, Figure 2A shows the first independent component of the matrix for the family of Hsp70/110 proteins (∣U1S〉), which reveals a clear separation of the family into two groups—one that includes the allosteric Hsp70‐like proteins (white, orange, and cyan in panels C and D, and black in Figure 2A) and one that includes the non‐allosteric Hsp110‐like proteins (purple, panels C and D, and gray in Figure 2A). The corresponding first independent component of the positional correlation matrix (∣V1S〉) reveals the positions most responsible for this sequence divergence (Figure 2B), and identifies the allosteric sector.
A sector associated with Hsp70 allostery
To identify sectors in Hsp70, we used the SCA to compute a weighted correlation matrix, , that describes the co‐evolution of every pair of amino‐acid positions in the Hsp70/110 family. The essence of sector identification is to analyze the non‐random correlations in the matrix to find collectively evolving groups of residues. One approach to do this is spectral decomposition, in which sectors are defined by the pattern of residue contribution to the top few eigenmodes of the matrix (Halabi et al, 2009). Importantly, sector identification by this method proceeds without presupposing the function of sectors; such properties are then assigned through experimental study.
Analysis of the Hsp70/110 protein family suggests a more targeted strategy for analysis of the matrix in which we take advantage of the functional divergence of allosteric mechanism between Hsp70 and Hsp110 proteins to guide sector identification. The basic idea is to simultaneously evaluate the pattern of divergence between sequences in a protein family and the pattern of co‐evolution between amino‐acid positions (Casari et al, 1995; Lichtarge et al, 1996). This can be performed in the framework of SCA using a mathematical method known as singular value decomposition (see Box 1). If the pattern of sequence divergence classifies members of a protein family according to distinctions in a functional mechanism (e.g. allostery), then we can identify the group of co‐evolving residues that correspond to this mechanism. We describe this approach here in context of the Hsp70/110 family (see Box 1, Materials and methods, and SOM for additional details). Given a weighted binarized sequence alignment (X˜) comprised of M sequences (rows) and L positions (columns), we can compute the following two correlation matrices:
where is the SCA correlation matrix between positions and is a correlation matrix between sequences. The singular value decomposition of X˜ is:
in which columns of U are eigenvectors of , columns of V are eigenvectors of , and Σ is related to the eigenvalues of these matrices. Importantly, this decomposition allows a direct mapping between each principal axis of sequence variation (a column in U) and the corresponding principal axis of positional co‐evolution (the same column in V). If functionally distinct sequences segregate along an axis of sequence variation, then the positions that underlie this divergence are defined in the corresponding axis of positional co‐evolution.
Examination of the top principal axes of sequence variation for the Hsp70/110 family (see Materials and methods) shows in fact a clear separation of the allosteric (Hsp70) and non‐allosteric (Hsp110) members into two distinct clusters (Figure 2A). This axis of separation between the two sub‐families is identified in an unbiased manner using an algorithm for independent component analysis (see Materials and methods; Supplementary information). The corresponding axis of the matrix of positional correlations reveals a protein sector comprising a small fraction of Hsp70/110 positions (∼20%, 115 sector out of 605 total residues) that underlie the separation of Hsp70 and Hsp110 family members (Figure 2B). The sector positions derive roughly equally from the nucleotide‐binding domain (in blue, 56 positions; Figure 2B) and the substrate‐binding domain (in green, 59 positions; Figure 2B) showing co‐evolution of residues in both domains to form a single unit of evolutionary selection. Consistent with the finding that allosteric coupling is a property of the Hsp70 sub‐family, positions comprising this sector are more conserved within the Hsp70 sub‐family than in the Hsp110 sub‐family (Supplementary Figure 5). Taken together, these results define an interdomain sector in the Hsp70 sub‐family that is associated with the allosteric mechanism.
Structural interpretation of the Hsp70 sector
What is the structural interpretation of this Hsp70 sector? NMR (Swain et al, 2007; Bertelsen et al, 2009) and tryptophan fluorescence (Moro et al, 2003) data in DnaK, the E. coli Hsp70, show that in the ADP‐bound state, the nucleotide‐binding and substrate‐binding domains are dissociated and largely independent. In contrast, upon ATP binding, the nucleotide‐binding domain undergoes conformational rearrangement, participates in the interdomain interface, and promotes substrate release from the substrate‐binding domain (Wilbanks et al, 1995; Moro et al, 2003; Mayer and Bukau, 2005; Swain et al, 2007).
To examine the spatial arrangement of the Hsp70 sector in the ATP‐bound state, we represented sector residues on the Sse1‐derived model for the DnaK Hsp70 (Supplementary Figure 7). Consistent with a function in allosteric coupling, residues comprising the Hsp70 sector form a physically contiguous network of atoms linking the ATP‐binding site to the substrate‐binding site, passing through the interdomain interface (Figure 2C). The physical connectivity is remarkable given that only a small fraction of overall Hsp70 residues is involved (Figure 2B). Prior work showed that sparse but connected clusters of amino acids forming sectors link distantly positioned functional sites within individual protein domains (Lockless and Ranganathan, 1999; Socolich et al, 2005; Halabi et al, 2009). This work extends this result to show that functionally coupled but non‐homologous protein domains can share a single sector that connects their respective functional sites through a protein–protein interface.
Functional studies of Hsp70 allostery
Does the Hsp70 sector represent the mechanism of allosteric coupling between the nucleotide‐binding domain and substrate‐binding domain? A number of biochemical and genetic studies on a variety of Hsp70s provide a basis for this assessment. Within the nucleotide‐binding domain, sector positions include catalytic residues E171 and D201, the mutation of which impairs ATP‐induced conformational change (Johnson and McKay, 1999), and T199, the mutation of which stabilizes ATP‐induced conformational change (Buchberger et al, 1995), among other sites, making direct contact with bound nucleotide (Figure 3A). Studies of isolated bacterial Hsp70 nucleotide‐binding domains have shown ATP‐dependent reorientation of all four sub‐domains (Zhang and Zuiderweg, 2004; Bhattacharya et al, 2009), and the sector spans all of the sub‐domain interfaces. In particular, actin and Hsp70 retain sequence conservation at nucleotide‐binding loops and adjacent crossing helices that form an interface between sub‐domains 1A and 2A (Figure 1A) (Bork et al, 1992). Actin responds to bound nucleotide through an ATP‐dependent shearing motion between sub‐domains 1A and 2A (Schuler, 2001), and this structural region is a focal point of Hsp70 sector mapping. These findings are consistent with the view that at least in part, the co‐evolution of sector positions may be related to the anisotropic physical coupling of amino acids within the protein structure. A similar empirical relationship has been noted between distributed physical interactions and the sector‐mediating specificity in the S1A serine proteases (Halabi et al, 2009).
Moreover, the crossing helices form a solvent‐accessible cleft between sub‐domains 1A and 2A in actin‐like nucleotide‐binding domains. In actin, the cleft mediates interaction with allosteric effector proteins (Dominguez, 2004), whereas in Hsp70/110, the cleft is proposed to act as an intramolecular‐binding surface for the interdomain linker (Jiang et al, 2007; Liu and Hendrickson, 2007; Swain et al, 2007). In the allosteric Hsp70 sector that our analysis describes, the cleft surface is lined with sector residues (e.g. Y145, D148, K155, E217 and V218; see Figure 3B), and mutation at these sites is reported to perturb Hsp70 allostery (Gassler et al, 1998; Vogel et al, 2006). The conserved interdomain linker sequence motif 389VLLL392 in the sector stimulates ATPase activity when present on truncated nucleotide‐binding domain constructs (Swain et al, 2007) and its mutation impairs interdomain allostery in full‐length Hsp70 (Figure 3C and D) (Laufen et al, 1999; Vogel et al, 2006). Binding of the linker to the cleft below the crossing helices is postulated to be important to the formation of the domain‐docked state, bringing the substrate‐binding domain into proximity to the nucleotide‐binding domain (Swain et al, 2007).
Sector positions within the substrate‐binding domain comprise a structurally contiguous set of atoms that extends from the substrate‐binding site through the protein core to a solvent‐exposed region that includes the interdomain linker (Figure 3C and D). The functional significance of the sector is supported by several previous observations. Sector residue K414 is centered in the domain surface patch, making multiple interdomain contacts in the docked state. Previous work showed that this residue has a critical function in allosteric signal transmission as mutation at this site blocked interdomain docking and allostery (Montgomery et al, 1999). A substrate‐binding domain sector position (I462) has been shown to have an epistatic relationship with sector positions in the nucleotide‐binding domain (Q152 and F216): mutation of I462 to Asn is lethal in the yeast Hsp70 Ssc1, but is partially suppressed by nucleotide‐binding domain mutations Q152L or F216L (Figure 3E) (Davis et al, 1999). There is also evidence that structural regions not essential for allosteric coupling are not involved in the interdomain sector; the substrate‐binding domain lid is nearly absent in sector residues, and a DnaK variant lacking the lid retains core allosteric function (Swain et al, 2006). Interestingly, several sector positions within the substrate‐binding domain experience large NMR chemical shift changes upon binding of a peptide substrate (S398, T403, G405, T428, D431, I438, F457, L459, G468) (Swain et al, 2006). In addition, mutation of sector residues far from the substrate‐binding site (S398, G400, G443, E444, L459) reduces substrate‐binding affinity (Figure 3C and D) (Burkholder et al, 1996).
The physical and functional connectivity of a single co‐evolutionary sector across domains provides strong support for the proposal that the sector mediates the allosteric coupling central to the basic biological activity of Hsp70.
Direct experimental analysis of the interdomain sector
Knowledge of the interdomain sector provides new hypotheses for further experimental testing. For example, residues D326 and N415 are sector positions that display interdomain contact, but no previous experiment has tested their involvement in interdomain allostery (Figure 3; Supplementary Figure 7). Therefore, we made conservative mutations based on amino‐acid frequencies at these positions in Hsp70 sequences and measured the effect on interdomain allostery both in vivo and in vitro. A direct test for the influence of sector mutants on organism fitness is provided by the ability of DnaK to promote E. coli growth at elevated temperature (Bukau and Walker, 1990). For example, strains of E. coli in which the chromosomal copy of DnaK is deleted grow very weakly after heat shock, but are rescued by expression of wild‐type DnaK from a plasmid (Figure 4A). In contrast, the D326V or N415G DnaK variants fail to complement the DnaK knockout strain upon heat shock, showing that these positions are critical for Hsp70 activity.
The origin of these cellular defects was investigated by purifying the mutant DnaK proteins and using biochemical tests of allosteric function in vitro. DnaK D326V and N415G are soluble, natively folded and thermally stable in the absence of ATP and substrate (Supplementary Figure 9). The fluorescence of the sole intrinsic tryptophan residue in DnaK is diagnostic for ATP‐dependent interdomain docking because it displays a characteristic blue shift and intensity quench upon interdomain interaction (Moro et al, 2003). DnaK D326V and N415G show the same tryptophan fluorescence spectrum as wild‐type DnaK in the absence of nucleotide, indicating that W102 is in its normal chemical environment in the undocked state. However, the extents of W102 fluorescence blue shifting and intensity quenching upon addition of ATP are reduced relative to wild type (Figure 4B). The same trends are observed for wild type and mutants when W102 accessibility is assessed by acrylamide quenching (Supplementary Figure 10). These findings are characteristic of a specific defect in ATP‐induced conformational change and domain docking in the point mutants. In addition, functional Hsp70 allostery entails an approximately seven‐fold stimulation of ATPase activity upon binding of peptide to the substrate‐binding domain. Relative to wild‐type DnaK, D326V and N415G show significantly elevated basal ATPase rates and only approximately three‐fold stimulation by peptide (Figure 4C). Given these data and the knowledge that ATP hydrolysis is the rate‐limiting step of the reaction cycle (McCarty et al, 1995), the likely interpretation is that the sector mutants shift the normal DnaK conformational equilibrium from the ATP‐induced, domain‐docked state to the more independent domain arrangement characteristic of the ADP state (Figure 4D).
These findings are consistent with the hypothesis that these two sector positions are important for stabilizing the interdomain interface and mediating allosteric communication between domains. More generally, these data provide further evidence that Hsp70 sector analysis has predictive value in describing an interdomain allosteric network.
In summary, we show that sequence analysis alone of the Hsp70/110 molecular chaperone family identifies a group of co‐evolving residues, a sector, that is responsible for the core function of the Hsp70 proteins—allosteric coupling between distantly positioned functional sites on two distinct protein domains. As per previous reports (Lockless and Ranganathan, 1999; Socolich et al, 2005; Halabi et al, 2009), the sector is sparse, such that only a small fraction of total amino acids in the protein are involved, and physically contiguous, so that the ATP‐binding site on the nucleotide‐binding domain is connected to the substrate‐binding site on the substrate‐binding domain through a continuous network of interacting amino acids. The identification of the sector provides a clear basis for directing new experiments toward a more complete understanding of the mechanism and evolutionary divergence of allostery in these proteins. For example, Hsp70s use diverse co‐chaperones in team‐assisted functions and many sector positions emerging as an allosteric surface for interdomain allostery in Hsp70s also have a function in J‐domain binding and J‐mediated ATPase stimulation (Gassler et al, 1998; Suh et al, 1998; Vogel et al, 2006; Jiang et al, 2007).
This work adds an important new finding with regard to the concept of protein sectors. Previous work showed that multiple quasi‐independent sectors are possible within a single protein domain, each of which contributes to a different aspect of function (Halabi et al, 2009). Here, we show that a single sector can also exist to functionally couple two different, non‐homologous protein domains. This result emphasizes the point that sectors are simply defined as units of selection, without regard to hierarchies of structural organization. An interesting possibility that follows is that sectors could physically join and co‐evolve across protein–protein interfaces in order to mediate the coupling of activities between proteins—the essence of signal transmission and allosteric regulation. Indeed, this idea has been recently used to design a synthetic two‐domain allosteric protein (Lee et al, 2008), and a similar concept was used to map regions involved in controlling the specificity of bacterial two‐component signaling systems (Skerker et al, 2008). It will be interesting to further test the notion that interaction between protein sectors is a process through which allostery between proteins might evolve.
Materials and methods
Multiple sequence alignment
Hsp70/110 sequences were obtained by combining the non‐redundant results of PSI‐BLAST (Altschul et al, 1997) searches queried with E. coli DnaK, human Hsc70, and yeast Sse1. Sequences were aligned automatically (Thompson et al, 1994) and by manual structure‐based methods (Doolittle et al, 1996). Non‐Hsp70/110 sequences were removed based on their anomalous length or sequence identity. Any sequence sharing >95% similarity to another sequence was removed to distribute sampling. The final alignment was large (926 sequences) and diverse (unconserved sites approached random amino‐acid distributions).
Statistical coupling analysis
As in previous work (Halabi et al, 2009), the alignment is binarized in an M by L matrix X with Xsi=1, if the most frequent amino‐acid at position i is present in sequence s, and Xsi=0 otherwise; M is here the number of sequences (rows of X) and L the number of positions (columns of X). The definition of the SCA matrix involves position‐specific weights ϕi that quantify the degree of conservation of each position i: , where fi represents the frequency of the prevalent amino‐acid ai at position i, and a background frequency for this amino‐acid. A weighted alignment X˜ is defined with X˜si=ϕi(X˜si−fi). The SCA matrix is the L by L matrix of correlations between positions given by =X˜TX˜/M, where X˜T denotes the transpose of X˜. Similarly, =X˜X˜T/L gives an M by M matrix of correlations between sequences. The eigenvectors of and form the columns of two orthogonal matrices, V and U, which are related through the singular value decomposition of X˜: X˜=UΣVT, where Σ is a diagonal matrix.
An independent component analysis provides a linear transformation Ws that maps the top three eigenvectors of into three maximally independent axes of sequence variations (see Supplementary information for algorithmic details). One of these three directions, the M‐dimensional vector , is found to discriminate the non‐allosteric sequences from the rest of the sequences in the alignment (Figure 2A). Applying the same linear transformation Ws to the top three eigenvectors of defines a direction of positional variations, the L‐dimensional vector , which indicates the positions underlying the discrimination. The allosteric sector is defined as the positions i making significant contribution to , that is ⩾ε, where ε=0.05 corresponds to a threshold of statistical significance (Figure 2B).
The ATP‐bound Saccharomyces cerevisiae Sse1 structure and a sequence alignment between Sse1 (Hsp110) and DnaK (Hsp70) (Liu and Hendrickson, 2007) were used to generate a homology model of DnaK(ATP) using Modeller (Sali and Blundell, 1993). Molecular dynamics simulations were carried out using the Gromacs platform and Gromos96 force field (Hess et al, 2008). The structural model was truncated at residue 531, and ATP, magnesium, and potassium ions coordinated in the active site were included. An ATP topology file provided by an earlier study (Colombo et al, 2008) was used. The system was solvated in a box with at least 12 Å spacing from protein atoms to the edge of the box. Net charge was neutralized with potassium ions and energy was minimized by steepest descents followed by short position‐restrained molecular dynamics to equilibrate water molecules at 300 K. In the production of molecular dynamics simulation, a simulated‐annealing protocol cycled through temperature gradients based on previous work on a smaller system (Lindorff‐Larsen et al, 2005): 300–400 K over 150 ps, 400–350 K over 150 ps, 350–300 K over 500 ps, and 300 K held for 100 ps. Berendsen temperature coupling, Parrinello–Rahman pressure coupling, and a periodic boundary condition were used. All trajectory analysis was performed within the Gromacs package. Atomic RMSD fluctuations were analyzed by principal component analysis, and cosine content of the first component indicated that motions within the RMSD plateau region were dominated by diffusion (Hess, 2002). To avoid over‐interpretation, structural clustering was performed on the trajectory such that the entire RMSD plateau region was defined as a single cluster (4.7–53 ns) to determine a median structure, and correlated motions within this region were not investigated further on the basis of cooperativity. PyMol was used for molecular visualization (Delano, 2002).
Plasmid pMS119 containing a wild‐type E. coli dnaK gene insertion was used as a template for site‐directed mutagenesis (Montgomery et al, 1999). Plasmids were transformed into temperature‐sensitive E. coli BB1553 cells (ΔdnaK52, sidB1) (Bukau and Walker, 1990). Single colonies were grown overnight in LB in the presence of antibiotics at 30°C, and each growth's optical density at 600 nm was normalized to 0.2 by dilution with LB media. Growths were serially diluted 10‐fold in water pre‐equilibrated at 43°C, spotted onto growth media plates at 43°C, and placed in an incubator at the same temperature for 15 h. Leaky expression of the pMS119 tac promoter was sufficient to achieve nearly optimal growth rescue by plasmid‐encoded DnaK in LB media without IPTG induction.
Purification of proteins and peptides
E. coli DnaK was prepared similarly as previously described (Montgomery et al, 1999), except that E. coli BB1553 cells were used and grown at 30°C. In a modified two‐column purification, the first anion exchange column was used with buffers at pH 7.4. In the second column, DnaK was eluted from ATP agarose with 2 mM ADP. KCl replaced NaCl in all purification buffers. Crude p5 peptide (CLLLSAPRR) was purchased from Genscript and purified by HPLC using a diphenyl column with elution at ∼30% acetronitrile and 70% water; mass spectrometry confirmed the identity of the peptide.
Steady‐state ATPase activity was measured in an enzyme‐coupled system as previously described (Montgomery et al, 1999). The ATPase activity of 1 μM DnaK plus or minus 100 μM p5 at 30°C was measured on a Biotek Gen5 platereader using Costar 3631 plates. Measurements were taken 3–5 times for each DnaK and auto‐hydrolysis sample.
DnaK W102 fluorescence and acrylamide quenching were measured similarly as previously described (Moro et al, 2003). Measurements were taken at room temperature in a Photon Technology International fluorometer at 295 nm excitation wavelength with 4 nm slit widths on both excitation and emission sides. For each sample, spectra were averaged over 10 acquisitions and normalized to an intensity of 1.0 before the addition of ATP. Faster 15 s scans showed the same spectral trends, indicating that ATP hydrolysis was not a complicating factor during the measurement.
Measurements were taken as previously described (Montgomery et al, 1999) using a Jasco J‐715 spectrophotomer. Wavelength scans were measured at 30°C using 2 μM DnaK in 10 mM potassium phosphate buffer at pH 7.6. Temperature melts were measured at 222 nm using 2 μM DnaK in 10 mM potassium phosphate buffer, 1 mM MgCl2 and 1 mM ADP, pH 7.6.
This study was supported by grants from NIH (LMG), the Robert A Welch foundation (RR), the Green Center for Systems Biology (RR), and a Simons Foundation fellowship from Rockefeller University (OR). Computational resources were funded by NSF and NIH.
Conflict of Interest
The authors declare that they have no conflict of interest.
Source data for figure 2C
This is an open‐access article distributed under the terms of the Creative Commons Attribution License, which permits distribution, and reproduction in any medium, provided the original author and source are credited. This license does not permit commercial exploitation without specific permission.
- Copyright © 2010 EMBO and Macmillan Publishers Limited