Why Do Rna Read Major Groove and Not Minor Groove on Dna

Nature. Writer manuscript; available in PMC 2010 Apr 29.

Published in final edited grade as:

PMCID: PMC2793086

NIHMSID: NIHMS143859

The role of DNA shape in poly peptide-Deoxyribonucleic acid recognition

Remo Rohs

¹Howard Hughes Medical Establish, Centre for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biophysics, Columbia University, 1130 St. Nicholas Avenue, New York, NY 10032, United states

Sean Chiliad. W

^aneHoward Hughes Medical Institute, Heart for Computational Biology and Bioinformatics, Section of Biochemistry and Molecular Biophysics, Columbia University, 1130 St. Nicholas Avenue, New York, NY 10032, The states

Alona Sosinsky

¹Howard Hughes Medical Institute, Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biophysics, Columbia University, 1130 St. Nicholas Avenue, New York, NY 10032, USA

Peng Liu

¹Howard Hughes Medical Plant, Heart for Computational Biological science and Bioinformatics, Section of Biochemistry and Molecular Biophysics, Columbia Academy, 1130 St. Nicholas Avenue, New York, NY 10032, U.s.

Richard S. Mann

²Department of Biochemistry and Molecular Biophysics, Columbia Academy, 701 West 168^th Street, HHSC 1104, New York, NY 10032, U.s.a.

Barry Honig

^oneHoward Hughes Medical Institute, Heart for Computational Biology and Bioinformatics, Section of Biochemistry and Molecular Biophysics, Columbia Academy, 1130 St. Nicholas Avenue, New York, NY 10032, U.s.

Abstruse

The recognition of specific DNA sequences by proteins is thought to depend on two types of mechanisms: 1 that involves the formation of hydrogen bonds with specific bases, primarily in the major groove, and one involving sequence-dependent deformations of the DNA helix. Past comprehensively analyzing the iii dimensional structures of protein-Dna complexes, we show that the binding of arginines to narrow small grooves is a widely used fashion for protein-Deoxyribonucleic acid recognition. This readout machinery exploits the miracle that narrow minor grooves strongly enhance the negative electrostatic potential of the Dna. The nucleosome core particle offers a striking case of this effect. Minor groove narrowing is often associated with the presence of A-tracts, AT-rich sequences that exclude the flexible TpA step. These findings advise that the ability to detect local variations in Dna shape and electrostatic potential is a full general mechanism that enables proteins to utilise information in the minor groove, which otherwise offers few opportunities for the formation of base-specific hydrogen bonds, to achieve DNA bounden specificity.

The ability of proteins to recognize specific DNA sequences is a hallmark of biological regulatory processes. The determination of the iii-dimensional structures of numerous protein-DNA complexes has provided a detailed picture of binding, revealing a structurally diverse set of protein families that exploit a wide repertoire of interactions to recognize the double-helixⁱ. Nucleotide sequence-specific interactions frequently involve the formation of hydrogen bonds betwixt amino acid side chains and hydrogen bond donors and acceptors of individual base of operations pairs. It has long been recognized that every base pair has a unique hydrogen bonding signature in the major groove but that this is not the case in the modest groove². Thus, the expectation has been that the recognition of specific DNA sequences would have place primarily in the major groove through the formation of a series of amino acid- and base-specific hydrogen bonds¹. This "direct readout" machinery is consistent with observations derived from 3-dimensional structures of poly peptide-DNA complexes simply it is far from the entire story.

In many complexes, the Deoxyribonucleic acid assumes conformations that deviate from the structure of an platonic B-grade double helix^three ^– ⁵, sometimes bending in such a way to optimize the protein-DNA interface^{half dozen} and in some cases undergoing significant conformational changes as in the opening of the modest groove in the complex formed between TBP and the TATA box⁷ ^, ⁸. The term "indirect readout" was coined^ix to describe such recognition mechanisms that depend on the propensity of a given sequence to presume a conformation that facilitates its bounden to a particular poly peptide. The bases involved in such mechanisms need not exist in contact with the protein and, for example, can exist establish in linker sequences that connect two half-sites that themselves are bound by private poly peptide subunits¹⁰ ^, ¹¹.

We recently described an example of a novel readout machinery, the recognition of local sequence-dependent minor groove shape¹², that is distinct from previously described indirect readout mechanisms. In this case, the sequence-dependence of minor groove width and respective variations in electrostatic potential are used by the Hox protein Sexual practice combs reduced (Scr) to distinguish small differences in nucleotide sequence¹². Here we report that this mechanism is a widely used manner of protein-Deoxyribonucleic acid recognition that involves the creation of specific bounden sites for positively charged amino-acids, primarily arginine, within the minor groove. Minor groove narrowing is found to be correlated with A-tracts¹³ ^, ¹⁴, usually defined every bit stretches of four or more than Equally or Ts that do not contain the flexible TpA pace¹⁵, but extended here to include as few as three base pairs (see beneath). Our results offering fundamentally new insights into the structural and energetic origins of protein-Dna binding specificity and thus take of import implications for the prediction of transcription cistron binding sites in genomes.

Arginine is enriched in narrow minor grooves

Figure 1a reports the percentage of small groove contacts associated with each amino acid, classified according to the width of the pocket-sized groove. Arginine constitutes 28% of all amino acid residues that contact the minor groove and is significantly enriched in narrow modest grooves, divers hither past a groove width of <5.0 Å (compared to v.8 Å in ideal B-DNA). Remarkably, 60% of the residues in narrow pocket-size grooves are arginines as compared to 22% in small-scale grooves that are defined as not narrow – i.eastward. width ≥v.0 Å. A smaller enrichment is also observed for lysines only the overall population of lysines within the pocket-size groove is much less than for arginine.

An external file that holds a picture, illustration, etc. Object name is nihms-143859-f0001.jpg

Amino acid frequencies in minor grooves

(a) Histograms for each amino acid illustrate the frequency with which they are observed in any minor groove (dark-green), in minor grooves with a width of ≥v.0 Å (blue), and in narrow minor grooves of <5.0 Å width (scarlet). (b) Frequency of AT (Due west) and GC (S) base pairs in sequences of 229 sites contacted by arginines in narrow minor grooves. The central base pair (boxed) is contacted by arginine. Frequencies are symmetrized past using both complementary strands.

Binding to the minor groove is a characteristic of many, but not all, protein superfamilies and a significant subset of these contact a narrow minor groove (Table 1). Moreover, if the minor groove is contacted, arginines are likely to be involved, and the likelihood that an arginine will be present becomes even greater for narrow small-scale grooves (Supplementary Table ane).

Table 1

Protein superfamilies with minor groove contacts.

SRF-like

IHF-like Dna-bounden proteins

Histone-fold

Deoxyribonucleic acid breaking-rejoining enzymes

Zn2/Cys6 DNA-binding domain

Homeodomain-similar

p53-like transcription factors

lambda repressor-similar Dna-binding domains

Winged helix Dna-bounden domain

Leucine zipper domain

C-final effector domain of the bipartite response regulators

Brake endonuclease-similar

Glucocorticoid receptor-like (Dna-bounden domain)

DNA repair protein MutS, domain I

Origin of replication-binding domain, RBD-like

DNA/RNA polymerases

Eukaryotic Deoxyribonucleic acid topoisomerase I, Northward-terminal Deoxyribonucleic acid-binding fragment

Ribonuclease H-similar

TATA-box binding protein-like

Listed are SCOP superfamilies⁴⁷ that have an arginine-modest groove contact inside a distance of <half-dozen.0 Å from the base of operations. Superfamilies that use arginine to contact a narrow modest groove (<5.0 Å) have grey shading; those that utilise arginine to contact a not-narrow minor groove (≥5.0 Å) are unshaded. Only superfamilies with a minimum of ten protein chains in PDB structures spring to DNA at to the lowest degree 1 helical plow long are included. The percentages of chains with modest groove contacts vary considerably among SCOP superfamilies and are provided in Supplementary Table 1.

Figure 1b compiles the Dna sequence preferences for protein-Deoxyribonucleic acid complexes in which an arginine contacts a narrow minor groove. The effigy shows that the base of operations pair that has the shortest contact distance with the arginine guanidinium grouping has a probability of 78% of beingness an AT and 22% of being a GC. Neighboring base of operations pairs in both the v' and 3' directions surrounding the closest contacting base pair also have a strong tendency to be AT. Taken together, these data demonstrate that arginines tend to demark narrow minor grooves in AT-rich Deoxyribonucleic acid.

AT-rich sequences tend to narrow minor grooves

Nosotros calculated small-scale groove widths for all tetranucleotides contained in PDB structures for both complimentary Deoxyribonucleic acid (Effigy 2a) and DNA in complexes with proteins (Effigy 2b). At that place is a big spread of values due in part to end effects and to the effects of crystal packing merely some trends are withal axiomatic. For case, for free Dna structures near of the tetranucleotides with narrow minor grooves (width <5.0 Å) are AT-rich (Figure 2a and Supplementary Table 2a). Similar behavior is observed in protein- DNA complexes (Figure 2b and Supplementary Table 2b). In dissimilarity, tetranucleotides with wide pocket-sized grooves have a strong tendency to be GC-rich.

An external file that holds a picture, illustration, etc. Object name is nihms-143859-f0002.jpg

Distribution of tetranucleotide sequences according to average minor groove width

Tetranucleotides from structures with a minimum length of one helical turn for which minor groove width tin can be defined are ordered by average pocket-size groove width (red). The widths of all tetranucleotides are shown (black) and the sequence, boilerplate width, and occurrence in our dataset are given in Supplementary Tabular array 2. (a) The 59 unique tetranucleotides from complimentary DNA structures. (b) The set of all 136 unique tetranucleotides derived from protein-Deoxyribonucleic acid complexes.

The correlation between AT content and groove width is not unexpected given the fact that A-tracts are known to produce narrow minor grooves. All the same, TpA steps take a tendency to widen the minor groove^fifteen, and then it was of interest to decide whether the distinct properties of A-tracts and TpA steps are reflected in our tetranucleotide data set. We find that 67% of tetranucleotides composed only of AT base pairs accept a narrow modest groove but that this number increases to 82% if we exclude TpA steps so as to consider only A-tracts. Even A-tracts of length three accept a strong tendency to narrow the minor groove. Forty three percent of the tetranucleotides with a pocket-size groove width of <5.0 Å take an A-tract of length three, a percentage that decreases to eleven% of tetranucleotides with canonical minor groove widths (between 5.0 and 7.0 Å) and to 4% of tetranucleotides with small grooves wider than 7.0 Å (Supplementary Figure one). Additionally, compared to other AT-rich sequences, A-tracts are specifically enriched in DNAs with narrow minor grooves (Supplementary Figure 1). Thus, although A-tracts are usually thought of as requiring four or more base of operations pairs, in part because a minimum of four is required to rigidify the DNA^fourteen, this analysis shows that A-tracts as short every bit length three are positively correlated with narrow pocket-size grooves.

Arginines recognize enhanced electrostatic potentials

Figure 3 and Supplementary Figure two plot minor groove width and electrostatic potential vs. bounden site sequence for several complexes whose binding interface includes an arginine inserted into the minor groove. The correlation of width and potential besides as the tendency of arginines to be located close to minima in width and potential is evident. Below nosotros highlight a few specific examples of how arginine-minor groove interactions are used in DNA recognition.

An external file that holds a picture, illustration, etc. Object name is nihms-143859-f0003.jpg

Specific examples of pocket-sized groove shape recognition by arginines

Dna shapes of the binding sites of (a) Ubx-Exd 1b8i¹⁶, (b) MATa1/MATα2 1akh¹⁷, and (c) Oct-i/PORE 1hf0¹⁸, (d) the MogR repressor 3fdq^xix, (e) the Tc3 transposase 1u78²², and (f) the phage 434 repressor 2or1²³ are shown in GRASP surface representations³¹ ^, ⁵⁰ with convex surfaces color-coded in green and concave surfaces in grey/black. Plots of minor groove width (blue) and electrostatic potential in the eye of the minor groove (crimson) are shown below. Arginine contacts (defined by the closest distance betwixt the guanidinium groups and the bases) are indicated. A-tract sequences are highlighted by a solid red line, the TATA box in (e) by a dashed line.

Figure 3a represents the ternary complex of the Hox poly peptide Ultrabithorax (Ubx) and its cofactor Extradenticle (Exd) bound to DNA¹⁶. In this complex, Arg5 of Ubx, which is a conserved residuum across all homeodomains, inserts into a narrow region formed by a four base of operations pair A-tract. Figures 3b provides an case of a long and very narrow A-tract that binds α2-Arg7 from the MATa1/MATα2 complex with DNA¹⁷. In dissimilarity, α2-Arg4 inserts into a shallower region at one end of the A-tract where in that location are local minima in width and potential that are smaller than at the Arg7 site in the eye of the A-tract. The two POU-domains of the Oct-i/PORE circuitous bind to two A-tracts (Figure 3c) where the minima are positioned in such a fashion to provide binding sites for 4 arginines, two from each POU domain^eighteen.

The location of these A-tracts with respect to other nucleotide sequence features tin can exist used to generate specificity, as previously discussed for the Hox poly peptide Scr¹². In the case of Scr binding, the position of a TpA pace within an AT-rich region plays a disquisitional role in binding specificity. A similar strategy is used by the MogR transcription cistron where two long A-tracts separated by a TpA step produce two arginine bounden sites¹⁹ (Effigy 3d). The unique shape recognized by these 2 arginines is likely to contribute to the position of the MogR bounden site along the DNA sequence. The overall tendency of TpA steps to widen the pocket-sized groove is nearly apparent when they are positioned between two A-tracts (as in Scr¹² and MogR¹⁹) where the TpA step acts equally a `hinge' between more rigid elements¹⁵ ^, ^twenty. In other contexts, due to their flexibility, TpA steps can also be accommodated in narrow small-scale grooves²¹. An example is provided by the bipartite DNA-binding domain of Tc3 transposase where the arginines bind to a narrow region containing a TATA box²² that displays enhanced negative electrostatic potential (Figure 3e).

Although less frequent, arginines also bind narrow grooves associated with non-A- tract sequences. Effigy 3f summarizes features of the binding of the 434 repressor to its operator²³ which contains seven base pairs that are all AT with the exception of a central CG. (The guanine amino group tends to widen narrow grooves simply a single GC base pair tin be accommodated with merely trivial disruption.)

Arginine-small groove interactions in the nucleosome

Figure 4a plots minor groove width and electrostatic potential forth the Deoxyribonucleic acid sequence of the nucleosome cadre particle containing recombinant histones and a 147 base of operations pair DNA fragment (PDB code 1kx5)²⁴. There are 14 minima in minor groove width respective to regions where the Deoxyribonucleic acid bends so equally to wrap around the histone cadre. Every bit in a higher place, there is a striking correlation between width and potential. The variation in width between the narrowest and widest regions is about 5 Å and the difference between the maxima and minima in electrostatic potential is nigh 6 kT/east (Figure 4a). As a event, in that location should be a potent driving force for basic amino acids to bind to narrow regions and indeed arginines are establish in nine of the fourteen minima. These arginines are shown in Figure 4b where the nucleosomal DNA has been color coded past minor groove width. (Although all 14 narrow small groove regions are contacted by arginines²⁴ only nine of these satisfy our criteria of <6.0 Å between arginine atoms and base atoms in the groove). A like repeating pattern of narrow minor grooves that are contacted by arginines is seen in all 35 available nucleosome crystal structures (Supplementary Figure 3a,b).

An external file that holds a picture, illustration, etc. Object name is nihms-143859-f0004.jpg

Modest groove shape recognition in the nucleosome

(a) Correlation of minor groove width of the nucleosome core particle (PDB code 1kx5)²⁴ (blue) and electrostatic potential (cherry-red). Arginine contacts (divers by the closest altitude between the guanidinium groups and the bases) are indicated. A-tract sequences are highlighted past solid red lines. (b) Schematic representation of the Deoxyribonucleic acid backbone in the nucleosome color-coded by minor groove width (red ≤4.0 Å, pinkish >4.0 Å and ≤5.0 Å, light blueish >v.0 Å and ≤vi.0 Å, dark blue >half dozen.0 Å), including all arginines that contact the minor groove. (c) The distribution of A-tracts of length 3 base pairs or longer in 23,076 yeast nucleosome-bound Deoxyribonucleic acid sequences²⁹. (d) Histogram of the occurrence of A-tracts of length three or longer in the same dataset²⁹.

Considering short A-tracts narrow the modest groove and facilitate the bending of DNA, we would expect to meet a periodicity of A-tracts in DNA sequences bound by nucleosomes in vivo. Previous analyses have focused on dinucleotide statistics²⁵ ^, ²⁶ although it has been known for some time that there is a periodic pattern of AAA and AAT trinucleotides in nucleosome core Dna²⁷ ^, ²⁸. An assay of DNA sequences spring in vivo by yeast nucleosomes²⁹ reveals a articulate periodicity for A-tracts of at least length three (denoted 3+, Figure 4c). Moreover, nucleosomal DNAs contain, on average, 10.0 A-tracts of length 3+ (Figure 4d). Periodicity is likewise detected for A-tracts of length four+ and even v+, although the number per nucleosome decreases to 4.1 and one.vi, respectively (Supplementary Effigy 3). Thus, even though long A-tracts tend to exist excluded from the nucleosome^xxx, A-tracts of ≤ five base pairs, when present, are used to facilitate bending of the Dna around the histone cadre.

To evaluate the outcome of TpA steps, we compared the periodicities of A-tracts of length three to those of other trinucleotides equanimous just of AT base pairs. Trinucleotides that contain TpA steps showroom a much weaker periodic signal than A-tracts of length three, which exclude the TpA step (Supplementary Figure 4). Together, this analysis suggests that many of the sequence periodicities in nucleosomal Dna reflect the presence of short A-tracts that pb to narrow regions in the minor groove that in plough are recognized past a complementary set of arginines nowadays on the surface of the nucleosome core particle.

Effects of groove width on electrostatic potential

The remarkable correlation between minor groove width and electrostatic potential (Figures 3 and 4) is due primarily to the properties of the Poisson-Boltzmann (PB) equation that have been extensively discussed in the literature³¹. Biological macromolecules are less polarizable than the aqueous solvent and, in the language of classical physics, tin can be thought of as a depression dielectric region embedded in a high dielectric solvent. Solutions of the Pb equation for DNA showed that lines of electrostatic potential due to backbone phosphates follow the shape of the Deoxyribonucleic acid and are the almost negative inside the grooves³². This upshot is due to electrostatic focusing, beginning observed for the protein superoxide dismutase³¹, where the narrow agile site focuses electric field lines away from the protein and into the high dielectric solvent. The same concrete miracle produces enhanced potentials in grooves, accounting for the stiff correlation described higher up.

In order to constitute the source of the consequence in quantitative terms, we calculated the potentials for the MogR binding site¹⁹ when the dielectric abiding is set to 80 both within the macromolecule and in the solvent (Figure v, dashed line) and for the case where the 2 dielectric constants are unlike (Figure 5, solid line). Strikingly, a significant enhancement of electrostatic potentials is simply observed when the dielectric constant of the macromolecule and solvent are different, reflecting the focusing of electric field lines described qualitatively above. The small outcome seen when the dielectric constant is the same results from the phosphates being closer to the center of the groove when it is narrow (see Supplementary Effigy five for a breakdown of the contributions to the net electrostatic potential). Both sets of calculations were carried out at physiological salt concentrations. Although ionic force has every bit stiff effect on the absolute values of the potentials, the effect remains qualitatively the same (Supplementary Figure six).

An external file that holds a picture, illustration, etc. Object name is nihms-143859-f0005.jpg

The biophysical origins of the negative potential of narrow small-scale grooves

Electrostatic potential in the pocket-size groove of the MogR binding site (PDB code 3fdq)¹⁹, calculated in the presence of a dielectric purlieus (ε=ii in solute and ε=lxxx in solvent – solid line) and in the absenteeism of a boundary (ε= 80 in both solute and solvent – dashed line).

Why are arginines preferred over lysines?

Information technology is somewhat surprising that there is such a significant population of arginines in the minor groove, and a big enrichment when the groove is narrow, whereas the effects for lysines are more modest (Figure 1a). Arginines have been known for some time to exist enriched relative to lysines in protein-protein³³ and protein-Dna interfaces³⁴ and the difference has generally been attributed to the ability of the guanidinium group to engage in more hydrogen bonds than the amino group of lysine³⁵. To evaluate this idea we adamant the number of hydrogen bonds formed by all the arginines and lysines in our data set that penetrate the minor groove. Surprisingly, on average, less than one hydrogen bail is formed by either amino acid side chain to Dna (0.9 for arginine and 0.6 for lysine), and the standard deviations are such that this difference is insignificant (Supplementary Table 3).

An alternate explanation derives from the difference in the size of the cationic moieties of the two residues. According to the classical Built-in model the solvation gratis energies of ions are proportional to the inverse of their radii³¹, suggesting that it is energetically less costly to remove a charged guanidinium group from water than it is to remove the smaller amino group of a lysine. To test this quantitatively, we calculated the change in gratis energy in transferring arginine and lysine from water to a medium of dielectric abiding ii (see Methods for details). The difference in the transfer free energies betwixt the two residues ranges from ii.3 to half-dozen.7 kcal/mole, depending on the force field that was used, with lysine consistently having the higher value (Supplementary Table four). These results suggest that the college prevalence of arginines compared to lysines in pocket-sized grooves is due, at least in office, to the greater energetic cost of removing a charged lysine from water than to remove a charged arginine.

Final remarks

We take shown that in that location is a dramatic enrichment of arginines in narrow regions of the Deoxyribonucleic acid pocket-size groove that provides the basis for a novel DNA recognition mechanism that is used by many families of DNA-binding proteins. A readout mechanism based on groove width requires a connectedness betwixt sequence and shape. This connection appears to be provided in part by A-tracts, which accept a stiff tendency to narrow the groove, producing bounden sites for arginines that, when spaced accordingly on the protein surface, offer a complementary set up of positive charges that can recognize local variations in shape. Arginines often insert into the minor groove as part of brusk sequence motifs (eastward.g. RQR in the Hox protein Scr¹², RKKR in POU homeodomains^xviii, RPR in Engrailed³⁶, RGHR in MATa1/MATα2¹⁷, RRGR in the nuclear orphan receptor³⁷ and RGGR in the homo orphan receptor³⁸), thus offering a diverseness of presentation modes that can contribute to the specificity of Dna shape recognition.

The tendency of A-tracts to narrow the minor groove is due primarily to their power to assume conformations, through propeller twisting, that atomic number 82 to the formation of inter- base pair hydrogen bonds in the major groove¹⁵. This network is disrupted by TpA steps as strikingly seen in the MogR bounden site¹⁹. GC base of operations pairs besides take a trend to widen the minor groove^xiv. The combination of these and other factors, such equally effects induced by flanking bases that are non directly located within the bounden site³⁹, can produce a complex small-scale groove landscape that offers numerous possibilities for specific interactions with proteins. Indeed, minor groove geometry is no doubt the result of the interplay of intrinsic and protein-induced structural furnishings.

The physical mechanisms described here are dramatically axiomatic in the nucleosome. The energetic cost of narrowing and angle the Dna in regions where the backbone faces inward volition be reduced by the presence of curt A-tracts that take an intrinsic propensity to assume such conformations and hence to bend the DNA²⁸. In addition, the penetration of arginines into the minor groove at sites where the DNA bends and the groove is narrow²¹ ^, ⁴⁰ provides a significant stabilizing interaction

The variations in Dna shape observed in poly peptide-DNA complexes often reflect conformational preferences of gratis DNA^iv ^, ¹⁰ ^, ⁴¹. Sequence-dependent conformational preferences have also been observed in computational studies¹¹ ^, ²¹ ^, ⁴² and, near recently, analysis of hydroxyl radical cleavage patterns shows that DNA shape is under evolutionary selection⁴³. Such observations suggest that the role of Dna shape must exist taken into consideration when annotating entire genomes and predicting transcription cistron binding sites. The biophysical insights described here, together with the increased availability of high-throughput bounden information, offering the hope of major progress in understanding how proteins recognize specific Deoxyribonucleic acid sequences and in the development of improved predictive algorithms.

Methods Summary

Minor groove geometry was analyzed with Curves⁴⁴ for all one,031 crystal structures of protein-DNA complexes in the PDB that have any amino acid contacting base of operations atoms. Protein side chains contact the minor groove in 69% of those structures that accept at least one helical plough of DNA. The probabilities for each amino acid to contact the minor groove were calculated for three groups of DNAs: full, narrow, and not narrow. Proteins were grouped based on 40% sequence identity. The backdrop of free DNAs and DNAs spring to proteins were analyzed based on the minor groove widths of tetranucleotides, defined at the central base pair pace.

All 35 crystal structures of the nucleosome available in the PDB were analyzed. The analysis of nucleosomal DNA is based on 23,076 sequences in an in vivo yeast dataset²⁹. The point for a sequence motif in nucleosomal Deoxyribonucleic acid is positive for a base pair when the base pair comprises any part of the sequence motif. Frequencies were symmetrized by analyzing both complementary Deoxyribonucleic acid strands.

Electrostatic potentials were obtained from solutions to the non-linear Poisson-Boltzman equation at physiologic ionic strength using the DelPhi program³¹ ^, ⁴⁵. Regions inside the molecular surface of the Deoxyribonucleic acid were assigned a dielectric constant of 2 while the solvent was assigned a value of 80. The potential is reported at a reference point at the heart of the minor groove. The reference point is located close to the bottom of the groove in approximately the plane of a base pair. This definition provides a measure of electrostatic potential equally a function of base of operations sequence. Solvation costless energies of amino acids were calculated for extended conformations of arginine and lysine side chains and compared for four dissimilar force fields.

Methods

Calculation of minor groove width

In that location were in full ane,031 crystal structures of protein-Deoxyribonucleic acid complexes in the PDB equally of June 1, 2008 in which the DNA was contacted by any amino acid side concatenation at a distance <6.0 Å from base of operations atoms. Of these structures, 567 independent at least ane helical turn, and no chemical modifications or deformations that prevent the calculation of minor groove width. Groove geometry was analyzed using Curves⁴⁴ and minor groove width was calculated every bit a part of base sequence by averaging all the Curves levels given for each nucleotide.

Statistical assay of protein-Deoxyribonucleic acid contacts

Of the 567 protein-DNA structures in our dataset, 392 have at least one minor groove contact defined by a distance of <six.0 Å between any base and side concatenation atoms. To avoid an oversampling bias, proteins in this dataset that shared ≥40% sequence identity were grouped to create 109 groups. The average number of contacts within each group was subsequently averaged over all 109 groups. These averages were divided by the sum of the average number of contacts for all amino acids to summate the total pocket-size groove contacts, contacts in non narrow modest grooves (≥v.0 Å), and contacts in narrow pocket-sized grooves (<5.0 Å), for each amino acid.

Hydrogen bail contacts between amino acid side chains and the DNA bases and phosphates, water molecules, and other poly peptide atoms were identified with the HBplus program⁴⁶.

Structural annotation of Deoxyribonucleic acid-bounden proteins

The proteins in our dataset of poly peptide-Dna complexes were classified in SCOP⁴⁷ superfamilies. Proteins for which SCOP annotations were not available were annotated manually or using the ASTRAL database⁴⁸.

Correlation of tetranucleotide structure and sequence

Tetranucleotides in free Dna and protein-DNA complexes were used to analyze the base sequence propensity of minor groove regions as a role of minor groove width. The minor groove width of a tetranucleotide was divers by the average of all Curves⁴⁴ levels for groove width of the second nucleotide and the first level of the third nucleotide, which describes groove width at the central base pair footstep. Cease regions and irregular tetranucleotides were excluded by requiring groove width definitions for at to the lowest degree ane Curves level of each of the four nucleotides. Tetranucleotides from nucleosomal Deoxyribonucleic acid were excluded from this analysis because the Dna is strongly plain-featured and the spacing between narrow regions is fixed at well-nigh 1 helical turn, thus adding a bias to the results. When applied to the 521 protein-DNA complexes in our dataset, these criteria allowed the assay of all 136 possible unique tetranucleotides. When applied to the 88 free DNA structures in our dataset, the same criteria resulted in the assay of 59 unique tetranucleotides. In guild to increase coverage for the free DNA dataset, NMR structures were included if dipolar coupling data were used in the refinement.

Propensity of sequence motifs in nucleosomes

The structural analysis of nucleosomes includes all 35 crystal structures in the PDB as of May 1, 2009. The sequence analysis was based on 23,076 nucleosome sequences of length 146–148 base pairs in a yeast in vivo dataset²⁹. These nucleosome sites were scanned for sequence motifs such equally A-tracts of different length, TpA steps, or other AT- rich regions. A given motif contributed to a positive signal for whatever base pair that overlapped that motif, thus longer motifs contributed signals to more nucleotide positions. The frequencies of all motifs were symmetrized past analyzing both complementary strands.

Calculations of electrostatic potentials

Electrostatic potentials were obtained from solutions to the non-linear Poisson- Boltzman equation at 0.145 M salt using the DelPhi program³¹ ^, ⁴⁵. Partial charges and atomic radii were taken from the Amber force field⁴⁹. The interior of the molecular surface of the solute molecule (calculated with a 1.iv Å probe sphere) was assigned a dielectric constant of ε=2 while the exterior aqueous phase was assigned a value of ε=80. Debye-Hückel boundary conditions and five focusing steps were used with a cubic grid size of 165 (a grid size of 185 was used for the nucleosome).

The electrostatic potential is reported at a reference betoken close to the bottom of the pocket-size groove approximately in the plane of base of operations pair i. The reference point i is divers as the geometric midpoint between the O4' atoms of nucleotide i+1 in the 5'-3' strand and nucleotide i−1 in the 3'−5' strand¹². Where the Deoxyribonucleic acid strongly bends into the major groove the reference point can clash with the guanine amino group and crusade large positive potentials (equally seen in Figure 4a for three regions of the nucleosome).

Desolvation free energies were calculated with the DelPhi programme³¹ ^, ⁴⁵ for the transfer of arginine and lysine side chains in extended conformations from water to a medium of dielectric constant ε=two. Transfer gratuitous energies were calculated for each of the two side chains based on charge distributions and atomic radii taken from Amber⁴⁹ and 3 other force fields (see Supplementary Table iii).

Supplementary Material

1

Acknowledgments

This work was supported by NIH grants GM54510 (R.Due south.M.) and U54 CA121852 (B.H. and R.S.M.). The authors thank Andrea Califano for many helpful conversations.

Footnotes

References

1. Garvie CW, Wolberger C. Recognition of specific Deoxyribonucleic acid sequences. Mol Cell. 2001;8(five):937–946. [PubMed] [Google Scholar]

2. Seeman NC, Rosenberg JM, Rich A. Sequence-specific recognition of double helical nucleic acids past proteins. Proc Natl Acad Sci U S A. 1976;73(3):804–808. [PMC costless article] [PubMed] [Google Scholar]

3. Travers AA. DNA conformation and protein binding. Annu Rev Biochem. 1989;58:427–452. [PubMed] [Google Scholar]

4. Shakked Z, et al. Determinants of repressor/operator recognition from the construction of the trp operator binding site. Nature. 1994;368(6470):469–473. [PubMed] [Google Scholar]

5. Lu XJ, Shakked Z, Olson WK. A-grade conformational motifs in ligand-bound DNA structures. J Mol Biol. 2000;300(4):819–840. [PubMed] [Google Scholar]

half-dozen. Hegde RS, Grossman SR, Laimins LA, Sigler Pb. Crystal structure at 1.7 A of the bovine papillomavirus-1 E2 Dna-bounden domain bound to its Deoxyribonucleic acid target. Nature. 1992;359(6395):505–512. [PubMed] [Google Scholar]

seven. Kim Y, Geiger JH, Hahn S, Sigler PB. Crystal structure of a yeast TBP/TATA-box complex. Nature. 1993;365(6446):512–520. [PubMed] [Google Scholar]

8. Kim JL, Nikolov DB, Burley SK. Co-crystal structure of TBP recognizing the small-scale groove of a TATA chemical element. Nature. 1993;365(6446):520–527. [PubMed] [Google Scholar]

9. Otwinowski Z, et al. Crystal structure of trp repressor/operator circuitous at atomic resolution. Nature. 1988;335(6188):321–329. [PubMed] [Google Scholar]

10. Hizver J, Rozenberg H, Frolow F, Rabinovich D, Shakked Z. DNA bending by an adenine--thymine tract and its function in gene regulation. Proc Natl Acad Sci U S A. 2001;98(15):8490–8495. [PMC costless article] [PubMed] [Google Scholar]

xi. Rohs R, Sklenar H, Shakked Z. Structural and energetic origins of sequence-specific DNA angle: Monte Carlo simulations of papillomavirus E2-DNA binding sites. Structure. 2005;13(10):1499–1509. [PubMed] [Google Scholar]

12. Joshi R, et al. Functional specificity of a Hox protein mediated by the recognition of pocket-sized groove construction. Cell. 2007;131(3):530–543. [PMC costless article] [PubMed] [Google Scholar]

thirteen. Burkhoff AM, Tullius TD. Structural details of an adenine tract that does not cause Dna to curve. Nature. 1988;331(6155):455–457. [PubMed] [Google Scholar]

14. Haran TE, Mohanty U. The unique structure of A-tracts and intrinsic DNA bending. Q Rev Biophys. 2009;42(1):41–81. [PubMed] [Google Scholar]

xv. Crothers DM, Shakked Z. Deoxyribonucleic acid bending past adenine-thymine tracts. In: Neidle S, editor. Oxford Handbook of Nucleic Acrid Structures. Oxford University Press; London: 1999. pp. 455–470. [Google Scholar]

xvi. Passner JM, Ryoo HD, Shen L, Mann RS, Aggarwal AK. Structure of a Deoxyribonucleic acid-bound Ultrabithorax-Extradenticle homeodomain complex. Nature. 1999;397(6721):714–719. [PubMed] [Google Scholar]

17. Li T, Jin Y, Vershon AK, Wolberger C. Crystal construction of the MATa1/MATalpha2 homeodomain heterodimer in complex with DNA containing an A-tract. Nucleic Acids Res. 1998;26(24):5707–5718. [PMC free article] [PubMed] [Google Scholar]

18. Remenyi A, et al. Differential dimer activities of the transcription factor October-ane by DNA-induced interface swapping. Mol Cell. 2001;8(iii):569–580. [PubMed] [Google Scholar]

19. Shen A, Higgins DE, Panne D. Recognition of AT-Rich DNA Binding Sites by the MogR Repressor. Construction. 2009;17:769–777. [PMC free article] [PubMed] [Google Scholar]

20. Stefl R, Wu H, Ravindranathan S, Sklenar V, Feigon J. Deoxyribonucleic acid A-tract bending in three dimensions: solving the dA4T4 vs. dT4A4 conundrum. Proc Natl Acad Sci U Southward A. 2004;101(5):1177–1182. [PMC free article] [PubMed] [Google Scholar]

21. Tolstorukov MY, Colasanti AV, McCandlish DM, Olson WK, Zhurkin VB. A novel roll-and-slide mechanism of Dna folding in chromatin: implications for nucleosome positioning. J Mol Biol. 2007;371(iii):725–738. [PMC free commodity] [PubMed] [Google Scholar]

22. Watkins S, van Pouderoyen G, Sixma TK. Structural assay of the bipartite Dna-binding domain of Tc3 transposase jump to transposon Dna. Nucleic Acids Res. 2004;32(14):4306–4312. [PMC free article] [PubMed] [Google Scholar]

23. Aggarwal AK, Rodgers DW, Drottar 1000, Ptashne G, Harrison SC. Recognition of a DNA operator by the repressor of phage 434: a view at loftier resolution. Science. 1988;242(4880):899–907. [PubMed] [Google Scholar]

24. Davey CA, Sargent DF, Luger G, Maeder AW, Richmond TJ. Solvent mediated interactions in the structure of the nucleosome core particle at 1.9 a resolution. J Mol Biol. 2002;319(five):1097–1113. [PubMed] [Google Scholar]

25. Trifonov EN, Sussman JL. The pitch of chromatin DNA is reflected in its nucleotide sequence. Proc Natl Acad Sci U S A. 1980;77(7):3816–3820. [PMC free article] [PubMed] [Google Scholar]

27. Satchwell SC, Drew Hour, Travers AA. Sequence periodicities in chicken nucleosome core Dna. J Mol Biol. 1986;191(4):659–675. [PubMed] [Google Scholar]

28. Travers AA, Klug A. Bending of DNA in nucleoprotein complexes. In: Cozzarelli NR, Wang JC, editors. DNA Topology and its Biological Effects. Cold Spring Harbor Press; Cold Jump Harbor, NY: 1990. pp. 57–106. [Google Scholar]

29. Field Y, et al. Distinct modes of regulation by chromatin encoded through nucleosome positioning signals. PLoS Comput Biol. 2008;4(11):e1000216. [PMC complimentary article] [PubMed] [Google Scholar]

30. Segal E, Widom J. Poly(dA:dT) tracts: major determinants of nucleosome organization. Curr Opin Struct Biol. 2009;19(1):65–71. [PMC free article] [PubMed] [Google Scholar]

31. Honig B, Nicholls A. Classical electrostatics in biology and chemical science. Science. 1995;268(5214):1144–1149. [PubMed] [Google Scholar]

32. Jayaram B, Precipitous KA, Honig B. The electrostatic potential of B-Dna. Biopolymers. 1989;28(5):975–993. [PubMed] [Google Scholar]

33. Tsai CJ, Lin SL, Wolfson HJ, Nussinov R. Studies of poly peptide-protein interfaces: a statistical analysis of the hydrophobic effect. Protein Sci. 1997;6(1):53–64. [PMC gratuitous article] [PubMed] [Google Scholar]

34. Nadassy Thou, Wodak SJ, Janin J. Structural features of protein-nucleic acid recognition sites. Biochemistry. 1999;38(7):1999–2017. [PubMed] [Google Scholar]

35. Luscombe NM, Laskowski RA, Thornton JM. Amino acid-base interactions: a three-dimensional analysis of protein-DNA interactions at an atomic level. Nucleic Acids Res. 2001;29(xiii):2860–2874. [PMC gratis commodity] [PubMed] [Google Scholar]

36. Kissinger CR, Liu BS, Martin-Blanco Due east, Kornberg TB, Pabo CO. Crystal construction of an engrailed homeodomain-DNA complex at ii.8 A resolution: a framework for agreement homeodomain-DNA interactions. Cell. 1990;63(3):579–590. [PubMed] [Google Scholar]

37. Meinke One thousand, Sigler Atomic number 82. DNA-binding mechanism of the monomeric orphan nuclear receptor NGFI-B. Nat Struct Biol. 1999;vi(five):471–477. [PubMed] [Google Scholar]

38. Gearhart Dr., Holmbeck SM, Evans RM, Dyson HJ, Wright PE. Monomeric complex of homo orphan estrogen related receptor-2 with Dna: a pseudo-dimer interface mediates extended half-site recognition. J Mol Biol. 2003;327(iv):819–832. [PubMed] [Google Scholar]

39. Rohs R, West SM, Liu P, Honig B. Dash in the double-helix and its office in protein-DNA recognition. Curr Opin Struct Biol. 2009;19(2):171–177. [PMC free article] [PubMed] [Google Scholar]

40. Richmond TJ, Davey CA. The structure of Deoxyribonucleic acid in the nucleosome cadre. Nature. 2003;423(6936):145–150. [PubMed] [Google Scholar]

41. Locasale JW, Napoli AA, Chen S, Berman HM, Lawson CL. Signatures of protein-Dna recognition in gratis Dna binding sites. J Mol Biol. 2009;386(4):1054–1065. [PMC gratis article] [PubMed] [Google Scholar]

42. Tolstorukov MY, Virnik KM, Adhya Due south, Zhurkin VB. A-tract clusters may facilitate Deoxyribonucleic acid packaging in bacterial nucleoid. Nucleic Acids Res. 2005;33(12):3907–3918. [PMC free article] [PubMed] [Google Scholar]

43. Parker SC, Hansen L, Abaan HO, Tullius TD, Margulies EH. Local Dna topography correlates with functional noncoding regions of the human genome. Science. 2009;324(5925):389–392. [PMC gratuitous article] [PubMed] [Google Scholar]

44. Lavery R, Sklenar H. Defining the structure of irregular nucleic acids: conventions and principles. J Biomol Struct Dyn. 1989;half-dozen(4):655–667. [PubMed] [Google Scholar]

45. Rocchia W, et al. Rapid filigree-based structure of the molecular surface and the use of induced surface charge to calculate reaction field energies: applications to the molecular systems and geometric objects. J Comput Chem. 2002;23(1):128–137. [PubMed] [Google Scholar]

46. McDonald IK, Thornton JM. Satisfying hydrogen bonding potential in proteins. J Mol Biol. 1994;238(v):777–793. [PubMed] [Google Scholar]

47. Murzin AG, Brenner SE, Hubbard T, Chothia C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol. 1995;247(4):536–540. [PubMed] [Google Scholar]

48. Brenner SE, Koehl P, Levitt M. The ASTRAL compendium for protein structure and sequence analysis. Nucleic Acids Res. 2000;28(1):254–256. [PMC free article] [PubMed] [Google Scholar]

49. Cornell WD, et al. A 2nd Generation Force-Field for the Simulation of Proteins, Nucleic-Acids, and Organic-Molecules. J Am Chem Soc. 1995;117(19):5179–5197. [Google Scholar]

50. Petrey D, Honig B. GRASP2: visualization, surface properties, and electrostatics of macromolecular structures and sequences. Methods Enzymol. 2003;374:492–509. [PubMed] [Google Scholar]

Why Do Rna Read Major Groove and Not Minor Groove on Dna

Source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2793086/

Fenner Unted1946