Contents:
However, as will be discussed in this Review, RNA structures are not static. A single RNA molecule with a well-defined sequence often has multiple accessible 2D and 3D structures that lie within a narrow range of folding free energies and are sampled as a consequence of thermal fluctuations or interactions with proteins and other cofactors that induce or capture specific RNA conformations. Schematic representation of the time scales that are relevant for RNA structural dynamics. In the upper part, sample structural changes are depicted. In the lower part, the typical time scales nowadays accessible to the techniques discussed in this Review are shown, including quantum mechanical calculations, atomistic explicit-solvent MD simulations, and coarse-grained models section 3.
Hardware and software improvements led to an order of magnitude gain every few years in the past. Dedicated hardware such as Anton 73 allows for significantly longer time scales to be accessed in atomistic MD, and massively parallel approaches like the folding home infrastructure 74 , 75 allow one to gather large cumulated simulation times although typically composed of a large number of short trajectories.
However, complex conformational changes still remain out of reach of MD simulations. Enhanced sampling techniques section 3. The bottom arrow shows typical free-energy barriers involved in these processes. Complex and biologically relevant molecular machines such as the ribosome utilize an endless spectrum of dynamical processes, extending from movements of single nucleotides up to large-scale movements of their whole subunits on a wide range of time scales section 4.
Despite the tremendous importance of RNA structural dynamics, its experimental characterization is even more challenging than obtaining static structural data. Thus, there have been intense efforts to complement the available experimental techniques by advanced MD simulation approaches. As we will discuss, the approximations required in MD sometimes do not allow experimental data to be quantitatively reproduced or predicted. Even in such cases, simulations can still be extremely useful in guiding chemical and physical intuition to design new experiments. Carefully designed MD simulations may often prevent incorrect interpretations of experimental data.
MD simulations may be complemented by quantum mechanical QM calculations, which can be used to assess the likelihood of specific chemical reaction pathways involving RNA enzymes and to thereby inspire new experimental tests or help interpret existing experimental data on RNA catalysis section 4. Nowadays, QM calculations are often interfaced with molecular mechanical MM treatments of distal layers of the RNA that are not directly involved in forming or breaking covalent bonds, to provide context and to capture the impact of conformational dynamics on the reaction probed by QM.
QM methods are also indispensable for the parameterization of the MM force fields used in MD simulations section 3. Moreover, larger scale dynamics and conformational changes not accessible via conventional atomistic MD simulations can be studied using various coarse-grained methods, all to be summarized here. Negative gradient of the MM potential energy defines a force for every possible Cartesian configuration of the system. To avoid confusion, we recall that molecular mechanics MM here indicates that the forces are computed using some empirical force field, in opposition to quantum mechanics QM , whereas molecular dynamics MD indicates that time-dependence of the atomic positions is simulated.
When assessing a force field, one must consider its basic approximation separately from its specific parameterization. The basic approximation determines the principal physical limits of a given force-field form. The basic physical limits of MM are very substantial, but their consequences for practical simulations are unknowable a priori and so must be determined by performing simulations of diverse systems.
Within these basic limits, one tries to tune the force field via specific parameterizations to obtain the best possible performance for RNA molecules. As one approaches the basic limits of the force-field form, improvements in individual components will increasingly conflict with one another such that improving one aspect of the force field may have adverse effects on other aspects that outweigh the gains. The applicability of a given force-field version may vary between simulated biomolecules, and even between different parts of a single simulated molecule. A textbook example can be found in G-quadruplexes, whose single-stranded loops are described less accurately than the G-stems.
Unfortunately, many MD publications show a notable reluctance to report force-field failures, and to discuss limitations. At present, the vast majority of RNA simulations are performed using nonpolarizable force fields whose form is simple and has remained essentially unchanged for several decades see also ref A notable example is the form used in the force fields of the AMBER simulation package, 79 which are based on the work of Cornell et al.
While it would be possible to add higher-order terms to better represent coupled dynamics such as bend—stretch coupling and nonharmonic terms, such a refinement appears unnecessary for biological molecules. Adding higher-order terms would also complicate the parameterization.
A consequence of the harmonic covalent bond description is, however, that covalent bond breaking and formation i. Coulombic term is represented by point charges q i in eq 1. The charges are localized at the atomic centers and have fixed values; that is, they do not change upon molecule conformational changes and do not respond to external electric fields, including those stemming from solvation.
This is the simplest meaningful form for an atomistic empirical potential. Notably, it lacks an explicit term for the polarization energy. Polarizable force fields i. In addition, despite tremendous efforts, the accuracy of polarizable DNA force fields does not currently surpass that of the best pair-additive alternatives, suggesting that tuning polarizable force fields will be a tedious task whose difficulty is comparable to that of tuning fast QM methods for biomolecular computations. Still, we consider the development of polarizable force fields to be of utmost importance because, in our opinion, pair-additive force fields are reaching the limits of their capabilities and are starting to restrict our ability to fully exploit continuing advances in computer power and sampling methods when studying nucleic acids.
The force-field description in nonpolarizable force fields is unphysical and, by definition, neglects many effects such as all types of polarization and charge-transfer effects by definition. A large part of the neglected contributions is compensated for by tuning of dihedral force-field terms. Bond and angle parameters can be derived from equilibrium distances and angles observed in X-ray diffraction data, while microwave and IR spectroscopies can be used to obtain stretching and bending force constants.
The bonded terms can also be derived via high-level QM reference data. There are two commonly used approaches for parameterizing the electrostatic term: The most arbitrary task in force-field parameterization is the fitting of the dihedral potentials. Because the parameters of dihedral potentials are fitted after all of the other parameters, any significant change in the other terms would require a subsequent retuning of the existing dihedral potentials. Small changes in the anti and high- anti regions i. Copyright American Chemical Society.
Typical example of a formation of a ladder-like structure in ff99 or bsc0 RNA simulations. The force-field artifact is characterized by A loss of helical twist accompanied by B collective shift of the glycosidic torsions of all nucleotides from the anti to high- anti region see section 3. Presently, dihedral potentials consist of one-dimensional potential energy profiles along individual torsions, so there is no explicit coupling between torsions. Although several parameterization procedures that will be discussed later see section 3. However, in real molecules, the neighboring torsions are nontrivially coupled through electron density redistributions upon rotations, which would imply nonadditivity of dihedral parameters.
Studies on the importance of these nonadditivities are rare. However, final CMAP corrections for proteins were obtained by empirically adjusting the map such that the MD simulations reproduce experimental data. Although the force-field terms look intuitive, they are in fact unphysical, even though they are motivated by physics. Thus, they cannot be derived from the first-principles of quantum mechanics, and they are not measurable by any even hypothetical experiments.
Partial atomic charges are entirely arbitrary quantities that do not exist in nature. It is therefore pointless to discuss the values of individual atomic charges, although such discussions do occasionally appear in the literature. However, the complete set of atomic charges of a given molecular fragment can be tuned to reproduce the ESP around the fragment or some other real physical property of interest. ESP is an unambiguously defined physical property of the molecule that in principle is measurable. Use of ESP fitting of atomic charges is probably one of the key reasons for the success of the AMBER nucleic acid force fields based on the seminal parameterization of Cornell et al.
In addition, the backbone is a polyanion, which further complicates its description. A given fixed set of backbone point charges cannot simultaneously describe the ESPs for different nucleic acid backbone conformers backbone families. It thus appears that a balanced description of diverse nucleic acid backbone families perhaps cannot even be obtained simply by tuning dihedral potentials. Similarly, despite seeming well-defined at first glance, atomic radii are purely arbitrary parameters. The force-field radius of an atom is an empirical parameter that is selected as the best compromise with which to describe molecular interactions in different chemical situations.
For example, a hydrogen atom attached to an H-bond donor looks very small hidden inside the H-bond donor and having zero radius when involved in an H-bond but large when pointing toward a nonpolar chemical group. This unphysicality is why the point charges and atomic radii of a given atom in different force fields can differ significantly, and there is no way of determining a priori which of them is more realistic.
Dihedral potentials may be considered the perhaps most unphysical part of a force field. These formally intramolecular terms are used to compensate for major flaws in the description of the intermolecular nonbonded terms. Limitations of the force-field description can be demonstrated by comparing MM and QM calculations, because the latter can be considered to reflect properties of real molecular fragments. MM and QM i. For example, base pair formation at the MM level is a result of complementarity of the unperturbed ESPs, balanced against the vdW Waals term. Real base pairing is a rich phenomenon involving mutual electronic structure redistributions adaptations of the interacting bases and communication between their molecular orbitals.
Although force fields do allow for some elongation of the X—H bonds, its origin is different from that in the QM description of real H-bonds. In the MM description, X—H elongations arise only from the electrostatic interactions in the point charge approximation, which are counterbalanced by the harmonic springs used to model the X-H bonds.
Further, the monomers in base pairs come into such close proximity that divergence from the MM point charge model may become a significant problem. The currently used force-field approximations neglect all of these effects. Importantly, the statement that force fields are unphysical says nothing about their ability, after suitable calibration, to mimic certain physical properties of the studied systems.
The only thing that matters in this respect is the final performance of the complete force field. However, the lack of a physical basis means that a given type of a force field has certain principal accuracy limits beyond which further tuning becomes impractical. This explains why improvements in force fields have so slowly and intermittently arisen over the last two decades, contrasting the steady and very substantial improvements in QM methodologies. Simplicity of force fields affects the flexibility of simulated RNA molecules.
Their harmonic terms and pair-additivity produce potential energy surfaces with considerably less flexibility than those derived from the wave functions of the QM description. Thus, while we cannot currently directly compare MM and QM simulations of nucleic acids, we would expect MD simulations to underestimate the richness of the local dynamics within a given conformational basin when compared to real QM molecules.
The previously discussed deformation of monomers upon H-bonding illustrates this point. A hallmark of reported folding simulations on tetranucleotides and tetraloops see sections 4. This may indicate that the force field is not only unable to find the correct native basins, but also predicts too many simultaneously populated structures; it appears that current force fields tend not to adequately separate the global minimum from the rest of the folding landscape for many systems, although more research will be needed to confirm this observation.
In general, a more complex force field is more difficult to balance. It therefore remains to be seen whether the much-needed polarizable force fields will yield the desired improvements in simulation quality; their complexity may simply increase the occurrence of conflicting imbalances on the potential energy surfaces without adequately improving the physicality of the description.
This risk is underscored by the notoriously poor performance of fast low-cost QM methods for biomolecular fragments. Despite their weaknesses, we suggest that the atomistic force-field simulations will remain the only viable computational technique for studying the atomistic structural dynamics of biomolecules for the foreseeable future.
It is not even clear that more sophisticated polarizable force fields will surpass the performance of pair-additive force fields. On the other hand, coarse-grained methods see section 3. On the other hand, despite the impressive development of QM methods in recent decades, good quality QM methods remain prohibitively slow.
The sampling problem in large-scale QM calculations is in fact reminiscent of problems with early MM calculations before the arrival of MD. In addition, testing of available fast low-cost QM methods has shown that they are in many respects less accurate than well-calibrated force fields for studying nucleic acid fragments.
These issues underscore the point that the different available computational methods each have their own advantages, limitations, and ranges of applicability, and thus complement each other. It is possible that the conceptual underpinnings of fixed-charge models make them incapable of correctly describing different RNA backbone families.
King Thrushbeard (with panel zoom) - Classics Illustrated Junior The Golden Bird - Classics Illustrated Junior # I Love Books, Books To Read of Tom Sawyer HRN Gilberton Comic Book Classic Comics/Classics Illustrated 50 N . The Little Mermaid (with panel zoom) - Classics Illustrated Junior by Hans . The Golden Bird - Classics Illustrated Junior # I Love Books, Books To Read.
This point is supported by several QM benchmark studies. When considering all of the available data, it is becoming evident that force fields have difficulties with properly balancing hydration against the diverse interactions that are important in RNA molecules such as stacking, base pairing, and many other types of H-bonds such as base—phosphate interactions. As a result, some interactions appear to be understabilized while others are overstabilized. Moreover, the degree of over- or understabilization may be nonuniform and context-dependent. A concomitant effect of the overall lack of balance is sensitivity of nucleic acid simulations to the chosen water model.
It is difficult to imagine that their refinement could fully compensate for the intrinsic inaccuracies of the solute biomolecular force field. For example, while the OPC water model was shown to somewhat improve the simulations of RNA tetranucleotides and free-energy computations of stacking in nicked B-DNA, the same water model appears to worsen the structural stability of short G-quadruplex stems. In view of all of the approximations, it was suggested that general force-field refinements could be complemented by structure-specific force-field modifications targeting selected RNA molecules see the hydrogen-bond fix potential in section 3.
Most widely used nucleic acid force fields are based on the seminal Cornell et al. AMBER parameterization, 80 which is commonly abbreviated as ff The success of this force field stems from its parameterization of the electrostatic term by fitting the charges to reproduce the electrostatic potential around nucleic acid building blocks section 3. The prime purpose of these modifications was to eliminate the under-twisting of B-DNA observed with ff94, but the changes were only moderately successful in this respect.
Nevertheless, many subsequent reparameterizations took ff99 as their starting point rather than ff Long simulations have shown that the ff94—ff99 force fields do not provide acceptably stable DNA and RNA trajectories. While most older studies were unaffected by this problem because of their short time scales, some papers have inevitably presented results based on corrupted trajectories that were either not noticed or not reported by the authors.
Increasing awareness of these problems prompted two key refinements of the AMBER dihedral potentials. Simulations of preQ 1 riboswitch aptamer starting from an X-ray structure upon removal of the ligand, showing the most severe artifacts that may occur in RNA MD simulations. Reprinted with permission from ref It was also later shown to eliminate ladder formation.
Alternative RNA force-field dihedral potential modifications have been proposed. First, Gil-Ley et al. Second, Cesari et al. In particular, the force field was trained to reproduce NMR data for nucleosides and dinucleoside monophosphates. The number of parameters used in the fit was significantly smaller than the number of available experimental data points to avoid overfitting.
The corrections were then validated on tetranucleotides, resulting also in this case in an improved agreement with NMR experimental data. Notably, both the training and the validation were performed using as reference solution NMR data, which is expected to describe structural dynamics of flexible motifs better than an individual crystallographic structure.
Possible problems in these torsions might be related to the occurrence of intercalated structures in simulations of tetranucleotides see section 4. Their modifications were tested on selected tetraloops, tetranucleotides, and RNA duplexes, with promising results, albeit with only limited sampling of the tetranucleotide simulations. A particularly interesting aspect of all of these works is that all dihedral potentials were fitted simultaneously, at variance with the usual procedure where one dihedral angle at a time is modified.
None of these new modifications has yet been tested extensively, and so all of these force fields should be considered experimental and only used with this caveat in mind. This divergence necessitated the development of separate Cornell et al. We therefore do not recommend its use in RNA simulations. This force field is also not recommended for use with RNA.
Reparameterizations of the remaining dihedrals achieved some additional improvements for DNA, and OL15 probably represents the upper limit of what can be achieved by reparameterizing uncoupled dihedrals for DNA. Despite these improvements, the force field is far from flawless. The fact that state-of-the-art force fields for RNA and DNA require different dihedral parameters is a further confirmation that these parameters are nonphysical and only used to compensate errors arising from other missing interactions.
Further tuning of the force field would require modification of the nonbonded terms and consideration of better solvent models. As a first attempt in this direction, Chen and Garcia modified the balance between stacking, H-bonding, and solvation. This involved i rescaling the vdW parameters of the nucleobases, and ii adjustment of vdW combination rules for base-water interactions nonbonded fix, NBfix.
The modified force field offered partial improvements in simulations of RNA tetraloops, albeit of a lesser magnitude , than was originally suggested, and some side effects have been later reported see section 4. Another work considering alternative vdW parameters was published by Bergonzo et al. The simulations showed a slightly better description of tetranucleotides than reported by Bergonzo et al.
However, the latter modification seems to have only marginal effects on the simulations, which become apparent after detailed scrutiny of the published data. However, none of these modifications were tested on a broad range of RNA structures, and their general applicability remains to be determined. In summary, most attempts to modify the vdW terms have required simultaneous modification of the vdW combination rules. It is therefore not clear whether RNA simulations can be improved by tuning the vdW term alone without using additional tricks such as NBfix, which effectively increase the number of parameters that can be tuned.
In addition, none of the reported attempts seems to have yielded a real breakthrough in the quality of RNA simulations. In addition to attempts to improve the general parameterization of the RNA force field, simulations of specific systems can be improved by structure-specific force-field modifications. The HBfix method not to be confused with the NBfix-type modifications discussed above adds a local spherical auxiliary potential supporting native hydrogen bonds. HBfix only indirectly promotes forward folding corresponding to k on in experiments but directly increases the lifetime of folded structures i.
Note that when utilizing structure-specific potentials, one is typically limited by the functions that are implemented in the simulation codes; in practice, HBfix is constructed as a linear combination of two standard restraints. For this reason, HBfix has been so far implemented in a way that does not account for the directionality of the H-bonds.
For instance, the typical number of hydrogen bonds stabilized in an 8-mer including a GAGA tetraloop in the HBfix approach would be 9. Obviously, the use of structure-specific force-field adjustments may seem unsatisfactory. However, pragmatic use of such biases is legitimate, and, due to the persistent performance problems of the general force fields, their use may become increasingly common, or even inevitable. We suggest that the best approach is to first try to achieve the best possible performance with the general force field.
Once one has then reached the point at which further tuning is unproductive, the native state s can be supported with gentle structure-specific biases rather than continuing with cumbersome force-field refinements that may cause many undesired side effects. In addition, in principle, the HBfix type of potential could be generalized in an interaction-specific manner.
It is not clear whether these contrasting reports reflect the use of somewhat different force-field implementations including a possible difference between periodic boundary and solvent sphere computations; see section 4. The robustness of this refinement is not yet fully clear; in our opinion, it has reduced but not eliminated the tendency toward fraying. As we noted in several other places of this review e.
However, there is always a potential risk that such structural changes may not be fully realistic. We reiterate that the CHARMM force-field developers are currently leading the efforts to derive polarizable force fields for nucleic acids. A very important issue closely related to the development of new force fields is their validation. In general, as we discussed, classical force fields contain nonphysical terms that might predict correct relative stabilities of multiple conformers as a consequence of error cancellation. This makes it very difficult to validate the force field from an ab initio perspective, by using for example QM benchmarks.
When possible, this comparison should be made in situations where the MD trajectory is ergodic see section 3. This analysis is computationally demanding when testing several force-field variants due to the fact that a full simulation has to be performed every time a force-field term is changed. An efficient alternative comes from reweighting the already available simulations to take into account changes in the force field.
This procedure is analogous to the free-energy perturbation method see section 3. Reweighting has been used, for instance, in ref to predict the effect of small perturbations applied on the dihedral angles on tetranucleotides and tetraloops, and in refs and to test full dihedral reparameterizations on tetranucleotides.
Unfortunately, this procedure, known as exponential averaging, is only effective when the fluctuations of the difference between the two potential energy functions are small. Effective sample size will be large only when the ensembles generated by the two force fields are significantly overlapping.
If some structure that is stabilized by the force field U 1 is never visited by the force field U 0 , its effect on the ensemble averages cannot be estimated without running a new simulation using the force field U 1. In addition, reweighting might be highly inefficient when charges are perturbed, because, due its long-range nature, electrostatic energy can be heavily affected by very small changes in the charges. MD simulations allow the equations of motion to be solved and the evolution of the system to be followed in real time.
This is achieved using a model empirical potential, the force field, which mimics the real interatomic forces acting on the simulated molecular system. Aside from the approximations inherent in the force field, which are discussed in sections 3. The time scales over which the conformational transformations of RNA occur are very heterogeneous. Slower processes such as ligand-induced riboswitch folding , occur on scales of seconds to minutes or beyond. Such comprehensive simulations are clearly beyond the reach of current computers.
From a theoretical point of view, MD simulations can be seen as Markov chains see section 3. When used to compute ensemble averages such as populations of individual substates, MD simulations will suffer, as any method based on Markov chains, from a statistical error due to the finite length of the simulation. The former effect can be decreased by making a simulation longer, and the latter by discarding the initial equilibration part see, e. However, whenever an MD simulation remains stuck in a given conformation, one should try to understand whether this specific conformation corresponds to the global minimum of the free energy of the system or it is just a kinetically trapped local minimum.
Rigorously speaking, the only way to answer this question is to run a simulation capable to explore the whole conformational space. In practice, one might try simulations starting from different conformations and see if results are independent of the starting point. Using state-of-the-art hardware and software, the only RNA systems for which a fully converged exploration of the conformational space can be achieved with plain molecular dynamics are probably nucleosides or dinucleotides.
However, using highly optimized hardware 73 or with the enhanced sampling techniques discussed in this section, tetranucleotides or even tetraloops see sections 4. Probably, neither plain MD nor enhanced sampling methods can completely sample the conformational space of larger systems. In these cases, one might only be able to sample different conformations that are in the vicinity of the initial structure.
Still, in some cases, it is possible to obtain relative populations of relevant substates that can be compared to experiments. In other words, whereas full convergence might be impossible to reach, one might be in the situation where multiple transitions between the locally available substates are seen and local exploration is virtually converged.
In addition, series of smartly designed simulations initiated in different parts of the conformational space and characterizing properties of different types of conformations present on the free-energy landscape may provide unique insights complementing the available experimental data even without simulating large-scale transitions.
In general, RNA tends to have multiple metastable states, and its folding landscape is very rugged. In other words, the metastable states may persist over diverse time scales and may include different backbone conformations, diverse patterns of directly bound ions, differences in base-pair geometries, or even entirely different folds. The definition of metastability depends on the observed time scale and the capability of experiments to detect the ruggedness of the folding landscape.
When reconformations are faster than the temporal resolution of the experiments, or in a bulk experiment where a macroscopic number of copies of the same molecule is present in a buffer, an experiment would probe some averaged ensemble property. MD is a fundamental method for studying the ruggedness of the RNA conformational landscape. In principle, MD simulations are not limited by experimentally detectable properties and temporal resolutions, which is important because many dynamic processes not resolvable by experiments may be critically important for the biochemical and biological functions of RNAs.
To tackle this issue, several groups have worked over the last few decades to develop methods that allow properties that emerge over long time scales to be investigated using relatively short simulations see, e. Scheme representing some of the methods discussed in this section. A In Markov state models, extensive simulations usually sets of simulations are analyzed, and the observed states are clustered.
A kinetic matrix is then constructed that provides the probability of observing transitions between pairs of clusters section 3. B In replica exchange simulations, numerous replicas of the system are simulated in parallel using different parameters e. From time to time, exchanges are attempted with a Monte Carlo procedure. Sampling in the reference unmodified replica is enhanced by the method section 3. C In metadynamics, a bias potential is added to compensate the underlying free-energy barriers along a preselected collective variable CV.
If the chosen CV is capable of discriminating the transition state, the transition probability is enhanced section 3. It is important to mention that enhanced sampling techniques would be almost wholly unnecessary if they could be replaced by straightforward MD simulations with sufficiently long time scales. Even more striking has been the effort made by the D. Shaw group, which has developed a dedicated machine for MD 73 that allows access to the millisecond time scale when simulating small proteins and DNA. Despite this remarkable progress, biologically relevant time scales are currently out of reach and will likely remain so for decades.
A wide array of methods have been developed over the years for obtaining information about biomolecular systems that emerges over long time scales from short simulations. In general, they have been tested more extensively on proteins than nucleic acids, and fewer studies still have focused on RNA.
These methods are also difficult to classify because many of them combine elements of previous methods that were developed on the basis of diverse principles. The conceptually simplest way to obtain long time scale information is to combine multiple short trajectories section 3. Techniques of this class, such as Markov state models MSM , mostly provide recipes for initializing the simulations in a way that maximizes the sampling of important events e.
MSM methods can also be exceptionally useful for analyzing MD simulation results from huge amounts of simulation data, and visualizing them in a humanly comprehensible way.
The WT ensemble approach thus can be directly combined with T-REMD in a hybrid approach where multiple replicas are still employed but the number of replicas is significantly reduced, thanks to the increased fluctuations of the potential energy. This procedure is known as a two-step transformation. With both catalytic and genetic functions, ribonucleic acid RNA is perhaps the most pluripotent chemical species in molecular biology, and its functions are intimately linked to its structure and dynamics. Adding higher-order terms would also complicate the parameterization. It is therefore pointless to discuss the values of individual atomic charges, although such discussions do occasionally appear in the literature.
Indeed, meaningfully analyzing raw MD trajectories is often insurmountably complex due to the vast amount of data they contain. Alternatively, one could use enhanced sampling methods to accelerate events. The methods of this kind that have been most widely used for exploring the conformational space of RNA molecules are based on the principle of annealing and replica-exchange section 3.
In this approach, a set of simulations replicas at different temperatures are performed in parallel, and exchanges among the replicas allow free-energy barriers to be crossed by coupling the cold replicas with the more ergodic hot ones. Other approaches are based on the principle of importance sampling, where an artificially modified ensemble is explored section 3. In general, importance sampling techniques aim to derive properties of a particular ensemble probability distribution of structures of interest from samples generated from a different biased distribution.
In these methods, a biasing force or a bias potential is used to accelerate sampling in a reduced low-dimensional space consisting of slow degrees of freedom or collective variables CVs. The CVs should reflect the essence of the studied processes, resembling reaction coordinates in simple chemical reactions. These methods thus assume that the studied process can be described with sufficient realism using just a few degrees of freedom, which can be seen as a coarse graining of the full coordinate space, that is, a low-dimensional projection.
The remaining dynamics that is orthogonal to the space defined by the CVs is assumed to be unimportant to the studied process. CVs can be very complex functions of the Cartesian coordinates of the system. Enhanced sampling methods aim to flatten the Boltzmann distribution in the CV space by imposing a bias that allows sampling of the whole CV space.
The effect of this external potential is reweighted a posteriori using procedures similar to that discussed in section 3. Enhanced sampling methods are based on the idea of guiding the system in some way through unrealistically fast trajectories so as to observe the desired events. This idea can be taken to its logical extreme by considering a case in which the path consists of a transformation in which the chemical identity of the simulated molecules is changed, as it is done in alchemical methods section 3.
These alchemical transitions may involve the use of chemically unrealistic intermediate states in which, for example, atoms present in the initial system are simply removed because the free energy is a state function and the free-energy difference between two states is path-independent. Finally, it is possible to remove solvent degrees of freedom to accelerate sampling section 3. Whereas this procedure introduces some unavoidable approximations that might be critical in RNA systems, it is particularly attractive as it allows free energies to be computed from single conformations, bypassing the need to simulate reactive trajectories.
The following sections discuss these methods and their strengths and limitations in more detail, with a particular focus on those that have been applied to RNA systems. In recent years, the availability of massively parallel computing resources has increased exponentially. Modern supercomputing clusters make it possible to run hundreds of parallel simulations. However, advanced analysis techniques must be used to combine the data generated by multiple, out-of-equilibrium, short simulations, and extract relevant information from them. The framework of Markov state models MSMs is perfectly suited for this task.
Each microstate consists of a number of structures that can be considered sufficiently similar to be indistinguishable equivalent in kinetic terms. Note that the T matrix is often used in its transposed form, and then the meaning of indexes i and j is interchanged. The resulting discrete MSM aims to approximate the continuous dynamics of the simulated system by a discrete process. Using a MSM, one can evolve either the individual stochastic trajectories time series of microstates, i.
The MSM approximation can be extremely accurate; that is, it can provide the same time-development picture as the MD simulations. The interpretation of the eigenvalues and eigenvectors of the transition matrix of an MSM is the following: MSM can be fruitfully applied to help extracting human-interpretable information from a single long MD trajectory, which repeatedly and spontaneously samples the rare event under investigation. If the simulation samples the process sufficiently well, the MSM can be used to create a coarse-grained model of the process, providing the much-needed insights into the otherwise overwhelming amount of raw simulation data.
Alternatively, MSMs can be used to merge together separate MD trajectories, by discretizing the ensemble spanned by all trajectories, and counting the transitions occurring in any of the trajectories. This enables one to rigorously combine the information coming from multiple trajectories in a single quantitative model. The first step in the construction of a MSM is the discretization of the phase space into microstates. This can be done employing different clustering methods as well as different metrics.
This allows the subsequent clustering to be done in a lower dimensionality, in the space of the leading TICA components, by projecting the simulation trajectories onto the largest TICA components. TICA can be viewed as a method analogous to principal component analysis PCA , which has been conventionally used to process MD trajectories in the past. PCA identifies linear combinations of the input degrees of freedom with the highest variance, while TICA finds those with the highest autocorrelation times, that is, corresponding to the slowest processes occurring in the simulations.
TICA can be performed starting from a description based on the Cartesian coordinates of all of the solute atoms, or using a description defined on some internal coordinates, as, for instance, dihedral angles or pairwise distances between relevant atoms. As with other dimensionality reduction methods, one must always be aware that some important pieces of information might be discarded, distorting the description of fast time-scale processes. Because of the remarkable results it has achieved, TICA has been recommended as a standard tool for coordinate transformation and dimensionality reduction of MD trajectories data.
Discretization of the phase space into a finite number of microstates is the main source of systematic error in a MSM. This step breaks the Markovianity of the system, that is, the assumption that the transition probabilities only depend on the current state of the system.
Thus, modeling the system as a Markov chain causes deviations from the true dynamics. It has been shown that these deviations can be reduced in two ways: In practice, when dealing with real finite-length simulations, both factors affect the quality of the computation. The lag time depends intrinsically on the Markovianity of the system and the desired temporal resolution.
Too short of a lag time will make the model non-Markovian. As a rule of thumb, the interconversions among structures within each individual microstates must be fast as compared to the lag time. The number of microstates should be large enough to avoid losing resolution due to coarse graining of the phase space but small enough for there to be a reasonable number of transitions between them i. For simulations of medium-sized biomolecules with contemporary methods, one typically uses MSMs with at least 10 2 —10 4 microstates.
Several methods exist to overcome this problem by exploiting the kinetic information provided by an MSM to construct an even coarser representation of the system, lumping the MSM microstates into a few metastable macrostates. A commonly used approach is Perron-cluster cluster analysis PCCA , a method that exploits the sign structure of the eigenvectors of the transition matrix to define the optimal metastable partition of the MSM microstates.
These macrostates are not directly observable but are measured by looking at the microstate, which at every step is extracted from a distribution probability that depends on the hidden macrostate. Thus, one assumes that an additional hidden variable can be used to label the states, and its time series is inferred by the time series of the observed variables. The HMM defines states without neat boundaries, and a given conformation has probabilities to be simultaneously participating in multiple macrostates.
A key strength of the MSM approach is the observation that it is not necessary to assume global equilibration in the ensemble of trajectories provided that the MD is in local equilibrium within each microstate. This is what makes MSMs powerful tools for accessing long-time-scale kinetics.
In fact, by choosing smart initialization points for the simulations, one can obtain an ensemble of relatively short trajectories, each of them sampling transitions relevant to different steps of a complex and slow configurational change. By combining these trajectories in a MSM, it is, in principle, possible to reconstruct even processes that occur on a time scale longer than the span of any of the individual trajectories. The largest implied time scale can be of the same order of magnitude of the aggregate duration of all MD trajectories used to build the MSM.
There are various ways of selecting the starting points for MD simulations to be used in the construction of a MSM. If available, prior knowledge about the system can be used to initialize simulations in different positions along interesting conformational changes for example, if experimental structures of multiple conformations are available. If this step is repeated recursively on each new simulation, it will produce a cascade of MD trajectories, sampling increasingly larger regions of the available phase space. By changing the criteria for identifying candidate starting points for new trajectories, it is possible to drive the system along the path of the conformational change of interest.
Another powerful approach is to extract the initial structures from an ensemble of configurations obtained with some different enhanced sampling technique. As with all methods designed to reduce the computational cost of MD simulations, a MSM may provide wrong results when used in a way that is inconsistent with its basic approximations and assumptions. It is therefore important to test the validity of the Markovian approximation before drawing any conclusions from a MSM. It is useful to point out that the common practice of showing the time scales in logarithmic scale may give a false impression of convergence due to the negative convexity of the logarithm function.
It is important to note that the convergence of the implied time scales is a necessary but not sufficient condition for Markovianity. When convergence is reached, the slowest implied time scale should correspond to the slowest transition mode in the studied system. Its comparison with previous knowledge of the system can provide some hint on whether the full free-energy landscape has been sufficiently sampled or not.
A too short implied time scale as compared to experiment could indicate that important parts of the free-energy landscape are entirely missing in the simulation data set, and the MSM characterizes only a local segment of the folding landscape. This could happen when simulations are too short or not initialized to cover sufficiently the relevant portions of the phase space. For example, in studies of the folding landscape, series of simulations may be initiated seeded from some unfolding pathway, obtained by forced unfolding or high-temperature simulation initiated from the folded state.
This may work well for molecules with fast folding via a funnel mechanism. Note, however, that as in all of the algorithms based on statistical sampling, there is no way to infer information about conformations that were never sampled in the simulated trajectory. Thus, these tests are not a panacea. As mentioned in the previous paragraph, in case of kinetic partitioning, even sophisticated convergence tests would not reveal the lack of convergence because the free-energy basins corresponding to the misfolded states are entirely inaccessible to the simulations.
Thus, the investigators must always perform a rational overall appraisal of the studied process and not rely merely on the numbers provided by the computational procedures.
Another important source of uncertainty in the predictions of a MSM is the statistical error due to finite sampling. The results of these studies with all of their limitations are discussed in section 4. The results of these studies are discussed in section 4. MSMs were used to characterize kinetic properties of very short RNA oligonucleotides, dinucleotides composed of combinations of adenine and cytidine as well as tri- and tetranucleotides composed of adenines.
The coordinates used were the dihedral angles of the studied systems and the G-vectors introduced in ref , which take into account the formation and direction of stacking interactions. Thus, agreement with experiments was only obtained after manually removing the intercalated structures from the trajectory. MSM modeling has also been used to analyze conformational substates of conformationally restricted single-stranded three-nucleotide loops of DNA quadruplexes; 76 such approaches should be readily applicable also to various single-stranded RNA segments with restrained positions of the strand termini.
Schematic representation of four-state hidden Markov model for adenine trinucleotide. Shading indicates the distribution of the simulation data projected on the plane defined by the two leading TICA components. Another study investigated the process of pairing and fraying of a terminal base pair of an RNA duplex with methods closely related to MSM. Interestingly, they identified a rate-determining trapped state, in which the base is stacked but the backbone assumes a non-native conformation.
MSM methodologies have developed rapidly over the past few years, and many procedures that were favored in the past have been surpassed by better alternatives. Being already successfully applied to the study of many protein systems, they are now starting to cut their space also in the world of RNA.
The most widely used enhanced sampling method in biomolecular simulations is probably the parallel tempering PT method, which is also known as temperature-replica-exchange MD T-REMD. This is because the time required to cross a free-energy barrier depends exponentially on the height of the enthalpic part of the barrier divided by the temperature of the system. However, the temperature also affects the equilibrium populations of different conformations.
Therefore, running MD simulations at a high temperature would yield faster conformational transitions but might also result in extensive sampling of structures whose population is negligible at lower temperatures e. High temperature simulations have sometimes been used qualitatively to enhance sampling of RNA systems see, e. However, high temperature simulations alone cannot be used to directly estimate the values of experimental observables at physiological temperatures, and this approach is considered rather obsolete these days.
The idea of annealing is, after the transition, to slowly decrease the simulation temperature so as to gradually shift the system to explore a relevant region of its conformational space. The main problem of simulated annealing is that the results can heavily depend on the schedule used to reduce the temperature. In particular, final conformations might retain properties of the initial high temperature part of the simulation if the cooling is too fast. An important step forward was introduction of the simulated tempering approach. The goal of the method is to perform a random walk across the temperature space, leading to multiple heating and cooling cycles.
Once a set of temperatures is chosen from a given range, a weight is assigned to each temperature state that determines the probability of visiting that state i. If the weights are not chosen properly, the random walk in the temperature space will be confined to a subspace to some degree rather than fully exploring the entire space. Schemes for automatically adjusting these weights have been proposed. From time to time, an exchange of coordinates between two replicas in the temperature ladder is attempted and either accepted or rejected using a MC procedure based on the potential energies of the simulated systems.
Because the number of simulations at each temperature is fixed i. T-REMD was originally introduced in studies on spin glasses and was subsequently used by the biomolecular research community in conjunction with Monte Carlo methods and then with MD. One important choice when setting up a T-REMD study is the temperature range spanned by the replicas. The range typically goes from the reference temperature to a temperature high enough for enthalpic barriers to be easily crossed usually between and K.
These high temperatures are far from the physiological conditions because they are well above the boiling point of water under simulated conditions. However, one should note that the water models used in MD simulations typically have higher boiling points than that of real water. Additionally, most T-REMD simulations are performed at constant volume and so are not subject to this issue. While this is not usually a problem, it should be accounted for properly whenever conformational changes are correlated with changes in the effective volume of the solute, as done by Garcia et al.
It should therefore be optimized to enable the observation of the greatest possible number of conformational changes. T-REMD is certainly a robust tool for overcoming enthalpic barriers. However, folding barriers often contain significant entropic contributions, and then the effect of high temperatures to enhance sampling is limited.
Paradoxically, the addition of more high temperature replicas could even make the algorithm less computationally efficient because the additional cost of the extra replicas might not be fully compensated by more effective sampling and corresponding shorter folding time. Moreover, the number of states that should be explored increases significantly with temperature, so the dimensionality of the generalized ensemble i.
The wide variety of topics Patoski has researched and written about is well represented in the Writing series and the Research series. A relatively smaller group of material in this collection relates to Patoski's personal life. In the Correspondence series, many of the letters, particularly the letters from his father, offer insight into Patoski's relationships with family and friends.
The Photographic Material series contains many unidentified snapshots of Patoski with friends and family throughout his life. These two collections only contain material specifically related to the production of those particular books, and offer further insight into Patoski's research and writing process. This series documents the wide variety of topics Patoski has written about, and helps illustrate his writing process from initial handwritten notes to published articles.
Contained in this series are clippings of Patoski's work, as well as unpublished band and club reviews, creative works, and a large quantity of handwritten notes. A course packet for a feature writing class taught by Patoski is also included in this series. This series contains Patoski's personal and professional correspondence, separated into groupings, for Texas Monthly Correspondence and General Correspondence.
The majority of the general correspondence is personal in nature. Of particular interest is the relatively large number of letters in general correspondence from Patoski's father, Victor Patoski, most of which are signed simply with the letter V. The Texas Monthly correspondence are mostly from fans and critics of Patoski's work in the magazine. Also contained in the Texas Monthly correspondence are interoffice memos, letters from other publications soliciting Patoski's work, correspondence regarding editing, general fan letters, and several personal letters.
The promotional material series is predominately made up of photographic and printed material sent to Patoski from music, television and film production companies, record companies, and talent agencies. Other promotional materials in this series include catalogs, newsletters, festival advertisements, press releases and ephemera all relating to the music industry. This series of subject files and artist files, made up of newspaper clippings, notes, interviews, photographs and ephemera, illustrates Patoski's many areas of interest and his research methods. Patoski wrote about and published pieces relating to many of the topics and people in the subject and artist files.
Of particular note is the large amount on material on Joe "King" Carrasco and his band, who Patoski managed in the s. This series consists of bank records and a photograph from Joe Nick Patoski's music management business, Artist Development, Inc. Patoski managed these groups while he was also working full-time at Texas Monthly.
This series contains a small group of documents pertaining to Patoski's personal financial, legal, and medical matters, as well as art works on paper, and an array of artifacts. The photographs in this series are more personal in nature than those in previous series. Many images of Patoski throughout his life are contained in this series.
This series is a collection of miscellaneous clippings and notes. They are divided into different categories. They are writing related, travel related, health and medical related, or arts related. The majority of phonographs in this series are from Patoski's personal collection, but some were sent to him as promotional material. The audio cassettes are mostly non-professional recordings of music, and a few are of interviews with musicians.
A relatively large portion of the audio cassettes are of the band Joe "King" Carrasco, which Patoski managed during the s. Music Catalogs by title. Music Newsletters by title. Music Festivals by festival. Music Press Releases by subject. Television and Film by label. Clippings of Patoski's work, Music by label, , n. Promotional Material, — , n.
Music by label, — , cont. Music Catalogs by title, Music Newsletters by title, Music Festivals by festival, , n. Music Press Releases by subject, , n. Music Ephemera, , n. Television and Film by label, , n. Promotional Material, , n. Research Material, , n. Subject Files, , n. Into the Edwards Aquifer.
Artist Files, , n. Art on Paper, n. Sound Recordings, , n. Women Love Uncle Bud". Blues," "Curley Haired Baby". Wilson - "Mean Old World," "U. Audio Cassettes, , n. Songs of Los Angeles Premies.
Please contact the archivist for details. One box containing an assortment of personal and professional related material in no discernible including: Lady Be Good C6th. Sounds In the Night. Fruit Loop Captain Original. The Service Station Song. Asylum — American Explorer Series Elektra Nonesuch — American Series Lowest Common Denominator No.
Assorted personal and professional materials dated around Oct. Posters, photographs, notes, drafts, correspondence, and ephemera related to the research and writing career of Joe Nick Patoski, dating from s Broken Spoke poster featuring pencil drawing of James White in front of dancehall, signed to Joe Nick, n.
Malone, 2-color Hatch poster co-sponsored by UT Press, 12x Light in Architecture and Art: The Work of Dan Flavin: March 19, , at Las Manitas 2color block print poster, 11x17 2 copies. The Jan Reid Rescue Concert: Wooden plaque memento from The Nightcaps to Joe Nick: