4.3.3. Water Bridge analysis — MDAnalysis.analysis.hydrogenbonds.WaterBridgeAnalysis
- Author:
Zhiyi Wu
- Year:
2017-2018
- Copyright:
GNU Public License v3
- Maintainer:
Zhiyi Wu <zhiyi.wu@gtc.ox.ac.uk>, @xiki-tempula on GitHub
Given a Universe
(simulation
trajectory with 1 or more frames) measure all water bridges for each
frame between selections 1 and 2.
Water bridge is defined as a bridging water which simultaneously forms
two hydrogen bonds with atoms from both selection 1 and selection 2.
A water bridge can form between two hydrogen bond acceptors.
e.g. -CO2-:···H−O−H···:-O2C-
A water bridge can also form between two hydrogen bond donors.
e.g. -NH···:O:···HN- (where O is the oxygen of a bridging water)
A hydrogen bond acceptor and another hydrogen bond donor can be bridged by a water.
e.g. -CO2-:···H−O:···HN- (where H−O is part of H−O−H)
A higher order water bridge is defined as more than one water bridging hydrogen bond acceptor and donor. An example of a second order water bridge:
e.g. -CO2-:···H−O:···H−O:···HN- (where H−O is part of H−O−H)
The following keyword arguments are important to control the behaviour of the water bridge analysis:
water_selection (
resname SOL
): the selection string for the bridging waterorder the maximum number of water bridging both ends
donor-acceptor distance (Å): 3.0
Angle cutoff (degrees): 120.0
forcefield to switch between default values for different force fields
donors and acceptors atom types (to add additional atom names)
4.3.3.1. Theory
This module attempts to find multi-order water bridge by an approach similar to breadth-first search, where the first solvation shell of selection 1 is selected, followed by the selection of the second solvation shell as well as any hydrogen bonding partner from selection 1. After that, the third solvation shell, as well as any binding partners from selection 2, are detected. This process is repeated until the maximum order of water bridges is reached.
4.3.3.2. Output as Network
Since the waters connecting the two ends of the selections are by nature a
network. We provide a network representation of the water network. Water bridge
data are returned per frame, which is stored in
WaterBridgeAnalysis.results.network
. Each frame is represented as a
dictionary, where the keys are the hydrogen bonds originating from selection
1 and the values are new dictionaries representing the hydrogen bonds coming
out of the corresponding molecules making hydrogen bonds with selection 1.
As for the hydrogen bonds which reach the selection 2, the values of the corresponding keys are None. One example where selection 1 and selection 2 are joined by one water molecule (A) which also hydrogen bond to another water (B) which also hydrogen bond to selection 2 would be represented as
# (selection 1)-O:···H-O(water 1):···H-(selection 2)
# | :
# H·············O-H(water2)
# H
{(sele1_acceptor, None, water1_donor, water1_donor_heavy, distance, angle):
{(water1_acceptor, None, sele2_donor, sele2_donor_heavy,
distance, angle): None},
{(water1_donor, water1_donor_heavy, water2_acceptor, None,
distance, angle):
{(water2_acceptor, None, sele2_donor, sele2_donor_heavy,
distance, angle): None}
},
}
The atoms are represented by atom index and if the atom is hydrogen bond donor,
it is followed by the index of the corresponding heavy atom
(donor_proton, donor_heavy_atom)
.
If the atom is a hydrogen bond acceptor, it is followed by none.
4.3.3.3. Output as Timeseries
For lower order water bridges, it might be desirable to represent the
connections as WaterBridgeAnalysis.results.timeseries
. The results
are returned per frame and are a list of hydrogen bonds between the selection
1 or selection 2 and the bridging waters. Due to the complexity of the higher
order water bridge and the fact that one hydrogen bond between two waters can
appear in both third and fourth order water bridge, the hydrogen bonds in the
WaterBridgeAnalysis.results.timeseries
attribute are generated in a
depth-first search manner to avoid duplication. Example code of how
WaterBridgeAnalysis.results.timeseries
is generated:
def network2timeseries(network, timeseries):
'''Traverse the network in a depth-first fashion.
expand_timeseries will expand the compact representation to the
familiar timeseries representation.'''
if network is None:
return
else:
for node in network:
timeseries.append(expand_timeseries(node))
network2timeseries(network[node], timeseries)
timeseries = []
network2timeseries(network, timeseries)
An example would be.
results = [
[ # frame 1
[ <donor index>, <acceptor index>,
(<donor residue name>, <donor residue number>, <donor atom name>),
(<acceptor residue name>, <acceptor residue number>,
<acceptor atom name>),
<distance>, <angle>],
....
],
[ # frame 2
[ ... ], [ ... ], ...
],
...
]
Using the WaterBridgeAnalysis.generate_table()
method one can reformat
the results as a flat “normalised” table that is easier to import into a
database or dataframe for further processing.
4.3.3.4. Detection of water bridges
Water bridges are recorded if a bridging water simultaneously forms hydrogen bonds with selection 1 and selection 2.
Hydrogen bonds are detected based on a geometric criterion:
The distance between acceptor and hydrogen is less than or equal to distance (default is 3 Å).
The angle between donor-hydrogen-acceptor is greater than or equal to angle (default is 120º).
The cut-off values angle and distance can be set as keywords to
WaterBridgeAnalysis
.
Donor and acceptor heavy atoms are detected from atom names. The current defaults are appropriate for the CHARMM27 and GLYCAM06 force fields as defined in Table Default atom names for water bridge analysis.
Hydrogen atoms bonded to a donor are searched based on its distance to the donor. The algorithm searches for all hydrogens (name “H*” or name “[123]H” or type “H”) in the same residue as the donor atom within a cut-off distance of 1.2 Å.
group |
donor |
acceptor |
comments |
---|---|---|---|
main chain |
N |
O, OC1, OC2 |
OC1, OC2 from amber99sb-ildn (Gromacs) |
water |
OH2, OW |
OH2, OW |
SPC, TIP3P, TIP4P (CHARMM27,Gromacs) |
ARG |
NE, NH1, NH2 |
||
ASN |
ND2 |
OD1 |
|
ASP |
OD1, OD2 |
||
CYS |
SG |
||
CYH |
SG |
possible false positives for CYS |
|
GLN |
NE2 |
OE1 |
|
GLU |
OE1, OE2 |
||
HIS |
ND1, NE2 |
ND1, NE2 |
presence of H determines if donor |
HSD |
ND1 |
NE2 |
|
HSE |
NE2 |
ND1 |
|
HSP |
ND1, NE2 |
||
LYS |
NZ |
||
MET |
SD |
see e.g. [Gregoret1991] |
|
SER |
OG |
OG |
|
THR |
OG1 |
OG1 |
|
TRP |
NE1 |
||
TYR |
OH |
OH |
element |
donor |
acceptor |
---|---|---|
N |
N,NT,N3 |
N,NT |
O |
OH,OW |
O,O2,OH,OS,OW,OY |
S |
SM |
Donor and acceptor names for the CHARMM27 force field will also work for e.g. OPLS/AA or amber (tested in Gromacs). Residue names in the table are for information only and are not taken into account when determining acceptors and donors. This can potentially lead to some ambiguity in the assignment of donors/acceptors for residues such as histidine or cytosine.
For more information about the naming convention in GLYCAM06 have a look at the Carbohydrate Naming Convention in Glycam.
The lists of donor and acceptor names can be extended by providing lists of
atom names in the donors and acceptors keywords to
WaterBridgeAnalysis
. If the lists are entirely inappropriate
(e.g. when analysing simulations done with a force field that uses very
different atom names) then one should either use the value “other” for
forcefield to set no default values or derive a new class and set the
default list oneself:
class WaterBridgeAnalysis_OtherFF(WaterBridgeAnalysis):
DEFAULT_DONORS = {"OtherFF": tuple(set([...]))}
DEFAULT_ACCEPTORS = {"OtherFF": tuple(set([...]))}
Then simply use the new class instead of the parent class and call it with
`forcefield` = "OtherFF"
. Please also consider contributing the list of
heavy atom names to MDAnalysis.
References
Lydia M. Gregoret, Stephen D. Rader, Robert J. Fletterick, and Fred E. Cohen. Hydrogen bonds involving sulfur atoms in proteins. Proteins: Structure, Function, and Bioinformatics, 9(2):99–107, 1991. doi:https://doi.org/10.1002/prot.340090204.