TEMpy is a object-oriented Python library designed to help the user manipulate and analyse atomic structures and density maps from 3D EM. It is object oriented so it treats atoms, groups of atoms, densities, etc as different objects.
A typical usage pattern is to analyse the coordinates of a model with respect a target density maps from 3D EM.
The following example show how to fetch a Structure Instance:
from StructureParser import PDBParser
'fetch a structure PDB file'
structure_instance=PDBParser.fetch_PDB('1A5T','1A5T.pdb',hetatm=True,water=False)
NOTE: It is possible to create structure object from a PDB file or from the mmCIF file. For mmCIF file the last version Biopython (>1.40b) is required.
The following example show how to create a Structure Instance:
from StructureParser import PDBParser
structure_instance=PDBParser.read_PDB_file("TEST", pdb_file_test)
The following example show how to create a Map Instance:
from MapParser import *
# Generate Structure Instance from File:
emmap=MapParser.readMRC(map_target) #read target map
The following example show how to create a 20Å resolution Map Instance from a Structure Instance using target map informations as template:
from StructureBlurrer import *
#Generate a Map instance based on a Gaussian blurring of a protein
blurrer = StructureBlurrer()
sim_map = blurrer.gaussian_blur(structure_instance, 20.,densMap=target_map)
NOTE To compare with a target map use densMap=target_map, unless specified the Map Instance dimensions will be based on the Structure Instance.
The following example show how to translate a Structure Instance of +4.3Å in the x-direction, 1Å in y and -55Å in z (translation vector):
structure_instance=PDBParser.read_PDB_file("TEST", pdb_file_test)
# get the starting Centre of mass of the Structure Instance
structure_instance.CoM
# translate the Structure Instance
# note, this overwrites the existing position
structure_instance.translate(4.3, 1.0, -55)
# get the transformed Centre of mass of the Structure Instance
structure_instance.CoM
# reset transformation
structure_instance.reset_position()
The following example show how to do a selection from a list of segments:
# select two segments: 1st from Res 130 to 166, 2nd from Res 235 to 280
rigid_segment_list=[[130,166],[235,280]]
structure_instance=PDBParser.read_PDB_file("TEST", pdb_file_test)
list_segments=b.break_into_segments(rigid_segment_list)
# First Segment from Residues 130 to 166
list_segments[0]
Alternative, the following example show how to do a selection from a list of residues:
# create a list of Structure Instance of selected segments from a list of residues.
rigid_list=[130,166,235,280]
structure_instance=PDBParser.read_PDB_file("TEST", pdb_file_test)
list_segments=b.get_RB(rigid_list)
It is possible to add the selected segments to a Structure Instance as follow:
structure_instance.combine_structures(list_segments)
Alternative, it is possible to combine a list of selected segments in a unique Structure Instance (rigid body):
structure_instance.combine_SSE_structures(list_segments)
The following example show how to generate an ensemble of 10 Structure Instance rotated less than 90° and translated less than 5 Å:
from EnsembleGeneration import *
EnsembleGeneration=EnsembleGeneration()
structure_instance=PDBParser.read_PDB_file("TEST", pdb_file_test)
ensemble_list=EnsembleGeneration.randomise_structs(structure_instance, 10, 5, 90)
The following example show how to generate an ensemble of 10 Structure Instance using Angular Sweep using a rotation angle of 100° around a specified rotation axis using a translation vector as before:
from EnsembleGeneration import *
EnsembleGeneration=EnsembleGeneration()
translation_vector=[4.3, 1.0, -55]
rotation_angle= 110
axis=[0.21949010788898163, -0.80559787935161753, -0.55030527207975843]
structure_instance=PDBParser.read_PDB_file("TEST", pdb_file_test)
ensemble_list=EnsembleGeneration.anglar_sweep(structure_instance,axis, translation_vector, 10, rotation_angle, 'structure_instance_angular_sweep', atom_com_ind=False)
NOTE It is advisable to chose the number of structures for the ensemble accordingly with the angular increment step (rotation angle/number of structures) and/orthe translational increment step (translation vector/number of structures) to have a more homogeneous ensemble.
For more information on the performance of the difference Scoring Functions please read:
Vasishtan and Topf (2011) Scoring functions for cryoEM density fitting. J Struct Biol 174:333-343.
The cross-correlation function (CCF) is the most prevalent method of scoring the goodness-of-fit. The following example show how to calculate the CCF score between two Map Instance:
from ScoringFunctions import *
from MapParser import *
scorer = ScoringFunctions()
maptarget=MapParser.readMRC(map_target)
mapprobe=MapParser.readMRC(map_probe)
scorer.CCF(mapprobe,maptarget)
Based on (Chacon and Wriggers, 2002) One of the most promising scores for low resolution (⩾10 Å). The following example show how to calculate the Laplacian cross-correlation score between two Map Instance:
from ScoringFunctions import *
from MapParser import *
scorer = ScoringFunctions()
maptarget=MapParser.readMRC(map_target)
mapprobe=MapParser.readMRC(map_probe)
scorer.laplace_CCF(mapprobe,maptarget)
This score is used to quantify and compare the local quality of fits between the simulated map of a selected local segment of the fit and its corresponding target map.
For more information:
Pandurangan AP, Shakeel S, Butcher SJ, Topf M. Combined approaches to flexible fitting and assessment in virus capsids undergoing conformational change. J Struct Biol. 2013 Dec 12 The following example show how to calculate the SCCF score:
from PDBParser import *
from Bio.PDB import *
from MapParser import *
sim_res=20 #Target resolution of the outputted map.
sim_sigma_coeff=0.187 #Sigma width of the Gaussian used to blur the atomic structure.
domain_filename='PATH/TO/FILE.txt'
scorer = ScoringFunctions()
emmap=MapParser.readMRC(map_target) #read target map
structure_instance=PDBParser.read_PDB_file("TEST", pdb_file_test)
rigid_list=[[130,166],[235,280]]# select two segments: 1st from Res 130 to 166, 2nd from Res 235 to 280
score_SCCC=scorer.SCCC(emmap,sim_res,sim_sigma_coeff,structure_instance,rb=rigid_list)
NOTE Different way of segment selection are implemented in TEMpy. See the segment selection exemple for more information. In this example the segment selection is defined from a rigid body list.Alternatively the user can specify rb in Flex-EM format (text file, rb=rb_filename) using residue numbers. Each line describes one rigid body by specifying the initial and final residue of each of the segments in that rigid body (eg, ‘2 6 28 30’ means that residues 2-6 and 28-30 will be included in the same rigid body). We recommend to use the RIBFIND server for accurately identifying Rigid Bodys in a protein structures.
The mutual information was amongst the best scores tested with robustness to changes in the resolution and in the sigma coefficients, making it one of the most promising scores for low resolution. This score, by calculation of its ratio to the total entropy of the system, can formulate an easily understood and statistically meaningful value. The following example show how to calculate the MI score:
from ScoringFunctions import *
from MapParser import *
scorer = ScoringFunctions()
maptarget=MapParser.readMRC(map_target)
mapprobe=MapParser.readMRC(map_probe)
scorer.MI(mapprobe,maptarget)
The envelope score is the most sensitivity to the resolution of the target maps. It is the fastest of the available scores and so it could be used in screening possible fits in large assemblies. The following example show how to calculate the ENV score:
from PDBParser import *
from Bio.PDB import *
from MapParser import *
scorer = ScoringFunctions()
target_map=MapParser.readMRC(map_target) #read target map
structure_instance=PDBParser.read_PDB_file("TEST", pdb_file_test)
min_thr=target_map.get_min_threshold(structure_instance.get_prot_mass_from_atoms(), target_map.min(), target_map.max()) #minimum density value based on protein molecular weight.
scorer.envelope_score(target_map, min_thr, structure_instance,norm=True)
NOTE The correcte definition of the volume threshold can effect the performance of this score
The Normal vector score does not rely as heavily on the absolute (coordinate) positions of the calculated surface voxels. An important advantage of this score is that it can be applied to any subsection of the surface of a map and be relatively free from ‘contamination’ from other subunits. For this reason, it is probably the most useful score in sequentially fitting single subunits into maps of large assemblies.
The following example show how to calculate the NV score:
from PDBParser import *
from Bio.PDB import *
from MapParser import *
from StructureBlurrer import *
scorer = ScoringFunctions()
blurrer = StructureBlurrer()
target_map=MapParser.readMRC(map_target) #read target map
structure_instance=PDBParser.read_PDB_file("TEST", pdb_file_test)
probe_map = blurrer.gaussian_blur(structure_instance, 20.,densMap=target_map)
min_thr=probe_map.get_min_threshold(structure_instance.get_prot_mass_from_atoms(), probe_map.min(), probe_map.max())
points= round((probe_map.map_size())*0.01)
max_thr=probe_map.get_max_threshold(min_thr, points, min_thr, probe_map.max(),err_percent=1)
scorer.normal_vector_score(target_map,probe_map, min_thr, max_thr)
NOTE Using too smal (0.187×resolution) or large (0.5×resolution) sigma values to produce the probe map appeared to disrupt the accuracy of the normal vector score.
The source code distribution comes with a `Example_TEMPY01`_ file that contains a number of code snippets that show how to use certain aspects of TEMpy to retrieve informations regarding the Structure and Map Instances.