Biotechies: February 2011

Friday, February 25, 2011

Calculating distances with VMD Tk scripting

This post shows how to calculate distances between atoms using Tk scripting.

Assuming you have loaded the molecule 1NEY in VMD we are going to calculate the distance between the atom OE1 of chain A with all of the residues of the protein. We will consider the alpha-Carbon atom to be arbitrary the center of a residue.

Open the Tk Console with menu Extensions > Tk Console.

The following two lines store the coordinates of oxygen named OE1:

% set oe1 [atomselect top "name OE1 and resid 165 and chain A"]
% set OE1coord [lindex [$oe1 get {x y z}] 0]

This will store all alpha-Carbon carbons in the list CASel:

% set CASel [atomselect top "type CA"]

Then to compute the distance between OE1coord an each element of CASel, you just run this script:

% foreach ca [$CASel get {x y z}] {

set distance [veclength [vecsub $OE1coord $ca]]

echo $ca : $distance

}

Displaying protein chains with VMD

VMD is a molecular visualization program that can display static or animated systems. It has a built-in programming language called Tk. The version used in this post is 1.8.7.

We are going to show some common operations on the protein Triosephosphate Isomerase in Complex with DHAP using this modified PDB file.

We start by displaying the protein chains using different colors.

start VMD program
menu File > New Molecule, browse to the downloaded file 1NEY.pdb, then click Load button. You should see something like this:
menu Graphics > Representations
select the line in the list of representations and delete it by clicking Delete Rep button,
click twice the Create Rep button to create one graphical representation for each chain,
select the first graphical representation line,
- for Drawing Method choose for instance NewRibbons
- for Coloring Method choose ColorID and then choose a color in the list of IDs
- in the Selected Atoms field enter chain A
- then click the button Apply
select the second graphical representation line,
- for Drawing Method choose for instance NewRibbons
- for Coloring Method choose ColorID and then choose another color in the list of IDs
- in the Selected Atoms field enter chain B
- then click the button Apply

You should get this output:

Remember the keyboard shortcuts to toggle the mouse pointer mode:

r for rotating
t for translating
s for scaling

Friday, February 18, 2011

Python and Doxygen

Dogygen produces a professional documentation of your source code. Originally wrote for C/C++ source code, it now supports a broader range of language: C, C++, Java, PHP, Python, etc.

Here is a quick howto to get your Python project documented:

download a binary distribution and install it in your system
generate a configuration file template running in a command line console:

doxygen -g my_project.doxyfile
update the configuration template to match your project folder, and options. So far I have configured the following options only:

PROJECT_NAME
PROJECT_NUMBER
OUTPUT_DIRECTORY (where you want the documentation files to be written)
EXTRACT_ALL = YES
EXTRACT_PRIVATE = YES
EXTRACT_STATIC = YES
FILE_PATTERNS = *.py
RECURSIVE = YES
GENERATE_LATEX = NO
then simply run doxygen with the configuration file as parameter:

doxygen my_project.doxyfile

Here's some HTML output:

Wednesday, February 16, 2011

Adding Biopython dependency to your Python project in Eclipse

This short post helps you to configure your Python project in Eclipse to use of biopython library.

First, you have to install separately NumPy (biopython required dependency) and biopython itself as explained in the biopython installation help.

Then, in Eclipse, open your Python project properties (Alt+ENTER when the project is selected).

Select the section PyDev - PYTHONPATH, then the tab External Libraries and make sure you have the source folder '...\site-packages' (or any other root folder you may have chosen to install Biopython). You should get the following screen:

Then you should be able to run a program such as the following:

parser = PDBParser()

structure = parser.get_structure("2BEG", "C:\\svn\\sporna\\2BEG.pdb")

for atom in structure.get_atoms():

print atom.get_name(), " coord:", atom.get_coord()

Tested on:

Eclipse Helios Service Release 1
Python 2.7
Numerical Python library (NumPy) v1.5.1
Biopython v1.56

Wednesday, February 9, 2011

Jmol applet (quick) howto

Here is a short post to include Jmol rendering in your blogger posts.

First of all, you need to have the applet and its related jar and resources files available somewhere behind a HTTP server. In my configuration I have a folder called JmolFolder with these 19 files:

JmolApplet0_Console.jar

JmolApplet0.jar

JmolApplet0_Minimize.jar

JmolApplet0_Popup.jar

JmolApplet0_ReadersCifPdb.jar

JmolApplet0_ReadersMolXyz.jar

JmolApplet0_ReadersMore.jar

JmolApplet0_ReadersQuantum.jar

JmolApplet0_ReadersSimple.jar

JmolApplet0_ReadersXml.jar

JmolApplet0_ReadersXtal.jar

JmolApplet0_ShapeBio.jar

JmolApplet0_ShapeSpecial.jar

JmolApplet0_ShapeSurface.jar

JmolApplet0_Smiles.jar

JmolApplet0_Symmetry.jar

JmolApplet.jar

Jmol.jar

Jmol.js

Note that I removed all the internationalization files from the original distribution (I hope it makes it start faster).

In blogger/blogspot post you have to insert the following as HTML:

<form>
<script src="http://biotechies.site40.net/jmol/JmolFolder/Jmol.js" type="text/javascript"></script> <script type="text/javascript"> jmolInitialize("http://biotechies.site40.net/jmol/JmolFolder"); jmolApplet(400, "load http://biotechies.site40.net/jmol/alphahelix.pdb.gz;");jmolBr();</script>
</form>

The result is:

First protein modeled

Also this post is not rocket-science, here is my firt protein modeled by Homology modeling. I tried to describe in details the process followed here.

First of all, this the FASTA sequence of the protein to be modeled:

>sequence   DLWNAIDPAIIADDHGQVWMSFGSFWGGLKLFKLNDDLTRPAEPQEWHSIAKLERSVLMDDSQAGSAQIEAPFILRKGDYYYLFASWGLCCRKGDSTYHLVVGRSKQVTGPYLDKTGRDMNQGGGSLLIKGNKRWVGLGHNSAYTWDGKDYLVLHAYEAADNYLQKLKILNLHWDGEGWPQVDEKELDSYIQVDVHDPVMTREGDTWYLFSTGPGITIYSSKDRVNWRYSDRAFATEPTWAKRVSPSFDGHLWAPDIYQHKGLFYLYYSVSAFGKNTSAIGVTVNKTLNPASPDYRWEDKGIVIESVPQ

Selecting homologs

I picked up the following homologs from a from a search result of the sequence with blastp:

1GYD_B
1GYH_D
1WL7_A

Note that those are chain of proteins. For each of them, I donwloaded the corresponding protein .pdb file (1GYD.pdb, 1GYH.pdb, 1WL7.pdb from PDB.

Then I splitted the chains from the PDB file. Here we use a homemade Perl script called PDBtoSplitChain.pl but I guess you may use splitpdb also. This will generate a file for each chain found in the PDB file.

Multiple Sequence Alignment of the homologs

Then we concatenate all the selected chains into the same file to run a MSA:

$ cat sequence.fa > homologs.fa

$ cat 1GYDB.fa >> homologs.fa

$ cat 1GYHD.fa >> homologs.fa

$ cat 1WL7A.fa >> homologs.fa

$ cat 3C2UA.fa >> homologs.fa

$ clustalw homologs.fa

The output file of clustaw (*.aln file) should be converted to a PIR file:

$ aconvertMod2.pl -in c -out p <homologs.aln >homologs.ali

Creation of the model

We are going to use Modeller 9v7 to create our model.

First the python script model-default.py for Modeller is edited:

from modeller import *

from modeller.automodel import *

log.verbose()

env = environ()

env.io.atom_files_directory = ['.', '../atom_files']

a = automodel(env,

alnfile = 'homologs.ali',

knowns = ('1GYDB', '1GYHD', '1WL7A'),

sequence = 'sequence')

a.starting_model= 1

a.ending_model = 1

a.make()

And the model is actually calculated with the following:

$ mod9v7 model-default.py

In case of error, check out the file model-default.log that is produced in the current directory.

Alignment of protein sequences based on 3D structure (STAMP)

We need to create a file called (for instance) domains.data referencing the domains of our template proteins plus the model we've just calculated in the previous step:

./1GYDB.pdb 1GYDB {all}

./1GYHD.pdb 1GYHD {all}

./1WL7A.pdb 1WL7A {all}

./sequence.B99990001.pdb mod1 {all}

Then we run stamp to superpose our model with the template proteins:

$ stamp -n 2 -rough -prefix sequence -f domains.dat

$ transform -f sequence.3 -g -o sequence.3.pdb

Nota Bene: with transform use the file sequence.[number] with the highest [number] generated by stamp (it depends on the set of sequences you used in model-default.py)

Here is a graphical rendering with Jmol of the superposition (you can move the model using drag n' drop, or use Jmol contextual menu with right click):