Wednesday, February 9, 2011

First protein modeled

Also this post is not rocket-science, here is my firt protein modeled by Homology modeling. I tried to describe in details the process followed here.

First of all, this the FASTA sequence of the protein to be modeled:
>sequence   DLWNAIDPAIIADDHGQVWMSFGSFWGGLKLFKLNDDLTRPAEPQEWHSIAKLERSVLMDDSQAGSAQIEAPFILRKGDYYYLFASWGLCCRKGDSTYHLVVGRSKQVTGPYLDKTGRDMNQGGGSLLIKGNKRWVGLGHNSAYTWDGKDYLVLHAYEAADNYLQKLKILNLHWDGEGWPQVDEKELDSYIQVDVHDPVMTREGDTWYLFSTGPGITIYSSKDRVNWRYSDRAFATEPTWAKRVSPSFDGHLWAPDIYQHKGLFYLYYSVSAFGKNTSAIGVTVNKTLNPASPDYRWEDKGIVIESVPQ
Selecting homologs

I picked up the following homologs from a from a search result of the sequence with blastp:
  • 1GYD_B
  • 1GYH_D
  • 1WL7_A
Note that those are chain of proteins. For each of them, I donwloaded the corresponding protein .pdb file (1GYD.pdb, 1GYH.pdb, 1WL7.pdb from PDB.

Then I splitted the chains from the PDB file. Here we use a homemade Perl script called PDBtoSplitChain.pl but I guess you may use splitpdb also. This will generate a file for each chain found in the PDB file.

Multiple Sequence Alignment of the homologs

Then we concatenate all the selected chains into the same file to run a MSA:

$ cat sequence.fa > homologs.fa
$ cat 1GYDB.fa >> homologs.fa
$ cat 1GYHD.fa >> homologs.fa
$ cat 1WL7A.fa >> homologs.fa
$ cat 3C2UA.fa >> homologs.fa
$ clustalw homologs.fa

The output file of clustaw (*.aln file) should be converted to a PIR file:

$ aconvertMod2.pl -in c -out p <homologs.aln >homologs.ali

Creation of the model

We are going to use Modeller 9v7 to create our model.

First the python script model-default.py for Modeller is edited:

from modeller import *
from modeller.automodel import *
log.verbose()
env = environ()
env.io.atom_files_directory = ['.', '../atom_files']
a = automodel(env,
alnfile = 'homologs.ali',
knowns = ('1GYDB', '1GYHD', '1WL7A'),
sequence = 'sequence')
a.starting_model= 1
a.ending_model = 1
a.make()

And the model is actually calculated with the following:

$ mod9v7 model-default.py

In case of error, check out the file model-default.log that is produced in the current directory.

Alignment of protein sequences based on 3D structure (STAMP)

We need to create a file called (for instance) domains.data referencing the domains of our template proteins plus the model we've just calculated in the previous step:
./1GYDB.pdb 1GYDB {all}
./1GYHD.pdb 1GYHD {all}
./1WL7A.pdb 1WL7A {all}
./sequence.B99990001.pdb mod1 {all}

Then we run stamp to superpose our model with the template proteins:

$ stamp -n 2 -rough -prefix sequence -f domains.dat
$ transform -f sequence.3 -g -o sequence.3.pdb

Nota Bene: with transform use the file sequence.[number] with the highest [number] generated by stamp (it depends on the set of sequences you used in model-default.py)

Here is a graphical rendering with Jmol of the superposition (you can move the model using drag n' drop, or use Jmol contextual menu with right click):


No comments:

Post a Comment