Biotechies: First protein modeled

Also this post is not rocket-science, here is my firt protein modeled by Homology modeling. I tried to describe in details the process followed here.

First of all, this the FASTA sequence of the protein to be modeled:

>sequence   DLWNAIDPAIIADDHGQVWMSFGSFWGGLKLFKLNDDLTRPAEPQEWHSIAKLERSVLMDDSQAGSAQIEAPFILRKGDYYYLFASWGLCCRKGDSTYHLVVGRSKQVTGPYLDKTGRDMNQGGGSLLIKGNKRWVGLGHNSAYTWDGKDYLVLHAYEAADNYLQKLKILNLHWDGEGWPQVDEKELDSYIQVDVHDPVMTREGDTWYLFSTGPGITIYSSKDRVNWRYSDRAFATEPTWAKRVSPSFDGHLWAPDIYQHKGLFYLYYSVSAFGKNTSAIGVTVNKTLNPASPDYRWEDKGIVIESVPQ

Selecting homologs

I picked up the following homologs from a from a search result of the sequence with blastp:

1GYD_B
1GYH_D
1WL7_A

Note that those are chain of proteins. For each of them, I donwloaded the corresponding protein .pdb file (1GYD.pdb, 1GYH.pdb, 1WL7.pdb from PDB.

Then I splitted the chains from the PDB file. Here we use a homemade Perl script called PDBtoSplitChain.pl but I guess you may use splitpdb also. This will generate a file for each chain found in the PDB file.

Multiple Sequence Alignment of the homologs

Then we concatenate all the selected chains into the same file to run a MSA:

$ cat sequence.fa > homologs.fa

$ cat 1GYDB.fa >> homologs.fa

$ cat 1GYHD.fa >> homologs.fa

$ cat 1WL7A.fa >> homologs.fa

$ cat 3C2UA.fa >> homologs.fa

$ clustalw homologs.fa

The output file of clustaw (*.aln file) should be converted to a PIR file:

$ aconvertMod2.pl -in c -out p <homologs.aln >homologs.ali

Creation of the model

We are going to use Modeller 9v7 to create our model.

First the python script model-default.py for Modeller is edited:

from modeller import *

from modeller.automodel import *

log.verbose()

env = environ()

env.io.atom_files_directory = ['.', '../atom_files']

a = automodel(env,

alnfile = 'homologs.ali',

knowns = ('1GYDB', '1GYHD', '1WL7A'),

sequence = 'sequence')

a.starting_model= 1

a.ending_model = 1

a.make()

And the model is actually calculated with the following:

$ mod9v7 model-default.py

In case of error, check out the file model-default.log that is produced in the current directory.

Alignment of protein sequences based on 3D structure (STAMP)

We need to create a file called (for instance) domains.data referencing the domains of our template proteins plus the model we've just calculated in the previous step:

./1GYDB.pdb 1GYDB {all}

./1GYHD.pdb 1GYHD {all}

./1WL7A.pdb 1WL7A {all}

./sequence.B99990001.pdb mod1 {all}

Then we run stamp to superpose our model with the template proteins:

$ stamp -n 2 -rough -prefix sequence -f domains.dat

$ transform -f sequence.3 -g -o sequence.3.pdb

Nota Bene: with transform use the file sequence.[number] with the highest [number] generated by stamp (it depends on the set of sequences you used in model-default.py)

Here is a graphical rendering with Jmol of the superposition (you can move the model using drag n' drop, or use Jmol contextual menu with right click):

Biotechies

Wednesday, February 9, 2011

First protein modeled

No comments:

Post a Comment