Also this post is not rocket-science, here is my firt protein modeled by
Homology modeling. I tried to describe in details the process followed here.
First of all, this the FASTA sequence of the protein to be modeled:
>sequence DLWNAIDPAIIADDHGQVWMSFGSFWGGLKLFKLNDDLTRPAEPQEWHSIAKLERSVLMDDSQAGSAQIEAPFILRKGDYYYLFASWGLCCRKGDSTYHLVVGRSKQVTGPYLDKTGRDMNQGGGSLLIKGNKRWVGLGHNSAYTWDGKDYLVLHAYEAADNYLQKLKILNLHWDGEGWPQVDEKELDSYIQVDVHDPVMTREGDTWYLFSTGPGITIYSSKDRVNWRYSDRAFATEPTWAKRVSPSFDGHLWAPDIYQHKGLFYLYYSVSAFGKNTSAIGVTVNKTLNPASPDYRWEDKGIVIESVPQ
Selecting homologs
I picked up the following homologs from a from a search result of the sequence with
blastp:
Note that those are chain of proteins. For each of them, I donwloaded the corresponding protein .pdb file (1GYD.pdb, 1GYH.pdb, 1WL7.pdb from
PDB.
Then I splitted the chains from the PDB file. Here we use a homemade Perl script called
PDBtoSplitChain.pl but I guess you may use
splitpdb also. This will generate a file for each chain found in the PDB file.
Multiple Sequence Alignment of the homologs
Then we concatenate all the selected chains into the same file to run a MSA:
$ cat sequence.fa > homologs.fa
$ cat 1GYDB.fa >> homologs.fa
$ cat 1GYHD.fa >> homologs.fa
$ cat 1WL7A.fa >> homologs.fa
$ cat 3C2UA.fa >> homologs.fa
$ clustalw homologs.fa
The output file of clustaw (*.aln file) should be converted to a PIR file:
$ aconvertMod2.pl -in c -out p <homologs.aln >homologs.ali
Creation of the model
First the python script model-default.py for Modeller is edited:
from modeller import *
from modeller.automodel import *
log.verbose()
env = environ()
env.io.atom_files_directory = ['.', '../atom_files']
a = automodel(env,
alnfile = 'homologs.ali',
knowns = ('1GYDB', '1GYHD', '1WL7A'),
sequence = 'sequence')
a.starting_model= 1
a.ending_model = 1
a.make()
And the model is actually calculated with the following:
$ mod9v7 model-default.py
In case of error, check out the file model-default.log that is produced in the current directory.
Alignment of protein sequences based on 3D structure (STAMP)
We need to create a file called (for instance) domains.data referencing the domains of our template proteins plus the model we've just calculated in the previous step:
./1GYDB.pdb 1GYDB {all}
./1GYHD.pdb 1GYHD {all}
./1WL7A.pdb 1WL7A {all}
./sequence.B99990001.pdb mod1 {all}
Then we run stamp to superpose our model with the template proteins:
$ stamp -n 2 -rough -prefix sequence -f domains.dat
$ transform -f sequence.3 -g -o sequence.3.pdb
Nota Bene: with transform use the file sequence.[number] with the highest [number] generated by stamp (it depends on the set of sequences you used in model-default.py)
Here is a graphical rendering with Jmol of the superposition (you can move the model using drag n' drop, or use Jmol contextual menu with right click):