tag:blogger.com,1999:blog-20123994292999358242024-03-13T13:06:03.155-07:00BiotechiesPascalhttp://www.blogger.com/profile/09612362594213376341noreply@blogger.comBlogger19125tag:blogger.com,1999:blog-2012399429299935824.post-26372033893124980802012-03-13T14:36:00.000-07:002012-03-13T14:43:55.589-07:00Biotech companies map - Denmark and Sweden<div dir="ltr" style="text-align: left;" trbidi="on">
A Google map displaying Biotech companies located in Sweden and in Denmark:<br />
<br />
<center style="text-align: left;"><small><span style="color: red; font-size: small;">Thank you very much to <a href="https://twitter.com/#!/TommyCarstensen">@TommyCarstensen</a> for his great help and advices in selecting the companies displayed on this map!</span></small></center><br />
<center><iframe frameborder="0" height="350" marginheight="0" marginwidth="0" scrolling="no" src="http://maps.google.com/maps/ms?msa=0&msid=202121346235939095426.0004bb26790b2af926481&ie=UTF8&t=m&ll=57.06463,11.689453&spn=4.182511,10.986328&z=6&output=embed" width="500"></iframe><br /><small>View <a href="http://maps.google.com/maps/ms?msa=0&msid=202121346235939095426.0004bb26790b2af926481&ie=UTF8&t=m&ll=57.06463,11.689453&spn=4.182511,10.986328&z=6&source=embed" style="color: blue; text-align: left;">Biotech Companies 2012 - Denmark and Sweden</a> in a larger map</small><br />
<small>Use <a href="https://docs.google.com/spreadsheet/viewform?formkey=dEpHZE5SZnFYc2JubldfUVNmR21abEE6MQ" target="">this form</a> if you want to suggest a correction to your company information.</small>
</center><center style="text-align: left;"><span style="font-size: x-small;"><br /></span></center><center style="text-align: left;"><small><b><span style="font-size: small;">Legend:</span></b></small></center>
<br />
<div class="separator" style="clear: both; text-align: left;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgIoSLSaLFg-ftSxHILQ0xkEhMDEzsGiKbaeC0TTcxJldPRCsd0Xlj4iDHC0lprgyPovkJnhipuFEynue_7BrXdrng9FjP44pgA4sfqrhM_b9gGU8u9eDMnRzJQou4RiClDWQ0r5saRn1Q/s1600/legend_popularity_markers.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgIoSLSaLFg-ftSxHILQ0xkEhMDEzsGiKbaeC0TTcxJldPRCsd0Xlj4iDHC0lprgyPovkJnhipuFEynue_7BrXdrng9FjP44pgA4sfqrhM_b9gGU8u9eDMnRzJQou4RiClDWQ0r5saRn1Q/s1600/legend_popularity_markers.png" /></a></div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
<i><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhOT5dhvqYBGKGIbVuOdzgMKKhy-d9DuTdW0vjNstFhiIO7aC7aWvx3JdvCDjGj6shAhYUzDxK_kyrGhiUusgEpOrJhS2wsV2a2oHVKSFQpraSra7m73EGLrCdISB4JvLB9-5Jyshk-yYg/s1600/red-dot.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhOT5dhvqYBGKGIbVuOdzgMKKhy-d9DuTdW0vjNstFhiIO7aC7aWvx3JdvCDjGj6shAhYUzDxK_kyrGhiUusgEpOrJhS2wsV2a2oHVKSFQpraSra7m73EGLrCdISB4JvLB9-5Jyshk-yYg/s1600/red-dot.png" /></a>the marker shows the precise location of the company</i><i> </i></div>
<div class="separator" style="clear: both; text-align: left;">
<i><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhS6ozKHTLHpebCWQkaM8IQmJZVIeU7ELzOx4oL1bYylsX-XLoMwGpT5IK6HLe4EQPTNwptVhMwNpNkoFzLjXHkkeoO2E002BdRFvCsfpHwfU6zrQwUi3Wh6bM0Z_0wkxHyUnea4DQxZng/s1600/red.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhS6ozKHTLHpebCWQkaM8IQmJZVIeU7ELzOx4oL1bYylsX-XLoMwGpT5IK6HLe4EQPTNwptVhMwNpNkoFzLjXHkkeoO2E002BdRFvCsfpHwfU6zrQwUi3Wh6bM0Z_0wkxHyUnea4DQxZng/s1600/red.png" /></a>the marker shows the location <u>area</u> of the company</i></div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div style="text-align: left;">
<b><br /></b></div>
</div>Pascalhttp://www.blogger.com/profile/09612362594213376341noreply@blogger.com0tag:blogger.com,1999:blog-2012399429299935824.post-49212505378550976712012-03-04T10:26:00.000-08:002012-03-04T10:26:38.700-08:00Biotech Companies map - Germany<div dir="ltr" style="text-align: left;" trbidi="on">
A Google map displaying Biotech companies located in Germany (see also <a href="http://biotechies.blogspot.com/2012/02/view-test-in-larger-map.html">Biotechs in Spain</a> and <a href="http://biotechies.blogspot.com/2012/02/biotech-companies-map-france.html">Biotechs in France</a> maps):<br />
<br />
<center>
<iframe frameborder="0" height="350" marginheight="0" marginwidth="0" scrolling="no" src="http://maps.google.com/maps/ms?msa=0&msid=202121346235939095426.0004ba6e6e70483fa6096&ie=UTF8&t=m&ll=50.912384,10.041057&spn=6.520831,7.909438&output=embed" width="500"></iframe><br />
<small>View <a href="http://maps.google.com/maps/ms?msa=0&msid=202121346235939095426.0004ba6e6e70483fa6096&ie=UTF8&t=m&ll=50.912384,10.041057&spn=6.520831,7.909438&source=embed" style="color: blue; text-align: left;">Biotech Companies 2012 - Germany</a> in a larger map</small>
</center>
<br />
<div style="text-align: center;">
<small>Use <a href="https://docs.google.com/spreadsheet/viewform?formkey=dEpHZE5SZnFYc2JubldfUVNmR21abEE6MQ" target="">this form</a> if you want to suggest a correction to your company information.</small><br />
<small><br /></small><br />
<div style="text-align: left;">
<b>Legend:</b></div>
<div style="text-align: left;">
<b><br /></b></div>
<div class="separator" style="clear: both; text-align: left;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgIoSLSaLFg-ftSxHILQ0xkEhMDEzsGiKbaeC0TTcxJldPRCsd0Xlj4iDHC0lprgyPovkJnhipuFEynue_7BrXdrng9FjP44pgA4sfqrhM_b9gGU8u9eDMnRzJQou4RiClDWQ0r5saRn1Q/s1600/legend_popularity_markers.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgIoSLSaLFg-ftSxHILQ0xkEhMDEzsGiKbaeC0TTcxJldPRCsd0Xlj4iDHC0lprgyPovkJnhipuFEynue_7BrXdrng9FjP44pgA4sfqrhM_b9gGU8u9eDMnRzJQou4RiClDWQ0r5saRn1Q/s1600/legend_popularity_markers.png" /></a></div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
<i><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhOT5dhvqYBGKGIbVuOdzgMKKhy-d9DuTdW0vjNstFhiIO7aC7aWvx3JdvCDjGj6shAhYUzDxK_kyrGhiUusgEpOrJhS2wsV2a2oHVKSFQpraSra7m73EGLrCdISB4JvLB9-5Jyshk-yYg/s1600/red-dot.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhOT5dhvqYBGKGIbVuOdzgMKKhy-d9DuTdW0vjNstFhiIO7aC7aWvx3JdvCDjGj6shAhYUzDxK_kyrGhiUusgEpOrJhS2wsV2a2oHVKSFQpraSra7m73EGLrCdISB4JvLB9-5Jyshk-yYg/s1600/red-dot.png" /></a>the marker shows the precise location of the company</i><i> </i></div>
<div class="separator" style="clear: both; text-align: left;">
<i><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhS6ozKHTLHpebCWQkaM8IQmJZVIeU7ELzOx4oL1bYylsX-XLoMwGpT5IK6HLe4EQPTNwptVhMwNpNkoFzLjXHkkeoO2E002BdRFvCsfpHwfU6zrQwUi3Wh6bM0Z_0wkxHyUnea4DQxZng/s1600/red.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhS6ozKHTLHpebCWQkaM8IQmJZVIeU7ELzOx4oL1bYylsX-XLoMwGpT5IK6HLe4EQPTNwptVhMwNpNkoFzLjXHkkeoO2E002BdRFvCsfpHwfU6zrQwUi3Wh6bM0Z_0wkxHyUnea4DQxZng/s1600/red.png" /></a>the marker shows the location <u>area</u> of the company</i></div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div style="text-align: left;">
<b><br /></b></div>
</div>
</div>Pascalhttp://www.blogger.com/profile/09612362594213376341noreply@blogger.com0tag:blogger.com,1999:blog-2012399429299935824.post-1394357131688826052012-02-19T07:04:00.000-08:002012-02-26T07:30:42.552-08:00Biotech Companies map - France<div dir="ltr" style="text-align: left;" trbidi="on">
Here is a new map displaying Biotech companies located in France, following the previous post about <a href="http://biotechies.blogspot.com/2012/02/view-test-in-larger-map.html">Spanish Biotech companies</a>.<br />
<br />
<center>
<iframe frameborder="0" height="400" marginheight="0" marginwidth="0" scrolling="no" src="http://maps.google.com/maps/ms?msa=0&msid=202121346235939095426.0004b951dbc46f6a47df3&ie=UTF8&t=m&ll=46.55886,2.197266&spn=12.089284,21.928711&z=5&output=embed" width="500"></iframe></center><br />
<div style="text-align: center;">
<small>View <a href="http://maps.google.com/maps/ms?msa=0&msid=202121346235939095426.0004b951dbc46f6a47df3&ie=UTF8&t=m&ll=46.55886,2.197266&spn=12.089284,21.928711&z=5&source=embed" style="color: blue; text-align: left;">Biotech Companies 2012 - France</a> in a larger map.<br />
Use <a href="https://docs.google.com/spreadsheet/viewform?formkey=dEpHZE5SZnFYc2JubldfUVNmR21abEE6MQ" target="">this form</a> if you want to suggest a correction to your company information.</small><br />
<small><br /></small><br />
<div style="text-align: left;">
<b>Legend:</b></div>
<div style="text-align: left;">
<b><br /></b></div>
<div class="separator" style="clear: both; text-align: left;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgIoSLSaLFg-ftSxHILQ0xkEhMDEzsGiKbaeC0TTcxJldPRCsd0Xlj4iDHC0lprgyPovkJnhipuFEynue_7BrXdrng9FjP44pgA4sfqrhM_b9gGU8u9eDMnRzJQou4RiClDWQ0r5saRn1Q/s1600/legend_popularity_markers.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgIoSLSaLFg-ftSxHILQ0xkEhMDEzsGiKbaeC0TTcxJldPRCsd0Xlj4iDHC0lprgyPovkJnhipuFEynue_7BrXdrng9FjP44pgA4sfqrhM_b9gGU8u9eDMnRzJQou4RiClDWQ0r5saRn1Q/s1600/legend_popularity_markers.png" /></a></div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
<i><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhOT5dhvqYBGKGIbVuOdzgMKKhy-d9DuTdW0vjNstFhiIO7aC7aWvx3JdvCDjGj6shAhYUzDxK_kyrGhiUusgEpOrJhS2wsV2a2oHVKSFQpraSra7m73EGLrCdISB4JvLB9-5Jyshk-yYg/s1600/red-dot.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhOT5dhvqYBGKGIbVuOdzgMKKhy-d9DuTdW0vjNstFhiIO7aC7aWvx3JdvCDjGj6shAhYUzDxK_kyrGhiUusgEpOrJhS2wsV2a2oHVKSFQpraSra7m73EGLrCdISB4JvLB9-5Jyshk-yYg/s1600/red-dot.png" /></a>the marker shows the precise location of the company</i><i> </i></div>
<div class="separator" style="clear: both; text-align: left;">
<i><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhS6ozKHTLHpebCWQkaM8IQmJZVIeU7ELzOx4oL1bYylsX-XLoMwGpT5IK6HLe4EQPTNwptVhMwNpNkoFzLjXHkkeoO2E002BdRFvCsfpHwfU6zrQwUi3Wh6bM0Z_0wkxHyUnea4DQxZng/s1600/red.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhS6ozKHTLHpebCWQkaM8IQmJZVIeU7ELzOx4oL1bYylsX-XLoMwGpT5IK6HLe4EQPTNwptVhMwNpNkoFzLjXHkkeoO2E002BdRFvCsfpHwfU6zrQwUi3Wh6bM0Z_0wkxHyUnea4DQxZng/s1600/red.png" /></a>the marker shows the location <u>area</u> of the company</i></div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div style="text-align: left;">
<b><br /></b></div>
</div>
</div>Pascalhttp://www.blogger.com/profile/09612362594213376341noreply@blogger.com0tag:blogger.com,1999:blog-2012399429299935824.post-32305787002427240472012-02-16T13:56:00.001-08:002012-02-20T06:17:58.831-08:00Biotechs companies map - Spain<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: center;">
<div style="text-align: left;">
This is a first step in building a map of European Biotech companies. Here we start with Spanish companies only. The info box (when you click a marker) should display the name, web, employees and location of the company. </div>
<br />
<center>
<iframe frameborder="0" height="400" marginheight="0" marginwidth="0" scrolling="no" src="http://maps.google.com/maps/ms?msa=0&msid=202121346235939095426.0004b951e0b2462b7e47f&ie=UTF8&t=m&ll=40.513799,-4.702148&spn=13.355432,22.016602&z=5&output=embed" width="500"></iframe><br /><small>View <a href="http://maps.google.com/maps/ms?msa=0&msid=202121346235939095426.0004b951e0b2462b7e47f&ie=UTF8&t=m&ll=40.513799,-4.702148&spn=13.355432,22.016602&z=5&source=embed" style="color: blue; text-align: left;">Biotech Companies 2012 - Spain</a> in a larger map.<br />Use <a href="https://docs.google.com/spreadsheet/viewform?formkey=dEpHZE5SZnFYc2JubldfUVNmR21abEE6MQ">this form</a> if you want to suggest a correction to your company information</small>
</center>
</div>
<div style="text-align: center;">
<br /></div>
<div style="text-align: left;">
<b><u>LEGEND:</u></b></div>
<div class="separator" style="clear: both; text-align: left;">
<b><br /></b></div>
<div class="separator" style="clear: both; text-align: left;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhCxTtNHlzeherhNRT0Pd65SKat-YI2Fn6S6trkibcxGC2ZAXz1QUrltSsWWd2lBXGNu-Wxr_KVfsEdPXe7ely23QMXFtiBhGnLKff609E8FYNs73AHdlKytF_GF4tiUjEfut6OYxmBGwU/s1600/legend_popularity_markers.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhCxTtNHlzeherhNRT0Pd65SKat-YI2Fn6S6trkibcxGC2ZAXz1QUrltSsWWd2lBXGNu-Wxr_KVfsEdPXe7ely23QMXFtiBhGnLKff609E8FYNs73AHdlKytF_GF4tiUjEfut6OYxmBGwU/s1600/legend_popularity_markers.png" /></a></div>
<div class="separator" style="clear: both; text-align: left;">
<b><br /></b></div>
<div class="separator" style="clear: both; text-align: left;">
<i><a href="http://maps.google.com/mapfiles/ms/micons/red-dot.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://maps.google.com/mapfiles/ms/micons/red-dot.png" /></a>the markers shows the exact location of the company,</i></div>
<div class="separator" style="clear: both; text-align: left;">
<i><a href="http://maps.google.com/mapfiles/ms/micons/red.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://maps.google.com/mapfiles/ms/micons/red.png" /></a>the marker shows the <u>area</u> of the company</i></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div style="text-align: left;">
<br /></div>
</div>Pascalhttp://www.blogger.com/profile/09612362594213376341noreply@blogger.com0tag:blogger.com,1999:blog-2012399429299935824.post-70491143443387629372012-01-19T00:21:00.000-08:002012-02-16T13:57:49.294-08:00Genomic Structural Variation: Methods & Protocols<div dir="ltr" style="text-align: left;" trbidi="on">
Book of the month about Structural Variation:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://www.springerlink.com/content/978-1-61779-506-0/"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhjLq7itc__OqM9BWwQMVv8rtZDxTSuWDcnB0GjiXx3NhGgyDZPQTbbFgqBoiNGHPRk6I9ye53kgI5aMtRr2gTv7j3kKfzC-PTYoX819p3sK1qqtGSpiIHZKMx1uYuCHBA5-kttUI6-mt4/s1600/cover-trimmed-210x300.jpg" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="http://www.springerlink.com/content/978-1-61779-506-0/">Springer book entry</a></div>
<br />
<b><a href="http://www.springerlink.com/content/978-1-61779-506-0/front-matter.pdf">Content:</a></b><br />
<i>1. What Have Studies of Genomic Disorders Taught Us About Our Genome?</i><br />
<i>2. Microdeletion and Microduplication Syndromes</i><br />
<i>3. Structural Genomic Variation in Intellectual Disability</i><br />
<i>4. Copy Number Variation and Psychiatric Disease Risk</i><br />
<i>5. Detection and Characterization of Copy Number Variation in Autism Spectrum Disorder</i><br />
<i>6. Structural Variation in Subtelomeres</i><br />
<i>7. Array-Based Approaches in Prenatal Diagnosis</i><br />
<i>8. Structural Variation and Its Effect on Expression</i><br />
<i>9. The Challenges of Studying Complex and Dynamic Regions of the Human Genome</i><br />
<i>10. Population Genetic Nature of Copy Number Variation</i><br />
<i>11. Detection and Interpretation of Genomic Structural Variation in Mammals</i><br />
<i>12. Structural Genetic Variation in the Context of Somatic Mosaicism</i><br />
<i>13. Online Resources for Genomic Structural Variation</i><br />
<i>14. Algorithm Implementation for CNV Discovery Using Affymetrix and Illumina SNP Array Data</i><br />
<i>15. Targeted Screening and Validation of Copy Number Variations</i><br />
<i>16. High-Resolution Copy Number Profiling by Array CGH Using DNA Isolated from Formalin-Fixed, Paraffin-Embedded Tissues</i><br />
<i>17. Characterizing and Interpreting Genetic Variation from Personal Genome Sequencing</i><br />
<i>18. Massively Parallel Sequencing Approaches for Characterization of Structural Variation</i><br />
<div style="font-weight: bold;">
<br /></div>
</div>Pascalhttp://www.blogger.com/profile/09612362594213376341noreply@blogger.com0tag:blogger.com,1999:blog-2012399429299935824.post-81855883350485865242011-12-08T09:17:00.001-08:002012-01-16T03:09:36.022-08:00Pipelines to simulate and detect indels<div dir="ltr" style="text-align: left;" trbidi="on">
<div dir="ltr" style="text-align: left;" trbidi="on">
This post describes a pipeline to simulate short reads using <a href="https://github.com/lh3/wgsim">wgsim</a>, align the produce short reads using <a href="http://bio-bwa.sourceforge.net/">BWA</a> or <a href="http://bowtie-bio.sourceforge.net/bowtie2/index.shtml">Bowtie2</a> and finally detects indels using <a href="http://www.broadinstitute.org/gsa/wiki/index.php/Unified_genotyper">GATK UnifiedGenotyper</a> (Broad Institute) or <a href="http://www.sanger.ac.uk/resources/software/dindel/">Dindel</a> (Welcome Trust Sanger Institute).<br />
<br />
<span class="Apple-style-span" style="font-size: large;"><b>Step 1 - Get a reference genome</b></span><br />
<div>
<br />
I chose to get the human reference genome v37 from <i>1000 Genomes Project: </i><a href="ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz">download link to human_g1k_v37.fasta.gz file</a> (851MB). You can extract the chromosome of your choice (to avoid working on the whole genome) with samtools:<br />
<br />
<span class="Apple-style-span" style="background-color: #f3f3f3; font-family: 'courier new';">samtools faidx human_g1k_v37.fasta 20 > human_g1k_v37_chr20.fasta</span>
<br />
<br />
<br />
<b><span class="Apple-style-span" style="font-size: large;">Step 2 - Simulate short reads</span></b><br />
<div>
<br />
There are several short reads simulators (e.g. <a href="http://maq.sourceforge.net/">MAQ</a>, <a href="https://github.com/jstjohn/SimSeq">SimSeq</a>, <a href="http://flux.sammeth.net/simulator.html">FluxSimulator</a>). Here we are going to use wgsim that was once included in samtools and later extracted as a standalone project.</div>
<div>
<br /></div>
<div>
Wgsim project homepage: <a href="https://github.com/lh3/wgsim">https://github.com/lh3/wgsim</a></div>
<div>
<br />
Before running the simulator we need to calculate the number of reads we need. Remember <a href="http://en.wikipedia.org/wiki/Shotgun_sequencing#Coverage">the formula to calculate the coverage</a>:<br />
<br />
<b>Coverage = N x L / G
</b><br />
<div style="text-align: center;">
<div style="text-align: left;">
<i>(where N: number of reads - L: read length - G: genome size)</i></div>
</div>
<div style="text-align: center;">
<i><br /></i></div>
To detect indels we aim to have a "good" coverage of the genome, let's aim to get a coverage around 20. Then, if we produce 70-bp reads on human chromosome 20 (<a href="http://en.wikipedia.org/wiki/Chromosome_20_(human)">63Mbp long</a>), it is easy with the formula above to find that we need at least 18.000.000 reads to achieve this coverage.<br />
<br /></div>
<div>
Once installed one run wgsim with the following command to generate the two files of paired reads:<br />
<br />
<span class="Apple-style-span" style="background-color: #f3f3f3; font-family: 'courier new';">wgsim -N 20000000 -X 0.95 human_g1k_v37_chr20.fasta out.read1.fq out.read2.fq > wgsim.out</span>
<br />
<span class="Apple-style-span" style="background-color: #f3f3f3; font-family: 'courier new';"><br /></span><br />
Note the use of the -X option to increase the probability to extend an indel. The file wgsim.out will contain the indels generated where the column 1 is the chromosome, column 2 is the position, column 3 is the original base, column 4 is the new base (following the <a href="http://www.bioinformatics.org/sms/iupac.html">IUPAC codes</a>) and column 5 is genomic copy/haplotype. The files out.read1.fq and out.read2.fq will contains the two reads of the paired reads (remember we are working with PEM here).</div>
<div>
<br /></div>
<div>
A limitation of wgsim is that it generates bases with a fixed quality of Q17 (encoded as "2" in fastq file) and the UnifiedGenotyper to be used below throws away everything that is below Q20. Hence we can reprocess the fastq files to uniformly increase base qualities with a awk command (Credit: <a href="http://biostar.stackexchange.com/questions/15205/how-to-control-wgsim-base-qualities">biostar contribution</a>):<br />
<br /></div>
<span class="Apple-style-span" style="background-color: #f3f3f3; font-family: 'courier new';">
awk ' /222/ { gsub("2", "I"); print $0; next } { print } ' out.read1.fq > out.read1b.fq<br />
awk ' /222/ { gsub("2", "I"); print $0; next } { print } ' out.read2.fq > out.read2b.fq
</span>
<br />
<br />
<b><span class="Apple-style-span" style="font-size: large;">Step 3 - Aligning short reads</span></b><br />
Whatever the alignment method you choose, be sure to apply "Post-processing of alignment file" before going ahead with next step.<br />
<br />
<b><i>3.1. Aligning using BWA</i></b><br />
Please consult <a href="http://bio-bwa.sourceforge.net/bwa.shtml">BWA Manual Reference pages</a> to get more details on the commands used below to <i>index database sequences in the FASTA format </i>(index)<i>, Find the SA coordinates of the input reads</i> (aln), and finally <i>generate alignments in the SAM format given paired-end reads</i> (sampe):<br />
<br />
<span class="Apple-style-span" style="background-color: #f3f3f3; font-family: 'courier new';">bwa index human_g1k_v37_chr20.fasta</span>
<br />
<br />
<i><b>Important:</b> the default BWA algorithm for constructing index, IS, is fast but will not work on database (fasta file) larger than 2GB. For larger database, use the option "-a bwtsw". </i><br />
<br />
<span class="Apple-style-span" style="background-color: #f3f3f3; font-family: 'courier new';">bwa aln human_g1k_v37_chr20.fasta out.read1b.fq -f out1.sai<br />
bwa aln human_g1k_v37_chr20.fasta out.read2b.fq -f out2.sai
</span>
<br />
<i><b><br /></b></i><br />
<i><b>Note:</b> if your computer has several CPUs, you probably want to use them all through the "-t" option (see <a href="http://bio-bwa.sourceforge.net/bwa.shtml">BWA Manual Reference pages</a>). </i><br />
<br />
<span class="Apple-style-span" style="background-color: #f3f3f3; font-family: 'courier new';">bwa sampe -f out.sam -r '@RG\tID:foo\tSM:bar' human_g1k_v37_chr20.fasta out1.sai out2.sai out.read1b.fq out.read2b.fq</span><br />
<span style="background-color: #f3f3f3; font-family: 'courier new';">samtools view -hbS out.sam -o out.bam </span><br />
<div style="text-align: -webkit-auto;">
<br />
<i><b>Note:</b> </i>
<i>as an optimization the command "bowtie2..." could be pipedto the "samtools view..." command in order to generate directly the binary version of the alignment file (.bam instead of .sam) and avoid the creation of an intermediary .sam file.</i><i> This is an optimization proposed by <a href="http://seqanswers.com/forums/showthread.php?p=59227&posted=1#post59227">swbarnes2 in SEQanswers</a>.</i><br />
<br />
<i><b>Note:</b> as suggested by <a href="http://seqanswers.com/forums/showthread.php?p=59334#post59334">Heisman in SEQanswers</a>, the previous command may also be optimized using the optional parameter -u to request samtools not to compress the .BAM file as this will be done by the "samtools sort" command below.</i><br />
<br /></div>
<b><i>3.2. Aligning using Bowtie</i></b><br />
As an alternative to BWA, we can use Bowtie 2 to align short reads to the genome. First, call bowtie-build to index our reference genome:<br />
<br />
<span class="Apple-style-span" style="background-color: #f3f3f3; font-family: 'Courier New', Courier, monospace;">bowtie2-build human_g1k_v37_chr20.fasta homo_chr20</span>
<br />
<br />
Then the call to actually create the alignment:<br />
<br />
<span class="Apple-style-span" style="background-color: #f3f3f3; font-family: 'Courier New', Courier, monospace;">bowtie2 -t homo_chr20 -X 700 -1 out.read1b.fq -2 out.read2b.fq -S out.sam</span><br />
<span style="background-color: #f3f3f3; font-family: 'Courier New', Courier, monospace;">samtools view -hbS out.sam -o out.bam</span><br />
<br />
<i style="text-align: -webkit-auto;"><b>Note:</b> as an optimization the command "bowtie2..." could be pipedto the "samtools view..." command in order to generate directly the binary version of the alignment file (.bam instead of .sam) and avoid the creation of an intermediary .sam file. This is an optimization proposed by <a href="http://seqanswers.com/forums/showthread.php?p=59227&posted=1#post59227">swbarnes2 in SEQanswers</a>.</i><br />
<br />
<div style="text-align: -webkit-auto;">
<i><b>Note:</b> as suggested by <a href="http://seqanswers.com/forums/showthread.php?p=59334#post59334">Heisman in SEQanswers</a>, the previous command may also be optimized using the optional parameter -u to request samtools not to compress the .BAM file as this will be done by the "samtools sort" command below.</i></div>
<div>
<i><br /></i></div>
<br />
<i><b>3.3. Post-processing of alignment file</b></i><br />
Before calling indels, the alignment file will be converted in binary format, sorted then indexed using samtools:<br />
<br />
<span class="Apple-style-span" style="background-color: #f3f3f3; font-family: 'courier new';">samtools sort out.bam out_sorted<br />
samtools index out_sorted.bam</span>
<br />
<div style="text-align: -webkit-auto;">
</div>
<br />
<br />
<br />
<span class="Apple-style-span" style="font-size: large;"><b>Step 4 - Calling indels</b></span><br />
<br />
<b><i>4.1. Calling indels with GATK UnifiedGenotyper</i></b><br />
<br />
<div style="text-align: -webkit-auto;">
<i><b>Important:</b> running the local realignment as described <a href="http://www.broadinstitute.org/gsa/wiki/index.php/Local_realignment_around_indels">here</a> is highly recommended to improve accuracy prior to run UnifiedGenotyper.</i></div>
<br />
<span class="Apple-style-span" style="background-color: #f3f3f3; font-family: 'courier new';">java -jar /home/pmaugeri/tools/GenomeAnalysisTK-1.3-21-gcb284ee/GenomeAnalysisTK.jar -R human_g1k_v37_chr20.fasta -T UnifiedGenotyper -I out_sorted.bam -o gatk_indels.vcf -glm INDEL</span>
<br />
<div style="text-align: -webkit-auto;">
<br />
Indels called by the UnifiedGenotyper can be found in the file specified by the -o option (file "gatk_pipeline_indels.vcf").<br />
<br />
Here's an example of a VCF file generated with GATK with this pipeline: <a href="http://dl.dropbox.com/u/53650620/blog/biotechies/gatk.vcf">gatk.vcf</a> (~2MB)<br />
<br /></div>
<b><i>4.2. Calling indels with Dindel</i></b><br />
Calling indels using Sanger software is done in 4 steps where the first and the third are the longest ones:<br />
<br />
<span class="Apple-style-span" style="background-color: #f3f3f3; font-family: 'courier new';">dindel --ref human_g1k_v37_chr20.fasta --outputFile 1 --bamFile out_sorted.bam --analysis getCIGARindels</span>
<br />
<div style="text-align: -webkit-auto;">
<br />
<span class="Apple-style-span" style="background-color: #f3f3f3; font-family: 'courier new';">python makeWindows.py --inputVarFile 1.variants.txt --windowFilePrefix 2 --numWindowsPerFile 20000</span>
<br />
<div>
<br />
<span class="Apple-style-span" style="background-color: #f3f3f3; font-family: 'courier new';">dindel --analysis indels --doDiploid --bamFile out_sorted.bam --ref human_g1k_v37_chr20.fasta --varFile 2.1.txt --libFile 1.libraries.txt --outputFile 3 > 3.out 2> 3.err</span>
<br />
<div>
<br />
<span class="Apple-style-span" style="background-color: #f3f3f3; font-family: 'courier new';">echo 3.glf.txt > 3.list<br />
python mergeOutputDiploid.py -i 3.list -o dindel_indels.vcf -r human_g1k_v37_chr20.fasta</span>
<br />
<div>
<br />
Indels called by Dindel are found in file specified by -o option.<br />
<br />
Here's an example of a VCF file generated with Dindel with this pipeline: <a href="http://dl.dropbox.com/u/53650620/blog/biotechies/dindel.vcf">dindel.vcf</a> (~300KB)<br />
<br />
<br />
<b>Thanks a lot to Alex Renwick, swbarnes2, Heisman and vyellapa for their great suggestions in <a href="http://seqanswers.com/forums/showthread.php?p=59227&posted=1#post59227">SEQanswers forum</a> to improve this post.</b><br />
<b><br /></b><br />
<b><br /></b><br />
<b><br /></b></div>
</div>
</div>
</div>
</div>
</div>
</div>Pascalhttp://www.blogger.com/profile/09612362594213376341noreply@blogger.com0tag:blogger.com,1999:blog-2012399429299935824.post-24980331807739933242011-11-29T03:16:00.001-08:002011-12-09T08:39:12.177-08:00Genome Structural Variation on simulated genome<div dir="ltr" style="text-align: left;" trbidi="on">
Here's a brief <i>memo</i> to simulate a read alignment (and indels) based on a human genome chromosome and detect Structural Variations.<br />
<b><br /></b><br />
<b>Step 1 - Get a reference genome</b><br />
<div>
I chose to get the human reference genome v37 from <i>1000 Genomes Project: </i><a href="ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz">download link to human_g1k_v37.fasta.gz file</a> (851MB).</div>
<div>
<br /></div>
<div>
You can extract the chromosome of your choice (to avoid working on the whole genome) with samtools:</div>
<div>
<br /></div>
<div>
<span class="Apple-style-span" style="background-color: #f3f3f3; font-family: 'courier new';">samtools faidx human_g1k_v37.fasta 20 > human_g1k_v37_chr20.fasta</span></div>
<div>
<br /></div>
<div>
<b>Step 2 - Simulate reads and indels</b></div>
<div>
There are several short reads simulators (e.g. <a href="https://github.com/jstjohn/SimSeq">SimSeq</a>, <a href="http://flux.sammeth.net/simulator.html">FluxSimulator</a>). Here we are going to use wgsim that was once included in samtools and later extracted as a standalone project.</div>
<div>
<br /></div>
<div>
Wgsim project homepage: <a href="https://github.com/lh3/wgsim">https://github.com/lh3/wgsim</a></div>
<div>
<br />
Before running the simulator we need to calculate the number of reads we need. Remember <a href="http://en.wikipedia.org/wiki/Shotgun_sequencing#Coverage">the formula to calculate the coverage</a>:<br />
<br />
<div style="text-align: center;">
<i style="background-color: yellow;">Coverage = N x L / G</i></div>
<div style="text-align: center;">
<i>(where N: number of reads - L: read length - G: genome size)</i></div>
<div style="text-align: center;">
<i><br /></i></div>
To detect indels we aim to have a "good" coverage of the genome, let's aim to get a coverage around 20. Then, if we produce 70-bp reads on human chromosome 20 (<a href="http://en.wikipedia.org/wiki/Chromosome_20_(human)">63Mbp long</a>), it is easy with the formula above to find that we need at least 18.000.000 reads to achieve this coverage.<br />
<br /></div>
<div>
Once installed you run wgsim with the following command to generate the two files of paired reads:<br />
<br />
<span class="Apple-style-span" style="background-color: #f3f3f3; font-family: 'Courier New', Courier, monospace;">wgsim -N 20000000 -X 0.95 human_g1k_v37_chr20.fasta out.read1.fq out.read2.fq > wgsim.out</span><br />
<br />
Note the use of the -X option to increase the probability to extend an indel. The file wgsim.out will contain the indels generated where the column 1 is the chromosome, column 2 is the position, column 3 is the original base, column 4 is the new base (following the <a href="http://www.bioinformatics.org/sms/iupac.html">IUPAC codes</a>) and column 5 is genomic copy/haplotype. The files out.read1.fq and out.read2.fq will contains the two reads of the paired reads (remember we are working with PEM here).<br />
<br />
<b>Step 3 - create the alignment file</b><br />
With the paired reads files created in previous files we can align these reads against the reference genome and create an alignment file.<br />
<br />
Again here, several tools are available and we will use <a href="http://bowtie-bio.sourceforge.net/bowtie2/index.shtml">bowtie2</a>.<br />
<br />
First, call bowtie-build to index our reference genome<br />
<br />
<span class="Apple-style-span" style="background-color: #f3f3f3; font-family: 'Courier New', Courier, monospace;">bowtie2-build human_g1k_v37_chr20.fasta homo_chr20</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"><br /></span><br />
Then the call to actually create the alignment:<br />
<br />
<span class="Apple-style-span" style="background-color: #f3f3f3; font-family: 'Courier New', Courier, monospace;">bowtie2 -t homo_chr20 -X 700 -1 out.read1.fq -2 out.read2.fq -S homo_chr20.sam</span><br />
<br />
Convert the alignment file to a binary alignment file (.SAM -> .BAM), sort and index using samtools:<br />
<br />
<span class="Apple-style-span" style="background-color: #f3f3f3; font-family: 'Courier New', Courier, monospace;">samtools view -bS homo_chr20.sam > homo_chr20.bam</span><br />
<span class="Apple-style-span" style="background-color: #f3f3f3; font-family: 'Courier New', Courier, monospace;">samtools sort homo_chr20.bam homo_chr20_sorted</span><br />
<span class="Apple-style-span" style="background-color: #f3f3f3; font-family: 'Courier New', Courier, monospace;">samtools index homo_chr20_sorted.bam</span><br />
<br />
<b>Step 4 - detect Structural Variants</b><br />
The following command calls shows how to detect indels generated in step 1:<br />
<br />
<span class="Apple-style-span" style="background-color: #f3f3f3;"><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">dindel --ref human_g1k_v37_chr20.fasta --outputFile 1 --bamFile homo_chr20_sorted.bam --analysis getCIGARindels</span></span><br />
<br />
<span class="Apple-style-span" style="background-color: #f3f3f3;"><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">python makeWindows.py --inputVarFile 1.variants.txt --windowFilePrefix 2 --numWindowsPerFile 20000</span></span><br />
<br />
<span class="Apple-style-span" style="background-color: #f3f3f3; font-family: 'Courier New', Courier, monospace;">dindel --analysis indels --doDiploid --bamFile homo_chr20_sorted.bam --ref human_g1k_v37_chr20.fasta --varFile 2.1.txt --libFile 1.libraries.txt --outputFile 3 > 3.out 2> 3.err</span><br />
<br />
<span class="Apple-style-span" style="background-color: #f3f3f3; font-family: 'Courier New', Courier, monospace;">echo 3.glf.txt > 3.list</span><br />
<br />
<span class="Apple-style-span" style="background-color: #f3f3f3;"><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">python mergeOutputDiploid.py -i 3.list -o 4.vcf -r </span></span><span class="Apple-style-span" style="background-color: #f3f3f3; font-family: 'Courier New', Courier, monospace;">human_g1k_v37_chr20.fasta</span><br />
<br />
<br /></div>
</div>Pascalhttp://www.blogger.com/profile/09612362594213376341noreply@blogger.com2tag:blogger.com,1999:blog-2012399429299935824.post-7569074008028219722011-04-17T13:33:00.001-07:002011-04-26T13:10:49.346-07:00Using XSL to filter biomedical data<div><div>This post gives a short introduction to XSLT (for Extended Stylesheet Language Transformation) that will be used to transform an XML file into <i>something else</i>. This <i>something else</i> could be another XML file with a different structure, a CSV file or any other text content. </div><div><br /></div><div>This is a very tiny tutorial intended to help you in setting your environment and doing a first transformation using biomedical data, so I won't go into details of the language nor try to provide an exhaustive list of available operations, there is already plenty of good literature and reference on this subject (see last paragraph "Further reading").</div><div><br /></div><div>I used this technology some years ago to dynamically transform a stream of data into either a HTML page or a WAP page (for mobile phone internet browser) depending on the client being detected. But here we are going to use XSL for handling and transforming a XML document resulting from a search in UniProt (list of proteins).</div><div><br /></div><div><b>Objective:</b> let's say we want to transform the XML document downloaded from UnitProt into a nicer CSV file with two columns: the protein name and the gene name. </div><div><br /></div><div>You will see that the corresponding filter could be easily adapted to your needs or to any other biomedical source of data (MeSH, Drugbank, etc.).</div><div><br /></div><b>Requirements/Environment</b><br />In order to run XSL in your computer you need the following software installed and properly configured:</div><div><ul><li>Java 1.6 or higher</li><li>Download and install <a href="http://xml.apache.org/xalan-j/downloads.html#latest-release">xalan-j_2_7_1-bin</a></li><li>once you have Xalan installed, define the environment variable <i>XALAN_HOME</i> that will point to Xalan root folder (e.g. <span class="Apple-style-span" style="font-family: Verdana; font-size: 13px; ">XALAN_HOME=C:\java\xalan-j_2_7_1)</span></li><li>update your Java classpath, it should at least include:<br /><br />CLASSPATH=%XALAN_HOME%\xalan.jar;%XALAN_HOME%\xalan.jar;%XALAN_HOME%\serializer.jar;%XALAN_HOME%\xml-apis.jar;%XALAN_HOME%\xercesImpl.jar</li></ul><div><b>Input data</b></div><div>We will use the XML data produced by UniProtKB when searching for proteins related to organism "Moloney murine leukemia virus". <u>This is random example</u> to demonstrate the technique on a small XML file, but of course the same could be applied on different data.</div><div><br /></div><div>Follow these steps to get the XML file:</div><div><ol><li>go to <a href="http://www.uniprot.org/">http://www.uniprot.org</a></li><li>run a search using the query:<br />organism:"Moloney murine leukemia virus"</li><li>in the search result page click the Download link</li><li>download the file "Complete Data in XML format" and save it using the filename uniprot.xml</li></ol><div>So we have a XML file to play with. You may open it in a text editor to understand the structure of the XML content before doing the XSL script. Note that Uniprot returned around 22 proteins, so you should find in the XML file 22 <entry/> blocks </div></div><div><b><br /></b></div><div><b>Write your XSL script</b></div></div><div>We will write a simple XSL script that will output 2 columns of data: a column with the "Protein fullname" and a column with the "Gene name". Both columns will be separated with a TAB character.</div><div><br /></div><div>Open a text editor and create a file uniprot.xsl.</div><div><br /></div><div>A XSL script is itself an XML file! So it should logically start by the header:</div><div><span class="Apple-tab-span" style="white-space:pre"><br /></span></div><div><span class="Apple-tab-span" style="white-space:pre"> </span><?xml version="1.0" encoding="ISO-8859-1"?></div><div><span class="Apple-tab-span" style="white-space:pre"><br /></span></div><div>Then the whole XML document should be written between the "stylesheet" tag:</div><div><span class="Apple-tab-span" style="white-space:pre"> </span></div><div><span class="Apple-tab-span" style="white-space:pre"><span class="Apple-tab-span" style="white-space:pre"> </span></span><span class="Apple-style-span" style="white-space: pre; "><xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"></span></div><div><span class="Apple-tab-span" style="white-space:pre"> </span>...</div><div><span class="Apple-tab-span" style="white-space:pre"> </xsl:stylesheet></span></div><div><br /></div><div>The next line will say to the XSL interpreter to match the root of the XML document:</div><div><br /></div><div><span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-style-span" style="white-space: pre;"><xsl:template match="/"></span></div><div><br /></div><div>Then this prints the column header of our CSV file (a TAB between two headers and a line feed character at the end):</div><div><span class="Apple-style-span" style="white-space: pre;"><br /></span></div><quote><div><span class="Apple-tab-span" style="white-space:pre"> </span><!-- Print columns headers --></div><div><span class="Apple-tab-span" style="white-space:pre"> </span><xsl:text>Accession</xsl:text></div><div><span class="Apple-tab-span" style="white-space:pre"> </span><xsl:text>&#x9;</xsl:text></div><div><span class="Apple-tab-span" style="white-space:pre"> </span><xsl:text>Gene name</xsl:text></div><div><span class="Apple-tab-span" style="white-space:pre"> </span><xsl:text>&#10;</xsl:text></div><div><br /></div></quote><div>The following block is a bit more sophisticated. It is a loop on each "entry" tag of your XML file. For each entry it will print the full name of the protein (follow the XML path), and the gene name:</div><div><br /></div><div><div><div><span class="Apple-style-span" style="white-space: pre;"> <xsl:for-each select="uniprot/entry"></span></div><div><span class="Apple-style-span" style="white-space: pre;"> <xsl:value-of select="accession"/></span></div><div><span class="Apple-style-span" style="white-space: pre;"> <xsl:text>&#x9;</xsl:text> </span></div><div><span class="Apple-style-span" style="white-space: pre;"> <xsl:value-of select="gene/name"/></span></div><div><span class="Apple-style-span" style="white-space: pre;"> <xsl:text>&#x9;</xsl:text></span></div><div><span class="Apple-style-span" style="white-space: pre;"> <xsl:text>&#10;</xsl:text></span></div><div><span class="Apple-style-span" style="white-space: pre;"> </xsl:for-each> </span></div></div></div><div><span class="Apple-style-span" style="white-space: pre;"><br /></span></div><div>For your convenience here is the complete XSL file content:</div><div><br /></div><div><div><div><?xml version="1.0" encoding="ISO-8859-1"?></div><div><xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"></div><div><span class="Apple-tab-span" style="white-space:pre"> </span><xsl:template match="/"></div><div><span class="Apple-tab-span" style="white-space:pre"> </span><!-- Print columns headers --></div><div><span class="Apple-tab-span" style="white-space:pre"> </span><xsl:text>Accession</xsl:text></div><div><span class="Apple-tab-span" style="white-space:pre"> </span><xsl:text>&#x9;</xsl:text></div><div><span class="Apple-tab-span" style="white-space:pre"> </span><xsl:text>Gene name</xsl:text></div><div><span class="Apple-tab-span" style="white-space:pre"> </span><xsl:text>&#10;</xsl:text></div><div><span class="Apple-tab-span" style="white-space:pre"> </span><xsl:for-each select="uniprot/entry"></div><div><span class="Apple-tab-span" style="white-space:pre"> </span><xsl:value-of select="accession"/></div><div><span class="Apple-tab-span" style="white-space:pre"> </span><xsl:text>&#x9;</xsl:text><span class="Apple-tab-span" style="white-space:pre"> </span></div><div><span class="Apple-tab-span" style="white-space:pre"> </span><xsl:value-of select="gene/name"/></div><div><span class="Apple-tab-span" style="white-space:pre"> </span><xsl:text>&#x9;</xsl:text></div><div><span class="Apple-tab-span" style="white-space:pre"> </span><xsl:text>&#10;</xsl:text></div><div><span class="Apple-tab-span" style="white-space:pre"> </span></xsl:for-each></div><div><span class="Apple-tab-span" style="white-space:pre"> </span></xsl:template><span class="Apple-tab-span" style="white-space:pre"> </span></div><div></xsl:stylesheet></div></div></div><div><br /></div><div><br /></div><div><b>Run your XSL script</b></div><div>Once you have your input XML and your XSL written, you are ready to try it. The following will parse the input XML file and output the desired data defined previously in CSV file that you may us in Excel:</div><div><br /></div><div><span class="Apple-style-span" style="white-space:pre"> java org.apache.xalan.xslt.Process -IN uniprot.xml -XSL uniprot.xsl -OUT uniprot.csv</span></div><div><span class="Apple-style-span" style="white-space:pre"><br /></span></div><div><span class="Apple-style-span" style="white-space:pre"><br /></span></div><div><span class="Apple-style-span" style="white-space:pre"><span class="Apple-style-span" style="white-space: normal; "><b>Tip:</b> remember that if you manipulate large XML file, your Java runtime may complain about the heap space. You can increase the heap space by adding the parameter -Xms<new max="" heap="" size=""> to the java command line.</new></span></span></div><div><span class="Apple-style-span" style="white-space:pre"><span class="Apple-style-span" style="white-space: normal; "><br /></span></span></div><div><b>Further reading:</b></div><div><ul><li><a href="http://www.w3schools.com/xsl/default.asp">http://www.w3schools.com/xsl/default.asp</a></li><li><a href="http://www.xml.com/pub/a/2000/08/holman/index.html">What is XSLT?</a></li></ul></div>Pascalhttp://www.blogger.com/profile/09612362594213376341noreply@blogger.com0tag:blogger.com,1999:blog-2012399429299935824.post-356195834052483102011-04-16T00:14:00.000-07:002011-04-16T04:34:58.187-07:00Development of Drugs in Public-Private Partnerships environmentsThis week we organized a debate about the <b>Development of Drugs in Public-Private Partnerships (PPP) environments</b>. It raised some passionate interventions by the participants.<div><br /></div><div>Here are the slides we used to introduce this subject. We are using <a href="http://en.wikipedia.org/wiki/Rare_disease">Orphan Diseases</a> as a potential field where PPP could be fruitful and we present the project <a href="http://www.grants4targets.com/scripts/pages/en/index.php">Grants4Targets</a> as an example of PPP.</div><div><br /></div><div>If you are looking for data for your market research on drug development I suggest you have a look to the <i>Reference</i>s slide (slide #18), it contains interesting materials.</div><center><br /><iframe src="https://docs.google.com/present/embed?id=dg2r6jh7_1313dr3dwqmb" frameborder="0" width="410" height="342"></iframe></center>Pascalhttp://www.blogger.com/profile/09612362594213376341noreply@blogger.com0tag:blogger.com,1999:blog-2012399429299935824.post-3371224559667560352011-03-26T11:57:00.000-07:002011-04-02T13:18:50.968-07:00Recommended book<div>I have recently read "Molecular Modeling: Basic Principles and Applications". It gives a very good introduction for Structural Bioinformatics and also some bits of information for Molecular Simulation. It is a small, <b>light</b> and clearly written book.</div><div><div><br /></div></div><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 282px; height: 400px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiSMI1ipDfIjHK0J-NgM6aM_tMJV2kY5w0hYmmN3epMGJ7JrQJg2Uy6Hv8CPn1_bCWO1zFPTBoXSbbBzsiXD1cRAkYIySsqqTGlTRw-aoA6kI9-JiUPs75bDvF-EGuEKSZQej1hYKapZyE/s400/3527315683.jpg" border="0" alt="" id="BLOGGER_PHOTO_ID_5588465090303851282" /><div style="text-align: center;"><b>[ <a href="http://eu.wiley.com/WileyCDA/WileyTitle/productCd-3527315683.html">Wiley</a> | <a href="http://www.amazon.co.uk/Molecular-Modeling-Basic-Principles-Applications/dp/3527315683/ref=sr_1_1?ie=UTF8&s=books&qid=1301166522&sr=8-1">Amazon</a> ]</b></div><div><br /><div>The new chapter "Chemogenomic Approaches to Rational Drug Design" has been added to the third edition (I have actually read the second one).</div><div><br /><div>Table of contents:</div><div><div><ol><li><i>Introduction.</i></li><li><i>Small Molecules.</i></li><li><i>A Case Study for Small Molecule Modeling: Dopamine D3 Receptor Antagonists.</i></li><li><i>Introduction to Comparative Protein Modeling.</i></li><li><i>Virtual Screening and Docking.</i></li><li><i>Scope and Limits of Molecular Docking.</i></li><li><i>Chemogenomic Approaches to Rational Drug Design.</i></li><li><i>A Case Study for Protein Modeling: the Nuclear Hormone Receptor CAR as an example for Comparative Modeling and the Analysis of Protein-Ligand Complexes.</i></li></ol></div></div><div><br /></div><div><p style="font-family: Arial, Helvetica, sans-serif; font-size: small; "><br /></p></div></div></div>Pascalhttp://www.blogger.com/profile/09612362594213376341noreply@blogger.com0tag:blogger.com,1999:blog-2012399429299935824.post-80713382965158744862011-03-25T13:46:00.000-07:002011-04-16T01:04:42.007-07:00Structural introduction to HLA/MHCAn introduction to HLA protein group giving some nice pictures on structures of HLA/MHC.<div><br /><center><iframe src="https://docs.google.com/present/embed?id=dg2r6jh7_1262gt28r8ds" frameborder="0" width="410" height="342"></iframe></center><br /><div style="text-align: center;"><b><br /></b></div><div><div>Here are some pictures from the slideset:</div><div><div><br /></div><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhk1qvaTam4NkrZ7MZTlfQyWP04saJb_G3lFaImg7qaWEdDrBV2tPm9er8VG3eQqGAvNBXPSIi5mIYbj7MdDK9e1bc2wsgff48aWysSf5odY_plU3-49d1VIPH5eUDs1L2-2kGXdIMbCvs/s400/1HSA_peptide_binding_floor_from_top.png" style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 344px;" border="0" alt="" id="BLOGGER_PHOTO_ID_5588122528284168882" /><div style="text-align: center;"><i>View of a peptide binding site of MHC Class-I (HLA-B27, PDB: 1HSA</i>)</div></div><div><br /></div><div><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEijkjIZCymiKZSyv8WPrU0T1cWjwDKGKpGKjLQ915-tWBBL8vYZThhhlW15G5Xyj4s2TlSRP1ihTy5TYPi7Mj49Ag0KhAYgZBibDP2mOiC8DWuY7t2pRGdMMT1BJ9li03IYyFWPy81bJ3g/s400/3PWP+with+TCR.jpg" style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 386px;" border="0" alt="" id="BLOGGER_PHOTO_ID_5588122539227965042" /></div><div><div style="text-align: center;"><i>MHC HLA-A2 (in blue) bound to TCR (in red)</i></div><div style="text-align: center;"><span class="Apple-style-span"><span class="Apple-style-span" style="font-size: 20px; line-height: 18px; "><i><br /></i></span></span></div><div style="text-align: center;"><br /></div><br /></div></div></div>Pascalhttp://www.blogger.com/profile/09612362594213376341noreply@blogger.com0tag:blogger.com,1999:blog-2012399429299935824.post-1220075486847101142011-03-10T12:39:00.000-08:002011-03-10T12:52:14.277-08:00Visit of Parc Cientific Barcelona<div style="text-align: left;">Today, I had the great opportunity to visit <a href="http://www.pcb.ub.edu/homePCB/live/ct/p1.asp">Parc Cientific Barcelona</a>. A great cocktail of biomedicine labs and companies. It shows Catalunya is investing hard in the sector.</div><div style="text-align: center;"><br /></div><div>The labs are cleaned, well organized and equiped with brand new equipments.</div><div style="text-align: center;"><br /></div><div>Some pictures of the "tour" here...</div><div><br /></div><div>In the basement, some zebra fishes waiting for being fished by lab technicians (there were also a bunch of frogs behind :->):</div><div style="text-align: center;"><br /></div><div><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgh8PrRDydV86Rs-QM8o5HtbLiSBarZF1SWHZlL-m1ckcMUqdj6drk5VNH4ByGoyOz3j-oyWmjjWL5U_yoN0E0x8sFZs_Lj8KM2Z0McAB2EcTtH_pbos63iAL7t5VDecB4HCmdPRCS4BmM/s400/IMG_20110310_183112.jpg" style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 300px;" border="0" alt="" id="BLOGGER_PHOTO_ID_5582555054426173298" /><div style="text-align: center;"><br /></div></div><div>a peptide synthetizer (?):</div><div><br /></div><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhxSQcNUEb3j5xbdm2QjrfU64yIgW6WTCG6W306nqLYdXbphlHwN6OfidgrWjvDiwQoCp94MwwbCbDEfKi92AVdIkJgOm5EerrJIW18Gwq7M4yHI-X1tDXj2FifSzm0YaGGYqLr_2CcUp8/s400/IMG_20110310_181406.jpg" style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 300px; height: 400px;" border="0" alt="" id="BLOGGER_PHOTO_ID_5582555822427689954" /><div>An old "electron microscope" was waiting for its turn to go to museum:</div><div><br /></div><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj78CZ751mCkpfBM3a0NxOKosO8pJmYbKN8j2lWtq4ck2DEVUZs3jvqhyphenhyphenKtEuazQw_vmL2aK8ZsxoZfqKu2Adfhrd_nM0OoV6KAqFOGO4PonMxpRexiY7UE96KnPMLSnULUPPo8be06BsY/s1600/IMG_20110310_175556.jpg"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 300px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj78CZ751mCkpfBM3a0NxOKosO8pJmYbKN8j2lWtq4ck2DEVUZs3jvqhyphenhyphenKtEuazQw_vmL2aK8ZsxoZfqKu2Adfhrd_nM0OoV6KAqFOGO4PonMxpRexiY7UE96KnPMLSnULUPPo8be06BsY/s400/IMG_20110310_175556.jpg" border="0" alt="" id="BLOGGER_PHOTO_ID_5582556472340757826" /></a>Pascalhttp://www.blogger.com/profile/09612362594213376341noreply@blogger.com0tag:blogger.com,1999:blog-2012399429299935824.post-19576016289577298322011-03-03T12:53:00.001-08:002011-03-04T10:41:08.330-08:00Fixing errors in a modelHere is an interesting exercice (and solution) we did today in Structural Bioinformatics class.<div><br /></div><div>The goal is to fix an error in the following model:</div><div><br /></div><div><script src="http://biotechies.site40.net/jmol/JmolFolder/Jmol.js" type="text/javascript"></script> <script type="text/javascript"> jmolInitialize("http://biotechies.site40.net/jmol/JmolFolder"); jmolApplet(400, "load http://biotechies.site40.net/pdb/wrong-model.pdb;ribbons;color chains;wireframe off;spacefill off;");jmolBr();</script></div><br /><br /><div>Did you see this loop in the middle of the alpha-helix ?!</div><div><br /></div><div>Before fixing it, these are <i>Prosa</i> plots of this model which confirm the error:</div><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgdAyyntl2jn6BOEOgyWNPdlBRR3SmMWG0W7JaTKJ8YKiNMfEW_mBGqjkTVJA4suuQNyJvdq2abP81j10gsbprAzoY_BLhyphenhyphenoy7XplxHs-6g9-K3z-drjBqLYnQICoEXTiibXx77rocLbSk/s1600/prosa2.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 400px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgdAyyntl2jn6BOEOgyWNPdlBRR3SmMWG0W7JaTKJ8YKiNMfEW_mBGqjkTVJA4suuQNyJvdq2abP81j10gsbprAzoY_BLhyphenhyphenoy7XplxHs-6g9-K3z-drjBqLYnQICoEXTiibXx77rocLbSk/s400/prosa2.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5579962592595729826" /></a><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgiZ_Lhi0Wfr2LQLHge5M5lDbkMKBdmzQNc8-NggpW8WcDTVLkwLsTnkqfthtPwwu9pNbma6nDSe23q0YyOdQcxOrQNql8T99X33K5zYYxcycznzvI0dn8zHxnC87io_8etqNd49OHy-mc/s1600/prosa1.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 400px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgiZ_Lhi0Wfr2LQLHge5M5lDbkMKBdmzQNc8-NggpW8WcDTVLkwLsTnkqfthtPwwu9pNbma6nDSe23q0YyOdQcxOrQNql8T99X33K5zYYxcycznzvI0dn8zHxnC87io_8etqNd49OHy-mc/s400/prosa1.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5579962586085451442" /></a><div><div>So here are the steps we followed to fix it. There are also interesting tricks to know to manipulate the data.</div><div><br /></div></div><div><b>Getting the Amino Acids sequence</b></div><div><b><br /></b></div><div>In order to get the corresponding AA sequence out of the PDB file, we simply splitted the PDB using our local tool which also generates the FASTA file out of PDB:</div><div><br /></div><div>PDBtoSplitChain.pl -i wrong-model.pdb</div><div><br /></div><div>You may also use this public tool: <a href="http://swift.cmbi.ru.nl/servers/html/soupir.html">Make sequence file from PDB file</a></div><div><br /></div><div><b>Prepare the alignment for modeler</b></div><div><span class="Apple-tab-span" style="white-space:pre"><br /></span></div><div><b></b>Then we produced an alignment between our target sequence against itself:</div><div><span class="Apple-tab-span" style="white-space:pre"><br /></span></div><div>cat wrong-model.fa > homologs.fa</div><div>cat wrong-model.fa >> homologs.fa</div><div><span class="Apple-tab-span" style="white-space:pre"><br /></span></div><div>Edit the homologs.fa file and name the first sequence "seq" and the second "model".</div><div><span class="Apple-tab-span" style="white-space:pre"><br /></span></div><div>Create a clustal alignment with homologs.fa</div><div><span class="Apple-tab-span" style="white-space:pre"><br /></span></div><div><span class="Apple-tab-span" style="white-space:pre"> </span>clustalw <span class="Apple-tab-span" style="white-space:pre"> </span>homologs.fa</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>(or do it online through <a href="http://www.ebi.ac.uk/Tools/msa/clustalw2/">http://www.ebi.ac.uk/Tools/msa/clustalw2/</a>)</div><div><br /></div><div>We have to locate the error (the loop of the alpha helix) in this alignment. For that purpose, we used Jmol and doing clicks on the AA of the loop we found the sequence Serine->Serine->Valine->Glutamic->Glutamic->Leucine->Leucine or ...<span class="Apple-style-span" style="font-family: Arial; font-size: 15px; white-space: pre-wrap; "><b>SSVEELL...</b></span></div><div><br /></div><div>We edit the alignment file and shift this subsequence to the right:</div><div><br /></div><div><div>seq [...]QVAKSSVEELLLSQNSVKSL</div><div>model [...]QVAKSSVEELLLSQNSVKSL</div><div><br /></div></div><div>Becomes: </div><div><br /></div><div><div>seq [...]QVAKSSVEELLLSQNSVKSL-------</div><div>model [...]QVAK-------SSVEELLLSQNSVKSL</div></div><div><br /></div><div><br /></div><div>With this alignment we can produce a new model.</div><div><br /></div><div><b>Generate a better model</b></div><div><br /></div><div>Based on the alignment modified in previous section, we are going to produce a model and assess whether we improve it or not.</div><div><br /></div><div>Convert the Clustal alignment file into a PIR file.</div><div><br /></div><div>For that purpose you may try the online tool: <a href="http://genome.nci.nih.gov/tools/reformat.html">Sequence Format Converter</a>.</div><div><br /></div><div>With the alignment in PIR format ready prepare the Modeler script:</div><div><br /></div><div><div><i>from modeller import * # Load standard Modeller classes</i></div><div><i>from modeller.automodel import * # Load the automodel class</i></div><div><i><br /></i></div><div><i>log.verbose() # request verbose output</i></div><div><i>env = environ() # create a new MODELLER environment to build this model in</i></div><div><i><br /></i></div><div><i># directories for input atom files</i></div><div><i>env.io.atom_files_directory = ['.', '../atom_files']</i></div><div><i><br /></i></div><div><i>a = automodel(env,</i></div><div><i> alnfile = 'homologs.ali', # alignment filename</i></div><div><i> knowns = ('seq'), # codes of the templates</i></div><div><i> sequence = 'model') # code of the target</i></div><div><i>a.starting_model= 1 # index of the first model</i></div><div><i>a.ending_model = 3 # index of the last model</i></div><div><i> # (determines how many models to calculate)</i></div><div><i>a.make() # do the actual homology modeling</i></div></div><div><br /></div><div>and create the model:</div><div><br /></div><div><span class="Apple-tab-span" style="white-space:pre"> mod9v7 model-default.py</span></div><div><br /></div><div><b>Assessment of the new model</b></div><div><br /></div><div>This is the new model we just produced. Observed that the loop does not appear in the alpha helix:</div><br /><div><script src="http://biotechies.site40.net/jmol/JmolFolder/Jmol.js" type="text/javascript"></script> <script type="text/javascript"> jmolInitialize("http://biotechies.site40.net/jmol/JmolFolder"); jmolApplet(400, "load http://biotechies.site40.net/pdb/model.B99990001.pdb;ribbons;color chains;wireframe off;spacefill off;");jmolBr();</script></div><br /><div>And indeed this new model gives a much better result in Prosa:</div><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjEd_aJ43ynr8ImW3wD_GBbHkbEsq8cWRvF_U6KADm_1JE_1c5EPq9So_8OmpZ3SyR8uqGfhaJEaBmzmnU5ach2-dMHKQHv5dmqvxYfQLVU31n55UevcjaRF3fiqV0Ovi7iSWkW_31f2qs/s1600/prosa1.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 400px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjEd_aJ43ynr8ImW3wD_GBbHkbEsq8cWRvF_U6KADm_1JE_1c5EPq9So_8OmpZ3SyR8uqGfhaJEaBmzmnU5ach2-dMHKQHv5dmqvxYfQLVU31n55UevcjaRF3fiqV0Ovi7iSWkW_31f2qs/s400/prosa1.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5579974918956935458" /></a>This manual modification of the alignment can be iteratively done to improve even more the model.Pascalhttp://www.blogger.com/profile/09612362594213376341noreply@blogger.com0tag:blogger.com,1999:blog-2012399429299935824.post-69972265785421822652011-02-25T12:15:00.000-08:002011-02-25T12:27:25.021-08:00Calculating distances with VMD Tk scriptingThis post shows how to calculate distances between atoms using Tk scripting.<div><br /></div><div>Assuming you have loaded the molecule <a href="http://biotechies.heliohost.org/pdb/1NEY.pdb">1NEY</a> in VMD we are going to calculate the distance between the atom OE1 of chain A with all of the residues of the protein. We will consider the alpha-Carbon atom to be arbitrary the center of a residue.</div><div><br /></div><div>Open the Tk Console with menu <b>Extensions > Tk Console</b>.</div><div><br /></div><div>The following two lines store the coordinates of oxygen named OE1:</div><div><br /></div><div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; background-color: transparent; font-size: medium; "><span class="Apple-style-span" ><span id="internal-source-marker_0.9904203857295215" style="font-size: 11pt; color: rgb(0, 0, 0); background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">% set oe1 [atomselect top "name OE1 and resid 165 and chain A"]</span><br /><span style="font-size: 11pt; color: rgb(0, 0, 0); background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">% set OE1coord [lindex [$oe1 get {x y z}] 0]</span></span></div></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; background-color: transparent; font-size: medium; "><span class="Apple-style-span" ><span style="font-size: 11pt; color: rgb(0, 0, 0); background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; "><br /></span></span></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; background-color: transparent; "><span class="Apple-style-span"><span style="background-color: transparent; vertical-align: baseline; "><div style="font-weight: normal; font-family: Georgia, serif; color: rgb(0, 0, 0); font-size: 16px; font-style: normal; white-space: normal; text-decoration: none; ">This will store all alpha-Carbon carbons in the list CASel:</div><div style="font-weight: normal; font-family: Georgia, serif; color: rgb(0, 0, 0); font-size: 16px; font-style: normal; white-space: normal; text-decoration: none; "><br /></div><div style="color: rgb(0, 0, 0); font-family: 'courier new'; font-size: 16px; font-style: normal; white-space: normal; text-decoration: none; "><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; background-color: transparent; font-size: medium; "><span id="internal-source-marker_0.9904203857295215" style="font-size: 11pt; color: rgb(0, 0, 0); background-color: transparent; font-style: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">% set CASel [atomselect top "type CA"]</span></div></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; background-color: transparent; font-size: medium; color: rgb(0, 0, 0); font-family: 'courier new'; font-style: normal; white-space: normal; text-decoration: none; "><span id="internal-source-marker_0.9904203857295215" style="font-size: 11pt; color: rgb(0, 0, 0); background-color: transparent; font-style: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; "><br /></span></div><div style="font-weight: normal; font-family: Georgia, serif; color: rgb(0, 0, 0); font-size: 16px; font-style: normal; white-space: normal; text-decoration: none; ">Then to compute the distance between OE1coord an each element of CASel, you just run this script:</div><div style="font-weight: normal; font-family: Georgia, serif; color: rgb(0, 0, 0); font-size: 16px; font-style: normal; white-space: normal; text-decoration: none; "><br /></div><div><div><span class="Apple-style-span" >% foreach ca [$CASel get {x y z}] { </span></div><div><span class="Apple-style-span" > set distance [veclength [vecsub $OE1coord $ca]] </span></div><div><span class="Apple-style-span" > echo $ca : $distance</span></div><div><span class="Apple-style-span" >}</span></div></div><div><span class="Apple-style-span" ><br /></span></div><div><span class="Apple-style-span" ><span class="Apple-style-span" style="font-family: Georgia, serif; "><div style="font-weight: normal; font-family: Georgia, serif; color: rgb(0, 0, 0); font-size: 16px; font-style: normal; white-space: normal; text-decoration: none; "><br /></div><div><br /></div></span></span></div></span></span></div>Pascalhttp://www.blogger.com/profile/09612362594213376341noreply@blogger.com1tag:blogger.com,1999:blog-2012399429299935824.post-67323462361098991722011-02-25T10:47:00.000-08:002011-02-25T12:15:08.036-08:00Displaying protein chains with VMD<div style="text-align: left;"><a href="http://www.ks.uiuc.edu/Research/vmd/">VMD</a> is a molecular visualization program that can display static or animated systems. It has a built-in programming language called Tk. The version used in this post is 1.8.7.</div><div><br /></div><div>We are going to show some common operations on the protein <i>Triosephosphate Isomerase in Complex with DHAP</i> using this <a href="http://biotechies.heliohost.org/pdb/1NEY.pdb">modified PDB file</a>.</div><div><br /></div><div>We start by displaying the protein chains using different colors.</div><div><ol><li>start VMD program</li><li>menu <b>File > New Molecule, </b>browse to the downloaded file <a href="http://biotechies.heliohost.org/pdb/1NEY.pdb">1NEY.pdb</a>, then click <b>Load</b> button. You should see something like this:<br /><br /><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg3qB6Np03_UzGfEoaQS2BMJz109xHTdx7KHzYMk0azlR98BTFY0Huy20G3bjVqaAnYd7owJ52Shs7NLmBERgL7BnC6x7RVG2vpmN-mMetEoBvr0Hl9sBWsyb_HP7wrypRMwoeJHeYX3Xc/s200/vmd.jpg" style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 200px; height: 191px;" border="0" alt="" id="BLOGGER_PHOTO_ID_5577707817299528434" /><br /></li><li>menu <b>Graphics > Representations</b></li><li>select the line in the list of representations and delete it by clicking <b>Delete Rep </b>button,</li><li>click twice the <b>Create Rep</b> button to create one graphical representation for each chain,</li><li>select the first graphical representation line,<br />- for <b>Drawing Method </b>choose for instance <b>NewRibbons</b><br />- for<span class="Apple-tab-span" style="white-space:pre"> </span> <b>Coloring Method </b>choose <b>ColorID</b> and then choose a color in the list of IDs<br />- in the <b>Selected Atoms</b> field<b> </b>enter <b>chain A</b><br />- then click the button <b>Apply</b></li><li><b><span class="Apple-style-span" style="font-weight: normal; ">select the second graphical representation line,<br />- for <b>Drawing Method </b>choose for instance <b>NewRibbons</b><br />- for<span class="Apple-tab-span" style="white-space: pre; "> </span> <b>Coloring Method </b>choose <b>ColorID</b> and then choose another color in the list of IDs<br />- in the <b>Selected Atoms</b> field<b> </b>enter <b>chain B</b><br />- then click the button <b>Apply</b></span></b></li></ol><div>You should get this output:</div></div><div><br /></div><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEipbcVhPsnUJch6DCJQzDo40irpdvwhIfBm0znA7pj2whyphenhyphenyvpTGNZ5Q5oWmMV3cDm5qofqEqEAXyBVALiAsRPQMKy8EyzZpl-tEGylKPBgb5LwU84JRN1h9_4MWxBk1GMcGrW706GHmrAs/s1600/vmd.jpg"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 382px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEipbcVhPsnUJch6DCJQzDo40irpdvwhIfBm0znA7pj2whyphenhyphenyvpTGNZ5Q5oWmMV3cDm5qofqEqEAXyBVALiAsRPQMKy8EyzZpl-tEGylKPBgb5LwU84JRN1h9_4MWxBk1GMcGrW706GHmrAs/s400/vmd.jpg" border="0" alt="" id="BLOGGER_PHOTO_ID_5577710545061356850" /></a>Remember the keyboard shortcuts to toggle the mouse pointer mode:<div><ul><li><b>r</b> for rotating</li><li><b>t </b>for translating</li><li><b>s</b> for scaling</li></ul></div>Pascalhttp://www.blogger.com/profile/09612362594213376341noreply@blogger.com0tag:blogger.com,1999:blog-2012399429299935824.post-5335075654908475762011-02-18T13:28:00.001-08:002011-02-18T13:50:32.644-08:00Python and Doxygen<a href="http://www.stack.nl/~dimitri/doxygen/index.html">Dogygen</a> produces a professional documentation of your source code. Originally wrote for C/C++ source code, it now supports a broader range of language: C, C++, Java, PHP, Python, etc.<div><br /></div><div>Here is a quick howto to get your Python project documented:</div><div><ol><li>download a <a href="http://www.stack.nl/~dimitri/doxygen/download.html#latestsrc">binary distribution</a> and install it in your system<br /><br /></li><li>generate a configuration file template running in a command line console:<br /><br />doxygen -g my_project.doxyfile<br /><br /></li><li>update the configuration template to match your project folder, and options. So far I have configured the following options only:<br /><br />PROJECT_NAME<br />PROJECT_NUMBER<br />OUTPUT_DIRECTORY (where you want the documentation files to be written)<br />EXTRACT_ALL = YES<br />EXTRACT_PRIVATE = YES<br />EXTRACT_STATIC = YES<br />FILE_PATTERNS = *.py<br />RECURSIVE = YES<br />GENERATE_LATEX = NO<br /><br /></li><li>then simply run doxygen with the configuration file as parameter:<br /><br /><span class="Apple-style-span">doxygen my_project.doxyfile</span></li></ol><div>Here's some HTML output:</div></div><div><br /></div><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhju6dgjhU6FafC5mrlyys2bFbYowzH1ROfxuPxbecM3mQ72N_R0lo3Utr__yQcjWXT1SSMKKJBM21z7gZw3H7XyOzA5twtKuyRKb0P-X-jXLbonBucZcyU-fBU8UGclSUHWRO4X2f19og/s1600/biotechies.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 267px; height: 400px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhju6dgjhU6FafC5mrlyys2bFbYowzH1ROfxuPxbecM3mQ72N_R0lo3Utr__yQcjWXT1SSMKKJBM21z7gZw3H7XyOzA5twtKuyRKb0P-X-jXLbonBucZcyU-fBU8UGclSUHWRO4X2f19og/s400/biotechies.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5575147992992293474" /></a>Pascalhttp://www.blogger.com/profile/09612362594213376341noreply@blogger.com0tag:blogger.com,1999:blog-2012399429299935824.post-23847659881248629292011-02-16T10:34:00.001-08:002011-02-16T13:10:25.478-08:00Adding Biopython dependency to your Python project in Eclipse<div style="text-align: left;">This short post helps you to configure your Python project in Eclipse to use of biopython library.</div><div><div style="text-align: left;"><br /></div><div style="text-align: left;">First, you have to install separately <a href="http://numpy.scipy.org/">NumPy</a> (biopython required dependency) and <a href="http://biopython.org/wiki/Main_Page">biopython</a> itself as explained in the <a href="http://biopython.org/DIST/docs/install/Installation.html">biopython installation help</a>.</div></div><div style="text-align: left;"><br /></div><div style="text-align: left;">Then, in Eclipse, open your Python project properties (Alt+ENTER when the project is selected).</div><div style="text-align: left;"><br /></div><div style="text-align: left;">Select the section <b>PyDev - PYTHONPATH</b>, then the tab <b>External Libraries </b>and make sure you have the source folder '...\site-packages' (or any other root folder you may have chosen to install Biopython). You should get the following screen:</div><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjT7uxFmvXyzztaz62m58ruhtKt0aUMxEFANhnj-Kf_Grtr1ICluvooeNwdVykx7bd9O3Iq71o8JT_45bzwTT67dNA04GWz34gJ2Z9C7qw_-6_30NinOfqSo9Bt3RR-hRUChOkKwBj7Z9s/s1600/biotechies.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 280px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjT7uxFmvXyzztaz62m58ruhtKt0aUMxEFANhnj-Kf_Grtr1ICluvooeNwdVykx7bd9O3Iq71o8JT_45bzwTT67dNA04GWz34gJ2Z9C7qw_-6_30NinOfqSo9Bt3RR-hRUChOkKwBj7Z9s/s400/biotechies.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5574362239162635330" /></a><br /><div>Then you should be able to run a program such as the following:</div><div><br /></div><div><div><span class="Apple-style-span"> parser = PDBParser()</span></div><div><span class="Apple-style-span"> structure = parser.get_structure("2BEG", "C:\\svn\\sporna\\2BEG.pdb")</span></div><div><span class="Apple-style-span"> for atom in structure.get_atoms():</span></div><div><span class="Apple-style-span"> print atom.get_name(), " coord:", atom.get_coord()</span></div></div><div><br /></div><div><br /></div><div><b>Tested on:</b></div><div><ul><li>Eclipse Helios Service Release 1</li><li>Python 2.7</li><li><span class="Apple-style-span" style="font-family: Arial; font-size: 15px; white-space: pre-wrap; ">Numerical Python library (NumPy) v1.5.1 </span></li><li><span class="Apple-style-span" style="font-family: Arial; font-size: 15px; white-space: pre-wrap; ">Biopython v1.56</span></li></ul></div><div><br /></div>Pascalhttp://www.blogger.com/profile/09612362594213376341noreply@blogger.com1tag:blogger.com,1999:blog-2012399429299935824.post-7070250963391812522011-02-09T09:25:00.002-08:002011-03-04T10:43:44.641-08:00Jmol applet (quick) howtoHere is a short post to include Jmol rendering in your blogger posts.<div><br /></div><div>First of all, you need to have the applet and its related jar and resources files available somewhere behind a HTTP server. In my configuration I have a folder called <i>JmolFolder </i>with these 19 files:</div><div><br /></div><div><div><div>JmolApplet0_Console.jar </div><div>JmolApplet0.jar </div><div>JmolApplet0_Minimize.jar </div><div>JmolApplet0_Popup.jar </div><div>JmolApplet0_ReadersCifPdb.jar </div><div>JmolApplet0_ReadersMolXyz.jar </div><div>JmolApplet0_ReadersMore.jar </div><div>JmolApplet0_ReadersQuantum.jar </div><div>JmolApplet0_ReadersSimple.jar </div><div>JmolApplet0_ReadersXml.jar</div><div>JmolApplet0_ReadersXtal.jar</div><div>JmolApplet0_ShapeBio.jar</div><div>JmolApplet0_ShapeSpecial.jar</div><div>JmolApplet0_ShapeSurface.jar</div><div>JmolApplet0_Smiles.jar</div><div>JmolApplet0_Symmetry.jar</div><div>JmolApplet.jar</div><div>Jmol.jar</div><div>Jmol.js</div></div></div><div><br /></div><div>Note that I removed all the internationalization files from the original distribution (I hope it makes it start faster).</div><div><br /></div><div>In blogger/blogspot post you have to insert the following <b>as HTML</b>:</div><pre><form><br /><script src="http://biotechies.site40.net/jmol/JmolFolder/Jmol.js" type="text/javascript"></script> <script type="text/javascript"> jmolInitialize("http://biotechies.site40.net/jmol/JmolFolder"); jmolApplet(400, "load http://biotechies.site40.net/jmol/alphahelix.pdb.gz;");jmolBr();</script><br /></form></pre>The result is:<br /><form><br /><script src="http://biotechies.site40.net/jmol/JmolFolder/Jmol.js" type="text/javascript"></script> <script type="text/javascript"> jmolInitialize("http://biotechies.site40.net/jmol/JmolFolder"); jmolApplet(400, "load http://biotechies.site40.net/pdb/alphahelix.pdb.gz;");jmolBr();</script><br /></form>Pascalhttp://www.blogger.com/profile/09612362594213376341noreply@blogger.com0tag:blogger.com,1999:blog-2012399429299935824.post-2016904544972827322011-02-09T05:20:00.001-08:002011-03-19T06:22:19.718-07:00First protein modeled<div>Also this post is not rocket-science, here is my firt protein modeled by <a href="http://en.wikipedia.org/wiki/Homology_modeling">Homology modeling</a>. I tried to describe in details the process followed here.</div><div><br /></div><div>First of all, this the FASTA sequence of the protein to be modeled:</div><div><span class="Apple-style-span" style="font-family: 'Times New Roman'; font-size: medium; "><pre style="word-wrap: break-word; white-space: pre-wrap; ">>sequence DLWNAIDPAIIADDHGQVWMSFGSFWGGLKLFKLNDDLTRPAEPQEWHSIAKLERSVLMDDSQAGSAQIEAPFILRKGDYYYLFASWGLCCRKGDSTYHLVVGRSKQVTGPYLDKTGRDMNQGGGSLLIKGNKRWVGLGHNSAYTWDGKDYLVLHAYEAADNYLQKLKILNLHWDGEGWPQVDEKELDSYIQVDVHDPVMTREGDTWYLFSTGPGITIYSSKDRVNWRYSDRAFATEPTWAKRVSPSFDGHLWAPDIYQHKGLFYLYYSVSAFGKNTSAIGVTVNKTLNPASPDYRWEDKGIVIESVPQ</pre></span></div><div><b>Selecting homologs</b></div><div><br /></div>I picked up the following homologs from a from a search result of the sequence with <a href="http://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastp&BLAST_PROGRAMS=blastp&PAGE_TYPE=BlastSearch&SHOW_DEFAULTS=on&LINK_LOC=blasthome">blastp</a>:<div><ul><li>1GYD_B</li><li>1GYH_D</li><li>1WL7_A</li></ul><div>Note that those are chain of proteins. For each of them, I donwloaded the corresponding protein .pdb file (1GYD.pdb, 1GYH.pdb, 1WL7.pdb from <a href="http://www.blogger.com/www.pdb.org">PDB</a>.</div><div><br /></div><div>Then I splitted the chains from the PDB file. Here we use a homemade Perl script called <span class="Apple-style-span" style="font-family: 'Courier New'; font-size: 15px; white-space: pre-wrap; "><i>PDBtoSplitChain.pl</i></span> but I guess you may use <a href="http://structure.usc.edu/splitpdb/splitpdb">splitpdb</a> also. This will generate a file for each chain found in the PDB file.</div><div><br /></div><div><b>Multiple Sequence Alignment of the homologs</b></div><div><br /></div><div>Then we concatenate all the selected chains into the same file to run a MSA:</div><div><br /></div><div><div><span class="Apple-style-span">$ cat sequence.fa > homologs.fa</span></div><div><span class="Apple-style-span">$ cat 1GYDB.fa >> homologs.fa</span></div><div><span class="Apple-style-span">$ cat 1GYHD.fa >> homologs.fa</span></div><div><span class="Apple-style-span">$ cat 1WL7A.fa >> homologs.fa</span></div><div><span class="Apple-style-span">$ cat 3C2UA.fa >> homologs.fa</span></div></div><div><span class="Apple-style-span">$ clustalw homologs.fa</span></div><div><br /></div><div>The output file of clustaw (*.aln file) should be converted to a PIR file:</div><div><br /></div><div><span class="Apple-style-span">$ aconvertMod2.pl -in c -out p <homologs.aln >homologs.ali</span></div><div><br /></div><div><b>Creation of the model</b></div><div><br /></div><div>We are going to use <a href="http://salilab.org/modeller/">Modeller 9v7</a> to create our model.</div><div><br /></div><div>First the python script <i>model-default.py </i>for Modeller is edited:</div><div><br /></div><div><div><span class="Apple-style-span" style="font-family: 'courier new'; ">from modeller import *</span></div><div><span class="Apple-style-span">from modeller.automodel import *</span></div><div><span class="Apple-style-span">log.verbose()</span></div><div><span class="Apple-style-span">env = environ()</span></div><div><span class="Apple-style-span">env.io.atom_files_directory = ['.', '../atom_files']</span></div><div><span class="Apple-style-span">a = automodel(env,</span></div><div><span class="Apple-style-span"> alnfile = 'homologs.ali',</span></div><div><span class="Apple-style-span"> knowns = ('1GYDB', '1GYHD', '1WL7A'),</span></div><div><span class="Apple-style-span"> sequence = 'sequence') </span></div><div><span class="Apple-style-span">a.starting_model= 1</span></div><div><span class="Apple-style-span">a.ending_model = 1</span></div><div><span class="Apple-style-span">a.make()</span></div></div><div><br /></div><div>And the model is actually calculated with the following:</div><div><br /></div><div><span class="Apple-style-span">$ mod9v7 model-default.py</span></div><div><br /></div><div>In case of error, check out the file <i>model-default.log</i> that is produced in the current directory.</div><div><br /></div><div><b>Alignment of protein sequences based on 3D structure (STAMP)</b></div><div><br /></div><div>We need to create a file called (for instance) <i>domains.data </i>referencing the domains of our template proteins plus the model we've just calculated in the previous step:</div><div><div><span class="Apple-style-span">./1GYDB.pdb 1GYDB {all}</span></div><div><span class="Apple-style-span">./1GYHD.pdb 1GYHD {all}</span></div><div><span class="Apple-style-span">./1WL7A.pdb 1WL7A {all}</span></div><div><span class="Apple-style-span">./sequence.B99990001.pdb mod1 {all}</span></div></div><div><br /></div><div>Then we run <i>stamp</i> to superpose our model with the template proteins:</div><div><br /></div><div><div><span class="Apple-style-span">$ stamp -n 2 -rough -prefix sequence -f domains.dat</span></div><div><span class="Apple-style-span">$ transform -f sequence.3 -g -o sequence.3.pdb</span></div></div><div><br /></div><div>Nota Bene: with transform use the file <i>sequence.[number]</i> with the highest <i>[number]</i> generated by stamp (it depends on the set of sequences you used in model-default.py)</div><div><br /></div><div>Here is a graphical rendering with Jmol of the superposition (you can move the model using drag n' drop, or use Jmol contextual menu with right click):</div><br /><form><br /><script src="http://biotechies.site40.net/jmol/JmolFolder/Jmol.js" type="text/javascript"></script> <script type="text/javascript"> jmolInitialize("http://biotechies.site40.net/jmol/JmolFolder"); jmolApplet(400, "load http://biotechies.site40.net/pdb/sequence.2.pdb.gz;ribbons;color chains;wireframe off;spacefill off;");jmolBr();</script></form></div>Pascalhttp://www.blogger.com/profile/09612362594213376341noreply@blogger.com0