Science and Environment

Going loco over bioinfo

STAR SCIENCE - STAR SCIENCE By Maria Pamela C. David -

October 14, 2004 | 12:00am

Bioinformatics? How (un?)interesting does it sound to you? How interesting could looking at and analyzing letters upon letters of DNA and protein sequences and giving a meaning to it be, or finding patterns or consensus among thousands of sequences and deducing its significance, or defining the proximity or distance between organisms, or finding the significance of a peptide from a sea of proteins – perhaps tracing the lineage of a creature? These are just some of the diverse tasks that the field of bioinformatics aims to perform – quickly. With the influx of millions of sequences from experimental efforts, it is no wonder that bioinformatics – a marriage of the computational and biological sciences – is emerging as one of the most important fields today. Basic knowledge of it – at the very least, knowledge of sequence alignment and related applications – is fast becoming a necessity in the biological field. Without it, it’s possible that all the data from much hard experimental work would not be managed, explored and used to its full potential.

Despite this knowledge, my involvement in the bioinformatics effort of our lab was initially so-so. I was partly, but not wholly, interested in it. It just seemed too dry – literally and figuratively speaking – possibly a mere academic exercise whose practical applications seemed far, far away. Modeling. Mathematics. Maybe phylogenetics. These were simply not my stuff, in street parlance. Apparently, most of those who perform lab experiments in biology or molecular biology, including myself, were more interested in wet lab than dry lab. Even as a student, I pretty much shrugged off bioinformatics as an accessory, rather than considering it as an emerging necessity.

Everything changed when a challenge came. Our group was tasked to analyze a bunch of protein sequences for mutations and attempt to find general rules for them, or at least, mutational preferences in them. It was not a matter of similarity or homology searching, or even structure prediction, at which most in our group were adept. It was more a matter of looking at particular mutations at specific positions and tallying them in accordance with certain criteria. Does this mutation occur at a buried or an exposed residue? How many were there of a particular mutation? How many of the other? What were the implications of each? What are the mutations that occur for a given position? These were just some of the questions that we were supposed to answer, and with the hundreds of sequences, we hardly had a clue. There was no beating around the bush: no pre-existing software, legal or illegal, licensed or cracked, could do the task. You could do the analysis manually (with a little help from pre-existing spreadsheet applications), get very cross-eyed, develop carpal-tunnel syndrome, eventually discover mistakes that would make you wonder if you didn’t make mistakes farther back, and be so frustrated that you wanted to give the pile of sequences a vicious kick; considering that every error matters, this is a difficult and even dangerous option. Or you could make a program. The right decision appears obvious and easy only that we – I, in particular – didn’t know how to use the only free compiler, PERL, available. It’s one thing to learn a language; it’s another to make a multi-tasking program in a week’s time. Initially, we had to perform manual analysis while learning the language, since both the initial results of the analysis and the program were supposed to be finished or nearly finished within a week. In our case, we were lucky: we managed to make the program, and it yielded the same results as our manual analysis. I had a former classmate who learned the language before I did, and he helped me considerably by giving examples. It was easier to learn that way, than strictly by the book. Through sample programs, at least in PERL, you would be able to see what works in a particular manner, and what works in another way. It helped that the syntax of PERL was relatively easy to understand: a line, for instance, that says, "while ($protein = ~ /A/ig) {$A++}" basically would check each letter, which, in turn, represents an amino acid in the protein sequence stored in $protein; while it is an "A" (for alanine), it adds a value to the variable $A. As such, it would perform that task of counting all alanine residues in a given sequence.

Right now, we have improved the program by giving it more tasks. It could now analyze particular segments of a set of sequences and give results for each – these, with only a few additions to the code. And we are getting interesting results. It’s the beauty of programming – you could have so many tasks done by invoking just a few commands.

Probably, the key to appreciating bioinformatics and programming in general lies in organization, logic and an imaginative outlook. It was, at least for me, a matter of envisioning the kind of data that you want to get, and outlining the ways by which you could get it. It’s best to come up with realistic and defined targets, and to attempt to find a variety of simple solutions for these. Coming up with a variety of approaches to a problem would give on back-ups, and would make things move more quickly and easily in case of failure. Once you know the syntax of a programming language, much in the same way that you have mastered the rudiments of Filipino or English, it’s somewhat easy to translate your logic to the language. Of course, one would initially encounter lots of syntax errors that would require debugging, but making a program is not an impossible task. Just don’t let it eat you. Usually, the simpler, the better: there would be less room for error, and debugging would be considerably easier. It may be equally frustrating as manual analysis, but once you get over the hurdles, the results are often rewarding. Even if you don’t get anything really useful from the program from which you had high expectations, at least you could be proud of the fact that your program worked. And at least, you know that your data set could no longer be expected to yield the information that you wanted, in which case you could more easily move to another data set. Sadly, the frustrations of having to repeat a series of experiments to generate that other data set is another matter.

Having sat for a considerable time in front of a computer doing bioinformatics work, I have found a degree of unexpected fulfillment in it that equals the fulfillment that I have experienced in performing or participating in a well-done experiment. It’s probably a matter of time for more people, especially biologists, to appreciate bioinformatics, too, since it’s slowly becoming an indispensable tool and, as such, it is highly probable that it would be inescapable as well. Most scientists are learning to make their own programs. Better to put a degree of fun and garner interest in it while you can, than be forced to learn it by circumstance. Learning bioinformatics and allied programming applications at your own pace is much easier than being forced to learn it, though a little pressure may help. Do not be intimidated by PERL, the bioinformatics language of choice, since it is quite a logical language; as a high school student some years ago, I nearly flunked the first quarter of Pascal, yet I could now work with PERL. If programming is really something that gets into your nerves, at least try to familiarize yourself with pre-existing applications that could be applied to your field of science. For a start, there’s DNAsis for DNA and protein sequence analysis, QSAR for binding kinetics and drug design, NCBI-BLAST for sequence alignment, and the Swiss-PDB viewer and PSIPRED server for protein structural analysis, structure prediction and modeling. For most standard applications, there would usually be pre-existing programs that one could play around with. Some may be bought for a song, and usually, if one really, really needs the application, an urge (and a necessity) to learn programming is created. A few words of advice: an academic e-mail address is usually, but not always, a passport to the free use of some Web-based bioinformatics tools, and cracking a licensed program would still not allow you to use it for a publication.

Before totally shutting out bioinformatics, try to find a niche that excites you. It has a lot to offer. Who knows? Maybe what you unravel will earn you a Ph.D.!

* * *

Maria Pamela C. David graduated from the National Institute of Molecular Biology and Biotechnology, UP Diliman, with a Bachelor’s degree in 2003. She is currently a researcher at the Marine Natural Products Laboratory of the Marine Science Institute, where she works on a number of immunology projects, including immunosuppression research and autoantibody sequence analysis. She may be reached through [email protected].