Importance of Perl in Bioinformatics
Biological advances have generated an ocean of data which requires thorough computer analytics to extract useful information. Calculation and mathematics are used for the interpretation of biological data such as the DNA sequence, the protein sequence or three-dimensional structures and are dependent on the ability to integrate data of different types. In this sense, the Perl language is well suited.
The ability to rapidly develop scripts for scanning or transforming large amounts of data is an important practical skill in bioinformatics. Because of its compact syntax, wider range of functions and data orientation, Perl is an excellent scripting language.
The following are a few of the attributes of Perl that make it an attractive choice.
- Perl offers powerful ways of matching and manipulating strings by using regular expressions. Changing file formats from one to the next is necessary to deform strings.
- Modularity which facilitates the writing of programs as a library (referred to as modules).
- Calls and pipes for the Perl system can be used to include external programmes.
- Perl dynamic loaders that help expand Perl to C programs and build compile libraries that the Perl interpreter can play.
- Perl is a good and easy-to-code prototyping language. Before using a strict language, new algorithms can be tested easily in Perl.
- Perl is excellent for writing CGI scripts to interface with the Web.
- Perl provides support for object-oriented program development.
Projects like human genome sequencing generate large numbers of textual information. In its initial stages, the human genome project confronted data interchange problems between software groups and Lincoln Stein noted how Perl was rescued (Stein, 1996). Perl is not the only language with these positive characteristics, but many can be found in other scripting languages like Python.
Why use Perl in Bioinformatics?
Looking at the existing programming environment, a number of programming languages are provided. The distinction between compiled and constructed languages is sufficiently explained in other articles of this series (sanner, 2004), so it is enough to say that compiled languages are usually selected to perform better, as reflected in computer-extensive algorithm C / C++, such as BLAST, ClustalW or HMMer, and their specific capabilities such as co-ordination.
Languages like Perl, Python, PHP, Ruby, Tcl, and the like are typically used for the fast prototype of scripts since the development process involves no recompilation of code. With the computer power available for bioinformatics tasks (Moore’s law) (Moore, 1965), performance problems lose their significance as opposed to creation and code maintenance. In the search for the appropriate programming language, the distinction between object-oriented and procedural programming can also be found alongside assessments in which language both of these programming methods should be used.
Until recently they were the most common languages of programming (e.g. FORTRAN, BASIC, and C), which is to say the ordered set of steps required to produce a result. Object-oriented (OOP) programming differs into indiscriminate modules, called objects, by combining data and code, which are abstract representations of “real-life” products. These objects model the application’s all information and functionality. OOD languages contain certain features that increase an object’s power and flexibility: class, say, heritage and polymorphic.
In bioinformatics, a DNA sequence may be viewed as an entity inheriting the characteristics of all biological sequences from a more general application. OOP will code this object by defining the properties of its sequence such as length, checksum and definitely the string of characters. You will then implement accessor methods to recover or define these features and even more complex functions like transcribe), (which take an organism-specific codon matrix as an argument and then turn the DNA object into an RNA object.
Although all of these things definitely matter in a language of your choosing, our experience is that bioinformatics programming is mostly not performed by computer scientists or professional programmers, but is performed by bio-science scientists who seek to make sense from their experiments. It is not elegant and maintainable code development, but fast and easy programming that is the priority. As Larry Wall puts it, “You can technically do your work with any ‘complete’ machine language. But from experience, we know that the machine languages are not so different in what they do, but in what they make easy.
High-level data structures integrated into Perl scale of any size are constrained only by the constraints of the operating system and the quantity of memory on the host computer. Perl embraces proceedings-based and object-based programming methods and runs on almost all Unix and Linux flavours. Furthermore, ActivePerl from ActiveState allows Perl also to run on the Windows computers and MacPerl on the Apple computes operating System 9 and below (http:/www.activestate.com/Products/ActivePerl/, 2005).
It can also help you with:
- Numerical calculations
- Regular expressions – Perl’s most famous strong point
- File handling and databases
- CPAN – the on-line Perl code library
- Perl and the World Wide Web
- XML processing
- Ease of Programming
- Rapid Prototyping
Increasingly, bioinformatics is going forward. There is an increasing need for the creation of the tools for the analysis of biological data. Perl ‘s development of prototyping scripts or suites of reusable Perl modules allows one to contribute to the creation and sharing of Perl programs in a broad research community. Like every other programming langue, language fluency is experienced; the quality of code depends largely on the programmer and his use of good programming practices. Perl is and will remain a language with considerable importance in computational biology for several years to come with the abundance with code already available.
Perl is a “feel good” language that does not force a certain style on you but supports you. Perl thus promotes the three virtues of the programme-maker as stated in the editorial of the camel subject (Wall et al, 2000): Laziness, Impatience, and Hubris.
Importance of Perl in Bioinformatics