Data Mining & Bioinformatics, A large amount of biological data have been produced in recent years in rapid developments in genomics and proteomics.
Sophisticated computational analyzes are required to draw conclusions from these results. The interdisciplinary science of biological data interpretation through information technology and computer science is Bioinformatics or computational biology. The significance of this new field of inquiry will increase as large amounts of genomic, proteomic and other data are produced and incorporated.
The application and development of data mining techniques for resolving biological problems is a special active area of research in bioinformatics. Analysis of large biological data sets requires that the data is taken into account by inferring structure or widespread data.
Examples such as protein structure prediction, gene classification, microarray-based cancer classification, gene expression clustering data, protein-protein interaction statistical modelling, etc. Therefore the interactions Zwischen data mining and bioinformatics are greatly enhanced.
What is Data mining in Bioinformatics?
Data mining is the way information is derived from large datasets for the use of learning patterns and models. Machine learning, analytics, artificial intelligence, database sets, pattern recognition and visualisation are part of data mining itself. Kenntnis Discovery in Databases (KDD) or IDA (Raza, n.d.), the method of data mining is not only limited to bioinformatics and used to provide data intelligence in several different industries.
“Machine learning systems may be rules, functions, relations, equation systems, probability distributions and other knowledge representations.”
This data mining intelligence or information exploration has a broad range of uses, including forecasting, testing, diagnosis and simulations (Guillet, 2007). Currently, data storing / processing, using algorithms, display/interpretation of results through the method for the exploration of information.
It is important to state that a variety of techniques including machine learning is included in the data mining or KDD process. The method of data mining, therefore, involves a number of steps which must be streamlined and replicated in order to ensure precision and solutions in the sense of data analytics.
The main tasks for data mining are:
1. Classification: Classifies a data item to a predefined class
2. Estimation: Determining a value for unknown continuous variables
3. Prediction: Records classified according to estimated future behaviour
4. Association: Defining items that are together
5. Clustering: Defining a population into subgroups or clusters
6. Description & Visualisation: Representing data
Typically speaking, this process and the definition of Data Mining defines the extraction of knowledge.
Application of Data Mining in Bioinformatics
- The bioinformatics industry is an increasingly data-rich industry and thus helps to provide proactive analysis within some areas of the biomedical industry through data mining techniques. This also helps researchers to better understand biological processes and explore new healthcare and life science therapies.
- The use of data mining for bioinformatics includes gene identification, domain discovery, motif feature discovery, inference protein structure, disease diagnosis, disease prediction, optimisation of disease treatment, reconstruction of the protein and gene network, data recycling, and the prediction of the subcellular position of protein.
- In order to predict patients’ outcomes, for example, microarray technologies are used. Their survival and risk of tumour metastasis/recurrence can be calculated on the basis of patients’ genotypic microarray results.
- Machine learning can be used by mass spectroscopy to classify peptides. In can stochastic mismatch in peptide detection through database analysis, the interaction between fragment ions within a tandem mass spectrum is crucial. A sophisticated and detailed scoring algorithm that handles the associated information is highly desirable.
Conclusion and challenges
Data mining methods are suitable for bioinformatics as bioinformatics is rich in data but does not have a detailed theory of molecular life.
The mining of data in bioinformatics is, however, hampered by various aspects of biological databases, including their scale, number, complexity and the lack of a standard ontology for their query as well as their heterogeneous content and origin data. The range of skill levels between potential users can also be a concern, such that the curators of the database can hardly provide a method of access suitable to all.
Another issue is the incorporation of bio-databases. Bioinformatics and data mining are now a rapidly growing field of study. The main research questions in bioinformatics must be looked at and new data mining tools designed for scalable and efficient analysis are designed. REFERENCE
Follow us for more educational articles.