High-throughput technologies create the necessity to mine large amounts of gene annotations from diverse databanks, and to integrate the resulting data. Most databanks can be interrogated only via Web, for a single gene at a time, and query results are generally available only in the HTML format. Although some databanks provide batch retrieval of data via FTP, this requires expertise and resources for locally reimplementing the databank .
Scientists at Bioengineering Department, Politecnico di Milano,IFOM – FIRC Institute of Molecular Oncology, Milano and IEO – European Institute of Oncology, Milano, Italy, developed MyWEST. This work has been published in 12 December issue of Bioinformatics, a scientific journal published by Oxford University Press and International Society for Computational Biology.
MyWEST, a software package for effective mining of web interfaced biomolecular databanks.
It provides an intuitive visual interface for building templates that define which information should be extracted from HTML pages of web databanks, then uses the created templates to mine information from multiple web pages of different databanks, stores and aggregates in a common database the extracted data, and allows performing articulated queries on the aggregated data for identification of hidden significant biological information.
A template configuration module enables the visual definition of the information to mine on selected reference HTML pages of web interfaced databanks of interest, and the creation of extraction templates. Furthermore, it allows definition of access parameters both to web accessible databanks of interest and to a relational database for storing all extracted data.
In a data extraction module, users can provide identification codes of nucleotide or amino acid sequences of interest and use the created templates to automatically mine, in batch mode from different web interfaced databanks at once, the available annotations of interest. The mined information is stored in text excel file format for easy and immediate use, and in a relational database. In the database all extracted data are aggregated and structured to allow performing articulated queries for further comprehensive mining.
A specifically designed updating software agent enables automatically updating of all information contained inside the database of the mined data.
External Links:
Selected Books from Amazon:
Disclaimer: Bioscholar is not intended to provide medical advice, diagnosis or treatment. The articles are based on peer reviewed research, and discoveries/products mentioned in the articles may not be approved by the regulatory bodies.