-->

Biological Database | Online Biotech Notes

Biological Database | Online Biotech Notes

Online Biotech Notes

BiologicalDatabase-Online Biotech Notes


Biological Databases 

Databases are effectively electronic filling cabinets, a convenient and efficient method to store biological data in an electronic format. The common biological data are 

  • Nucleotide sequences (genes and genomes) 
  • Protein sequences 
  • Macromolecular structures 
  • Metabolic pathways 
  • Gene expression data 
  • Protein-protein interactions 
  • Literature 

The amount of biological data are increasing exponentially due to various sequencing projects and new technologies used in large-scale genomic and proteomic datasets. 

Probably, the first published work on the biological sequence databases was Atlas of protein sequence and structures. 1965, by Margaret Dayhoff et.al. 

Characteristics of Biological Databases

All the biological databases use their own standardized formats. But all databases have two main features 

Non-Redundancy 

  • Each entry in the database occurs only once i.e., duplication of the one entry is not allowed in the same database. 

Data Sharing 

  • Biological data in databases are shared for the scientific community's examination and inspection.
So, we can define biological database as a collection of data i.e., structured, searchable, updated periodically and cross-referenced. 

classification of biological database

Classification of Databases

Classification of Biological Databases 

There are many different database types depending both on the nature of biological data and on the Here, we will discuss only databases concerned only with nature of the information being stored. 

1. Sequence Databases 

Primary Sequence Databases 

Primary databases are repositories for raw sequence data that is generated through laboratory experiments. These can be described in two subsections 

1. Nucleotide Database 

The sequences were collected from published sources or direct author submission. TheTe are three major nucleotide sequence databases. 

(a) European Molecular Biology Laboratory (EMBL) 

It is maintained by EBI. Sequence Retrieval system (SRS) is used to retrieve information and links the sequence databases with other databases including maibi_E facility. 

(b) DNA DataBank of Japan (DDBJ) 

It is maintained by National Institute of Genetics. Sequences may be submitted from all over the world through a web based data submission tool. 

(c) Genetic Sequence DataBank (GenBank) 

This is maintained by National Centre for Biotechnology Inf on (NCBI)

Entrez, integrated retrieval system, is used to retrieve information from GenBank Each entry in the Geetheng follows flat-file formats. Each of the nucleotide database work collaboratively, the resources exchange data among the three on daily basis. 

2. Protein Sequence Databases 

(a) Protein Information Resources (PIR) 

This database was developed by National Biomedical Research Foundation. This database is divided into four sub-sections 

(i) PIR1 It contains fully classified and annotated entries. 

(ii) PIR2 It includes preliminary entries and may contain redundancy. 

(iii) PIR3 It contains unverified entries. 

(iv) P1R4 It may contains conceptual translations of artefactual sequences, genetically engineered sequences sequences not transcribed or translated. 

(b) SWISS-PROT 

It was produced collaboratively by the Department of Medical Biochemistry at the Unheisil) of Geneva and EMBL. It is now maintained by SWISS Institute of Bioinformacs (SIB), SWISS-PROT bay* minimal redundancy and high level annotations.

(c) Translated EMBL (TrEMBL) 

  • The database contains translations of all coding sequences (CDS) in EMBL. It is divided in two sections. 

(i) SWISS-PROT TrEMBL (SP-TrEMBL)

  • It contains entries incorporated into SWISS-PROT 

(ii) Remaining TrEMBL (REM-TrEMBL) 

  • Entries not incorporated with SWISS-PROT. NRL-31) is database is produced by PIR. ATLAS retrieval system is used to access information. 

Martinsried Institute for Protein Sequences (MIPS) 

  • This database is distributed with PATCH X and access to it is provided through its web server. 

Composite Databases 

  • This database amalgamates a variety of different primary database sources. which 03'1%6 the need to search multiple resources. 

Secondary Sequence Database

The source of data of secondary databases are the primary databases. Secondary ec is different and have their own fonnats. --c sequence alignment. The inf tion noosed in each of the secondarY  databases having the informauon about Aren't"' region'', obtained t rough multiple oatla• PROSITE This database is maintained by SWISS Institute of Bioinformatics. The prote,n family can be characterized the single conserved motif, responsible for key biological functions. Such motifs are represented as regular expressions 

PROSITE 

  • Regular expression is the consensus descriptions of motifs. e.g.. C-T-X2-1161-C-RMS1. square within bracket any residue can be placed at that position. 

PRINTS

  • In PRINT databases, the protein family is represented by the signature or fingerprint. The fingerprint is osensus description of several conserved motifs within the sequences of particular protein family.

Blocks 

  • In this database, the motifs or blocks are created by automatically detecting the most highly conserved regions of each protein family. 

P fam 

  • This database is maintained by Sanger Centre. It has a collection of Hidden Markov Models (HMMs) for protein domains. It is a statistically based mathematical treatments, consisting of linear chains of match. delete and insertion states which encode the conserved region within aligned family. 

Profiles 

  • The protein family is characterized by the profile in this database. Profile indicates the position of residues where insertions and deletions (INDELs) are allowed and where the conserved regions are these profiles known as weight matrices. 

Identify 

  • This database is derived from BLOCKS and PRINTS. In identify e-motif is used as a search software to access protein function. 

2. Structural Databases 

We can classify proteins on the basis of structure, as many protein share structural similarities. Sometime during evolution protein functions remain same while secondary structural environment shows variation. To understand structure and sequence relationships. variety of structural classification has been done. Some classification schemes are as follows:
 

Structural Classification of Proteins (SCoP) 

  • This database is based on evolutionary studies of all proteins of known structures. The levels of hierarchial classifications of structures in SCoP are class, fold, superfamily, family. proteins and sequences. Here. four major levels are described. 

Class 

  • Proteins of similar secondary structures are said to belong to same class. Classes are all a proteins. all 13. 0.113. a + 13 (segregated a and 13 regions). multi-domain, small proteins and membrane proteins. 

Fold 

  • The fold shows arrangement and topological connections of the major secondary structure .in protein. In the tame fold category, may not have a common evolutionary origin. 

Super-family 

  • In the same superfamily, proteins show low sequence identity but ancestors are the same as their gattline and function share common characteristics. 

Family 

  • Members of the same family are clearly evolutionary related and show 30% or more identity. 

Class, Architecture, Topology. Homology (CAM) 

  • It is a hierarchial domain classification of protein structures in the protein data bank. Here only those structures are "It'fisidered which have resolution better than 3.0 angstroms. There are four basic levels of classification. 

Class 

  • This is determined according to the secondary structure composition and packing Within the major classes arc recognized (i) all a. (ii) mainly p. (iii) a - p (a/(S and a + (iv) with low secondary strustriacturag. pnitt 

Architecture 

  • This describes the overall shape of the domain structure, ignoring their connecnyities et. barcrseck sandwitch, etc. et,* TaPoloRY The proteins are grouped into fold families depending on the overall shape and connects secondary structures. Those proteins share the same topology that have 60% or more than 60% identity. 

Homology 

  • This indicates that structures, share a common ancestor and show high structural and functional 

PDB Sum 

  • It provides summaries and analysis of all structures in the protein data bank. 

3. 3D Structure Databases 

Protein Data Bank (PDB) 

This is maintained by Research Collaboratory for Structural Bioinformatics (RCSB), Brookhaven Nation Lahniato USA. It is the repository for, the macromolecular structures derived experimentally by X-ray crystallography ‘r‘iit. neutron diffraction and cryo-electron microscopy. 
Auto Dep Input Tool (ADIT) is used to deposit structures to the PDB. It checks the coordinate format and vandrioe tests on a structure prior to deposition. 

Molecular Modelling Database (MMDB) 

  • There are two 3D structure databases of NCBI-MMDB and CDD (Conserved Domain Database). This is the retrival version of PDB structures, having experimentally determined biomolecular structures. 

Conserved Domain Database (CDD) 

  • This provides a directory of the sequence and structure alignments representing conserved functional domains tOn within proteins, CDs are displayed in MMDB structure summaries and link to a sequence alignment.

4. Literature Databases 

  • PubMed Central (PMC) is the literature database that provides access to full-text articles and journals for stolen ul researchers. 

5. Gene Expression Databases 

  • These databases can be explained in three major sub-headings 

SAGEmap 

  • The repository of serial analysis of gene expression (SAGE) data. 

GEO 

  • The repository and retrieval system for any high-throughput gene expression data. 

GENSAT 

  • The database of mouse's central nervous system data, produced by the National Institute of SetaILO Disorders and Stroke, USA. 

Probe 

  • The database have entries of probe sequences. The entries indicate the intended experimental aPfha°35 and include the experimental results generated by using the probe. 

6. Chemical Databases 

  • Pubchem is the popular chemical database of NCBI. It contains structural, chemical and biological Ob' small molecules and their diagnostic and therapeutic applications. 

Other Databases 

dbEsT - This database having information about the Expressed Sequence Tags (ESTs). 
UniGene - Database of ESTs focuses on human. 
SGD - This is the genome database of different strains of yeasts. 



THANKYOU!!!

0 Response to "Biological Database | Online Biotech Notes"

Post a Comment

Bhanu prakash

advertising articles 2

Advertise under the article