Skip to Menu

Home





bioData: FAQs

Notes on bioData

What is bioData?
What biological databases are currently queryable through bioData?
How often are the databases updated?
What are the different types of databases in bioData?
How do I query a database?
Sample Query: Find non-coding RNA's in the region chr17:75700000-85700000 of the human genome

What is bioData?

There is a vast amount of biological infomation available in various formats and in various scattered resources. At ABCC, we download and maintain many of these publicly available databases for easy of access to the Frederick National Laboratory for Cancer Research(FNLCR) and NCI-Frederick community. bioData is a web interface that provides extensive documentation and query options for all such bioinformatics databases maintained at ABCC.

What biological databases are currently queryable through bioData?

Most of the actively maintained databases at ABCC are automatically parsed and loaded into relational tables in an Oracle datawarehouse. Some exceptions include Medline and complete UniProt which are stored in NoSQL data stores due to their complex formatting structure. All of the databases that are currently available in our Oracle data warehouse are available through bioData. Most of the query options in bioData are automated such that any new databases added to the backend Oracle warehouse would automatically become available through the web interface.

How often are the databases updated?

Almost all our databases, wherever practically feasible, are updated daily through automatically scheduled cron jobs. These cron jobs ping all the public data sources for any available updates, download the new versions when available, parse the files using custom scripts and load to our development Oracle servers. Our validation scripts then check for any discrepancies between the new and old versions of the databases. The database versions that are successfully validated are then loaded into our production servers on a weekly schedule.

What are the different types of databases in bioData?

We download a wide variety of biological databases to cater to the different needs of the FNL/NCI-F community. These databases include but are not limited to gene annotations from EntrezGene, Ensembl; protein information from UniProt, Ensembl, RefSeq; microarray chip annotations from Affymetrix, Agilent; functional annotations from GO; pathway data from KEGG, Biocarta, Reactome.

How do I query a database?

  1. click on "Query Databases" in the top menu
  2. select a database from the drop-down select box
  3. select a table to query from the tables drop-down select box
  4. build your query by checking the different options available on each of the table columns orwrite your custom SQL statement by clicking on the 'Edit' button near the 'Query' text box at the bottom of the page


Sample Query: Find non-coding RNA's in the region chr17:75700000-85700000 of the human genome

click on "Query Databases" in the top menu

  1. Select Ensembl from the database list.
  2. Select "ENS_GENE_MAIN" from the "Tables" selection box.
  3. Query Page 1
  4. In the results page
    1. Select "hsapiens" in the "ORGANISM" column
    2. In "BIOTYPE" column select "does" and type "*ncRNA*" (for searching lincRNA or ncRNA)
    3. In "CHR" column select "does" and type "17"
    4. In "CHR_START" column select ">= (greater than or equal to)" and type "75700000"
    5. In "CHR_END" column select "<= (less than or equal to)" and type "85700000"
    6. Click on the "Submit Query" button
  5. Query Results Page