PEPR Home  
        
spacer image Horizontal grey line spacer image
spacer image
spacer image
spacer image
spacer image
Background
spacer image
Design Goals
spacer image
Data Repository
 
Search Repository
spacer image
PEPR Tutorial
spacer image
Resources
  spacer image    
spacer image spacer image spacer image
New User Registration
spacer image
spacer image
spacer image
spacer image Username: spacer image
spacer image spacer image
spacer image
spacer image Password: spacer image
spacer image spacer image
spacer image
spacer image Forgot Password? spacer image
spacer image
spacer image
spacer image
spacer image spacer image spacer image
Home
spacer image
spacer image
spacer image
spacer image
 spacer image
spacer image

PEPR Background

spacer image

The design and implementation of PEPR version 1. (2002-2004).

The original implementation of PEPR was supported primarily by a grant from the NIH NHLBI Programs in Genomic Applications (see both the HOPGENE site, and the NHLBI PGA site), and was developed by the Hoffman laboratory at Children's National Medical Center by Josephine Chen.

Original PEPR implementation goals and features

The original implementation of the Public Expression Profiling Resource (PEPR) provided c entralized Affymetrix expression profiling data to public research community. PEPR was, and remains, an implemented web-based solution, which permits researchers seamless access to Affymetrix expression profiling database through web browser without Affymetrix software. The web interface also enabled users to export all forms of data associated with any particular profile, including raw image files (.dat), processed image files (.cel), and interpretation files (.txt). It also allowed researchers to perform a variety of on-line queries of all expression profiles by any number of experimental variables (tissue, species, chip type, etc). Other built-in functions include searching by GenBank Accession ID and gene name (gene-based cross-profile search). These search functions return Avg Diff values and Present/Absent Calls for all profiles in the PEPR. In addition to basic searching functions, the website provided an in-house developed Single Gene Query Tool (SGQT) which allows user to search a specific gene and results a visualization graph of replicates, average, mouse-over meta-data of each data point, and click links to both internal and several external genome database resources. In addition, an automated backend process disseminate s all available PEPR profile data into NCBI Gene Expression Omnibus (GEO) database without any user interaction . Public users can easily access deposited data in GEO as well as access CRI PEPR data through a corresponding link created during the direct deposit process.

Original PEPR workflow

The following diagram depicts the workflow and infrastructure of PEPR as implemented in the 2002-2004 version (first generation).

  1. An external/internal investigator begins a new project and delivers his/her samples to CRI Microarray Center . All communications are exchanged over emails, meeting notes or phone conversations.
  2. CRI scientist gathers delivered samples, completes the lab work and enters minimum experimental information into Affymetrix LIMS database. The technician then scans the microarray chip, and data is deposited into Affymetrix LIMS system.
  3. CRI scientist publishes finished experiment data. An automated process is triggered to replicate the data to PEPR and direct deposit PEPR data into GEO database in succession.
  4. CRI database administrator collects more experimental information from the investigator and CRI scientist and then associates the information to the proper data set.
  5. Public user accesses and exports data through PEPR microarray website and use other features of the website such as performing a single gene query and downloading experiment data

PEPR challenges as experienced 2002-2004.

  • Incomplete meta-data collection process
  • Although current PEPR provides great convenience of data sharing through web browsers, there are several limitations imposed by the existing infrastructure and the technology practices. Unlike most of pharmaceutical companies, the experiment samples are originated within the organization. CRI receives experiment samples from around the world. Scientist often initiates a project at the remote facility then delivers the experimental samples to CRI Microarray Center . CRI scientist then processes the experimental samples and generates expression profiling data in house. The information collection process goes multi-direction and the process is rarely completed. The experiment information is exchanged among several parties without any consistent records. CRI scientist often struggle places the accurate and proper experimental information into the CRI LIMS database. Moreover, CRI scientist enters the experimental information through Affymetrix Microarray Suite (MAS). The application interface provides little guidance to end-user and lacks well-defined data integrity rules. Most of data fields don't include controlled vocabularies. In order to provide well-annotated experiment results to researchers, extra manual steps are added to compensate the incomplete experimental information collection process. The ambiguous and inconsistent data becomes useless afterwards.

  • Lack of efficient framework and inflexible procedural programming
  • The PEPR was developed two years ago without a blueprint. Without going through design and requirement review stages, the application was developed based on the “at-the-moment” needs. The application evolved from few small procedures (e.g. search experiment based on different criteria and download experiment raw data) to several lengthy procedures (e.g. GEO SOFT format direct data deposit and single gene query on different experimental projects). The application development progresses with little considerations of expandability and robustness. As user needs and application requirement increase, the current application architecture along with inconsistent meta-data stored in the LIMS system hinder any future development.

  • Database schema dependency
  • PEPR utilized the Affymetrix Analysis Data Model (AADM). The AADM is the relational database schema Affymetrix uses to store experiment results. AADM was designed mainly for Affymetrix Data Mining Tool. The AADM database is used for PEPR as the fundamental data access layer. It was the solution that used the least amount of computing resources and consumed shortest amount of time. However, direct access AADM soon became development obstacle as the requirement increased:

    • Inefficiency of AADM schema With different application goals, AADM schema becomes inefficient to use. (e.g. in the previous PEPR architecture, a single database query may cross access several AADM tables while the needed data is stored in one or two columns. The AADM design inefficiency worsens the data query run time.)
    • Vendor dependency and maintenance complexity Without any other alternatives, current PEPR greatly depends on the AADM schema. Any AADM change introduces new efforts of correspondent changes in existing PEPR. The rigid relationship creates maintenance difficulties for application developers whenever there is version upgrade or schema modification.
  • Limited application functions
  • Because the drawbacks stated above, to further expand the application features becomes an unachievable task. Without completed knowledge of each experimental project and samples, it's difficult to design and to develop on-line real time data mining tools and other advanced search functions. (e.g. currently, the single gene query tool can only apply to time course study with single control time point sample. The tool doesn't work with a more complex designed time course study such as multiple controls at different time points due to the data inconsistencies in conjunction with inflexible data schema problem) Furthermore, the current LIMS system only collects limited information regarding each experimental project and sample. In order to present a biological meaningful data result to researchers, scientist needs to provide more unambiguously information in a fixed and standardized format. Those data will then assist with the development of PEPR repository and on-line data mining tools. Another key PEPR deficiency is in the data exportation. PEPR doesn't provide multiple files download option to researchers. Single file download not only decreases the system response time but also causes inconvenience to the researchers since most of time researchers desire to download the entire project rather than one experimental file.

    For all the above reasons, we began a complete re-design of PEPR (see PEPR v2 Design Goals ).

    spacer image
    Grey dotspacer image
     
    spacer image
    spacer image Contact Us spacer image  Copyright © 2006 PEPR spacer image Funding spacer image
    spacer image
      Version 2.0.0  
    Site designed and built by Eric Hoffman & Josephine Chen
    spacer image