The re-design described below enables rich meta-data search functions (i.e. search by experiment design type or animal model's age, sex); a web-interface data input system is used to capture experiment information. Unlike other currently utilized profiling packages, our web interface data input submission process offers great flexibility to obtain desired experiment meta-data (e.g. addition of experiment design type) for analysis and visualization. It provides a mechanism to enforce data input consistency and validation, and eliminates the current accessory tables and batch process to filter data. The data consistency expands the search and visualization capabilities.
Affymetrix GCOS operating system and AADM database is provided with all Affymetrix packages. However, rather than accessing the AADM database directly, our application utilizes the Affymetrix GCOS and GDAC SDK (software developer kit) to retrieve and parse experiment related data (e.g. .chp, .cel files). It preprocesses all the published chip files to improve the data download performance. It eliminates the existing process to transfer large sets of experiment data from lab database to public database. With GCOS and GDAC SDK, only a small subset of the data is extracted and placed in public database for analysis at any point in time. It also eliminates the AADM dependency (no need to change application if the AADM schema is changed). Indeed, the often-changing AADM schema resulted in chronic compatibility problems with the first generation PEPR resource.
PEPR also utilizes our newly implemented GEO submitted or update API's to submit new experiments or revised previously-published experiment data. PEPR incorporates a custom-designed Probe Profiler API (funded by a Department of Defense grant for PEPR to Dr. Hoffman), to offer four additional data algorithms (DCHP Diff, DCHP PMOnly, RMA, and PCA), in addition to the built-in MAS algorithm values for data analysis and visualization. Finally PEPR provides off-line batch data exportation that allows the researcher download/export a series of large data set while continuing to navigate the site. The generation of .chp, .dat and .cel data files is processed during off-peak hours.
Our previous design and implementation of PEPR was supported by an NHLBI Programs in Genomic Applications grant, and an NINDS Spinal Cord Trauma grant (the latter the single NIH-award for this contract). While we have only very recently reported our initial implementation of PEPR (Almon et al. 2003; Chen et al. 2004), we feel our new re-design (funded by the Department of Defense and a R21/R33 NHGRI grant) makes substantial improvements over our previous version, and any other dynamic query resource for massively parallel and multi-dimensional biological datasets available elsewhere.
The major improvements of PEPR while comparing the previous application include:
- proposal submission/approval workflow
- expanded search
- expanded data visualization
- data retrieval preprocess through GCOS and GDAC SDK
- GEO publishing addition and update
- Off-line batch data exportation
The major benefits of the PEPR while comparing the previous application:
- Workflow and central repository improves the collaboration between researchers and investigators.
- Enhanced search features offers better data sharing and navigation
- Enhanced visualization offers better assistances to researchers
- GCOS and GDAC SDK utilization eliminates the AADM dependency
- GEO publishing update completes the existing GEO publishing process (experiment addition and modification) through browser-based. It empowers the scientists to manage their own experiment data
- Off-line batch data exportation provides faster system response to researchers
- Data validation and consistency make database maintenance and operation easier
- OOD technology implementation make maintenance and future enhancement easier
The PEPR process architecture design and implementation
PEPR is a three-tier Java enterprise application, composed of a Web Tier, Middle Tier and Back-End Tier . A schematic of the overall design is provided on the next page of this text. Note that the current version of PEPR ( http://microarray.cnmcresearch.org ) (Chen et al. 2004) will be replaced with the version described below, at http://pepr.cnmcresearch.org , over the next few weeks (prior to meeting of the study section).
Web Tier
Web Tier includes a web server, a Tomcat application server and various web components which provide front end functionalities such as navigation, data browsing, data searching, project submission, project publishing, gene query tool and user notification. Most of web components interface transparently with PEPR back-end databases. This tier's interface allows users to trigger the middle tier application.
Middle Tier
The Middle Tier is integrated with several third party services, some of which we have purchased enterprise versions of pre-existing software, and others we wrote or contracted specifically for PEPR (Popchart, Lucene, Affymetrix SDK and Corimbia Probe Profiler SDK). It is designed to handle time-consuming processes such as Affymetrix data extraction, offline data downloading while allowing user to navigate the site without waiting the completion of the process. The Middle Tier applications require intense computing resources and are responsible for chart visualization generation, offline data download, metadata indexing for keyword search, NCBI GEO data submission; Affymetrix data file extraction and transformation, and Probe Profiler mixture of algorithm data generation.
Most of processes in this tier do not require synchronous response from the PEPR front-end. In addition to the conventional web click-and-wait applications features, PEPR allows user to submit the request without waiting the completion of the process while the process is guaranteed to be completed. To achieve this asynchronous operation in a reliable manner, an Open JMS queue server is introduced in PEPR implementation, and this serves to enhance the PEPR application functionalities. JMS is designed to handle the messages delivery between web components. When a user submits a request to download a large set of data in PEPR, a web component in Tomcat application server packages the user's request to a message and drops the message into the JMS Queue. The JMS Queue is responsible for receiving and delivering the message as a specialized router that looks at the message's address and delivers it to the appropriate parties (i.e. Offline Data Download process in the chart). The Offline Data Download process then parses and handles the download request. It continues to search and compress the requested data, and then send out the download URL notification to the user. During this process, the user does not have to wait for the lengthy file compression process completion. . The JMS Queue makes the batch download possible.
The importance of PEPR JMS Queue service:
- Asynchronous communication: JMS Queue serves as an asynchronous communication channel between Web Tier and the Middle Tier components. When a PEPR administrator issues a GDAC data export command, the interface drops the message into JMS Queue and triggers the Affymetrix GDAC process, the process further loads data into the PEPR database while the administrator continues to perform other tasks.
- Reliable messaging communication: JMS Queue stores all the messages in Oracle database permanently. In the event of shutting down Middle Tier processes due to unexpected software failure, the JMS Queue continues to store and buffer the messages delivered from Tomcat application server. The JMS Queue then delivers the stored messages to the appropriate process when the Middle Tier applications restart. The persistence of JMS Queue provides PEPR high availability.
- Distributed computing: Probe Profiler API process requires intense computing resources. PEPR uses JMS Queue to distribute the computing resources to different server. JMS Queue is used to communicate with Probe Profiler API process (residing on CRI7) remotely. It allows the remote process to receive the messages and start its own calculation.
Sequence process control: Probe Profiler API is designed as single thread model; it can only process one request at a time. If more than one Probe Profiler processes are triggered at the same time, the second request would be dropped. JMS Queue can guarantee the arrival of the message and delivery of the message sequentially to the Probe Profiler API process on a first-come first served basis.
Figure. PEPR architecture. (click on the image for details)

|