Advanced Query Processing

A key feature of a database management system (DBMS) is its ability to process declarative queries. Users pose queries to the database system which specify what the desired query result should be, not how to compute this result. For any user query, there are typically many different query execution plans consisting of different operators combined in different orders. The query optimizer of a database system must choose the most efficient query execution plan by enumerating candidate plans and estimating their cost using a cost model, which relies on statistics that capture the different properties of the data in the database.

Emerging applications that require database support (e.g., multimedia systems, Web databases) incorporate notions of proximity, ranking, and user feedback that have serious impact on the query model, which cannot be adequately handled by current query processing approaches embodied in traditional DBMSs. Our work focuses on enriching query processors/optimizers to handle Information Retrieval (IR) style queries that are common in these applications. These queries do not have exact answers, requiring the ranking of the results according to their suitability as an answer. Some examples of these types of queries are the following.

Consider a video database that contains several extracted video features, e.g., color histograms, texture, and edge orientation. These features are extracted for each video frame and are stored in separate tables. Each feature is indexed using a high-dimensional index for a faster query response. An example query is one that retrieves the top- k frames most similar to a given image based on both color and texture. Alternatively, consider a family interested in buying a house with a nearby school with the objective of minimizing the total cost. For simplicity, consider a cost function that sums the price of the house and 5-year school tuition. Searching against two web databases, HOUSES and SCHOOLS, the family may wish to find the top 10 houses and schools, ranked in decreasing combination of price of the house and the school tuition such that the distance between the house and the school is less than some value.

Since, in many emerging applications such as the above, "ranking" the query results according to some user criteria is an integral part of the query semantics, it is essential to efficiently support ranking by database systems. Furthermore, as more of these applications migrate to hand-held devices, sensor network and the Internet, query processing paradigms have to adopt to the continuous changes and fluctuations in the computing environments that are typical of these platforms. Consequently, it is necessary to provide adaptive processing mechanisms for ranking queries.

The goal of this line of research is to develop a formal and generalized ranked-retrieval framework that adapts to non-traditional processing environments and is applicable to a variety of data models. We are primarily interested in developing algorithms and frameworks to leverage the current capabilities of database systems to handle ranked retrieval efficiently and adaptively. Current research is along three lines: (1) enabling ranked retrieval in current relational query processors as a first-class functionality; (2) extending rank-aware query processing beyond relational data models to text and XML databases, which are becoming an integral part of current applications infrastructure; and (3) introducing new rank-aware query processing paradigms that offer a high degree of adaptability and scalability.

We have proposed several practical query operators with various physical implementations to be integrated in relational query processors ([1], [2] , [3]). We have studied developing novel algebraic frameworks and cost estimation models to fully integrate the proposed operators and algorithm in real-world database engines ([4] , [5]). We developed a prototype, RankSQL, based on PostgreSQL that integrate all the proposed algorithms and frameworks, RankSQL has bee demonstrated at the 2005 International Conference on Very Large Databases [6].

References

  1. W. G. Aref, A. Catlin, A. Elmagarmid, J. Fan, M. Hammad, I. F. Ilyas, M. Marzouk, S. Prabhakar, and X. Zhu. "VDBMS: A Testbed Facility for Research in Video Database Benchmarking,"ACM Multimedia Systems Journal , Special Issue on Multimedia Document Management Systems, 9(6): 575-585, 2004.
  2. I. F. Ilyas, W. G. Aref, and A. K. Elmagarmid. "Joining Ranked Inputs in Practice," In Proceedings of the 28th International Conference on Very Large Databases , September 2002, pages 950-961.
  3. I. F. Ilyas, W. G. Aref, and A. K. Elmagarmid. "Supporting Top-k Join Queries in Relational Databases," In Proceedings of the 29 th International Conference on Very Large Databases , 2003, pages 754-765. Also appears in The VLDB Journal Special Issue for selected papers of VLDB 2003, 13(3): 207-221, 2004.
  4. I. F. Ilyas, R. Shah, W. G. Aref, J. S. Vitter, and A. K. Elmagarmid. "Rank-aware Query Optimization," In Proceedings of the ACM SIGMOD Conference on Management of Data , June 2004, pages 203-214.
  5. C. Li, K. C-C Chang, I. F. Ilyas, and S. Song. "Query Algebra and Optimization for Relational Top-k Queries," In Proc. ACM SIGMOD International Conference on Management of Data, 2005, pages 131-142.
  6. C. Li, M. A. Soliman, K. C-C Chang, and I. F. Ilyas. "RankSQL: Supporting Ranking Queries in Relational Database Management Systems," In Proceedings of the 31st International Conference on Very Large Databases, 2005, pages 1342-1345.

Related Links

Ihab Ilyas's research



Campaign Waterloo

Database Research Group
David R. Cheriton School of Computer Science
University of Waterloo
Waterloo, Ontario, Canada N2L 3G1
Tel: 519-888-4567
Fax: 519-885-1208

Contact | Feedback: db-webmaster@cs.uwaterloo.ca | Database Research Group


Valid HTML 4.01!Valid CSS! Last modified: Wednesday, 14-Jun-2006 03:20:23 EDT


Menu:ShowHide