 |
 |
The 21st International Conference on Data Engineering (ICDE 2005) |
| |
|
|
Advanced Technology Seminars
Advanced Technology Seminar 1
XQuery Midflight: Emerging Database-Oriented Paradigms and a Classification of Research Advances
Ioana Manolescu (INRIA)
Yannis Papakonstantinou (University of California, San Diego)
April 5th (Tue), 11:15-12:45/14:00-15:30, Hall
XQuery processing is one of the prime research topics of the database community, as is evident from the number of systems and publications.
Systems, architectures, principles, and algorithms rapidly emerge for all its incarnations; be it in message/file transformations, XML caching, XML content management, or XML-based publishing and mediator systems.
At the same time, XQuery research is still in a "pre-paradigmatic" stage, where the conventional symptoms of the stage are observed: It is hard to piece together point efforts into a big picture. Similarities and interplay opportunities between parallel efforts are "lost in the translation" across the different paradigms. This is a natural stage in the evolution of most science and technology topics and our references to the classic 1930's works of Kuhn on the development and evolution of science will make sure the audience gets rid of any guiltiness we may accidentally create. Nevertheless, the time is ready for the next stage:
The goal of this tutorial is to "federate" among the plethora of works, and categorize existing work and future topics along a few reference paradigms that fuse existing works around a reference architecture.
The focus will be on database-oriented issues, in the sense of focusing on issues and drawing parallels with the principles and techniques of database systems, as follows:
We will provide a quick overview of relevant standards as abstractions, such as: XQuery/XPath data model and its labeled tree counterpart, a classification of XQuery usages, and an outline of an XQuery processing reference architecture. Then, we will delve into the details of logical-level XQuery optimization, discussing tree-pattern style abstractions, sub-expression factorization, and view-based query answering. We will then turn to physical level optimization, briefly discussing storage, indexing and relational shredding schemes for XML.
|
 |
Ioana Manolescu
Ioana Manolescu is a researcher in the Gemo group in INRIA Futurs, France.
Ioana has obtained her PhD in 2001 from University of Versailles and INRIA, France, working on query optimization for distributed databases, and XML.
Her thesis work on distributed query optimization was incorporated into the Medience French start-up. She has worked as a post-doc in Politecnico di Milano, Italy, extending the WebML web modelling model to cope with Web services and workflow specification. Her current research topics include XML data storage, query algebras and query processing, XML compression, XML data cleaning, and distributed data and process management based on Web services. Ioana has (co-)authored several tutorials and advanced courses for the EDBT database summer school. More information on Ioana's projects can be found at
http://www-rocq.inria.fr/~manolesc.
Yannis Papakonstantinou
Yannis Papakonstantinou is an Associate Professor of Computer Science and Engineering at the University of California, San Diego. His research is in the intersection of database and Internet technologies. Yannis has published over fifty research articles in scientific conferences and journals, given tutorials at major conferences, and served on journal editorial boards and program committees for numerous international conferences and symposiums. He was the co-Chair of WebDB 2002, the co-Chair of XIME-P 2004, the General Chair of ACM SIGMOD 2003 and the Vice PC Chair for the "XML, Metadata and Semistructured Data" track of IEEE ICDE 2004.
In 1998, Yannis received the NSF CAREER award for his work on integrating heterogeneous data. In 2000 Yannis founded Enosys Software, which built the first generally available distributed XQuery processor, along with software for XML-based integration of distributed sources, and was sold in
2003 to BEA Systems. Yannis holds a Diploma of Electrical Engineering from the National Technical University of Athens and MS and Ph.D. in Computer Science from Stanford University (1997). His complete bio is available at
http://www.db.ucsd.edu/people/yannis.htm
|
 |
Advanced Technology Seminar 2
Rank-Aware Query Processing and Optimization
Ihab F. Ilyas (University of Waterloo)
Walid G. Aref (Purdue University)
April 5th (Tue), 16:00-17:30, Room A
Efficient execution of ranking query is increasingly becoming a major challenge for database technology. Nowadays, many applications have requirements that can only be matched by a combination of information retrieval systems and DBMSs. DBMSs provide efficient update, indexing, concurrency and recovery. On the other hand, IR on text and multimedia requires techniques involving uncertainty and ranking for effective retrieval. A true integration is likely to require significant changes in the standard database techniques for indexing and query optimization and may require new query languages.
The main goal of this seminar is to give an in-depth look on supporting ranking queries as an increasingly interesting area of research. We cover the state-of-the-art techniques in research prototypes and industry-strength database engines for efficient handling of ranking and top-k queries. We give an inclusive background on ranking, voting and rank-aggregation algorithms. Then we give a detailed coverage of ranking query models--covering top-k selection and top-k join queries, and the various approaches recently proposed by researchers to support these queries in database systems. We focus primarily on how to integrate ranking as a new query processing and optimization dimension, with the aim of supporting ranking queries as a basic and core functionality. The seminar identifies several challenges that need to be addressed towards a true support for ranking and effective retrieval in database management systems.
This seminar is targeted at general database researchers. The seminar is also of interest to audience with industrial background as it describes and summarizes different attempts to integrate new functionalities in industrial database management systems. The seminar gives several motivating examples and challenging applications that are in real need for efficient handling of ranking queries. More importantly, the seminar highlights some interesting challenges in rank-aware query processing and optimization.
|
 |
Ihab F. Ilyas
Ihab F. Ilyas is an assistant professor at the school of computer Science, University of Waterloo. He obtained his Ph.D. in 2004 from Purdue University, and an M.Sc. and a B.Sc. from the University of Alexandria, Egypt. His main research interests include advanced query processing and optimization, self-managing and adaptive computing, and non-traditional database systems (e.g., multimedia and spatial databases). In his recent Ph.D. thesis, he has introduced novel query processing and optimization techniques for top-k queries in relational database systems. For more information, visit http://db.uwaterloo.ca/~ilyas
Walid G. Aref
Walid G. Aref is an associate professor of computer science at Purdue University. His research interests are in developing database technologies for emerging applications, e.g., spatial, multimedia, genomics, and sensor databases. He is also interested in indexing, data mining, and geographic information systems (GIS). His research has been supported by the NSF, Purdue Research Foundation, CERIAS, Panasonic, and Microsoft Corp. In 2001, he received the CAREER Award from the National Science Foundation. He is in the editorial board of the VLDB Journal and is a member of the ACM and the IEEE. For more information, visit http://www.cs.purdue.edu/~aref
|
 |
Advanced Technology Seminar 3
Data Stream Query Processing
Nick Koudas (AT&T Labs-Research)
Divesh Srivastava (AT&T Labs-Research)
April 6th (Wed), 11:00-12:30/14:00-15:30, Hall
Measuring and monitoring complex, dynamic phenomena --
traffic evolution in internet and telephone communication
infrastructures, usage of the web, email and newsgroups,
movement of financial markets, atmospheric conditions --
produces highly detailed stream data, i.e., data that
arrives as a series of "observations", often very rapidly.
With traditional data feeds, one modifies and augments
underlying databases and data warehouses: complex queries
over the data are performed in an offline fashion, and real
time queries are typically restricted to simple filters.
However, the monitoring applications that operate on modern
data streams require sophisticated real time queries (often
in an exploratory mode) to identify, e.g., unusual/anomalous
activity (such as network intrusion detection or telecom
fraud detection), based on intricate relationships between
the values of the underlying data streams.
Stream data are also generated naturally by (message-based)
web services, in which loosely coupled systems interact by
exchanging high volumes of business data (e.g., purchase
orders, retail transactions) tagged in XML (the lingua
franca of web services), forming continuous XML data
streams.
The objective of this tutorial is to provide a comprehensive
and cohesive overview of the key research results in the
area of data stream query processing, both for SQL-like and
XML query languages.
|
 |
Nick Koudas
Nick Koudas is a Principal Technical Staff Member at ATT
Labs-Research. He holds a Ph.D. from the University of
Toronto, an M.Sc. from the University of Maryland at College
Park, and a B.Tech. from the University of Patras in Greece.
He serves as an associate editor for the Information Systems
journal and the IEEE TKDE journal. He is the recipient of
the 1998 ICDE Best Paper award. His research interests
include core database management, metadata management and
its applications to networking.
Divesh Srivastava
Divesh Srivastava is the head of the Database Research
Department at ATT Labs-Research. He received his Ph.D. from
the University of Wisconsin, Madison, and his B.Tech. from
the Indian Institute of Technology, Bombay, India. He was a
vice-chair of ICDE 2002, and is on the editorial board of
the ACM SIGMOD Digital Review. His current research
interests include XML databases, IP network data management,
and data quality.
|
 |
Advanced Technology Seminar 4
Online Mining Data Streams: Problems, Applications, Techniques and Progress
Haixun Wang (IBM T.J. Watson Research Center)
Jian Pei (Simon Fraser University)
Philip S. Yu (IBM T.J. Watson Research Center)
April 7th (Thu), 11:00-12:30, Hall
In many emerging data intensive applications, including applications
in sensor networks, stock market analysis, network communication
management and intrusion detection, a tremendous volume of data
arrives in the form of continuous streams. Online mining of data
streams for knowledge discovery has become a novel and rapidly growing
research direction in the last couple of years. Recently, a few
exciting results have been published in this area, while at the same
time, even more challenging problems have been identified. The
seminar will present a brief tutorial of the inherent challenges in
mining data streams, a survey on the latest results in this line of
research, and an introduction to some real-life applications.
Haixun Wang
Haixun Wang received the Ph.D. degree in computer science from
University of California at Los Angles in 2000. He also holds the
B.S. and the M.S. degree, both in computer science, from Shanghai Jiao
Tong University. He is currently a research staff member at IBM Thomas
J. Watson Research Center.
His research interest includes data mining, machine learning, database
language and systems, database indexing techniques, XML, and
bioinformatics. He has published more than 50 research papers in
referred international journals, conferences, and workshops. He has
served in the program committees of international conferences and
workshops, including SIGKDD'04, ICDM'04, ICDE'04, and SIAM Data
Mining'04. He has been a reviewer for some leading academic journals,
including ACM Transaction on Database Systems, IEEE Transaction on
Knowledge and Data Engineering, Data Mining and Knowledge Discovery,
and Knowledge and Information Systems. He is a member of the ACM, the
ACM SIGMOD, the ACM SIGKDD and the IEEE Computer Society.
|
 |
Jian Pei
Jian Pei received the B. Eng. and the M. Eng. degrees, both in Computer
Science, from Shanghai Jiao Tong University, China, in 1991 and 1993,
respectively, and the Ph.D. degree in Computing Science from Simon Fraser
University, Canada, in 2002. He was a Ph.D. candidate in Peking University
in 1997-1999.
He is currently an Assistant Professor of Computing Science at Simon
Fraser University, Canada. In 2002 - 2004, he was an Assistant Professor
of Computer Science and Engineering at the State University of New York at
Buffalo, USA.
His research interests can be summarized as developing advanced data
analysis techniques for emerging applications. Particularly, he is
currently interested in various techniques of data mining, data
warehousing, online analytical processing, and database systems, as well
as their applications in bioinformatics. His current research is supported
in part by the National Science Foundation (NSF).
He has published over 50 research papers in refereed journals,
conferences, and workshops, has served in the program committees of over
40 international conferences and workshops, and has been a reviewer for
some leading academic journals. He is a member of the ACM, the ACM SIGMOD,
the ACM SIGKDD, the IEEE Computer Society and Sigma Xi.
Philip S. Yu
Philip S. Yu is the manager of the Software Tools and Techniques group
at the IBM Thomas J. Watson Research Center. The current focuses of
the project include the development of advanced algorithms and
optimization techniques for data mining, anomaly detection and
personalization, and the enabling of Web technologies to facilitate
E-commerce and pervasive computing.
Dr. Yu's research interests include data mining, Internet applications
and technologies, database systems, multimedia systems, parallel and
distributed processing, disk arrays, computer architecture,
performance modeling and workload analysis. Dr. Yu has published more
than 340 papers in refereed journals and conferences. He holds or has
applied for more than 200 US patents. Dr. Yu is an IBM Master
Inventor.
Dr. Yu is a Fellow of the ACM and a Fellow of the IEEE. He is the
Editor-in-Chief of IEEE Transactions on Knowledge and Data
Engineering. He is an associate editor of ACM Transactions of the
Internet Technology and also Knowledge and Information Systems
Journal. He is a member of the IEEE Data Engineering steering
committee. He also serves on the steering committee of IEEE Intl.
Conference on Data Mining. He received an IEEE Region 1 Award for
"promoting and perpetuating numerous new electrical engineering
concepts", and the IEEE 2003 ICDM Innovation Award.
Philip S. Yu received the B.S. Degree in E.E. from National Taiwan
University, Taipei, Taiwan, the M.S. and Ph.D. degrees in E.E. from
Stanford University, and the M.B.A. degree from New York University.
|
 |
Advanced Technology Seminar 5
Web Service Coordination and Emerging Standards
Fabio Casati (HP Labs)
Gustavo Alonso (ETH Zürich)
April 7th (Thu), 14:30-16:00/16:30-18:00, Hall
Web services, and more in general service-oriented architectures (SOAs), are emerging as the technologies and architectures of choice for implementing distributed systems and performing application integration within and across companies' boundaries. In this tutorial we describe Web services from an evolutionary perspective, with an emphasis on their utilization for enterprise application integration and service-oriented architectures. The tutorial covers basic middleware problems and shows how the solutions to these problems have finally evolved into what we call today Web services.
The first part of the tutorial is intended to put Web services in the right perspective, particularly in terms of what can be and what cannot be done today. This is an important aspect of the tutorial as the almost daily appearance of new specifications and self-proclaimed standards has led to a very confusing set of ideas around Web services.
The tutorial then focuses on two of the most important and innovative aspects of Web services: business protocols and service composition. The tutorial discusses the need for business protocols and the opportunities they bring, and presents different approaches to protocol modeling in terms of languages, formalisms, and expressive power. It also stresses the need for a protocol algebra and shows how protocol modeling and protocol algebras can together form the basis for supporting and automating many aspects of service development and execution. With respect to service composition, the tutorial motivates the need and opportunity for service composition models and technologies, compares different approaches, and analyzes standardization proposals.
|
 |
Fabio Casati
Fabio Casati is a senior researcher at HP Labs, Palo Alto. He got his PhD from Politecnico di Milano (Italy) in 1999. His research interests include business processes, Web services, business-aware application management, and "middleware intelligence" (embedding data mining technologies into the middleware). He has led the development of several applications and is author of more than 60 papers in international conferences and journals. He initiated the research in Business Process Intelligence and organized the first conferences and journal issues on e-services. He is also co-author of a book on Web services.
Fabio has also served as chair and PC member for dozens of conferences in the areas of databases, information systems, and Web services.
Gustavo Alonso
Gustavo Alonso is professor of Computer Science at the Swiss Federal Institute of Technology in Zurich (ETHZ). Gustavo Alonso holds degrees in Telecommunications Engineering from the Madrid Technical University
(1989) and in Computer Science (M.S. 1992, Ph.D. 1994) from the University of California at Santa Barbara. After graduating, he was a visiting scientist at the IBM Almaden Research Laboratory in San Jose, California. Currently, Gustavo Alonso leads the Information and Communication Systems Research Group at ETH Zurich. His research interests include Web Services, grid and cluster computing, databases, workflow management, scientific applications of database and workflow technology, pervasive computing and dynamic aspect oriented programming.
Gustavo Alonso is co-author of a recently published book on Web Services (Springer Verlag, Berlin 2004, ISBN 3-540-44008-9) and has participated in numerous conferences, panels and projects related to the topic. He also regularly works as an independent consultant in enterprise application integration, Web Services, and middleware projects.
|
 |
Advanced Technology Seminar 6
Database Architectures for New Hardware
Anastassia Ailamaki (Carnegie Mellon University)
April 8th (Fri), 9:00-10:30/11:00-12:30, Hall
Thirty years ago, DBMS stored data on disks and cached recently used data in main memory buffer pools, while designers worried about improving I/O performance and maximizing main memory utilization. Today, however, databases live in multi-level memory hierarchies that include disks, main memories, and several levels of processor caches. Recent research shows that database performance is directly influenced by all levels of the underlying computer hardware and devices.
Four (often correlated) factors have shifted the performance bottleneck of data-intensive commercial workloads from I/O to the processor and memory subsystem. First, storage systems are becoming faster and more intelligent (now disks come complete with their own processors and caches). Second, modern database storage managers aggressively improve locality through clustering, hide I/O latencies using prefetching, and parallelize disk accesses using data striping. Third, main memories have become much larger and often hold the application's working set.
Finally, the increasing memory/processor speed gap has pronounced the importance of processor caches to database performance.
This tutorial aims at (a) explaining why database performance depends on modern processor and memory microarchitectures, (b) surveying and contrasting research on the topic over the past decade, and (c) discussing future research challenges. We will first survey the computer architecture and database literature on understanding and evaluating database application performance on modern hardware. We will motivate the problem of database performance on modern hardware by discussing how database and computer microarchitecture technologies have evolved over the past three decades, and present approaches and methodologies for characterizing database workloads on modern processors. Then, we will present techniques proposed in the literature to alleviate the problem and their evaluation. Finally, we will discuss open problems and future directions.
|
 |
Anastassia Ailamaki
Anastassia Ailamaki received a B.Sc. degree in Computer Engineering from the Polytechnic School of the University of Patra, Greece, M.Sc. degrees from the Technical University of Crete, Greece and from the University of Rochester, NY, and a Ph.D. degree in Computer Science from the University of Wisconsin-Madison. In 2001, she joined the Computer Science Department at Carnegie Mellon University as an Assistant Professor. Her research interests are in the broad area of database systems and applications, with emphasis on database system behavior on modern processor hardware and disks. Her projects at Carnegie Mellon (including Staged Database Systems, Cache-Resident Data Bases, the Fates Storage Manager, and PUMA2), aim at building systems to strengthen the interaction between the database software and the underlying hardware and I/O devices. Her other research interests include automated database design for scientific databases, storage device modeling, and internet querying. She has received three best-paper awards (VLDB 2001, Performance 2002, and ICDE 2004), an NSF CAREER award (2002), and IBM Faculty Partnership awards in 2001, 2002, and 2003. She is a member of IEEE and ACM.
|
 |
Advanced Technology Seminar 7
Data Mining Techniques for Microarray Datasets
Lei Liu (University of Illinois at Urbana-Champaign)
Jiong Yang (Case Western Reserve University)
Anthony K. H. Tung (National University of Singapore)
April 8th (Fri), 9:00-10:30/11:00-12:30, Room A
Development in microarray technology has result in revolutionary changes in biological research. Using microarrays, the expression level for thousands of genes can be monitored simultaneously, providing biologists with new ways to gain insight into the complex interaction in living organisms. To do so however, biologists must first overcome the challenge involved in analyzing the large and complex datasets that are generated from microarray experiments. Data mining research, which focuses on scalable and effective knowledge discovery from databases, can provide timely solutions for the biologists in these aspects. In this proposed seminar, we aim to provide platform in which various aspects of microarray data analysis will be introduced. In the first part of the seminar, we will discuss in layman term how microarray datasets are generated and used in biological research. We will use example from the real projects that we participate in to illustrate the potential of different technologies. In the second part of the tutorial, we will discuss existing data mining tools and methods used for analyzing the microarray data sets and their biological implications. Finally, we will present a set of open problems and future research directions for microarray data analysis.
|
 |
Lei Liu
As the founding director of the bioinformatics unit, Dr. Lei Liu joined the W. M. Keck Center for Comparative and Functional Genomics in 1999. Prior to coming to the University of Illinois, he worked as a postdoctoral fellow for two years at the Department of Computer Science and Engineering at the University of Connecticut, where he also received a Ph.D. in cell biology. His expertise is in the areas of comparative genomics, biological databases, and data mining. He has been working in the microarray analysis and data mining for more than three years and co-authored several papers in that area. He has been organizing and participating in many workshops on microarray analysis at the University of Illinois. He has participated recently in the international workshop on "Statistical Methods in Microarray Analysis" in Singapore and presented a talk on multiple platform comparison. He collaborates with computer scientists and statisticians on developing new algorithm for microarray data mining.
Jiong Yang
Dr. Jiong Yang received his Ph.D. degree from the computer science department of UCLA at 1999. After graduation, he joined IBM T. J. Watson research center as a research staff member. Later, Dr. Yang worked as a visiting assistant professor at UIUC computer science department. Since July 2004, he has been the Schroeder Assistant Professor of the EECS department at the Case Western University. Dr. Yang has published more than fifty research articles in international conferences and journals. His recent research interests include mining biological and mobile data. Dr. Yang served on the program committees of various international conferences and workshops. He was also the guest editor of the special issue of IEEE TKDE on mining biological data.
Anthony K. H. Tung
Dr. Anthony K. H. Tung is currently an Assistant Professor in the Department of Computer Science, National University of Singapore (NUS). He received both his B.Sc. and M.Sc. in computer sciences from the National University of Singapore in 1997 and 1998 respectively. In 2001, he received the Ph.D. in computer sciences from Simon Fraser University (SFU). His research interests involve various aspects of databases and data mining (KDD) including buffer management, frequent pattern discovery, spatial clustering, outlier detection and classification analysis. Recent interest also includes data mining for microarray data and 3D protein structures, spatial indexing and sequences searches.
|
|