Center for Information Systems Integration and Evolution

  • This page is provided to historical purposes, providing projects and publications for work done by the former 

Center for Information Systems Integration and Evolution.

Knowledge Rovers

Participating Faculty: Kerschberg, Gomaa, Jajodia, Motro
Visiting Scholars: Len Seligman, Jong Pil Yoon

To access publications produced in this project, please follow this link.

Knowledge rovers represent a family of cooperating intelligent agents that may be configured to support enterprise tasks, scenarios, and decision-makers. These rovers play specific roles within an enterprise information architecture, supporting users, maintaining active views, mediating between users and heterogeneous data sources, refining data into knowledge, and roaming the Global Information Infrastructure seeking, locating, negotiating for and retrieving data and knowledge specific to their mission. The concept of Knowledge Rovers serves as a metaphor for the family of cooperating intelligent agents that support an enterprise's information architecture. The goal is to configure rovers automatically with appropriate knowledge bases (ontologies), task-specific information, negotiation and communication protocols for specific scenarios.

The family of rovers supports the data and information infrastructure by providing specialized information mediation services such as: (1) approximate consistency services which monitor deviations of cached objects and the underlying databases and take actions based on user-specified consistency conditions, (2) object replication services that ensure object availability, reliability, performance, and survivability, and (3) information repository services consisting of ontology services, object location services, and domain servers that integrate heterogeneous types of data obtained from diverse heterogeneous sources including the Internet.

[1] S. Jajodia and L. Kerschberg, eds., Advanced Transaction Models and Architectures, Kluwer Academic Publishers, 1997, 381 pages.

[2] L. Kerschberg, "Knowledge Rovers: Cooperative intelligent agent support for enterprise information architectures,'' Springer-Verlag LNCS, Vol. 1202, 1997, pp. 79-100.

[3] L. Kerschberg, "The role of intelligent software agents in advanced information systems,'' Springer-Verlag LNCS, Vol. 1271, 1997, pp. 1-22.

[4] O. Wolfson, S. Jajodia, and Y. Huang, "An adaptive data replication algorithm,'' ACM TODS, 22(2)1997, pp. 255-314.

[5] L. Seligman and L. Kerschberg, "Federated Knowledge and Database Systems: A New Architecture for Integrating of AI and Database Systems," In Advances in Databases and Artificial Intelligence, Vol. 1: The Landscape of Intelligence in Database and Information Systems, vol. 1, L. Delcambre and F. Petry, Eds.: JAI Press, 1995.

[6] L. Seligman and L. Kerschberg, "A Mediator for Approximate Consistency: Supporting "Good Enough" Materialized Views," Journal of Intelligent Information Systems, vol. 8, pp. 203 - 225, 1997.


Data Mining and Knowledge Discovery in Databases

Participating Faculty: Kerschberg
Research Associates: Jong Pil Yoon, Sookmyung Womans University, Seoul, Korea
Students: Linda Tischer, Trish Carbone, Sook-wong Lee, Jim Ribeiro

Data Mining (DM) and its associated field of Knowledge Discovery in Databases (KDD) has grown in importance as organizations realize that the large storehouses of data collected over many years, hold valuable information that if properly mined, distilled, curated and managed, can be used for strategic advantage. There are many examples of successful data mining projects in the financial, governmental, scientific and business areas.

Research and development activities within CISE focus on the use of Data Mining and KDD in large databases. We have developed, in conjunction with Dr. Michalski's Machine Learning and Inference Laboratory, a multi-strategy learning system called INLEN. In addition, we are exploring was to mine data from multiple information sources, on the premise that the association and correlation of information from multiple sources will provide interesting and useful results. We are also exploring the use of Cooperative Information Agents to obtain information from multiple sources including the World Wide Web, in an effort to mining information from public information sources.

One very important research issue is the application of DM&KDD to very large databases. Here one needs to perform the data mining and KDD tasks within the database proper, and not by simply extracting a subset of the database into main memory. We have developed a series of algorithms to perform DM&KDD within the database.

We have applied our data mining techniques to the area of Precision Agriculture , funded by a grant from INEEL, the Idaho National Engineering and Environmental Laboratory. Here we investigated the use of DM&KDD tools and techniques to determine the factors influencing wheat yields. These include soil composition and nutrients, pesticide and fertilizer applications, irrigation, etc. The visualization of the relevant factors discovered by the DM&KDD tools was supported by a Geographic Information System. This provided a two-dimensional mapping of the knowledge to the topology of the farmland, and was an important instrument in conveying complex relationships to experts and farmers in a natural and intuitive framework and format.

Our most recent research involves a methodology and tools for Query-Driven DM&KDD within the database management system. The approach consists of the user specifying an SQL query against a relational database. The query result and the relational tables accessed are used to create the set of positive examples, while the complement of that set is used to construct the set of negative examples. Association rules and mined and analyzed for their relevance and interesting-ness to the original query. The results are then presented to the user for feedback and tuning..

Publications in Data Mining and Knowledge Discovery in Databases

[1] R. S. Michalski, L. Kerschberg, K. Kaufman, and J. Ribeiro, "Mining for Knowledge in Databases: The INLEN Architecture, Initial Implementation and First Results," Journal of Intelligent Information Systems, vol. 1, pp. 85-113, 1992.

[2] J. P. Yoon and L. Kerschberg, "A Framework for Knowledge Discovery and Evolution in Databases," IEEE Transactions on Knowledge and Data Engineering, 1993.

[3] J. Ribeiro, K. Kaufman, and L. Kerschberg, "Knowledge Discovery in Multiple Databases," presented at First International Conference on Knowledge Discovery and Data Mining, Montreal, CA, 1995.

[4] L. Kerschberg, "Knowledge Rovers: Cooperative Intelligent Agent Support for Enterprise Information Architectures," in Cooperative Information Agents, vol. 1202, Lecture Notes in Artificial Intelligence, P. Kandzia and M. Klusch, Eds. Berlin: Springer-Verlag, 1997, pp. 79-100.

[5] L. Kerschberg, "The Role of Intelligent Agents in Advanced Information Systems," in Advanced in Databases, vol. 1271, Lecture Notes in Computer Science, C. Small, P. Douglas, R. Johnson, P. King, and N. Martin, Eds. London: Springer-Verlag, 1997, pp. 1-22.

[6] L. Kerschberg, S. W. Lee, and L. Tischer, "A Methodology and Life Cycle Model for Data Mining and Knowledge Discovery for Precision Agriculture," Center for Information Systems Integration and Evolution, George Mason University, Fairfax, Final Report 1997.

[7] J. P. Yoon and L. Kerschberg, "A Query-Driven Rule Discovery Method Using Association and Spanning Operations," Center for Information Systems Integration and Evolution, George Mason University, Fairfax, Technical Report (Submitted for Publication), 1998. 

Domain Engineering

Dr. Gomaa and Dr. Kerschberg have been conducting research in the area of domain modeling for software reuse. The project centers on the concept of an Evolutionary Domain Life Cycle (EDLC) Model, which is a highly iterative software life cycle model that eliminates the distinction between development and maintenance and addresses the development of a family of systems. The EDLC involves domain analysis, domain specification, and domain design for a family of systems.

We have implemented a proof-of-concept demonstration of the Evolutionary Domain Life Cycle Model, called the Knowledge-Based Software Engineering Environment (KBSEE), using both commercial-off-the-shelf software (COTS) as well as custom-developed software. The configuration uses: 1) Software through Pictures as the multiple viewpoint graphical editor, 2) an Eiffel-based Object Repository which provides a persistent object store and checks for consistency among object specifications, 3) a Feature/Object Editor that allows the Domain Analyst to represent the dependencies among features and the objects supporting those features, and 4) a knowledge-based requirements elicitation tool, implemented in NASA's CLIPS shell, to assist the target system requirements engineer in selecting those features and objects that will constitute the target system.

The Object Repository manages the evolution of the object specifications, maintains consistency according to the information model, and provides data/knowledge translation services to other knowledge-based tools. One such tool is the knowledge-based requirements elicitation tool (KBRET), which elicits target system requirements by having users select desired features and object types from the domain model. KBRET accesses the object repository to obtain the knowledge required for its reasoning. In addition, KBRET has several domain-independent knowledge sources for browsing, for selecting target system features, and for generating a target system. In particular, KBRET has special reasoning methods for non-monotonic reasoning; users may select a feature and its dependent object types, and then retract that feature, causing many dependent objects to be retracted from the target system specification. The Target System Specification is stored in the object repository, and another translator provides the appropriate knowledge representation for a program to tailor the multiple viewpoints of the domain model to reflect the object types in the target system specification.

We have applied this framework, the EDLC and the associated environment to modeling Payload Operations Control Centers for NASA, Evolutionary Life Cycle Models for the specification of software process models based on risk mitigation for the Software Productivity Consortium, and the Earth Observing System Data and Information System (EOSDIS) for NASA and Hughes Applied Information Systems.

[1] C. Bosch, H. Gomaa, and L. Kerschberg, "Design and Construction of a Software Engineering Environment: Experiences with Eiffel," In IEEE Readings in Object-Oriented Systems and Applications, D. Rine, Ed. Piscataway, NJ: IEEE Computer Society Press, 1995.

[2] H. Gomaa, "A reuse-oriented approach for structuring and configuring distributed applications," Software Engineering Journal, pp. 61-71, 1993.

[3] H. Gomaa, Software Design Methods for Concurrent and Real-Time Systems: Addison-Wesley Publishing Company, 1993.

[4] H. Gomaa, "Software Design Methods for the Design of Large-Scale Real-Time Systems," Journal of Systems and Software, 1994.

[5] H. Gomaa, "Reusable Software Requirements and Architectures for Families of Systems," Journal of Systems and Software, 1995.

[6] H. Gomaa, R. Fairley, and L. Kerschberg, "Towards an Evolutionary Domain Life Cycle Model," Proc. Workshop on Domain Modeling for Software Engineering, OOPSLA, New Orleans, 1989.

[7] H. Gomaa and G. K. Farrukh, "An Approach for Configuring Distributed Applications from Reusable Architectures," IEEE International Conference on Engineering of Complex Computer Systems, Montreal, Canada, 1996.

[8] H. Gomaa, L. Kerschberg, and V. Sugumaran, "A Knowledge-Based Approach for Generating Target System Specifications from a Domain Model," IFIP World Computer Congress, Madrid, Spain, 1992.

[9] H. Gomaa, L. Kerschberg, and V. Sugumaran, "A Knowledge-Based Approach to Domain Modeling: Application to NASA’s Payload Operations Control Centers," Journal of Telematics and Informatics, vol. 9, 1992.

[10] H. Gomaa, L. Kerschberg, V. Sugumaran, C. Bosch, and I. Tavakoli, "A Prototype Domain Modeling Environment for Reusable Software Architectures," International Conference on Software Reuse, Rio de Janeiro, Brazil, 1994.

[11] H. Gomaa, L. Kerschberg, V. Sugumaran, I. Tavakoli, and L. O'Hara, "A Knowledge-Based Software Environment for Reusable Software Requirements and Architectures," Journal of Automated Software Engineering, vol. 3, 1996.

[12] H. Gomaa, D. Menascé, and L. Kerschberg, "A Software Architectural Design Method for Large-Scale Distributed Information Systems," Journal of Distributed Systems Engineering, 1996.

[13] H. Gomaa, "Use Cases for Distributed Real-Time Software Architectures," presented at IEEE International Workshop on Parallel and Distributed Real-Time Systems, Geneva, 1997.

[14] H. Gomaa and G. K. Farrukh, "An Approach for Generating Executable Distributed Applications from Reusable Software Architectures," presented at IEEE International Conference on Engineering of Complex Computer Systems, Montreal, Canada, 1996.

[15] H. Gomaa and G. K. Farrukh, "Automated Configuration of Distributed Applications from Reusable Software Architectures," presented at IEEE International Conference on Automated Software Engineering, Lake Tahoe, 1997.

[16] H. Gomaa and G. K. Farrukh, "A Software Engineering Environment for Configuring Distributed Applications from Reusable Software Architectures," presented at IEEE International Workshop on Software Technology and Practice, London, 1997.

[17] H. Gomaa and R. Pettit, "Integrating Petri Nets with Design Methods for Concurrent and Real-Time Systems," presented at IEEE Workshop on Real-Time Applications, Montreal, 1996.

[18] H. Gomaa and K. Mills, "A Knowledge-based Approach for Automating a Design Method for Concurrent and Real-Time Systems", Proceedings Eighth International Conference on Software Engineering and Knowledge Engineering, Lake Tahoe, CA, June 1996.

[19] H. Gomaa, "A Software Architecture for Earth Observing Systems", International Conference on Earth Observation and Environmental Information, Alexandria, Egypt, October 1997.

[20] H. Gomaa and E. O'Hara, "Dynamic Navigation in Multiple View Software Specifications and Designs," To be published in the Journal of Systems and Software, 1998.

[21] D. A. Menascé, H. Gomaa, L. Kerschberg, "A performance-oriented design methodology for large-scale distributed data intensive information systems," Proc. 1st IEEE International Conf. on Engineering of Complex Computer Systems, 1995.

[22] H. Gomaa, D. Menascé, and L. Kerschberg, "A Software Architectural Design Method for Large-Scale Distributed Information Systems," Journal of Distributed Systems Engineering, 1996.

Large-Scale Scientific Database Systems

Participating faculty: Gomaa, Kerschberg, Wang, Menasce, Kafatos, Michaels
Visiting Scholar: Jong Pil Yoon

To read more about our work in large-scale systems, please click HERE!

The Center for Information System Integration and Evolution (CISE) participated in the NASA-sponsored Independent Architecture Study of the Earth Observing System Data and Information System. This study led to the development of the GMU Federated Client-Server Architecture which was based on the federated approach developed for DARPA's Intelligent Integration of Information Program (I*3).

Dr. Kerschberg and Dr. Michaels teach GMU's Scientific Database Course - one of very few such courses taught world-wide - and several research papers have appeared in the 1997 IEEE Statistical and Scientific Database Management Conference (SSDBM). The EOSDIS and SSDBM-related publications are listed below:

[1] L. Kerschberg, H. Gomaa, D. A. Menasce, J. P. Yoon, "Data and information architectures for large-scale distributed data intensive information systems,'' Proc. IEEE Int'l. Conf. Scientific & Statistical Database Management, 1996.

[2] M. Kafatos, X. S. Wang, et al., "The virtual domain application data center: Serving interdisciplinary earth scientists," Proc. IEEE Int'l. Conf. Scientific & Statistical Database Management, 1997.

[3] D. A. Menasce, H. Gomaa, L. Kerschberg, "A performance-oriented design methodology for large-scale distributed data intensive information systems," Proc. 1st IEEE Int'l. Conf. on Engineering of Complex Computer Systems, 1995.

[4] K. Massey, L. Kerschberg, and G. Michaels, "VANILLA: A Dynamic Data Model for a Generic Scientific Database," International Conference on Statistical and Scientific Database Management, SSDBM, Olympia, WA, 1997.

[5] L. J. Milask, T. Guynup, C. Hammel, L. Kerschberg, and G. Michaels, "An Integrated Scientific Database System and Value-Added Support Center: Application to Ecological Research of the Forest Canopy and Biosphere Interface," International Conference on Statistical and Scientific Database Management, SSDBM, Olympia, WA, 1997.

[6] H. Gomaa, D. Menascé, and L. Kerschberg, A Software Architectural Design Method for Large-Scale Distributed Information Systems. Journal of Distributed Systems Engineering, 1996.

Integrating Heterogeneous and Inconsistent Information

Participating Faculty: Motro

The integration of information from multiple databases has been an enduring subject of research for almost 20 years, and many different solutions have been attempted or proposed. The major goals of this project, called Multiplex, are to (1) define a formal model of multidatabases; (2) provide simple, rich and flexible support for heterogeneity; and (3) in situations where single, authoritative answers are not feasible, either because there is "too little information'' (e.g., an information source went off-line) or there is "too much information'' (e.g., there are multiple, mutually inconsistent answers), provide approximative answers. The present version of Multiplex is available on the Internet as an integration server: after defining a new database scheme, users need only specify links to sources that deliver views of that scheme. Queries (in SQL) submitted to the server are answered transparently from the available sources. The present approach to the resolution of inconsistencies is based on majority votes; our current work is to strengthen this capability, to allow different schemes for resolving inconsistencies, and flexible user control over these schemes.

[1] A. Motro, "Multiplex: A formal model for multidatabases and its implementation,'' Technical Report, ISSE Dept., 1995.

Information Quality and Uncertainty

Participating Faculty: Motro

With more and more electronic information sources becoming widely available, the issue of the quality of these, often-competing, sources has become germane. Since 1993 we have been exploring this relatively neglected subject. We have been proposing a new standard for rating information sources with respect to their quality. This standard, based on the concepts ofsoundness and completeness, attempts to gauge the distance of the information in a database from the truth, and is implemented by combining manual verification with statistical methods. Once a source has been rated for quality, the quality of arbitrary queries is estimated with an appropriately-extended relational algebra. At the present, we are experimenting with this methodology. We plan to incorporate information quality considerations into Multiplex, as a strategy for resolving information inconsistencies. We also plan to address the issue of adjusting quality specifications to reflect changes in the information.

As models of the real world, databases are often permeated with various forms of uncertainty, including imprecision, incompleteness, vagueness, inconsistency and ambiguity. Ever since our work on the Vague database interface (1988), we have sustained continued interest in this area, and have been advocating the adaptation of various uncertainty theories that have been developed within the AI community to the needs of practical information systems. At the present, we are developing our soundness and completeness model of uncertainty, which is based on the proximity of a stored instance of a database and the real-world instance which it tries to approximate.

[1] A. Motro and I. Rakov, "Not all answers are equally good: Estimating the quality of database answers,'' In Flexible Query-Answering Systems (T. Andreasen et al., Editors). Kluwer Academic Publishers, 1997.

[2] A. Motro and P. Smets, Editors. Uncertainty Management in Information Systems: from Needs to Solutions, Kluwer Academic Publishers, 1996, 480 pages.


Cooperative Databases

Participating Faculty: Motro

This area of interest focuses on database retrieval methods that offer alternatives to formal querying (such as SQL). Research in this area has been going on for over 10 years, resulting in several new retrieval paradigms and user interfaces. Highlights include Baroque, an early browser for relational databases (1986); Vague, a user interface to relational databases that permits weakly specified queries (1988); Flex, a tolerant and cooperative query system that can be used satisfactorily by users with different levels of expertise (1990); ViewFinder, a graphical object-oriented database browser (1993, with D'Atri and Tarantino from the University of L'Aquila); and, most recently, Panorama, a database system that annotates its answers to queries with their properties (1996).

[1] A. Motro, "Intensional answers to database queries,'' IEEE TKDE, 6(3)1994, pp. 444-454.

[2] A. Motro, "Cooperative database systems,'' Int'l. Jour. of Intelligent Systems, 11(10)1996, pp. 717-732.

[3] A. Motro, "Panorama: A database system that annotates its answers to queries with their properties,'' Jour. of Intelligent Information Systems, 7(1)1996, pp. 51-73.

Constraint Databases

Participating faculty: Brodsky

Constraints provide a flexible and uniform way to represent and manipulate diverse data capturing spatio-temporal behavior, complex modeling requirements, partial and incomplete information, etc. They have been used in a wide variety of application domains. Constraint databases (CDBs) have recently emerged as a tool for deep integration of heterogeneous data captured by constraints in databases.

This project involves research in the incorporation of successful constraint technology (including such aspects as arithmetic constraints over reals, interval constraint propagation, and combinatorial optimization over finite domains) with database technology, in the framework of CDB, and aimed especially at two broad application areas:

  1. Spatial and temporal, in which there is need to represent such data as complex objects in a low-dimensional space (typically, up to 4-5), movement of objects in 3D-space, transformations among various (possibly polar) coordinate systems, and patterns of behavior in space over time. In these applications low-dimensionality and domain-specific properties of the data can be exploited in developing efficient data structures and algorithms.
  2. Applications requiring mathematical optimization, such as linear programming, in presence of large amounts of data. Mathematical optimization techniques are used to facilitate query evaluation, and, even more importantly, database set-at-a-time processing, indexing, and the ability to keep very large intermediate results can be exploited to facilitate mathematical (combinatorial) optimization.

Examples of spatio-temporal applications include CAD/CAM systems, GIS and environmental systems, command and control systems, such as maneuver planning and data fusion and sensor management. Examples of applications requiring combinatorial optimization and search include manufacturing and warehouse support systems, financial systems such as electronic trade, and many traditional mathematical programming problems involving large amounts of data. We strongly believe that the CDB technology will have a significant impact on these application areas. Moreover, it has the potential to become an integral part of a new generation of DBMS.

The major aspects of this project are (1) constraint modeling, canonical forms and algebras, (2) data models and query languages, (3) indexing and approximation-based filtering, and (4) constraint algebra algorithms and global optimization, and, most importantly, (5) building a system and demonstrating the feasibility of the CDB technology by means of case studies. The more theoretical work on aspects (1)-(4) led to the large-scale development of CCUBE, the first object-oriented database system, developed at GMU. The challenge involves achieving both declarative and efficient querying of large data sets involving constraints. A successful integration of constraint programming techniques with object-oriented or relational database systems is possible, given the current programming and database state of the art, but this is also challenging, given the demands for high level specification and efficiency.

[1] A. Brodsky and Y. Kornatzky, "The Lyric language: Querying constraint objects,''. Proc. ACM SIGMOD. 1995.

[2] A. Brodsky, V.E. Segal, J. Chen and P.A. Exarkhopoulo, "The CCUBE constraint object-oriented database system,'' Constraints, An Int'l. Journal, To appear.

[3] A. Brodsky, C. Lassez, J.-L. Lassez, M.J. Maher, "Separability of polyhedra for optimal filtering of spatial and constraint data,'' Proc.ACM PODS, 1995.

[4] A. Brodsky, J. Jaffar, M. Maher, "Toward practical constraint databases,'' Constraints, An Int'l. Journal, To appear. A preliminary version also appeared in Proc. VLDB Conf., 1993.

[5] A. Brodsky, X.S. Wang: On Approximation-based Query Evaluation, "Expensive Predicates and Constraint Objects,'' Proc. Workshop on Constraints, Databases, and Logic Programming, December 1995.

On-Line Analytical Processing (OLAP)

Participating Faculty: Barbara, Wang

OLAP Data Model We formalize a multidimensional data (MDD) model for OLAP, and develops an algebraic query language called grouping algebra. The basic component of the MDD model is a multidimensional cube, a popular abstraction for multidimensional data. A cube is simply a multidimensional structure that contains at each cell an aggregate value, i.e., the result of applying an aggregate function to an underlying relation. In order to express user queries, relational algebra expressions are then extended to those on basic groupings for obtaining complex groupings, including order-oriented groupings (for expressing, e.g., cumulative sum). We then consider the environment where the multidimensional cubes are materialized views derived from base data situated at remote sites. A multidimensional cube algebra is introduced in order to facilitate the data derivation. We also studied optimization issues.

Quasi-Cubes: Exploiting Approximations Even though vendors have been selling products to support data cubes for a while it is accepted that OLAP products do not scale to large datasets or high dimensions.

There are two obstacles that make scaling difficult. First, there is the issue of database explosion: even though the multidimensional cube is usually sparse, materializing every cell is very often prohibitive. Secondly, the demands on query performance in OLAP are strict (analysts need the answers quickly, so they can figure out the next question to ask to the system). So, even if all cells are materialized, there is a need to support a large variety of queries efficiently.

In this project we investigate techniques for efficiently scaling cubes. These techniques are based in a variety of statistical tools and aim to provide approximations to query answers, trading off errors for better space management or query response. The main idea is to describe regions of the cube by statistical models that can be represented succinctly. In doing so, one relies on the models to reconstruct some of the cells in the cube, incurring in errors in the process. To keep the errors under control, some cell values (namely, the outliers for the models) must be retained. We call these approximated cubes Quasi-Cubes.. Our preliminary results show that this technique is feasible and provides with an excellent way of reducing the storage needs for the cube. Even if the usage of approximations is not possible for a given application, the modeling techniques enable the implementation of systems which provide answers that are progressively polished on-line (until the correct answer is given), eliminating the traditional latency that users experience when they pose queries. This is possible since the models provide the designer with a good classification method for the cells of the cube. Each cell can be put in an error bin, according to the error one would incur if the cell value were to be estimated by the model. When answering a query, the cells in higher error bins are retrieved first while the other cell values are estimated by the models. If one wants to refine the answer, some of the estimated cell values can be replaced by the real ones by retrieving cells in the next error bin, and so on.

[1] D. Barbara and M. Sullivan, "Quasi-Cubes: A space-efficient way to support approximate multidimensional databases,'' Technical Report, ISSE Dept., September 1997.

[2] C. Li and X. S. Wang, "A Data Model for Supporting On-Line Analytical Processing'', Proc. CIKM Conf., 1996.

[3] C. Li and X. S. Wang, "Optimizing Statistical Queries by Exploiting Orthogonality and Interval Properties of Grouping Relations,'' Proc. Int'l. Conf. Scientific and Statistical Database Management, June 1996.

Semantic-based Transaction Processing

Participating Faculty: Ammann, Jajodia

The traditional correctness criteria of serializability forces database designers into tradeoffs among design objectives. For example, in multidatabases, the designer balances the objectives of local design and execution autonomy, decentralized management of global transactions, maintenance of global integrity constraints, and execution history correctness. The last objective is typically assessed with respect to some variant of conflict serializability. Switching to a semantics-based perspective of correctness can greatly reduce the conflict between the remaining objectives. In the case of multidatabases, the conflict can be entirely avoided for certain applications.

We are utilizing the semantics-based perspective in three distinct application areas: multidatabases, secure multilevel databases, and long duration transactions. Additionally, the method holds promise for such areas as database recovery and survivability. The cost of the semantics-based approach is additional off-line transaction analysis early in the lifecycle of a system; however, this up-front cost is amortized over the numerous transaction invocations during the system's lifetime.

[1] P. Ammann, S. Jajodia, and I. Ray, "Applying formal methods to semantic-based decomposition of transactions,'' ACM TODS, 22(2) 1997, pp. 215-254.

[2] P. Ammann, S. Jajodia, and I. Ray, "Ensuring atomicity of multilevel transactions,'' Proc. IEEE Symp. Security and Privacy, 1996, pp. 74-84.

Copyright 2009-2016 by Larry Kerschberg, All Rights reserved.