Object-Oriented vs. Relational

[RU]

Object-Oriented vs. Relational

Today, relational databases absolutely dominate in the market; however, alternative solutions have never died, and they are gradually gaining strength. Still many people are entirely unaware of such alternatives, and those who heard about them are apt to treat them as unpractical experimental toys without any real future. Lotus Notes/Domino is the only example of a long-lived commercial system. However, after it has been bought by IBM, there is no technological progress; a forced adaptation of Notes/Domino to native IBM technologies and common Web platforms has already lead to a diffusion of the original idea, and the prospects for its development are still very poor. It seems like Lotus databases will soon be entirely abandoned in favor of DB2. In early 1990s, the Jasmine database tried to compete with SQL based products, and the start was quite promising. Unfortunately, the project has been abandoned by the authors and re-profiled to serve as an integration platform rather than a database management system. That was the end of the story.

However, there is some hope. The roots of the domination of the relational approach are in the very concept of database as a kind of warehouse to store essentially static entities. The predominant usage of such a store was to extract data and feed them to some presentation software servicing end user requests. In the early days of computer technologies, the only form of report that the majority of computer users understood was a table (probably enhanced with a number of standard diagrams). It is quite logical that the basic structures were chosen to reflect this common need and tuned to maximum performance in table generation. All the other processing occurred outside the DBMS, and did not influence its functionality. Ray Ozzie's Lotus Notes was the first breach in the wall. It was an instance of groupware system, workflow rather than data oriented. Stress on collaborative work, parallel data processing, individualization and secure transactions demanded an entirely different paradigm, since lack of support for multiple synchronized data update threads has always remained the weak place of relational databases. The focus was thus shifted to from a database to an application, combining diverse data in a manner transparent for the user.

Today, when all the commercial DBMS are enhanced with collaborative tools and the means of workflow automation, the dilution of the strictly relational paradigm is evident. It will take time, however, to become aware of the necessity of a quite different organization of data storage and retrieval. As soon as plain table reports will give way to interactive multimedia presentations allowing unfolding into different structures depending on the current context and user needs, tables on disk will become too restrictive. Obviously, this is the general trend with Web development; most modern databases are used as back-end storage in Web applications, and the relational model is already hindering the progress in that area.

Well, let us set aside the commercial aspects and discuss the principal ideas. And the first question is whether there is anything special about object databases (ODB) that would distinguish them from relational databases (RDB).

Basically, an object-oriented database is intended to store objects rather than data; that is, an ODB record may contain both data fields and code required to adapt access to data to a particular environment. Since this is also the principal idea behind object-oriented programming, it seems like ODBs may come more convenient for integration with most programming languages, directly implementing the same ideology. In this sense, RDBs reflect the tradition of the usual structural programming, with code well separated from data.

However, the reality is a little bit less reassuring. Most object-oriented programming languages are statically typed; this means that the code (class methods implemented as dynamically linked libraries) is always effectively separated from object data (an instance structure reproduced at allocation), and there is no real difference between the imperative and functional styles. On the other hand, there are object-oriented looking interfaces to RDBs that eliminate any need in a closer integration.

Similarly, it could be argued that there is no truly object-oriented storage system, since any record is, in fact, a simple structure (a list of fields), while methods are usually defined for a particular presentation form (access method) rather than the record itself. That is, an ODB can be projected onto an RDB equipped with appropriate event handlers; this is the way they do it DB2 ODS for Lotus Notes/Domino.

On the other hand, there is no such thing as a purely relational database either. None of the real DBMS is relational in the exact meaning of the word; they all incorporate numerous non-relational features, and it is exactly this inconsistency that makes them usable and competitive. Noting that the real on-disk structure in such systems is often far from mere tables, to optimize disk usage and the speed of access, the only link to the relational approach is in the query construction language, which is usually some dialect of SQL. That is, we don't have any RDBs—we only have SQL databases.

With all that in view, a future growth of the ODB share in data storage technologies is not entirely unbelievable, though, of course, one could also expect a third way, different from both ODB and RDB and combining their advantages. Still, to the extent of our present knowledge, it would be useful to consider a few pieces of common prejudice.

Prejudice 1. Object-oriented data structures are entirely different from tables.
This is not true. Primarily, a table is an object too, with its specific properties and methods, and any combination of tables can be expressed in the object-oriented (dynamic) language no worse than in (essentially static) SQL. On the other hand, any object collection can be treated as a table, with one row for each object and separate columns for each of the object's properties. This possibility, for example, was used in Lotus Notes databases, to enable plain SQL queries based on a particular view, or a form. Considering methods as a kind of properties (references to code), one obtains a quite relational structure.

Prejudice 2. Object-oriented organization is too cumbrous for flexible reporting.
This opinion is based on the experience of the currently available ODBs, and primarily Lotus Notes and Jasmine; however, the absence of an efficient implementation does not mean that such implementations are utterly impossible. Of course, if tables is all we need, ODBs must be enhanced with additional tools for quick data extraction along a specific slice, just as relational databases have to be extended by workflow support tools. However, with the development of data mining techniques, knowledge management, many-aspect analysis and semantic modeling, the benefits of the object-oriented approach will become obvious. Instead of browsing through a huge table in a vain hope to encounter some regularities, the user will observe a few global features, unfolding them to get more details, if necessary. Hierarchical databases that can be mined starting from an arbitrarily chosen element are much better fit for such reporting than rigid tables.

In real life, relational databases are not as user-friendly as they claim to be. It takes a qualified user and hard work to construct a correct SQL query to produce a nontrivial report combining different tables. Similarly, in the Lotus Notes client, one can easily design customized views to produce simple table-like reports, while a skilled programmer is needed to program a view in a more complex case. A carefully designed ODB is much more comprehensible, since the user only deals with the natural properties of the objects, rather than with abstract data fields. However, as soon as the user needs a non-standard structure, effectively corresponding a different kind of object, the transition from the existing (pre-defined) structures to a new one may cause problems. A normalized design is deemed to be the correct choice when many irregular queries are expected; unfortunately, it is a high degree of normalization that makes RDBs so unfriendly, with all those interrelated tables, with nobody knowing where to find what. As a true solution, one could dream of a hierarchical database allowing dynamic structure formation, with automated adjustment of on-disk structure, when needed.

Prejudice 3. Object-oriented databases are huge.
Once again, this is only a sad experience of the available implementations. It is well known that relational systems have gone quite a way from the simple tables in the DBASE format that resulted in file sizes far above a typical Lotus NSF size. With modern powerful computers, RDBs employ all kinds of data compression and decompression on the fly, which was impossible in the early times of data management. Obviously, the file size depends on the complexity of data. Thus, a static item does not require any more space than is needed to store its value; on the contrary, a workflow element requires additional information about modification date/time, versioning, encryption, digital signatures etc. A perfect ODS would allow complete control over the parameters (fields, properties) to store, implementing various space saving techniques.

For example, consider the overhead due to the necessity to maintain multiple dependencies, and hence the appearance of reference fields in any record, both in ODBs and RDBs. With growing complexity, the number of such auxiliary data threatens to overweigh the meaningful content. To overcome this difficulty, we could store the link structure as a separate object, dynamically creating any structures over the same data set. Even more scarcity could be achieved with creating shortcuts to common object classes and aliases for common values. That is, huge files are mainly due to the lack of object-oriented technologies, rather than their involvement.

Prejudice 4. Object-oriented databases are too complex to efficiently manage.
It depends. Indeed, a RDB is based on a limited set of universal operations, while an ODB needs specific "drivers" for each object class. However, all modern software is extensible, using the standard integration features. It is enough to implement such integrators in the core of an ODB, along with a few basic classes, so that all the rest could be added through third-party components (disposable when no longer needed). A properly organized ODB would be much faster than any RDB in the long run, since ODB processing time does not much depend on the complexity of query.

Changes in data organization are much more tractable in ODBs. In an SQL system, any rearrangement of the tables is a real headache, since all the queries must modified, and the users (or applied programmers) have to learn where the data they need is located in the new infrastructure. Sometimes, a trivial change (like allowing a lengthier key string) would raise almost insurmountable difficulties. In a hierarchical database, any change is transparent for the user, since data are separated from links, and the same fore-end organization can correspond to any back-end representation.

Prejudice 5. ODBs are not as scalable as RDBs.
It often said that high transaction volumes and user counts are incompatible with ODBs. This is a superficial opinion derived from the comparison of ODBs and RDBs in incomparable respects. As long as we compare the both approaches within the same functionality, any difference disappears. With RDBs, most workflow and presentation tasks are performed in an external manner, by the other software querying the RDB for data. To compare the performance of such systems with ODBs, we need to include these outer functions in comparison as well. As mere data store, that is, with simple data structures that do not assume any inside processing, ODBs are as scalable as RDBs. Both are equally inefficient when it comes to storing huge voluminous documents (e.g. multimedia); such records are much better to keep as separate files.

Separating data from its processing may be an advantage in one case and an obstacle in another. In an ideal (hierarchical) database, the level of separation would be controllable and adjustable to the application and the current operating environment.

Prejudice 6. ODBs are still far away, while RDBs are here at now.
This is a serious argument. Race for quick profit withholds software companies from innovative development. However, the pressure of objective necessity drives the market to more modularity, uniformity and flexibility; so far, these goals can be achieved by extending the traditional SQL based systems, that are going farther and farther from mere RDBs. Experimenting with new technologies in data and knowledge management, as well as new cooperation and workflow paradigms, is going on. Quite possibly, the breakthrough is near at hand, and it will soon be pushed into the market by one of the big players trying to seize the promising niche.

[Computers]

[Science] [Unism]