Introduction to Relational Database Design

A carefully designed database is the foundation for a system that users and programmers can easily access and that can accept insertions and deletions while remaining error-free. Proper database design is critical to the system development process.

A carefully designed database is the foundation for a system that users and programmers can easily access and that can accept insertions and deletions while remaining error-free. Proper database design is critical to the system development process. Without some guidelines, however, database design can be difficult. Fortunately, others who have gone before you have found solutions to many of the problems you might encounter.

Several approaches to organizing and manipulating database data, called data models, have evolved in the past twenty years. One important development has been the relational database model. The relational model is based on the concept that data is organized and stored in two-dimensional tables called relations. You can think of a relation as a file, and of each row in the relation as a record. Just as a record is a collection of data items, a row is made up of many attributes. Although the terms are different, the concepts are similar. We will say more about relations and relational terminology shortly. Based on the user's requirements, we need to decide which attributes (or data items) should be collected together and stored in the same relation. This is one aspect of logical database design.

The relational model is important for design even if a different model is used for implementation. Consequently, you need to become familiar with relational terminology, the relational database model is based on the concept that data is stored in two-dimensional tables, called relations. Each row in the table represents a record. Each column represents a field. The entire table is roughly equivalent to a file. A row is called a tuple . A column is called an attribute. Certain restrictions are imposed on relations. First, attributes are single-valued; neither repeating groups nor arrays are allowed. Second, entries in any column are all of the same kind. For example, one column may contain employee numbers, and another may contain employee names. Further, each attribute has a unique name, and attribute positions are insignificant. Finally, no two tuples in a relation may be identical, and the order of the tuples is also insignificant. The terms relation, tuple, and attribute come from the field of mathematics and are alien to most people. You might find the terms file, record, and field more comfortable. Users, on the other hand, might prefer the terms table, row, and column when referring to the structure of stored data. In keeping with industry usage, we will use the terms relation, row, and attribute when discussing logical database design. Tuple, although technically correct and consistent with the other terms, is an awkward word and is seldom used in industry.

Classes of Relations

Relations can be classified by the types of modification anomalies to which they are vulnerable. In the 1970s relational theorists chipped away at these types. Someone would find an anomaly classify it, and think of a way to prevent it. Each time this happened, the criteria for designing relations improved. The classes of relations and techniques for preventing anomalies are called normal forms. Depending on its structure, a relation might be in first normal form, second normal form, or some other normal form. E. F. Codd, in his landmark 1970 paper, defined first, second, and third normal forms (INF, 2NF, 3NF). Later, Boyce-Codd normal form (BCNF) was postulated, and then fourth and fifth normal forms were defined. Thus, a relation that is in second normal form is also in first normal form. A relation in 5NF (fifth normal form) is also in 4NF, BCNF, 3NF, 2NF, and INF.  These normal forms were helpful, but they had a serious limitation. No theorist was able to guarantee that any of these forms would eliminate all anomalies. In fact, each form would eliminate only certain ones. This situation changed, however, in 1981 when R. Fagin defined a new normal form called domain/key normal form (DK/NF). Fagin's proof, however, greatly simplified the situation. If we can put a relation in DK/NF, then we are guaranteed it will have no anomalies, period. The trick is to know how to put relations in DK/NF.

Imran Zafar writes articles about computer basics and database management such as functions of database administration and free computer dictionary.

License: You have permission to republish this article in any format, even commercially, but you must keep all links intact. Attribution required.