Friday, July 4, 2008

A Brief introduction on Database Design

The words used in database design may be puzzling, at times: That is why I give this database glossary for you: If you don't discover the term you are uncertain about here, use my Contact page for a message, and I will include the term here.

This is a "live" glossary: Whenever there exists relevant information about a term on some other page around here, the term appears as a direct link to that page. Some of the pages may be articles, short or long, others may lead you to a free eBook, and others again may offer you a payable eBook, if the term is a complex and/or lengthy one.

If I do not have a relevant page for a given term, you can blame it on me: Not writing fast enough. I will, however, put the term in bold and give a short explanation.

Alternate (candidate) Key

A column or combination of columns, other than the primary key column(s), which may be used to uniquely identify a row in a table. The primary key is chosen from one of the candidate keys. As you can read from the article behind this term; if you have many candidate keys, you have a problem of selecting the best fit.

Attribute

An attribute is a part of the description of the entity. The entity itself is described by one or more attributes; together, they describe all things of importance about the entity. Example: Typhical attributes for a customer would be name, address, telephone, etc.

Business Rules

Specific business-related information that is associated with database objects. The information can be business restrictions (allowable values), facts, or calculation rules for given business situations, e.g VAT shall be added to all products. Business rules should be applied in the completed database, either as triggers/stored procedures, or (absolutely worst case) implemented in the application code. However, if you do that, you will make your business dependent on a specific application...

Candidate keys

Each entity should have at least one candidate (that is: unique) key defined. This is in order to being able to identify each row in the table at hand. Read the above link about candidate keys: It will teach you the importance of the issue.

Cardinality

The cardinality of an entity indicates the number of instances (zero or many) of an entity.

Many describe the cardinality through relationships: One-to-one, One-to-many, Many-to-many. In addition, each end of a relationship may be optional or mandatory. The cardinality term is not restricted to relationships, however: We also try to distinguish between low and high cardinality on attributes: an attribute for GENDER will have a low cardinality (2); Male or Female. This may inflict on how we design indexes in the database.

Another meaning of cardinality is to describe how many rows you have in a specific table.

Column

The physical equivalence to an attribute: In a database table, the table consists of one or more columns; together they describe all we want to know about each occurrence (record) in the table.

Data Definition Language (DDL)

The language used to define objects in a database: CREATE TABLE, CREATE INDEX, and so on. DDL provides the semantics for administering all the physical objects in your database. It is based on a given standard, but may deviate some from vendor to vendor. All of the objects created together, form the 'schema' of a given application.

Data Manipulation Language (DML)

The language used to manipulate objects in a relational database: We only have four statements in DML: INSERT, SELECT, UPDATE and DELETE. DML provides the semantics for manipulating one or more occurrences (record) in a table in our database. It is based on a given standard, but may deviate some from vendor to vendor, especially through ‘extensions’ found valuable by the given vendor.

Data model

A data model is not the model (conceptual/logical/physical) of your future or existing database: It is a generic model which you base your analysis and design upon. The relational model is one such data model. To my knowledge, there are no other data models that reflect the reality as we have to deal with it. I can not provide you with an OO-oriented data model, or an XML data model; such complete models do not exist, although some vendors are trying to make you think otherwise.

Database model

A Database model is the logical conversion of an Entity Relationship model. While an ER model reflects the business case, the Database model visualizes the logical model, which in turn is the basis for the physical implementation in the form of database tables, indexes and other mechanisms in the database, necessary to construct the database in question.

Database Normalization

Database Normalization is the process of analyzing your database model to ensure that information is stored only in one place in the database, and that there are no derivates of the information within the database. A perhaps more understandable explanation of the 5 Normal Forms can be found in my eBook on Database Normalization

Data type

Identifies the kind of information that an Attribute/column in an entity/table on a specific database platform represents. These are actual physical representations and are dependent on the actual RDBMS. The data type should be given by the domain definition for that attribute/column.

Domain

A standardized definition which applies to many attributes/columns.

For example, the domain MONEY may be specified as NUMBER, 15 digits long, with 2 decimals, not allowed to have a value of 0.00, and so on. Applying domain definitions to every attribute/column, eases the implementation of changes, and assures that the same kind of information is treated equally throughout the system.

Entity

Any kind of information of importance to the business: Customers, Orders, Products, or whatever information we need in order to perform a certain task.Used in the building of the conseptial (business) model

Foreign Key

Column or combination of columns in a table, whose values are related to a primary key in another table.

Index

An index is a physical mechanism applied to one (or a combination of) column(s). The purpose of the index is for the database system to use the index as a look-up mechanism instead of reading the whole row. Indexes are a prime resource for optimalization (and thereby increasing speed) of searches in the database.

Join Relationship

A join relationship is a collection of information from two or more tables. The join is performed by relating columns which are foreign key columns in one table with equivalent columns which are primary key columns in the other table.

Metadata

'Data about Data'. This is the documentation stored in the database repository, and which holds information about your database objects. In Oracle, for example, the table USER_TABLES holds vital information about your tables.

NULL values

The concept of NULL, which means "unknown value", (not the digit zero) is not a part of the relational model, but has been introduced by the different vendors. It can create unexpected problems when accessing the database (read the linked article).

Occurrence

Each row in a database table. If you have 100 customers in your CUSTOMERS table, the table has 100 occurrences.

Open DataBase Connectivity (ODBC)

A general interface for communication with different vendor-specific Relational Database Systems.

Primary Key

A column (or combination of columns) whose value(s) uniquely identify a row in a table. This has been a central consept in Relational Theory, and crucial to both identification and performance. Lately, the concept has been questioned, saying that a table should have one or more candidate keys only; however, for all practical reasons, a table should never be created before its (unique) primary key has been determined.

Ref cursor

A ref cursor is a pointing feature for use in a procedural language. The use of a ref cursor, explicit, gives you the power to navigate a result set from a SELECT statement one record at a time. It is a basic concept in Oracle, but can also be used from other procedural languages such as Java.

Referential Integrity

Referential integrity deals with the governing of data consistency. We mostly think of it as keeping the relations between tables valid; that is, an order may not have a customer id that does not exist; a transaction can not be posted for an illegal (non-existent) account.

Relationship

A relationship is an association between two entities. For example, the relationship between the ORDERS and CUSTOMERS tables will normally be that an order is placed by ONE AND ONLY ONE customer, while a given customer MAY HAVE placed ONE OR MORE orders. This will create a link between orders and customers, and thereby generating a business rule inside the database to enforce that relationship.

Structured Query Language (SQL)

SQL is the industry standard language for designing and communicating with relational databases.

Stored Function Link BUilding Company

A stored function is SQL (and procedural code, in most cases), placed in the database itself. It masks the business logic from the programmer. In addition, stored functions return only one value to the calling program, and can be used as part of DML statements and within calculations and conditional statements as a parameter.

Stored Procedure

A stored procedure is SQL (and procedural code, in most cases), placed in the database itself. It masks the business logic from the programmer. In addition, stored procedures represent a powerful tool to let all programmers have a generic interface to different access mechanisms to each table in the database.

Table

The physical implementation of an entity. This is where the actual data is stored as customers, orders, products, or whatever. Each table consists of one or more columns.

Trigger

A trigger is a stored procedure assigned to a given table. It ‘fires’ whenever you do an operation on that table (BEFORE/AFTER INSERT/UPDATE/DELETE etc.) Triggers are powerful, performance-enhancing mechanisms in the database.

Unique Identifier

The equivalent of a primary key. It may also be a candidate for creating unique indexes in addition to the primary key.

View

An imaginary table: A view may be constructed to give the user/programmer access to a limited resultset from one or more tables. It is often used for security reasons; restricting access through views.
However; it may also be a signal of insufficient design: Making a view to solve a problem with a query which is difficult to do in a single SELECT.