Take a look at most production databases. The best you will find that the database has been implemented using Third normal form (3NF). Very few databases reflect higher normal forms, such as Boyce-Codd normal form (BCNF), the Fourth normal form (4NF), and the Fifth normal form (5NF). So, why don't most database designers go beyond the 3NF?
I am not going deep into the normalization and their definition. A google search will bring out all those definitions so I opted to keep the definition out of this post.
How far should you go with normalization?
To be sure, each progressive step may impact upon overall performance. I have seen normalization taken to absurd lengths. In one of the recent discussion one person came out with the idea of different financial document types. As though the world of accounting is going to change. I had to remind him that Accounting is just the recording of historical events :)
A while ago I was reading an article and there the author mentioned about the normalization. Over a period of time I learnt to ask few questions before I decide how much normalization is required.
- What is the nature of the system. Is it an OLTP or OLAP system?
- What is the nature of DB Query. Are they mostly Insert or Retrieve?
- For Part of DB where the inserts are more, its better to have the Data in 3rd normal form.
- For system where Retrieve operation is more than Inserts, 2nd Normal form is the best.
Why most designers don't go beyond the 3NF?
There are few factors which affect the level of normalization we choose for the database. Most of my decision to design the database in past were based on the answers I got back after asking these questions:
- What is the insert/retrieve ratio?
- What is the database used for Transaction Recording/Processing or Decision making?
- What is the response time we are looking for in case of Insert, Update, Delete and Retrieve?
- What is the estimated peak transaction load on the database also the off-peak hour transaction load.
- What is the database deployment strategy Centralized or Distributed?
- What is the transaction control strategy. Whether it is Single-Phase commit or Multi-Phase commit?
- Is there a temporary Cache implemented or required?
- Do we need to maintain the user session or user interaction with business layer is stateless?
Sometime in coming days I will write a post on how to optimize database and what are the common mistakes database designer make. Keep watching this space for the post.
Until Next Time... :)