Saturday, August 18, 2007

Normalization Where to Stop?

One of the most important thing we learn in our DBMS subject is Normalization. Whether you are a data modeler, DBA, or SQL developer, normalization is one of those topics we all learn. We learn this either at work or during our formal IT degree. We are taught that there are 5 different normal forms and what goes in where etc etc. But how much do we use them?

Take a look at most production databases. The best you will find that the database has been implemented using Third normal form (3NF). Very few databases reflect higher normal forms, such as Boyce-Codd normal form (BCNF), the Fourth normal form (4NF), and the Fifth normal form (5NF). So, why don't most database designers go beyond the 3NF?

I am not going deep into the normalization and their definition. A google search will bring out all those definitions so I opted to keep the definition out of this post.

How far should you go with normalization?
To be sure, each progressive step may impact upon overall performance. I have seen normalization taken to absurd lengths. In one of the recent discussion one person came out with the idea of different financial document types. As though the world of accounting is going to change. I had to remind him that Accounting is just the recording of historical events :)

A while ago I was reading an article and there the author mentioned about the normalization. Over a period of time I learnt to ask few questions before I decide how much normalization is required.
  1. What is the nature of the system. Is it an OLTP or OLAP system?
  2. What is the nature of DB Query. Are they mostly Insert or Retrieve?
  3. For Part of DB where the inserts are more, its better to have the Data in 3rd normal form.
  4. For system where Retrieve operation is more than Inserts, 2nd Normal form is the best.
Where you draw the line in the sand is ultimately up to you, but you will be better equipped to draw it with a sound understanding of the various normal forms and the risks of not going far enough. And not to forget the business requirement, after all its "Not Your Software" its the users who will be using it not the developers who write it.

Why most designers don't go beyond the 3NF?
There are few factors which affect the level of normalization we choose for the database. Most of my decision to design the database in past were based on the answers I got back after asking these questions:
  1. What is the insert/retrieve ratio?
  2. What is the database used for Transaction Recording/Processing or Decision making?
  3. What is the response time we are looking for in case of Insert, Update, Delete and Retrieve?
  4. What is the estimated peak transaction load on the database also the off-peak hour transaction load.
  5. What is the database deployment strategy Centralized or Distributed?
  6. What is the transaction control strategy. Whether it is Single-Phase commit or Multi-Phase commit?
  7. Is there a temporary Cache implemented or required?
  8. Do we need to maintain the user session or user interaction with business layer is stateless?
Asking these questions does give an overview of how the system will look and to suit the need bes the appropriate strategy can be formed.

Sometime in coming days I will write a post on how to optimize database and what are the common mistakes database designer make. Keep watching this space for the post.

Until Next Time... :)

Thursday, August 16, 2007

Mandatory and Essential

Last night when I logged onto yahoo messenger, I saw this message as one of my friends status message "What is the difference between Mandatory and Essential". At first I thought what's big deal? Are they different? Are they not same? At one point I went to an extreme thinking that his Phd course has taken its toll and he is about to get into second phase of PhD degree where one becomes grumpy, tired and insane :) But the more I started to think on this the more it became evident that Mandatory and Essential are NOT the same thing. They are two very-very different words when it comes to their real meaning.

I remember doing a post earlier this year on Atomicity and Knowledge Representation, I discussed about Mandatory, Additional and Optional Attributes of an object. While I am not going to go deep into philosophical aspect of it, I would rather stay in context of Software Engineering and Semantic Web.

Mandatory:
Mandatory Attributes signifies the bare minimum required for existence. It is the basic definition of any thing(object) in this world. In reference to my earlier post on Atomicity it is the Mandatory Attributes of an object. The bare minimum requirement for any object which exists in any space is Type and a unique attribute (most of the time it is Name) which distinguishes the object from others of its kind in the same space or for any other space where the object is likely to be found.

Essential:
Essential attributes of an entity (object) is the must have attributes for an object in any context or where it has to be used or play an important role. More often than not it is also the mandatory attribute of the object. Referring back to the post I did earlier this year, I see Essential Attributes are nothing else but the Additional Attributes of an object which is required for an object to participate in certain activity or to be used in a context.

In context of Knowledge Representation we can always rely on Mandatory Attributes to be present where Essential Attributes are (due to their contextual nature) guaranteed to be available only if we are considering object in a given context.

Would love to hear your thoughts on this as what you think the Mandatory and Additional attributes are. Whether you see them being same or different, if yes then why?

Until Next Time... :)

Tuesday, August 07, 2007

What is beyond Web 3.0

Recently there have been talks about what the Web 3.0 should be and what it will do etc. Go to facebook, myspace, orkut or any other site where techies come together and most of them you will find talking about Web 2.0 or Web 3.0. But among them there was one gentleman who said that Web 3.0 is going to be about Relationship Economics. It is the user who will drive the next generation of advancement. Lets see

Web 1.0 and 2.0 has been predominantly technology oriented. Starting from a command line browser to pull texts of a server by passing the URL to building websites to facilitate social interaction the journey is been quite long (almost 17 yrs) and in this journey we discovered many technologies and we left many of them. Some of them we still keep with us even after years and some we left along the way.

Web 3.0 is about deep and meaningful relationship, effective networking. If Web 2.0 is about breadth i.e. adding more and more feature and technology advancement, Web 3.0 is about depth i.e. creating an understanding as why and how we can leverage upon them.

Most significantly, Web 3.0 is the turning point where the 100 year cycle of transaction economics is superseded by the next great cycle, usually described as relationship economics. Web 3.0 transcends and includes both 1.0 and 2.0 to provide the platform that provides lift off for this great shift in economic, organisational, social and cultural structures. Web 3.0 is likely to be born out of a major crisis affecting current structures. If you are part of Web 3.0 you will be carried by a great wave. If you're not, you may find it difficult to survive.

But then then thought came to me what is going to be after Web 3.0? Well how about Web 4.0? Seems like a light bulb glowing :) but yes what advancement we will seek in Web 4.0. The way I see is Web 4.0 will go one level further and take the man-machine relationship to the next level. The machines will be able to understand human language and the there will be a change in our behaviors towards machines and vice-versa.

If we call Web 3.0 a paradigm shift then Web 4.0 is going to be the Dimension Shift. I would love to hear your thoughts on this.

Until Next Time... :)

Friday, August 03, 2007

WWW - Past, Present and Future

Lately there have been lots of talks going around about World Wide Web. Some people are giving their best shot to describe what the web is today and where it is going in future.

When I started to look at where the web started back in early 1990s and where it is today and how it evolved. I found the whole phenomenon quite interesting and the trend with which it is developing is quite interesting too. In this article its my attempt to capture the Past, Present and Future of World Wide Web. The special emphasis is put on the future trends of Web and how I see it evolving.


Past
It was back in 1991 when Sir Tim Berner Lee along with his colleague Robert Cailliau and Necola Pellow developed the first web browser. This was a command line interface which could read the content from a given location (Servers). But that was just the beginning of a whole new era for the humanity. These browsers were capable of pulling out content (text only) from a server.

It wasn't until 1993 when Marc Andreessen and Eric Bina invented the "" tag for HTML which revolutionized the way web pages look and allowed developers and artist to unleash the creativity. In the same year MIT developed the technology to index and count the web servers which took the architecture of Web to next level.


Present
Today when we talk about Web we talk about many things like Dynamic Web Sites, Blogs, Animations, Social Networking etc. In a nutshell we have everything we need to suffice our requirements. If we have to summarize the Web today we can describe following characteristics for the Web today.

Portability: Today we are not just limited to browse web through our desktops. Instead we access web from our mobile devices, hand-held devices, TV, Digital Camera. Also the choice for input device is not limited to keyboard and mouse. We also use Stylus, voice recognition etc to interact with web.

Diversity: When first started web pages were available in only English. But today we have many languages in which there are web pages available. Google Search Engine is available in 100+ languages and its increasing. Though the percentage of pages in English is still more compared to other languages but, other language web pages are increasing their share of pages on Web.

Distributed Technologies: Web is becoming distributed now. We have technologies like Web Services which allow the user to share the applications over Internet (not Web). We can not build a new application by using services which is already out there available.

Collaborative: In today's web if we have to name one technology which is been evolutionary it is collaboration. Wikipedia is the largest online encyclopedia available today and contains information about almost everything. The wiki is the best example of what collaboration can produce.

Social Networking: Today we have websites which facilitate social networking and allow users to interact with other users of the site. MySpace, Orkut, Facebook are few in this list who have got a larger share of this market. There are other sites like LinkedIn where one can create their professional network. So in a nutshell the web is not just publish and read.

Consumer and Producer: Today the user is not just the consumer of the information published on the web. The user is also the publisher and consumer at the same time. A web user can publish his thoughts / ideas etc in form of a blog which may include some of the texts which is been published at some other site or contains a link to an external site.


Future
With so many technology choices available today, one might as well start to think what new we need on web? We have the tools and technologies which are pretty much sufficient for our requirement. The current buzzword in Industry is Web 2.0 (which is the technologies available today) and Web 3.0.

The evolution of Web 2.0 will lead us into Web 3.0. It has enabled the users to start thinking in a new paradigm (which is really an old paradigm) collaboration. This was the original idea why World Wide Web was created in the first place. Web 3.0 (aka The Semantic Web) - is really just an extension of the original collaborative concepts which was the main driving factor for the World Wide Web. But it makes the data connected in a sense that is more relevant to the user i.e. based on Contextual.

With the advent of Web 2.0 we have seen the rise of user generated Taxonomies, Folksonomies, and other related Information Architectural Methodologies. However they are still limited as in 2.0. The classic example is the taxonomy of one source does not equal to the taxonomy of another source. Though with the growing usage of open API's we are starting to see the taxonomies shared on a collaborative level between systems. In the Web 3.0 conceptualization we will see the data is shared not through API's, but through the structure of the data, i.e. through the meaning and context of the data.

In the Semantic Web the data (entities) will be linked to each other at a deeper level which is most likely to be in their raw form. The linking among the data is not going to be pre-defined rather it will be based on the few aspects like Context (in which they exist), the meaning, any other similar behavior they might exhibit. According to few I had discussion with in past they feel that there will be some kinds of standard emerging to link the data. But as we have witnessed in past the standards are not always the best approach either. More often than not those standards are heavily influenced by vendors and their proprietary technologies.

But then the question arises is how will this work out? and how will be able to achieve the Semantic Web? Whether we ever be able to achieve it or not. One of my friend (Nathan McCosker ) with whom I regularly discuss on Semantic Web thinks that it is going to be an iterative and evolving process. Initially there will be few groups coming together to form a standard what we may like to call Web 3.0 but then there will be further corrections and the standard will evolve and we may as well end up versioning them as 3.1, 3.2, 3.x etc.

But this is not where it ends. Current research works have no (or very less) emphasis on what the user experience is going to be with Semantic Web. The research is useful and accepted by masses only when it is done for the masses. The research labs around the world have started to throw different technologies like RDF, RDFS, OWL, RUL, RVL etc in the basket but they have not touched the nerve i.e. the problem they are trying to address or the pain they are trying to relieve. The technology is not the bottleneck here. Today or tomorrow the experts will find the technology solution to make the things happen, but the biggest hurdle is why? As in "The Matrix" Movie they always talked about the reason/purpose for anything to happen we need to apply the similar logic here and find out why we need the Semantic Web and when (not if) that is defined it will not be a theoretical discussion at all.


Suggested Reading

Those who are interested in knowing more about Web 3.0 and beyond would like to read through the following links.

http://www.iht.com/articles/2006/05/23/business/web.php
http://www.pcmag.com/article2/0,1895,2102852,00.asp
http://www.androidtech.com/knowledge-blog/2006/11/web-30-you-aint-seen-nothing-yet.html
http://www.roughtype.com/archives/2006/11/welcome_web_30.php
http://www.alistapart.com/articles/web3point0

Conclusion
Semantic Web is becoming next buzzword in the industry and without knowing what it is in detail many people have started to claim their product being suitable for Semantic Web etc. So far it is been purely theoretical in approach and those who are claiming that their product is based on Semantic Web technologies are still on the way. They are not there yet.

Until Next Time.. :).