Wednesday, December 02, 2009

Search : Past, Present and Future

In my previous post on semantic search I discussed the drawbacks of current searches and also mentioned that it takes average 3 google searches to get the desired result.

Recently I read a paper on evolution of search 3.0. This paper described how the search has evolved over a period of time. This is what author has to say in the paper:
"In the coming third decade of the Web, Web 3.0 (2009 - 2019), there will be another shift in the search paradigm. This is a shift to from the past to the present, and from the social to the personal, and from the generic to the precise."

In short the next generation of search will be returning results based on the information supplied by the user. This means user's data has to be available to the search engine or user will publish a personal information (virtual card) along with every request they submit. These details will be metadata driven and will be used by various search engines to filter the search result and tailor it to suit user requirements matching his expertise level.

What it means is each content that is published on web must publish the metadata that describes what the content is. The metadata must contain sufficient details about the content and must be in a form that it can be interpreted by search engines. But metadata is just one side of the story. The search algorithms must be modified to make use of this metadata and produce the results considering the (published) user information. While some searches will be locations independent, there will be few searches that need to be location sensitive and the results must be valid in current location of the user.

Until Next Time...

Wednesday, November 11, 2009

Representing Uncertainty

In one of my earlier blog post Is AI a Possibility I discussed the need of a 3rd state. For past few days I am thinking about scenario where just returning a Boolean value ie True or False is not good enough.

Normally a function evaluates to either "True" or "False" based on whether the attributes of the entities meet the conditions defined in the rule or not. But it may happen at times that the entity does not contain the attributes required by the rule to evaluate it properly. Then in that case we need to have a 3rd (Not Available) and 4th (Not Applicable) state as rule outcome. When a function returns Not Available then the it implies that the entity does not contain the attribute needed for the rule to execute or process the object. On the other hand Not Applicable means the rule does not apply to the type of the entity in context.

So in total we have 4 return values for the function:
- True
- False
- Not Applicable
- Not Available
A Boolean value (outcome) is not the possible solution here. So we need an alternate representation here for the function result. All functions cannot evaluate to True/False. For those rules (functions) that cannot be evaluated we need to find out the what was the state (Not Applicable or Not Available). When we apply multiple rules to the same object in a sequence (workflow) the outcome is a set of conclusions. But the conclusion must include which rule was evaluated and which ones could not be evaluated.

I am still puzzled as how to represent the two more state considering that internally everything is represented as either 1 or 0 and that does not leave room for representing uncertainty.

Until Next Time....

Monday, October 26, 2009

Representing Frequency

While reading about Tree of Porphyry (proposed by Ramon Lull in 1272) I learnt about the 10 questions that can be asked to any entities. But one thing that was missed out of this list is representing frequency. Suppose some process A takes place every 2 days. So we need to find a mechanism to represent the repetition and the frequency at which this occurs. Ramon Lull describes When as the question that can represent the date and time related attribute of the object.

What I propose is extending the 10 questions as listed in Tree of Porphyry and adding another question to the list How Often. The purpose of how often is to represent the frequency of a repetitive attribute of the object. It will have few sub attributes like a Value (How Much) and the Unit (What Kind). Together these will describe the nature of the repetition.

Until Next Time...!!!

Wednesday, October 21, 2009

Finding the Right Data Structure for Knowledge Representation

The most commonly used data structure today is the Row-based data structure where in one row represents details about an instance of an entity type. But in real-world the representation of an entity is not a flat structured. An object's representation contains several sub-attributes that those sub-attributes may have their own sub-attributes that make up the entire object (its attributes). But if we go by the row-based representation of the object we cannot represent the sub-attribute and their relation with the main object.

Consider an example where a Person has FirstName, Surname, Home Address (Street, Suburb, State, Post Code) and Work Address (Company Name, Street, Suburb, State, Post Code). If we use a row-based representation here then we find that our records look like this :

FirstName, Surname, HomeStreet, HomeState, HomePostCode, CompanyName, WorkStreet, WorkSuburb, WorkState, WorkPostCode.

The limitations here is unless we uniquely name the Street, PostCode, State attributes for both Home and Work Address we will not be able to distinguish their real meaning. On the other hand consider a Structure like this:
  - Name
    - First Name
    - SurName
  - Home Address
    - Street
    - Suburb
    - PostCode
    - State
  - Work Address
    - Street
    - Suburb
    - State
    - Post Code

By looking at this structure we can easily tell that the home Address is made up of 4 attributes and Work address is made up of 5 sub-attributes. These in-turn can have their sub attributes as well that will define them in more detail.

It is evident that the hierarchical data structure provides more flexibility and room to grow than the flat row-based structure for representing a real-world object.

Until Next Time...!!!

Tuesday, September 15, 2009

How do we achieve Artificial Intelligence?

Artificial Intelligence, I am sure many readers are familiar with this buzzword created by research groups around the world not so long ago. Where we started dreaming about many things a machine can do that we do in our day to day life and will in turn make our life simple and easy. But what happened to most of those projects, they are either shelved or have very limited usability in our day-to-day life. Though there are few outcomes that we did find useful.

When I read about the artificial intelligence and where it went wrong, I ask a question as what went wrong? Where did it all go wrong?

Lets define the intelligence. The intelligence is art of making best choices based on what we know (or rather don't know). But what determines whether we know something or not. It is our ability to recall something we learnt in past. Learning is associating facts to a context. Context define how the entities are being linked together. The linking does not have to be static.

So in a nutshell, in order to build a system that can:
  • Understand the context in which a particular fact is stated.
  • Retrieve the most appropriate rule that can be applied to the available facts i.e. show some sort of intelligent behavior.
  • The retrieve operation depends on how the raw data is structured.

In my opinion it all comes down to how the data is structured (represented) and the reasoning mechanism that works on the data.

Until Next Time..

Tuesday, August 04, 2009

Starting Point for Semantic Search

In my previous post on Semantic Search, I discussed about what is Semantic Search in general. One of the ideas that is revolving around is how to make the search efficient.

At an average it takes 3 google searches for someone to find what they are looking for. This is mainly because the google search engine has to scan their index table and it brings out all the documents that matches the keyword, of course ranked by the google page ranking algorithm also known as PigeonRank.

But if we keep the technology aside, then there are two possible ways we start searching for something.
  • When we know what we are looking for. This is the simple and straightforward case where we are very well aware of our needs and we often get result faster.
  • But there is this other situation when we don't know what we are looking for. We just have knowledge about few attributes, characteristics of the object we are searching.
A Semantic Search will have to operate taking both into account. The search engine will have to rank the results based on the criteria matching.

Until Next Time....!!!

Monday, July 13, 2009

Information and Knowledge

Does Information mean Knowledge or vice-versa? This is often the topic of discussion when I happen to talk about knowledge and its role in Semantic Web. More often than not we confuse information with knowledge.

So what is information? It is the smallest detail we have (fact in the system) about an entity. Today is Monday is an information, But Monday is first day of work week is not. In western countries first day of week is Monday but that is not true for middle-east, their week begins on Sunday.

So how do we define knowledge? Knowledge is the interpretation of information. In our previous example, Monday is the first day of the week or not depends in which country's context we are discussing Monday. We can define knowledge as Information related to a context. If not attached to a context, the information does not convey any meaning and that means the information is of no use hence cannot be classified as knowledge. So for an information to be classified as knowledge we must have the context attached to it.

Until Next Time...!!!

Wednesday, July 01, 2009

What is Semantic Search

If I ask this question in a group of 10 researchers, I will have 11 correct answers of this (including one of mine) for sure. But the question will remain unanswered as what exactly is "Semantic Search"?

During Semantic Technology Conference 2009, this topic was debated among search biggies as what exactly is next direction of search ? One thing that came out of the discussion is page ranking and keyword search is definitely NOT the way the searches are going to work in future. One of the most probable future for search engine is "It will be more like a conversation with the user".

Currently the search engine work in "Tell and show the result" mode. The user enters the keyword and the search engine dumps all possible matches (based on page rank, keyword matches etc) to the user. Most of the time the results do not make any relavance to the search intention at all. A conversation style of search seems to be more appropriate where the user refines their search criteria with continuous interaction with computer. This resembles more of how we search information in our day-to-day life. The search session will be more like a brainstorming where the user will feed more and more information about what they know about what they want. The system will then return a set of most relavant searches to the user. The user then will add more details of what they want and the steps will be repeated till the user finds what they want. At any point in time, they would also like to go back and start from scratch.

One of the positive step in this area is Semanti search engine. This search engine provides a list of suggested categories where a particular keyword is associated. This is just the starting point for semantic search. The semantic search engine should be able to let the user feed more (unlimited) details of what they know about what they are looking for. These details (provided by user) could be directly related to the result they are looking for or it might not describe the result at all. Knowledge reasoning is a major influencer in the search process. Reasoning is the core of the semantic search process and will determine how accurate the search engine is. While the search engine is designed for accuracy, the efficiency will have to be compromised at least for a while. One of the possible scenario these search engines will have to handle is percentage accuracy of result based on the search criteria. The more matching result should be displayed on top of the list.

So what is your idea of Semantic Search?

Until Next Time...!!!

Monday, June 29, 2009

Search Engines

Search is the next big thing in the World Wide Web and every big player is trying their best to capture the bigger market share of the search today. Recently I had a chance to learn about Microsoft's new Search Engine Bing and came to know that the product bing is a result of M$ buying a company that was working on Semantic Search.

With my limited interaction with Bing I did not find any WOW factor with it. The accuracy of result is bit better than google search result, but still it misses out quite a few search results (what google brings back) with high level of relevance. But it promises to be a good alternative for google search.

I also tried out Wolfram Alpha's Computational Knowledge Engine. In my opinion that is not a search engine, the Wolfram Alpha is more like a knowledge engine, that brings back facts when we submit a query. I tried to ask few questions related to Biotech and other science subjects but the result were quite disappointing, as sometime it showed me the share prices of the company and sometimes it did not bring anything at all.

The search engine technology is still far from being matured and it will be sometime before we see a fully matured search engine that can answer most of our queries. We need a true semantic search if we want to build a search engine that is helpful to the users. In the next post I will discuss more about how a true semantic search can be achieved.

Until Next Time....!!!

Saturday, March 14, 2009

Is AI a possibility

In one of my previous post I discussed whether the World is Ready for AI? Today I was reading this post with after a long time (more than 2 yrs) and asked myself this question whether we have made enough advancements in the comupting to make AI possible in near future. We have had faster computers, greater RAM and storage available on our laptops over last two years but we are still far from having the computing framework that can support AI.

The basic of computing is a bit that has two states 0 and 1. What it translates to is the computer always has a state of certainty ie whether it has something or it does not have. On a contrary our normal intelligence works on few more states. We operate on 3 states.
  1. We know that we know.
  2. We know that we don't know
  3. We don't know that we don't know.
This third state of ours is what makes us intelligent and gives us the power to reason given a situation. Human intelligence operates mostly out of this 3rd state of mind. The moment we know about anything that falls in 3rd stage it moves to either 1st or 2nd category.

Our brain operates in a 3-dimensional space and that's what provides us the flexibility to process similar data differently. But on the contrary computers operate in a linear space and that limits the processing capability of the computers. A simple example is for computer a glass of water is a glass of water no matter how many times we feed this data in, but for humans the first glass of water is life saver (if we are thirsty) but the same is not true with the 30th glass of water if it is drunk in succession. The 30th glass may become a burden to drink. So the same data is interpreted differently here in case of humans.

What we may require is to think about the fundamental aspect on which our computing is based at. The basis of computing is 0 and 1, but we may need to think about a state where the computer can be in May Be state ie somewhere in transition. Once we have this third bit discovered and our machines are based on that, we may be able to feed consciousness and that will lead to natural intelligence in computer.

Until Next Time.

Saturday, March 07, 2009

Object Structures and Descriptions

While reasoning an object we often fall into a trap of thinking about the attributes and methods of the object (in a typical Object Oriented way). But reasoning about an object goes beyond the attributes and methods. While reasoning an object we need to consider the following:
  1. Object fall into categories. eg. My car is a Hatchback. My Pet is doberman. etc But then we also have instances where an object is part of multiple categories like I am an Employee, Blogger and a Husband.
  2. Categories can be more general or more specific in nature eg. Physician and Surgeons are types of Doctors, A Father is a parent etc.
  3. In addition to generalization being common for categories with simple names, it is also natural for those with more complex description. A Contract employee is an employee. A family with at least one child is not childless etc.
  4. Object have parts and these parts have multiplicity of 1 or more. Books have Title, Humans have 2 arms, Cars have 4 wheels etc.
  5. The relationship among an object's parts is essential to its being considered a member of the category. A pile of book is not same as catalog of book.
These are few things we need consider while deriving a framework for knowledge representation. Then there are additional complexities added to it as if the same word is used as noun or pronoun. He is Helium or He refers to another person depends on the context where the word is used.

Until Next Time....

Saturday, February 28, 2009

Solving Problems in Semantic Web

It's been quite sometime since I am hearing about Semantic Web and what it can do. It's been the original idea of the World Wide Waste (WWW) since the day one. But somewhere down the line we seem to have side tracked from our original path. But the question everyone is asking is :
  1. In-spite of spending so much in terms of manpower and research funding why this problem is still a PROBLEM?
  2. Why we are drifting away from the actual problem i.e. the Semantic Web?
  3. How much time it will take before Semantic Web becomes a reality?
The answer is simple because nobody is trying to solve it. We have vendors whose sole purpose is to push and promote their tools and technologies. We have committees that work on the standards but somewhere they get side-tracked. Nobody is working towards the big picture where we can see all the pieces of puzzle fitting together. But everybody is working on small-small pieces and then calling it a step towards the Semantic Web. But that is not the approach that will work in this case. To achieve Semantic Web we need to :
  1. Identify what is needed to make it work. We are better of going away from current format of Web as trying to extract meaning out of Web (today) is like trying to extract water from stones.
  2. Bring a team of experts who then will sit together and get their heads aligned in one common direction.
  3. Build teams that work towards solution aligned with the top-level goal.
  4. Build the tools and technologies that is solely designed to build the Semantic Web.
  5. Last but not least Start Afresh.
I am sure if we take this approach we will have Semantic Web as a reality not just another science fiction.

Until Next Time....

Monday, February 16, 2009

Reasoning Mechanism and Project Halo

After a long silence I am back again to unleash the ideas through this blog. A while ago I had a look at the Questions of Project Halo and that made me thinking as how can I represent chemical elements and work out a general mechanism to represent elements, compounds and chemical reactions.

What I found that using the classification described in Tree of Porphyry it was easy to describe them all and also that made the whole scenario simple to explain. Basically the chemical reactions has 3 core parts. Chemical Elements (including molecules), Compounds and the Chemical Reactions. For a chemical reaction to proceed we need to either elements or compounds or a mixture of both. The result is again a compound, element etc.

In the future posts I will discuss the procedure I am following to tackle the elementary level problems in chemistry. I would love to hear from the readers if they have a problem scenario in mind they want to discuss and then we can work out how to solve this using the reasoning mechanism I am working on. The more complex the problem is, the better it will be for me work out an appropriate way to solve the same.

Until Next Time....