Thursday, May 22, 2008

Reasoning and Inferencing

Whenever I read about a journal or article on Artificial Intelligence, it sound like a science fiction to me. Okay okay I may not be up-to-date on my knowledge about what is happening in AI field. But more often than not what I find is most of the scenarios described in those fiction (?) are related to inference.

I read somewhere long back that Inference is the act of attaining a conclusion based on certain facts already present in the system. What are facts? In my opinion it is do to with the statements presented before us. But does the computers understand the statements as we do? I guess not. Then in relation to computer system the facts are the object, and their attributes. So we have few objects and their state information (as attributes) and we need to deduce a conclusion from that. How do we do that?

In order to combine these facts the system needs to have certain ability (A set of rules which will let us combine these facts together and infer something). This ability could not be anything else but Reasoning. By Reasoning we mean semantic relationship here. As we discussed in earlier posts that we need to have proper annotation in order to do Semantic Search and establish semantic relationship among entities in the system.

I guess I am getting more and more philosophical on this topic. I remember in our childhood we used to have a phrase. More Study More confusion, Less study less confusion, No Study NO Confusion :).

I would love to hear from readers about their opinion on relationship between Inference and Reasoning.

Until Next Time... :)

Wednesday, May 14, 2008

Can we beat Google for Web Search?

Sounds like the most difficult question we faced so far? In Today's world we are relied on google so much that we are not in a position to think that we can have a day at work without using google search and also that there is a better search engine than google. Google has created so much hype around and has made us dependent on itself that we benchmark every new search engine against google. Including those who existed before google like yahoo, altavista etc.

But while google is becoming powerful with each new application it releases, and upgradation it does to its search engine there are still few important points which is missing in google search engine which happens to be the core of google.

Most of us who are interested in Search Engine and how it works have read the paper published by Page and Brin on original google search engine architecture and also the initial version of page ranking algorithm. But over last few years they have believed to change the original page rank algorithm. There are few problems with this search engine.
  1. The page rank considers based on Words and the documents.
  2. The google search is based on current web. Whereas the web is growing and evolving with every passing minute. The paradigm of World Wide Web is Persistent Publish and Read. Which holds good to an extent but the web we are looking at today is evolving. We are not in the era of one publisher and many readers but today we have more content producers than readers on web.
  3. The page ranking algorithm uses the index table and the crawler (software) traverses through the links available on page to navigate to next page and so on. The philosophy what google and many other search engines have adopted is to represent the pages as set of nodes (or documents) connected to each other by a static link (HREF). They see it as some sort of tree structure. Whereas the web is not exactly like that. There are pages that do not have any link at all no incoming and outgoing link. Such pages are left behind by google search. An example is my poetry page which is very much hidden from the google search. Though its been on web for almost 3 yrs now. The google crawler managed to reach the main page of my homepage but could not get to the poetry page as there is no link to poetry page from the main page.
  4. We do not maintain a registry which is based on relevance for the web pages outside the page. The google search engine uses the keywords found in the page while indexing. But there are chances that a page which is relavant might not contain the keyword at all.
  5. Though Google is planning to use Latent Semantic Indexing for its next upgrade for page ranking, the accuracy of result is still doubtful.
This was about the problem, but then what is required to beat the google search engine? As discussed in my previous post on similar topic. I stressed the need for a Semantic Search for the web. The semantic search is missing in google search engine and unless that is made available the google search (and for that matter all other search engine) will still give us the irrelavant results (in abundance) when we query them.

Until Next Time... :)

Friday, May 09, 2008

Why Semantic Search?

In my previous post I discussed search engines in general and also how do they build the index table which is the core of any search engine. One thing which became very clear after these studies that the search engines available today are very limited when it comes to functionality. The keyword search does not leave much room for returning the relevant result. In google if we enter Paris Hilton as a search keyword we also get Hilton Hotel in Paris returned as search result that too on the first page. But the search engine there is not able to distinguish that we are not looking Hilton in Paris but Paris Hilton celebrity. On the other hand if we enter Hiton Paris we also get Paris Hilton in our search result. One way or the other the search result we get is not relevant to what we are looking for.

Last night I was reading about Latent Semantic Indexing (LSI) and that did give some hope. I found this page at SEOBook explaining it in a much simpler way about LSI. There are other references as well but this is one page which other than Wikipedia that explains it in a layman's term.

But the million dollar question we are faced with is whether LSI is going to take away the pain of going through irrelevant search results when we query the search engine. In my opinion that is still not very clear. As the algorithm of LSI is still based on the keywords found in the document. And that is not going to take the pain away unless we use the semantic search. But then why semantic search?

Semantic search as most of us know is based on the meanings conveyed by the objects. The term meaning has more depth than it appears from surface. The semantic search is not new, its been there for centuries now. In ancient times philosophers have given the mantra to the world as how to perform the semantic search. Its just that only a handful of people (technologists) today take the pain to read through those literatures. What the current search engines fail today is to restrict the result to what the user wants. We are allowed to input only a bunch of keywords.

In case of Semantic Search the driving factor is context as different terms (or concepts as John F. Sowa describes it) have different meaning or interpretation depending on where they are used. If we build a search engine around these philosophies then we can definitely achieve semantic search (upto a great extent).

Until Next Time... :)