Wolfram Alpha is built on a curated, proprietary database created over several years by 100+ Wolfram employees. I don’t think that the details of this database have been released, but it’s clear that it isn’t something that was created by web-spidering, parsing, and organizing algorithms that search engines typically use. That’s too bad, because when I first read about Alpha, I was hoping that I’d be able to apply it my own content. Standard search engines are really bad at desktop and enterprise search, and I wanted to try something radically different.
The genius of Google’s page rank when searching the web is that:
- It derives relevancy from normal web behaviors
- It’s publicly known, so that we have a light-weight way to make our content findable
- There is a high reward for following the rules and penalties for abusing them
But, for my desktop content, it doesn’t work so well – specifically, there’s hardly any linking, and it’s not clear to me how much desktop search is taking advantage of relevancy cues (document names and headings, email subjects, etc). It may be, but it’s opaque to me. There is no shortage of articles on SEO advice for web pages, but creating findable documents isn’t as ubiquitous a topic. The usual advice is to add meta-data, but I want to just behave normally and have the engine derive the meta-data.
The same goes for enterprise content, but at least with my own documents, I have a chance of remembering something that will help me find them. With other people’s documents, it’s hard without a good taxonomy and diligent tagging.
Wolfram Alpha offers another way, potentially, but only if it can build the database automatically, or with help from content filters (not humans). Since it builds up a kind of understanding of the content and my query, it’s not dependent on keyword matching. For example, here are some queries that I think Alpha could handle if it had a model of my enterprise documents:
- “Is Pat on vacation next week”
- “Who was at the last budget meeting”
- “list of recruiters I have contacted this year”
- “average response time to forum questions”
- “how many blog posts did we write last quarter”
And that’s not even close to what I could do with my already structured data (like sales figures or budget data). These kinds of things are just not possible with current search technology and are usually solved by knowing where to find the information, manually collating it, or (mostly) by not bothering.
It looks like Wolfram Alpha is being used in part to drive Mathematica sales – access to data from the platform will be more powerful than just through the Alpha site. I don’t imagine that their back-end is ready for deployment for servers other than theirs, but I’m hoping that they’re considering it and thinking of standard API’s for content management systems to provide data to it.