Check out the videos:
Even though I'm ragging on all the iPhone / Android location -based games so far, I think they're harbingers of good stuff to come and I'm glad to see them coming out.
Thanks TechCrunch.
Posted at 04:06 PM in Games, MMO, Social Media, Web/Tech | Permalink | Comments (0) | TrackBack (0)
Great post over on Jeff Jonas' blog. His point is basically that search is increasingly going to be about context, and the next competitive frontier will be in capturing context to make search smarter and more effective (gross oversimplification) check it out here.
Posted at 05:18 PM in Data Mining, KDD, Semantic Web, Web/Tech | Permalink | Comments (0) | TrackBack (0)
Jeremy Liew has a worth-reading post over on Lightspeed's blog, with a rather long winded reply from me.
Posted at 11:37 AM in Advertising, Data Mining, Marketing, Targeting, Web Marketing, Web/Tech, Weblogs | Permalink | Comments (0) | TrackBack (0)
There's a post on TechCrunch today on Peer39. Worth reading.
Peer39 is a semantic analysis-enabled ad network. The company's approach to information retrieval, their market focus, and their understanding the limits of their tech are what makes this company the first viable semantic web company. While the company does all the usual "natural language processing" heuristic stuff which has come to be synonymous with Web 3.0 / "The Semantic Web," they also do what appears to be collaborative filtering and machine learning. In other words, they are at least partly making up for the shortcomings of heuristic approaches to information extraction with statistical analysis.
As I've remarked before, it is impossible, given the current (and reasonably foreseeable) state of computer science for IE to work well enough to bring about the vision of the Semantic Web in the oft-cited travel agent example. You can do NLP query parsing, define microformats, come up with better and better ontologies, and so forth all you like, and you will never solve the problem of incompletely, inconsistently, and poorly tagged source data. Machines are too stupid and people are too lazy for all that data to ever get tagged right. These things will not change in our lifetimes.
What makes Peer39 a sensible company is that they understand this and their goal is not to create a domain non-specific, highly accurate, robust information extraction service that enables the Semantic Web. They just want to analyze content somewhat less inaccurately in order to enable ads to be served that will get a somewhat better clickthrough rate. Improving CTRs is highly measurable and gets you paid; online ad serving is one area where having a better mousetrap really will get the world beating a path to your door.
My guess as to why this company is doing it right is because the founders and key technical leaders come out of online advertising and intelligence services. The ad people know where the pain points are and what level of "better" is enough to get market traction; the ex-spies know the limits of semantic tech and information extraction because intelligence services have been using that tech in production longer than anyone - those guys know what level of "better" is truly achievable, and how. This team contrasts with most semantic web startups which are long on "visionaries" and researchers, and short on people who have had to use this tech with money (or lives, or national security) on the line.
This will be an interesting company to watch.
Posted at 01:25 PM in Advertising, Data Mining, KDD, Marketing, Semantic Web, Web Marketing, Web/Tech | Permalink | Comments (0) | TrackBack (0)
TechCrunch notes the private beta launch of Evri, a service that sounds like it creates a semantically enhanced version of a web index, and helps users find topically related information. Sounds cool.
Topically related information navigation is a great way to find stuff, speaking intuitively. However, making it work and actually useful is extremely hard. To make it work means you need good metadata that describes those concepts. Good metadata means either humans have to enter it rigorously, comprehensively, and consistently, or machines have to interpret unstructured text highly reliably. Neither of these things have ever happened in the history of the world, except in small datasets in very narrow knowledge domains. Doing this at web scale has been a holy grail of IR.
Evri's screenshots look great. Pretty, and an intuitively obvious navigation scheme. There are companies that do a pretty good job at guided navigation already (e.g. Endeca). For the most part, they wisely concentrate on doing the topical browsing using well structured data (e.g. shopping sites, intelligence datasets). Still, Evri's UI looks like a step forward. However, for Evri to be live up to its promise it must do a whole lot more than put a pretty front-end on current state of the art (i.e. so crappy it isn't worthwhile) semantically enhanced web index. Evri really needs to have a general solution to do information extraction at web scale.
When I read the company's blog and see phrases like "natural language-derived grammatical data" and "it’s all about the UI" I start to think that maybe the company is falling down the natural language / semantic web rathole. It's not about the UI. Doing the UI is trivial compared to the problem of the creating / acquiring / wrangling the metadata. "Natural language" and "grammatical" are codewords for heuristics, and given the current state of computing and human knowledge, heuristics cannot produce general case useful results at web scale, relatively speaking. That "relatively speaking" qualifier is an important one. The "relatively" is relative to statistical analysis and full-text indexing. Properly done, this approach actually does produce "related concepts" search results. Related concepts will tend to cluster when one performs data reduction on the index entries' vectors. If one chooses well (and the choice can reasonably be left to machines in web-scale search engines, particularly learning algorithms), collapsing vector dimensions does produce meaningful clusters of related concepts. When you enhance that data reduction with algorithms like PageRank, you actually get pretty good related-document retrieval. So, long winded way of saying it, but Google already does what Evri does. Just without making you use a tree-walking UI. Of course, if you really pine for the days of walking a tree in an rdbms, you can.
Anyway, I signed up for the beta, and hope to have my skepticism proven unfounded.
Update: I've been playing with the beta a little. It's nice. It's too limited to tell what's going on under the hood, but it's still early days so that can be overlooked. Using Evri is a different experience from using search, even search with tree-walking. People will need to get used to it, and the company will have to get very good at anticipating how users will want to navigate, but if they can, they might have a pretty cool service.
Posted at 11:07 AM in KDD, Semantic Web, Web/Tech | Permalink | Comments (0) | TrackBack (0)
Today's New York Times has an article entitled "Guessing the Online Customer’s Next Want," the basic
point of which is that it giving customers good recommendations is hard.
The article, which is a nice general audience discussion of how Amazon,
Netflix, et al do their recommendations, what some people are doing to
improve it, and why it's hard to get real improvement.
The basic technique that everyone uses is collaborative filtering, and it was
invented in the early '90s by my friend and business partner Jeremy Bornstein (he holds the original patent),
among others. Collaborative filtering essentially takes a pile of data that
a person has generated, compares it to piles of data other people have
generated, and looks for similarities and differences. When a person has
a highly similar data pile to another person, or better yet, to a cluster of
other people who have similar data piles, one can infer that the areas of
dissimilarity are potential grounds for becoming more similar - i.e. all these
people who seem to share your movie/music/book/pet/whatever preferences have
bought this thing, but you don't have it yet: maybe you want it too. It works
pretty well, with pretty well being a relative thing. Boosting
sales by even a few percentage points is well worth it for most internet
retailers.
Despite collaborative filtering's being pretty good, there's lots and lots of
room for improvement. And there has been since the early '90s. The
basic thing is that, while the technique is fundamentally sound, people have
been using the same technique for 15+ years. Every year there are a few
startups that have a better recommendation engine, and the major in-house ones
get better and better, but these improvements are only little increments.
This is mainly because they come from using different data sets, more and
bigger data sets, and tweaking well-known algorithms, rather than doing
anything fundamentally new.
The Times article didn’t discuss a few things are happening now that will make
recommendations a whole lot better soon. While collaborative filtering won’t go away, it will be used in
conjunction with other techniques and the quality of customer recommendations
will get way better.
Posted at 03:39 PM in Data Mining, KDD, Social Media, Web/Tech | Permalink | Comments (0) | TrackBack (0)