Today's New York Times has an article entitled "Guessing the Online Customer’s Next Want," the basic
point of which is that it giving customers good recommendations is hard.
The article, which is a nice general audience discussion of how Amazon,
Netflix, et al do their recommendations, what some people are doing to
improve it, and why it's hard to get real improvement.
The basic technique that everyone uses is collaborative filtering, and it was
invented in the early '90s by my friend and business partner Jeremy Bornstein (he holds the original patent),
among others. Collaborative filtering essentially takes a pile of data that
a person has generated, compares it to piles of data other people have
generated, and looks for similarities and differences. When a person has
a highly similar data pile to another person, or better yet, to a cluster of
other people who have similar data piles, one can infer that the areas of
dissimilarity are potential grounds for becoming more similar - i.e. all these
people who seem to share your movie/music/book/pet/whatever preferences have
bought this thing, but you don't have it yet: maybe you want it too. It works
pretty well, with pretty well being a relative thing. Boosting
sales by even a few percentage points is well worth it for most internet
retailers.
Despite collaborative filtering's being pretty good, there's lots and lots of
room for improvement. And there has been since the early '90s. The
basic thing is that, while the technique is fundamentally sound, people have
been using the same technique for 15+ years. Every year there are a few
startups that have a better recommendation engine, and the major in-house ones
get better and better, but these improvements are only little increments.
This is mainly because they come from using different data sets, more and
bigger data sets, and tweaking well-known algorithms, rather than doing
anything fundamentally new.
The Times article didn’t discuss a few things are happening now that will make
recommendations a whole lot better soon. While collaborative filtering won’t go away, it will be used in
conjunction with other techniques and the quality of customer recommendations
will get way better.
- Social network data portability will soon enable very robust analysis of who someone’s explicit “friends” are, and one would expect that friends are more likely predictors of purchase intent than strangers who are similar in certain dimensions.
- Link analysis techniques being applied in consumer marketing applications will make it a lot easier to know who a person’s real friends are, and to implicitly identify them even if they’re not in the expressed social graph.
- Better metadata from microformat-using product and content pages will provide more, and more robust grist for the mill.
- Consumer interest profiles are getting better, fast: deep packet inspection and clickstream –based profiles generated by the likes of Phorm, Adzilla, NebuAd, and Loomia are now adding to the hopper, and next generation services that do information extraction on this sort of data will create much more meaningful, actionable metadata.