Saturday, December 27. 2008Search Twitter - but on whose authority?
Fascinating debate raging on Techmeme about whether Twitter should be able to search by number of "follows" (as a proxy for authority). At first glance its a "what's the issue here"?. After all, Google pretty much runs its page ranking off links, and Technorati uses links to a blog over a last 6 month period to rank its authority.
So why all the fuss, why so many coming out against it? Firstly, its worth looking at why its being propesed - Loic leMeur organised a major conference (Le Web), the reaction to which was, well, controversial - and he clearly wanted to search Twitter to find the individuals who were the most followed and try to neutralise them where news was bad. Problem was, sifting through 6,000 odd comments is non trivial (though I must say it can't be that hard to build a routine that would map identity of commentators in a particular stream by number of followers*.) He gives away the Web 2.0 game a bit when he says: “We’re not equal on Twitter, as we’re not equal on blogs and on the web” Ooops - the Kool Aid packet preaches that the Web 2.0 media is Democratic, not Feudal. Shame on you for giving the game away Loic The key arguments against this sort of search are fundamentally that: (i) People who have little knowledge on a topic, but high overall authority, will win out over those with real knowledge (an issue we pointed to in this article about the risks of User Generated Content drowning in its own noise) to the eventual detriment of the overall ecosystem Realistically, since Twitter is increasingly being seen as email 2.0, PR/Marketing people will write these sort of routines, its arrival is inevitable. And judging by the jostling that goes on every time a new status list on Twitter is shown, the number of players will be huge, despite pious declarations to the contrary. However, this should not be the only way to search people on any system. The endgame will be the ability to search across a wider number of user volunteered parameters (location, people you follow, as examples), not just followers. * And indeed, 24 hours later, there was one..... Friday, November 21. 2008What comes after Social Media Recommendation?
As well as being a comms system an directory rolled into one, Social Media has had another role - low cost search and selection of relevant content. However, as we and others noticed when comparing Last.fm with Pandora, a social network system 's point at which it clould improve no further was limited compared to the algorithmic approach. Well that was 2006 and that was audio, this is 2008 and the next thing that will need to go through the selection mill is video.
I've always been fascinated with Video selection, as it is a far more complex task than text and audio selection, but due to the sheer size of the video industry, solving it is a huge prize. In 2005, state of the art was a 5 movie trick, but the game was upped rapidly. And thus I've been reading (with fascination) this article in the NYT about the efforts Netflix has been making, initially to find selections from its own algorithms, and later trying to crowdsource the approach - there is a $1million prize for building a system that is a 10% better predictor of the movies you will like than the Netflix in-house system, Cinematch. The state of the art right now is based around singular value decomposition approaches: Singular value decomposition works by uncovering “factors” that Netflix customers like or don’t like. Say, for example, that “Sleepless in Seattle” has been rated by 200,000 Netflix users. In one sense, this is just a huge list of numbers — user No. 452 gave it two stars; No. 985 gave it five stars; and so on. But you could also think of those ratings as individual reactions to various aspects of the movie. “Sleepless in Seattle” is a “chick flick,” a comedy, a star vehicle for Tom Hanks; each customer is reacting to how much — or how little — he or she likes “chick flicks,” comedies and Tom Hanks. Singular value decomposition takes the mass of Netflix data — 17,770 movies, ratings by 480,189 users — and automatically sorts the films. The programmers do not actively tell the computer what to look for; they just run the algorithm until it groups together movies that share qualities with predictive value. As you would expect, the hard thing to do is to resolve extreme outliers: There’s some X factor in human judgment that the current bunch of algorithms isn’t capturing when it comes to movies like “Napoleon Dynamite.” [ A movie that people either love or hate, and unpredictably so] And the problem looms large. Bertoni is currently at 8.8 percent; he says that a small group of mainly independent movies represents more than half of the remaining errors in the way of winning the prize. Most teams suspect that continuing to tweak existing algorithms won’t be enough to get to 10 percent. They need another breakthrough — some way to digitally replicate the love/hate dynamic that governs hard-to-pigeonhole indie films. This is the first time I'd heard of this, (and the $1m prize Thursday, October 2. 2008Google builds Techmeme - why?
Google Blog Search has changed - it used to be a less-than-OK version of Technorati, its now trying to copy Techmeme / Memeorandum. Looking at the Tech section vs Techmeme, three observations (here's a copy of the output)
Google's New Blog Search
TechCrunch points out the nub of it: at this point it’s too early to tell if Google Blogsearch will be more useful than any of the other memetrackers (or if its even in the same league). Much of its utility will lie in how often the listings are updated, how many sources it pays attention to, and how it assesses a blog’s credibility - a memetracker is only as good as the stories it presents. - and as a number of people noted in the comments (and I found vs Technorati) Google seems to picks up the blog posts far later. So far, not so good then But the bigger issue for me is why is Google copying all these things (Wikipedia with Knol, Firefox with Chrome, MS / Open Office, etc - and here, initially Technorati and now Techmeme with Google Blogs) ? Clearly Google is looking at every service that has high traffic and thinking "we'll have some of that" as the benefit to them is they sell Ads to their own pages. (There are no Ads on Blog Search yet, but no doubt they will come). Not only that, it captures extra user data for a spot of mining. The issue is this potentially distorts competition as badly as anything Microsoft or IBM ever did in their heydays - the fear that Google will build an X the minute one succeeds in anything, and subsidise it via its other operations so the original player can't succeed, must be an issue. I wonder which Googlestraw will be the last one before the anti-trust camel is backed? Afterpost - Fred Wilson I think articulates a key point, in that aggregation is not good enough, you need filtering and insight: But it's like a lot of Google's services. All algorithm and no "voice". It may attract a mainstream audience the way Google News has and that's fine. But for me, it's not close to the value that I get from aggregators with an angle. It's like a mainstream newspaper versus a blog. On one you get the news and on the other you get insight. I also saw this related post, that the algorithms are already being spammed by pay for post blogging. Tuesday, September 16. 2008What if Google gets date data wrong?
Last week we noted that Google's role in the United airline debacle was not as blame free as the tried to imply.
(To remind you - a newspaper story from 2002 was picked up by Google as 2008, because (allegedly) it wasn't properly dated and thus (allegedly) Google redated it. The story was about United going into Ch 11 in 2002. A Bloomberg analyst picked it up, put it on the wires as 2008, and United shares plummeted until trading was stopped.) Anyway, Greywolf SEO has been doggedly tracking down the Googleclaim that it was a one off, and has found other examples.
Well, just ask United. The reason Google is trying to fight is that having to check the accuracy of its millions of pages - which would put huge costs onto its operation. But if its not Google's problem and fault, whose precisely is it? Thursday, September 11. 2008Predicting Search Value - Pareto 2.0The Mayer 90 / 10 Rule (Pareto 2.0 ) Speaking of prediction markets, I've become quite intrigued with following Google's Marissa Mayer as one such. Earlier this year she shifted Google off the "Don't be Evil" petard it was hung on - and shortly thereafter Google started up with a whole slew of controversial privacy invasive services. More interestingly this week she noted that search was "90% - 95% done". When pressed on this to clarify, she noted that*:
I've graphed the implications of this above - ie 90% of search is "done", but it took 10% of the effort. The real sweat is in the last 10% - in other words the Return On Investment is appalling, unless you believe that a lot more than 90% of the value is still there to be captured. Assuming that Ms Mayer can be used as a Prediction Market, the implications are that Google is moving away from Search as its primary business (Content AOwnership, Ads and Clouds seem to be the main areas). * I saw this on TechCrunch, the original is over here Monday, September 8. 2008Google is Microsoft 2.0
Sometimes you just have to hate people - this piece by Nick Carr gets to the ideal helicopter altitude and encapsulates the Google strategy very succinctly, and without even having to resort to comics. Because Google owns one of the main "choke points" on the Internet (Search), and has learned to monetise ads on what flows through its choke point aka toll booth, then it's optimal strategy is to remove all other choke points to maximise flow:
For Google, literally everything that happens on the Internet is a complement to its main business. The more things that people and companies do online, the more ads they see and the more money Google makes. In addition, as Internet activity increases, Google collects more data on consumers’ needs and behavior and can tailor its ads more precisely, strengthening its competitive advantage and further increasing its income. As more and more products and services are delivered digitally over computer networks — entertainment, news, software programs, financial transactions — Google’s range of complements expands into ever more industry sectors. That's why cute little Google has morphed into The Omnigoogle. And, as Nick points out: But while Google has an odd business model, it's not an unprecedented one. The company it most resembles is, ironically, its archrival, Microsoft. Just as Google controls the central money-making engine of the Internet economy (the search engine), Microsoft controlled the central money-making engine of the personal computer economy (the PC operating system). In the PC world, Microsoft had nearly as many complements as Google now has in the Internet world, and Microsoft, too, expanded into a vast number of software and other PC-related businesses - not necessarily to make money directly but to expand PC usage. Microsoft didn't take a cut of every dollar spent in the PC economy, but it took a cut of a lot of them. In the same way, Google takes a cut of many of the dollars that flow through the Net economy. The goal, then, is to keep expanding the economy. In other words say hello to the new Microsoft. The interesting thing though, is that this is all pre-supposed on Google being the No 1 search Engine - so when you see things like Marissa Mayer saying search is 90% done, you start to wonder - as Arthur C Clarke once noted in his laws of prediction: When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong. The ancient Greeks, who had seen all this many a time, had a word for it - hubris. It goeth before a fall. Thursday, September 4. 2008Chrome EULA - the Comic
We noted yesterday that many (most?) of the Chromebloggers were fairly non analytic and blogged on the shiny shiny bits of Chrome (see our post on the matter here) - but a few (like us
Anyway, its all over the blogosphere today. To explain this issue strategically I decided to use a comic format as its all the rage. Chrome tarnished? (As it is a p*ss take, I believe it is allowed under Fair Usage as well.) (Update - Ina Fried reports the EULA has now been changed. Section 11 now reads simply: "11.1 You retain copyright and any other rights you already hold in Content which you submit, post or display on or through, the Services.". Strategic reasons for a Googlebrowser remains though) Wednesday, September 3. 2008Searching for the future of retail therapy - a review
Chinwag ran a very interesting session last night on the differing (and competing?) roles of search and review sites in retail therapy fulfilment.
Phil Wilkinson has done an amazing job of writing the gist of it up over here (thus absolving yours truly of the need - props Phil). I'd like to comment on a number of the points Phil makes: 1: Recommendation sites need search engines to drive any traffic to them but: 2: Search doesn’t take into account the stage of the purchasing cycle that someone is currently at. One of the things we didn't get round to is the CPM per visit though - one would assume that ads on a review site may be more useful, as you have established intent which is not clear on a search site. Any ideas welcome. 3: If retailers got their acts together, would anyone really need to use recommendation or review sites This is a very good question - Amazon does its own reviews, probably because of the sheer range it has. But one can imagine specialist retailers, or other large retailers, taking over this role. It is hard to understand what specific benefits a 3rd party site has, especially if the retailer makes completion very easy. This issue is related to the following: 4: There are other forms of communication that can help in the buying process that miss out both search engines and recommendation sites altogether I had a similar experience buying a DSLR camera - the issue being that the review site is just one part of the continually iterating workflow of making a purchasing decision, and Twitter, Flickr and Yahoo discussion groups etc all came into it. Another issue is that review sites are good at the "look at this" but comparing X vs Y is harder. Also, with SLR cameras for example, there is a whole secondary discussion around which lens combinations to use - a multi-factor discussion. I wanted statistical view that you get a glimpse of with Amazon "people also bought", but all I could get was a few opinions on various sites. Another point Phil makes is:
My experience of Google product search (in the UK anyways) is the front page is too often filled up with sites that are clearly just trying to grab the traffic, and have no real content themselves but provide a list of other sites to go to (if you are lucky - I'd love to have a button that instantly blacklists those sites). On the other hand, with review sites: What would you prefer - 2 recommendations from a friend and colleague or 45 reviews from anonymous people? I haven’t found any data to support either side right now but it’s an interesting question nonetheless that needs some answers. Actually, I'd probably trust 45 others, its big enough to be a random sample - so long as its not gamed, as can happen - even on Amazon. But people authenticate themselves better on Twitter than Amazon say, so is that a more valuable opinion? It is an interesting game theory / behavioural psychology - how many anonymous users are worth 1 friend? A friend who knows about the stuff vs one who doesn't? The other thing that wasn't really covered was the economics - the tradeoff value of item vs transaction costs of the buying experience. Its just not worth providing the same levels of service for a shirt purchase vs a car, say. A camera is somewhat inbetween. Or how about restaurant selection vs holiday selection? Phil's Conclusions: This entire space about how people choose and recommend products and services to each other is much more complicated than I ever imagined, and luckily I’m not the only one who thinks that! Thats my impression as well - my instinct is that this whole arena is in very early days, and there are considerable rewards to getting it right - which is of course why its one of the areas we are collaborating with in background research. I was most interested in the one speaker's note (from Bazaarvoice? I didn't catch name or company) that they started off algorithm based, and are now adding social functions to get blended analysis. I suspect thats right - if I look at Last.fm vs Pandora, Last.fm optimises faster initially, but Pandora keeps on long after adding an extra friend to last.fm no longer helps converge to a solution Monday, July 28. 2008An initial review of Cuil
News from TechCrunch pointed to the long awaited launch of Cuil (nee Cuill), pronounced "kewl" ( or "cool" if you are from the US
- Buttermilk Pancakes, the subject of Dare Obasanjo's Knol test. Buttermilk Pancakes The first thing to notice is that Cuil's search page is black (as is that on our real time search engine design - clearly they have good taste Cuil Results Page - Buttermilk Pancakes Two things hit me - firstly, that there were no Knol based entries on the Cuil front page. Secondly, my Google results, although deliberately searched on "The World" and not "The UK" setting, looked nothing like Dare Obasanjo's search, mine had far more BBC and other UK based pages served. There were only 2 common pages between my Google page and Cuil. I had to manually switch over to Google.com to replicate Dare's search - so clearly searching "the world" from the UK is still very UK focussed - why is that? What was more interesting after that was that the difference between the Yahoo, Cuil and Google search pages were fairly negligible apart from the Knol entry on Google. (In fact, my most delightful discovery on this journey was the Yahoo drop down context menu when you enter a search term). Freeconomics Google did the "Right Thing" here and brought up our blog article on this on the front page Comparison of "Freeconomics" Search (I was reading Cuil down each column at a time to define order......) Rosicrucians I was looking it up anyway, so tried all 3. Google and Yahoo's first 2 posts were sponsored, and then they got straight down to business - an article on Rosicrucianism on Wikipedia. Cuil couldn't quite bring itself to get the Wikipedia article on "Rosicricianism" but opted instead for the article on the Ancient Order of the Rosicrucians on No 3 - sans any sponsored pages. Privacy Cuil tries to set itself apart from Google here, its policy stated as:
My name Searched for my name, didn't feature in Cuil till page 4 (where it brought up my Twitter account). Yours truly is on page 1 on Google and Yahoo via Broadstuff. I note my Indiana artist, Australian and Argentinian namesakes are still there on page 1, so Cuil clearly does not love blog based links very much. (Update - a few other things....) - If you mis-search for Coase's Laew (as opposed to Law) Google knows what you're after, Cuil gets totally confused, Yahoo thinks you are after Case law. - "Apple" got not one record on page 1 of the actual fruit on Cuil. - when I searched for the Tuatha de Danaan on Cuil, the graphic was a planet of the apes type monkey - thats not the fair Faerie people, its the Fir Bolg Conclusions Well, any search engine that can't reference our blog is clearly a non starter for the cognoscenti! Apart from that glaring omission, I think its search is "mostly harmless" - it doesn't bias for Knols, and it comes up with roughly the same stuff as the others (though not as much or as often as the 120 billion links may imply). The question for Cuil is - whats the differentiator here? I don't think pretty pictures helps that much, and the layout leaves me largely unmoved. Privacy could become a competitive issue, but I think its still a minority sport - noting how little care the munchkins have with Facebook and Google privacy today, I (sadly) conclude its one of these things that exercises the digerati but few others. In fact, the biggest aha for me from this is that Yahoo has come on apace since the last time I looked at it. (For the record - I've used Dogpile for 10 years very happily) An aside - David Kelly notes that Cuil say their name in theory means "knowledge" in Irish. Um - that bit o' me that is forever gaelic tells me that Cuil means corner or recess or rear. If it's Cuill (the original company name) then that's straight out of Irish legend (the coming of the Tuatha De Danaan) where Mac Cuill was named after a tree, the hazel (or Coll) which in Celtic Mythology gives you wisdom from eating the nut, which the Salmon of Knowledge did. So, a bit roundabout, but in some recess of the name Cuil is the concept of knowledge, though it'd drive you nuts to find it and it all seems a bit fishy Update thought - catching up on all the comments on this blog and elsewhere (try typing Cuil on Summize), I am left scratching my head as to why Cuil launched now (when it would seem it was too early) and why not go "beta". After all, you only get 1 shot at the launch mojo. Still, to be fair, people were moaning about Google vs AltaVista et al as late as 2002 if I recall correctly. Saturday, July 26. 2008Pagerank, Browserank and the search for Google's achilles heel
Its been clear for awhile that the Google Pagerank system is (i) being heavily gamed and (ii) is no longer giving the best result from its basically link volume/value based analysis. Microsoft has also spotted this, and has come up with Browserank:
Essentially, the researchers tested out a system that replaces PageRanks' link graph--a mathematical model of the hyperlinked connections of the Internet--with what they call a user browsing graph that ranks Web pages by people's behavior. Will this propel Microsoft to top spot? Who knows - the interesting thing to take away is that Microsoft (and others) are starting to take Google on with new search ideas - an encouraging trend. ASnd itsa worth remembering that all that advertising wealth still depends on search excellence. And there is an increasing probability - as more people search for New Search - that the innovation will come from without Google, and they may not be able to buy it themselves. The way Google is structured today, search drives Ad revenue. Lose search and Ad revenue goes with it - that is Google's Achilles heel. All else in the Googleplex is still largely noise in the revenue producing sense. One suspects therefore that strategically, Google is increasingly trying to separate the (current close) links between search excellence and Ad revenues, but we suspect that will be some time comng, and very hard for text based Ads.
(Page 1 of 4, totaling 32 entries)
» next page
|
QuicksearchAnd hopefully a prosperous one... For More Information about Broadsight: Contact us Broadsight website Articles To sign up for Broadstuff on other services: Broadstuff - the Twitter edition Broadstuff - the Jaiku edition Broadstuff - the FriendFeed edition Subscribe to Broadstuff via email Books we are reading: Syndicate This BlogArchivesBlog AdministrationCreative Commons LicenceCategories
|
