Showing posts with label panda update. Show all posts
Showing posts with label panda update. Show all posts

Friday, April 6, 2012

Top 3 Strategies for SEO After Google Panda Update

Google has changed more in the past year than it did in the 12 years prior to that. Most of the changes are good for honest marketers who just want the ranking their content deserves. But taken together, they radically change search engine marketing (SEM) best practices. James Mathewson (author of Audience, Relevance, and Search: Targeting Web Audiences with Relevant Content), won’t go into every change, because they number in the dozens, but this article discusses three changes every SEM should care about.

As some of you may know, I am IBM’s representative to the Google Tech Council. For those who don’t know, the council is a place where representatives from the leading B2B tech companies sit around a table each quarter and discuss our search challenges with Google representatives. B2B Tech companies might have different ways of doing search marketing, but our challenges are common. We all need to rank well in Google for the words and user intents most relevant to our clients and prospects.

Google does its best to programmatically help us solve our challenges. They can’t always help for legal reasons I won’t get into. But where they can, they do. For example, a couple of years ago, several of us complained about the alarming increase in content farms on the search engine results pages. Whether we had organic or paid listings on those pages, content farms caused serious friction for our target audience, and diluted the results we had legitimately earned or paid for.

After that pivotal meeting (though perhaps not because of the meeting), Google began working on ground-breaking changes to its algorithm that would tend to improve the quality of search engine results incrementally over time. The first set of changes was launched in March 2011. Of course, I’m referring to Panda.

Nine months and several Panda updates later, I can confidently say that Google does a much better job with the quality of its search results. I rarely if ever see content farms anymore, and those I do see don’t last long on page 1. Those who think of SEO the way it was primarily conducted prior to these changes—keyword stuffing, buying links on content farms, and participating in commodity link exchange trading—have been left behind.

Panda is perhaps the most profound change to Google’s search engine since PageRank, which was the technology that gave Google its edge. Ironically, it was overdependence on PageRank that led to the series of algorithm changes known collectively as Panda. The practice of spoofing PageRank by swapping or buying links from low-quality sites had grown to such an extent that the results were polluted by them.

Towards an Algorithm that Rewards Quality Content
The Tech Council was not the only place where Google was hearing that it needed to change. Google’s chief competitor—Bing—had taken some of its share, to the point where Google only owned something like 70 percent of the market, down from 80 percent at its peak. The quality of the results had something to do with this.

The problem Google faced is that it had made regular changes to its algorithm over the years to stay one step ahead of the scammers, spammers, and scrapers. It had even introduced continuous A/B testing that gave pages better results if users actually engaged with them. That approach had reached its limits. The A/B tests were simply not getting rid of the pages fast enough. Scammers, spammers, and especially scrapers could publish pages faster than Google could drop them in the rankings. Google needed to undercut these activities once and for all.

How do you change the algorithm again to reward authentic, high quality content and punish low-quality spam-riddled content from scrapers? The answer was a revolutionary way of building an algorithm: with UX/editorial crowd sourcing combined with machine learning. According to Rand Fishkin, founder of SEOMoz, Google hired hundreds of quality raters—primarily editors and UX specialists—to rate a massive number of pages on the web. It then put the ratings into a machine-learning program, which recognized patterns and built the algorithm organically.

Machine learning is a technique borrowed from artificial intelligence used to enhance the analysis of complex systems such as natural language. When a computer system is said to learn in this way, it is taught to recognize complex patterns and make intelligent decisions based on the data. Watson, for example, used machine learning to train for the television game show Jeopardy!. By studying the questions and answers of past games, and practicing in live sessions with past champions, Watson was able to learn the nuances of the game well enough to play above past championship levels.

There are hundreds of patterns or signals that the Panda machine learning program recognizes in how the quality testers rate pages. Notice I used the present tense because this is an ongoing process. Google releases a new version of Panda every two months or so that reranks the entire web based on a new weighting of patterns and signals the machine learning program learns, all stemming from feedback from the quality testers.

Though the algorithm is changing in subtle ways all the time, the general trend it to favor the following three areas of digital excellence, described in the next few sections.

1. Design and UX
It is no longer advisable to build text-heavy experiences, which force users to do a lot of work just to ingest and understand the content. Clear, elegant designs that help users achieve their top tasks will tend to be rewarded by Panda. There are dozens of books and sites on user experience (UX) best practices, so I won’t rehash them here. But these principles of good UX won’t lead you astray:


Gerry McGovern, Jerrod Spool, and Jakob Nielsen are three of the leading thinkers in web UX.

  • Keep it simple. The whole experience needs to be easy to use. One thing Panda does is put new emphasis on the UX of entire sites, not just one page within the site. Do you help users navigate once they come to the page? Or does your experience drive your users in circles? Can they get back home if they click your links?
  • Don’t make users work. A page needs to have most of the content “above the fold” or on the first screen view. Don’t make users scroll too much. Don’t make them click just to see more of the content you want them to see.
  • Clarify. A page needs to clearly communicate what it’s about at a glance. You have six to eight seconds to give quality testers a clear idea of what the page is about, who it’s for, and what users can do on it.
  • Don’t shout. You already have their attention. The temptation is to make it so blatantly clear that you use huge text and flashy graphics. Don’t insult users’ intelligence. Just elegantly clarify what the page is about.
  • Don’t hide stuff. The temptation for some designers is to be too elegant, forcing users to mouse over items to make them appear. If users don’t know it’s there, chances are they won’t mouse over it.
  • Emphasize interaction. Sites and pages are not passively consumed. Give users ways to interact and participate in the conversations at the core of the content.
  • Answer user questions. Ask yourself what questions users might have when they come to your page. Learn these questions by analyzing the grammar of search queries they used to find your content. More and more all the time, users are phrasing their queries in the form of questions. Answer these questions clearly and concisely.
These and many other design and UX best practices are some of the strongest signals the Panda machine learning algorithm looks for.

2. Content Quality
One of the main complaints I hear about SEO is from my editorial colleagues who say it’s just a way of helping poor-quality content climb the ranking at the expense of good-quality content. Their complaint has some validity. Content quality can’t be boiled down to a simple checklist of where to put keywords. So why are these factors so important in search engine results?

The answer is, they’re not as important anymore. Panda does not primarily reward traditional SEO best practices. Panda primarily rewards clear, concise, compelling, and original content. Only if two sites are of equal quality in Panda’s eyes will it tend to reward the one that displays traditional SEO best practices. But it is easy to overdo SEO practices.

For example, keywords are strong indicators of relevance to user queries. As paragon users, quality testers look for the words they typed in their queries when they land on a test page. If they’re not clearly emphasized on a page, the page will not tend to get a good score. So having well emphasized keywords above the fold is an important positive pattern for Panda.

But having a conspicuous number of the same keyword over and over is the sign of bad quality content. So that’s a negative pattern for Panda. Matt Cutts, Google’s organic quality czar, advises page owners to read the copy aloud. If it sounds natural, it should be fine.

The point is, the rules of good quality content are much more important than any simplistic set of SEO rules for pages. Do traditional SEO practices matter? Yes, because they are patterns Panda cares about. But really good content that does not have keywords in every alt attribute or great backlinks will still tend to rank better than marginal content that has all of the attributes of traditional SEO.

Panda also tends to reward fresh content and punish duplicate content. All things being equal, a piece of content will rank higher if it is more recently published. (So pay attention to the date metatag.) One of the signals it looks for is not at the page level but at the site level. A high quality page that sits in a site full of old duplicate junk will not rank well until you clean the junk out.

In short, Panda rewards good content strategy. Content strategists such as Colleen Jones don’t trust SEO “snake oil.” And rightly so. Good SEO is good content strategy and vice versa.

Finally, one of the dilemmas that I hope gets permanently retired with this article is the false dichotomy between writing for search engines and writing for users. I argued in the book I co-authored for IBM Press that they were essentially the same. The attributes users care about are much the same as the ones search engines care about. And we can use the intelligence we glean from search engines as a proxy for intelligence about our users. Prior to Panda, this was controversial. Post Panda, it is not controversial.

The algorithm is derived from user preferences. The only reason why you need a machine to learn those preferences and do the work is because of the sheer volume of pages and sites on the web. The machine is not quite human, but it is getting closer all the time to human intelligence. And it has something that no individual human has: It has the collective intelligence of the whole crowd of quality testers. Like Watson, it is smarter than any individual human because it combines the intelligence of the collective of people feeding it data.

3. Site Metrics
As I mentioned, Google has long rewarded search engine results with high click-through rates, low bounce rates, and high engagement rates by helping them climb the rankings. But Panda rewards these even further by making search excellence metrics strong signals in each update. It also continues to raise the level of sophistication of these metrics signals, where they tend to align with pages rated highly by the quality testers.

For example, Google’s A/B testing didn’t have different standards for different types of experiences. It used a relative standard based on the bounce rates for the words in question. As a result, certain types of experiences for a given keyword tended to rank better over time (ahem, Wikipedia). Yet it makes perfect sense that portals have different bounce rates than single-offer commerce experiences. The more options users can click, the lower the bounce rate, generally speaking. Because humans understand the nuances of different experiences, Panda tends to contextualize these variable metrics values. And it will tune how it weights them over time as the quality testers provide more data.

Another example in metrics sophistication is in the levels of engagements. If a user clicks through to a page from the search engine results page, and clicks three more times, it will count for more than if she just clicked once. More generally, if a site has a high number of engagements per user, it will tend to rank better over time than one with one page that converts well and a bunch of dead pages.

Perhaps the best news of all in this is that you can improve your search rankings just by making incremental improvements to the pages on your site based on the metrics you gather.

Unfortunately, all the changes Panda makes only happen every two months or so. So once you are pushed down in the rankings by Panda, it will take a while to get back up in the rankings. Hopefully, Google will be able to make more frequent updates to Panda in the future so that those penalized by an overaggressive ad executive or an inadvertent UX faux pas can get back into Panda’s good graces more quickly.

Unlike past Google algorithm changes, Panda itself is not changing in any drastic way. It is just getting smarter at recognizing high-quality digital experiences. It’s also getting smarter at recognizing poor quality experiences that look good from a simplistic point of view. If you want to rank well for Google, you will need to invest in building high quality, authentic digital experiences. Given the growing confidence in the Google algorithm, it is a business imperative.
Written By: James Mathewson Source:

Friday, March 9, 2012

Search Quality Highlights: 40 changes for February 2012

This month we have many improvements to celebrate. With 40 changes reported, that marks a new record for our monthly series on search quality. Most of the updates rolled out earlier this month, and a handful are actually rolling out today and tomorrow. We continue to improve many of our systems, including related searches, sitelinks, autocomplete, UI elements, indexing, synonyms, SafeSearch and more. Each individual change is subtle and important, and over time they add up to a radically improved search engine.

Here’s the list for February:
  • More coverage for related searches. [launch codename “Fuzhou”] This launch brings in a new data source to help generate the “Searches related to” section, increasing coverage significantly so the feature will appear for more queries. This section contains search queries that can help you refine what you’re searching for.
  • Tweak to categorizer for expanded sitelinks. [launch codename “Snippy”, project codename “Megasitelinks”] This improvement adjusts a signal we use to try and identify duplicate snippets. We were applying a categorizer that wasn’t performing well for our expanded sitelinks, so we’ve stopped applying the categorizer in those cases. The result is more relevant sitelinks.
  • Less duplication in expanded sitelinks. [launch codename “thanksgiving”, project codename “Megasitelinks”] We’ve adjusted signals to reduce duplication in the snippets forexpanded sitelinks. Now we generate relevant snippets based more on the page content and less on the query.
  • More consistent thumbnail sizes on results page. We’ve adjusted the thumbnail size for most image content appearing on the results page, providing a more consistent experience across result types, and also across mobile and tablet. The new sizes apply to rich snippet results for recipes and applications, movie posters, shopping results, book results, news results and more.
  • More locally relevant predictions in YouTube. [project codename “Suggest”] We’ve improved the ranking for predictions in YouTube to provide more locally relevant queries. For example, for the query [lady gaga in ] performed on the US version of YouTube, we might predict [lady gaga in times square], but for the same search performed on the Indian version of YouTube, we might predict [lady gaga in India].
  • More accurate detection of official pages. [launch codename “WRE”] We’ve made an adjustment to how we detect official pages to make more accurate identifications. The result is that many pages that were previously misidentified as official will no longer be.
  • Refreshed per-URL country information. [Launch codename “longdew”, project codename “country-id data refresh”] We updated the country associations for URLs to use more recent data.
  • Expand the size of our images index in Universal Search. [launch codename “terra”, project codename “Images Universal”] We launched a change to expand the corpus of results for which we show images in Universal Search. This is especially helpful to give more relevant images on a larger set of searches.
  • Minor tuning of autocomplete policy algorithms. [project codename “Suggest”] We have a narrow set of policies for autocomplete for offensive and inappropriate terms. This improvement continues to refine the algorithms we use to implement these policies.
  • “Site:” query update [launch codename “Semicolon”, project codename “Dice”] This change improves the ranking for queries using the “site:” operator by increasing the diversity of results.
  • Improved detection for SafeSearch in Image Search. [launch codename "Michandro", project codename “SafeSearch”] This change improves our signals for detecting adult content in Image Search, aligning the signals more closely with the signals we use for our other search results.
  • Interval based history tracking for indexing. [project codename “Intervals”] This improvement changes the signals we use in document tracking algorithms. 
  • Improvements to foreign language synonyms. [launch codename “floating context synonyms”, project codename “Synonyms”] This change applies an improvement we previously launched for English to all other languages. The net impact is that you’ll more often find relevant pages that include synonyms for your query terms.
  • Disabling two old fresh query classifiers. [launch codename “Mango”, project codename “Freshness”] As search evolves and new signals and classifiers are applied to rank search results, sometimes old algorithms get outdated. This improvement disables two old classifiers related to query freshness.
  • More organized search results for Google Korea. [launch codename “smoothieking”, project codename “Sokoban4”] This significant improvement to search in Korea better organizes the search results into sections for news, blogs and homepages.
  • Fresher images. [launch codename “tumeric”] We’ve adjusted our signals for surfacing fresh images. Now we can more often surface fresh images when they appear on the web.
  • Update to the Google bar. [project codename “Kennedy”] We continue to iterate in our efforts to deliver a beautifully simple experience across Google products, and as part of that this month we made further adjustments to the Google bar. The biggest change is that we’ve replaced the drop-down Google menu in the November redesign with a consistent and expanded set of links running across the top of the page.
  • Adding three new languages to classifier related to error pages. [launch codename "PNI", project codename "Soft404"] We have signals designed to detect crypto 404 pages (also known as “soft 404s”), pages that return valid text to a browser but the text only contain error messages, such as “Page not found.” It’s rare that a user will be looking for such a page, so it’s important we be able to detect them. This change extends a particular classifier to Portuguese, Dutch and Italian.
  • Improvements to travel-related searches. [launch codename “nesehorn”] We’ve made improvements to triggering for a variety of flight-related search queries. These changes improve the user experience for our Flight Search feature with users getting more accurate flight results.
  • Data refresh for related searches signal. [launch codename “Chicago”, project codename “Related Search”] One of the many signals we look at to generate the “Searches related to” section is the queries users type in succession. If users very often search for [apple] right after [banana], that’s a sign the two might be related. This update refreshes the model we use to generate these refinements, leading to more relevant queries to try.
  • International launch of shopping rich snippets. [project codename “rich snippets”]Shopping rich snippets help you more quickly identify which sites are likely to have the most relevant product for your needs, highlighting product prices, availability, ratings and review counts. This month we expanded shopping rich snippets globally (they were previously only available in the US, Japan and Germany).
  • Improvements to Korean spelling. This launch improves spelling corrections when the user performs a Korean query in the wrong keyboard mode (also known as an "IME", or input method editor). Specifically, this change helps users who mistakenly enter Hangul queries in Latin mode or vice-versa.
  • Improvements to freshness. [launch codename “iotfreshweb”, project codename “Freshness”] We’ve applied new signals which help us surface fresh content in our results even more quickly than before.
  • Web History in 20 new countries. With Web History, you can browse and search over your search history and webpages you've visited. You will also get personalized search results that are more relevant to you, based on what you’ve searched for and which sites you’ve visited in the past. In order to deliver more relevant and personalized search results, we’ve launched Web History in Malaysia, Pakistan, Philippines, Morocco, Belarus, Kazakhstan, Estonia, Kuwait, Iraq, Sri Lanka, Tunisia, Nigeria, Lebanon, Luxembourg, Bosnia and Herzegowina, Azerbaijan, Jamaica, Trinidad and Tobago, Republic of Moldova, and Ghana. Web History is turned on only for people who have a Google Account and previously enabled Web History.
  • Improved snippets for video channels. Some search results are links to channels with many different videos, whether on, Hulu or YouTube. We’ve had a feature for a while now that displays snippets for these results including direct links to the videos in the channel, and this improvement increases quality and expands coverage of these rich “decorated” snippets. We’ve also made some improvements to our backends used to generate the snippets.
  • Improvements to ranking for local search results. [launch codename “Venice”] This improvement improves the triggering of Local Universal results by relying more on the ranking of our main search results as a signal. 
  • Improvements to English spell correction. [launch codename “Kamehameha”] This change improves spelling correction quality in English, especially for rare queries, by making one of our scoring functions more accurate.
  • Improvements to coverage of News Universal. [launch codename “final destination”] We’ve fixed a bug that caused News Universal results not to appear in cases when our testing indicates they’d be very useful.
  • Consolidation of signals for spiking topics. [launch codename “news deserving score”, project codename “Freshness”] We use a number of signals to detect when a new topic is spiking in popularity. This change consolidates some of the signals so we can rely on signals we can compute in realtime, rather than signals that need to be processed offline. This eliminates redundancy in our systems and helps to ensure we can continue to detect spiking topics as quickly as possible.
  • Better triggering for Turkish weather search feature. [launch codename “hava”] We’ve tuned the signals we use to decide when to present Turkish users with the weather search feature. The result is that we’re able to provide our users with the weather forecast right on the results page with more frequency and accuracy.
  • Visual refresh to account settings page. We completed a visual refresh of the account settings page, making the page more consistent with the rest of our constantly evolving design.
  • Panda update. This launch refreshes data in the Panda system, making it more accurate and more sensitive to recent changes on the web.
  • Link evaluation. We often use characteristics of links to help us figure out the topic of a linked page. We have changed the way in which we evaluate links; in particular, we are turning off a method of link analysis that we used for several years. We often rearchitect or turn off parts of our scoring in order to keep our system maintainable, clean and understandable.
  • SafeSearch update. We have updated how we deal with adult content, making it more accurate and robust. Now, irrelevant adult content is less likely to show up for many queries.
  • Spam update. In the process of investigating some potential spam, we found and fixed some weaknesses in our spam protections.
  • Improved local results. We launched a new system to find results from a user’s city more reliably. Now we’re better able to detect when both queries and documents are local to the user.