Google, Content Farms, and Scraping

Approximately a month ago Google announced that it was planning to change its algorithms, ostensibly to provide a higher quality of search results for each search query, and this change has very recently gone into effect. For most people this change has no effect on their website and/or it’s rankings whatsoever, but websites that are filled with “shallow” and “low quality” content might be feeling the burn. Although Google won’t just outright come out and say this, the purpose of the new algorithm is to combat content that are produced from what’s commonly called “Content Farms” and “Scrapers”.

Content Farms are commonly described as companies that employ large numbers of freelance writers who write a copious amount of textual content that is specifically designed to satisfy algorithms for higher page rankings on search engine websites like Google. Most of this type of content can be called “good enough” – i.e, the content gives superficial information that is related to the search query but lacks unique or insightful input. Google believes that these types of articles (and blogs) are boxing out higher quality content that offers expert or thoughtful perspectives and Google’s new algorithms will attempt to change the playing field.

Google has created a new classifier that detects repeated spammy words—the sort of phrases you tend to see in junky, automated, self-promoting blog comments. These types of words or phrasing are great for Search Engine spiders, but not so good for human readers. When Google encounters blogs or websites with low quality, spammy words coupled with superficial knowledge of the subject, Google is going to rank that site lower.

Here is an example of what may qualify as a low quality and shallow article:
If the article on your website is focused on a very specific popular search term that is just a flash-in-the-pan trend – for example, “Miley Cyrus New Tattoo”. If your website is not related to Miley Cyrus, Tattoos, or the specific Tattoo Parlor that inked her, this article may be tagged as shallow. It will definitely be tagged as low-quality if there are redirect links and ads all over the page.  Think about it – if your site is dedicated to “Organic Kelp Products” (is that a potential niche?!? Need to check…) and there is an article about Miley Cyrus inexplicably on this website,  this article is obviously pandering to the search engines in hope of generating a high search ranking for a very specific term (if only for one day!) with the ultimate goal of redirecting traffic to the rest of the Organic Kelp website.

Or here is another example, one that doesn’t involve a hot trend:
Let’s pretend I have a website about Indian wedding saris and I have an article or blog post on my site that goes something like this, “Indian wedding saris are the most beautiful saris. They come in every color but red is the most popular Indian wedding sari. Green saris can also be used for Indian wedding saris. Silk saris, Georgette Saris, and Cotton Saris are popular wedding saris. Brides wear unique wedding saris for their wedding. An Indian Wedding Sari is a sure bet to look beautiful at wedding”.

Did that article actually tell you anything? There are a lot of words there, but will you walk away with any knowledge? By the end of this blog post, will you remember anything I just said about saris? Probably not. The article is redundant – Indian Wedding Saris and minor variations on that phrase – which in the past would have caught the attention of Google Spiders and quite possibly ranked my article highly. However, using Google’s new classifier, this article would JUMP out as shallow and low-quality (which it is). Now imagine if all of the articles on my website were of similar quality to the article above….not good!

To make the article better – to show some depth and knowledge – I could go into regional differences of wedding saris, the price ranges one could expect to pay, the history of wedding saris, what colors and patterns represent, other accessories one could wear with a wedding sari, wedding traditions, etc. I would put the focus of my article into a context that shows more research, thereby providing higher quality and unique content. Humans will appreciate it more and so will Google’s new algorithm. If the effort is made to create outstanding content, the rankings will reward you.

Here is a real example I found scouring the internet. It’s an article that’s pretty highly ranked (that’s obviously going to change very soon, you’ll see why) and concerns DIY (Do – It – Yourself) Wedding Projects. The article title leads me to believe that if I were trying to plan a wedding I would really enjoy reading this article, and that it may even give me a brilliant idea! But once I click on the title, this is the pitiful article I’m left to read:

Are you serious!?! This is so vague and superficial….nothing in this article provides anything that indicates thoughtful analysis/ideas or that even more than two minutes was spent typing this little diddy. If I was a frazzled from trying to organize a wedding I’d be royally ticked at this waste of time. If the author spent more time on the article – used pictures, gave specific instructions, gave resources on where to buy the materials, etc. – the article would be infinitely more impressive and useful. Don’t let your articles be like the one above – put in a bit more effort, be more thorough, and give a searcher quality they will enjoy to read!

This doesn’t mean the era of hiring freelance writers is over – but a bit more awareness is certainly called for. If you are going to hire a writer to generate content, hire someone that has demonstrated that they can write high quality, deep, thoughtful, unique content that humans would enjoy reading. Yes, it may cost a bit more, but higher quality and unique articles will be generated for you, which will in turn lead to higher rankings.
Another type of content Google is cracking down on includes scraper content. If you own a website that just scrapes content from other websites and has no original content, you might be in trouble. There are some legitimate ways to pull content from other sources such as RSS files with permission, or using just a very small amount of content under fair use guidelines, but it’s incredibly important that the vast bulk of your website has its own unique and useful content to balance out the legitimate scraped content. You must have original content on your site, it’s imperative!

Google is very firm on this issue – you can get articles from wherever and however you want, and they’re not policing content farms – but if the content-farm or scraper generated article you put forth on your website is shallow and low-quality you will not enjoy high rankings.

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

Leave a Reply