How To Dupe Content And Get Away With It
Let’s do one more post about content. First, consider Google’s Webmaster Blog’s post dispelling common duplicate content myths as a prerequisite read. Do I always trust what Google’s public relations tells me? Absolutely not, but it does confirm my own long standing protests against people who perpetuated the paranoia about duplicate content. So it makes a good point of reference.
The most common myth ensue with the paranoia is, “anytime I use content that is published somewhere else, I am sure to fall victim to duplicate content penalties.” This of course is bunk because for any specific terms related to an article you can show me I can find you 9 other sites that ranks for its terms that aren’t necessarily supplemental and full of penalties. However there is no doubt that there really is a duplicate content penalty. So we’ll discuss ways around it. One of my favorite parroted phrases is, “It’s not what content you use. It’s how you use it.” So we’ll start with formatting.
Here Is Some Spammy Text
Welcome to spam textville. This is just a bunch of spammy text. Text to fill and spam the streets. This is horrible spam text filled content that will surely get my spam site banned. Spam spam spam. It’s not food it’s text. Spammy text. I copied this spam text all over my site and others are copying it for their spammy text sites. I can’t believe I’m keyword stuffing for the words spammy text.
Alone in paragraph form this text is very easy to detect as spam and being autogenned. So the classic SEO ideology of “well written article style paragraphed text does well” gets thrown out the window with this example. However, since I would love nothing more than to rank for the term “Spammy Text” and this is all the content available to me I have to abandon the idea of keyword stuffing and find some new formats to put this text in that search engines will find acceptable.
How about an Ordered List?
Lists and bulleted points work very well because the text enclosed is meant to be very choppy, short, and contain repetition such as My goals are, The plan is, Do this..etc. etc. If the common ordered list is formatted as such, than we by all right can do the same.
What about presenting it as user contributed?
Comments (3)
John Doe:
Spam spam spam.
Jane Doe:
I copied this spam text all over my site and others are copying it for their spammy text sites.
John Deer:
Spammy text.
How many of you readers have left complete crap in my comments? I’m not banned or penalized yet.
Faking user contributed material works great because since it’s outcome is unpredictable therefore you can do virtually anything with it and get away. Including but not limited to inner linking.
Mary Jane:
I saw this wicked article about this on Eli’s blog subdomainspam.spammydomain.com/spammysubpage.html check it out!
Break It Up Into Headings
Heading 1
Welcome to spam textville. This is just a bunch of spammy text.
Heading 2
Text to fill and spam the streets. Spam spam spam. It’s not food it’s text.
All the keywords are there its just no longer spammy because its been broken up properly into nice little paraphrases. Once again, standardized = acceptable.
Change The Format
What about PDF’s? They may not support contextual ads very well but they most certain can cointain affiliate links. The engines also tend to be quite lenient on them and redundant text. For more information read my Document Links post.
Let’s Move On
So now that we can format our text to avoid the penalties what if we attempt to side step them all together? I talked about how to swap titles out using a thesaurus and IMDB data in my Desert Scraping post. So I won’t talk too much about it, but definitely consider doing some research on exploiting LSI in this matter.
How about scraping the right content?
Heavily syndicated content works well for duping and it has the added bonus of being exclusively high quality. For instance I sometimes like to snag shit from the good ol’ AP. It’s not the smartest legal move but seriously, who’s going to deeply investigate and do anything about it? In such an event its always an option to just remove the article upon receipt of the CDC letter.
All in all, theres plenty you can do to dupe content and get away with it. It’s a pretty open game and theres a lot out there.
Have Fun
Just wanted to say that although this is pretty obvious to me, it has sparked some new ideas that I will be working on. Thanks Eli.
Eli,
While this seems like common sense and might be totally true, I find it hard to swallow without any empirical proof–or at least some anecdotal evidence.
Gene
I have built sites like this and getting google to index them is easy peasy.
Getting links / traffic to them is not easy.
To make money with this stuff you need thousands and thousands of sites..
You need to build faster than you get banned… as a lot of your sites WILL get banned…
For that you need sophisticated automation on a level not discussed on this site.
Coming up with madcap schemes is easy enough… making money from them is not.
You are up against professional programmers / server admin’s / SEO’s with years of experience and a LOT of expertise and the right contacts.
Any noob who thinks they can go on E-lance with $100 and start earning with a script like this is seriously mistaken.
Duplicate is in the eye of the beholder… If you are clever enough you can use as much duplicate content as you want and get away with it, without changing a letter.. Personally I use duplicate content as an excuse for keywords instead of containing them.
Webmaster search
Look at the most popular search page
Im going to sound stupid now but please can you un-abbreviate AP? Thanks
.
AP means Associated Press
Another thought-provoking post, Eli! Makes me realise how much I need to learn some coding - I’m sure if can’t be hard to re-format content with php, can it?
On the money again. It works wonders building feeders sites IMO. Have been able to run a bit of SERP domination in a couple markets with similar techniques.
Assuming I’m correct with this duplicate content is not duplicate until its found inside of a search engine, for instance if you come across a new post/article (block of content) kind of like how this post is new and you put it on your site, there is a chance you’ll never hit a duplicate filter.
This is because hypothetically speaking that the only way the search engines know you’ve stolen the content is if they’ve previously indexed that content, so content is not unique until the search engines index it, now I’ve already wrote this all out on Principle Of Marketing but I’ll keep going with it here.
We know that search engines are predictable, for instance if you write a new article once a week for a while the search engine spiders will learn that behaviour and only come around once or so per week to index the new content (Assuming your not doing anything else to bring the bots in). This means that if you monitor these types of sites with your own little bots you can potentially steal their content and never get a duplicate penalty.
Here is how it works, some site makes a post once per week, Google sends out their bots on average once per week to look for new content, you send out your own bots to this site once per day to check for unique content, the second your bot finds a new post you simply scrape that post, place it on your site and push as many search engine bots to your new post as quickly as possible.
What I’m getting at with this comment that turned out to be a mini blog post is if you can get new content indexed in the search engines before anyone else that content is unique to you even if you have stolen that content, simply because the search engines have to have something to compare it to in order to give you a duplicate penalty, if it hasen’t been indexed then its still unique.
great post.
I actually had a post about all this. I think it was called Maintaining Rank Through manipulating Freshness Factors or something like that.
Many feel the SE’s are using a shingle algorithm to detect dupe content. Is rearranging sentences or a thesaurus swap enough ?
I agree that many sites do rank for dupe content but I also think these sites have other strong factors that negate the dupe content penalty.
as always a very inspiring post. thank you eli.
Here is the definition of irony:
http://credit.abcsouth.com/?p=144
Now that is funny.
I haven’t actually tried to get spammy text into the index and get it to rank well, but this definitely gives me some things to think about.
Do you really think there’s some sort of correlation between dupe content and supplemental results?
Thanks guys
For the most part all this should fairly common sense. I just want to make sure its out of the way before I start talking about mass producing gray hat sites, so I don’t have to deal with a bunch of questions about duplicate content penalties. It’s a great reference post to have.
Quick question, thats kinda related and may actually have been answered somewhere else, but what software are you using to create the pdfs and keep the hyperlinked anchor text hyperlinked??
Ta muchly
Something I’ve tried is sourcing relevant content on non English sites, and translating it. It tends to need editing, otherwise the copy has a wonderful auto gen garbled spam feel to it. And of course it’s time consuming (unless it can be automated somehow?) - but for a beginner like myself it’s interesting to play with.
btw eli, you are ranking #1 with spammy text at google now
great posting
lol ranked number 1 for spammy text
hows about doing a search and replace for a term thats close to the original term… rinse and repeat…
BTW - im not convinced at google downplaying dupe content as articles do well, so do dmoz listings which seem to pop up every where.
and for the record we’re ranked #1 for seo cock LOL!
Does anyone know a good commenting system? It would be for HTML sites. I want to make it real.
1000’s of sites coming off the same block must figure negatively into the equation. How is it possible to control a few hundred sites let alone a thousand that come from unique subnets?
I have a few content scrapping sites, one is this: http://relevantnews.net - how could they provide a benefit? I don’t know…
To me, the best way to turn a copy into an original document (in G’s eyes) is to add links. Google maintain a separate index for links only; if document A has a link and document B doesn’t have it, it is reasonable to assume that G will see them as different.
Besides, adding link has another advantage: it will make your page more informative than the original one. Take the keywords you want to target; find a relevant page related to that keyword; place the link and voila, you just created a better page than the original.
Use your own internal links to boost the new page (I assume you have original quality content with relevant links) and you will outrank the original page.
Some people forget that search engines do not buy products or sign up for services. I had a client (stressing the word “had”) who was adamant about using software to crank out pages and pages of keyword-based copy for his website.
The end result was a slight increase in search engine ranking for certain phrases … but his website now reads as though it were written by a 4-year-old.
I thought the search engines stripped out the html before reading it. If that is true then an Ordered List would look like a paragraph once the tags were removed. That would also apply to H headers.
If not, then you could put them in random divs, spans, tables, hr etc. Anything to break up the typical paragraph after paragraph. Is that true?
The comments idea is great!
I believe my scraper sites have been banned in Google based on my content. If I can just format it differently, then I am good to go!
Thanks!
My 3 web-sites with duped content have been banned away from google search data base, it is probably better to have a web-site with original content…
Or you can have someone rewrite the original content somehow… mazbe there could be some script to replace some synonym words…
Great post,
Obviously G penalizes dupe content, but I can’t believe that it’s as strictly as some would have you believe.
I hate spammers…
Says the man with 8 posts in 5 minutes - lol !
If you do write original content all the time, its quite an achivement.
I have been using the same technique on one of my sites after reading you article. I used a mixture of replacing some words, re-ordering the paragraphs and also the format as per your article. Seems to do the trick!