Gray Hat SEO. Gray Hat SEO. Gray Hat SEO. I’m not keyword stuffing, I’m thinking. If gray hat is so widely accepted as a popular tactic than why are there no good articles on what it is and how to do it? Maybe, and this is just one theory, its fuckin’ impossible to write about. It’s just too damn much of a gray area in the industry (pun intended). Well, we’ll see about that. :)

So what exactly is Gray Hat SEO? Most define it as a site that uses questionable tactics? I think thats an excellent analysis, because the number one rule to gray hat is to be questionable. In fact the more you can get a trained eye to scratch their head wondering if your site is black hat or white the better. If you can fool the average visitor than you more than likely will fool a SE bot. In my opinion the best way to do this is to have an innate eye for structure.

In designing a structure for your gray hat site the best way to go about it is to steal a structure from a site that couldn’t possibly be banned. Let’s take for example. Digg is setup in various primary categories where each contains news related stories. Each news story consists of a title with the link to the original article and a small, 255 character or so description. Each news story is accompanied by some user contributed content. Upon a indifferent perspective this is a very questionable structure. The content itself is very short and aren’t organized like a standardized article would be, the user contributed content is always very short and dispersed. Not to mention there is an enlarged lack of control over length of and quality of the user contributed “comments.” However the structure gives us some possibilities. We know that since Digg, Reddit and other social sites of the likeness are authoritative and standardized in the industry than naturally the search engine antispam algorithms couldn’t possibly automatically consider it of bad taste or a possibility of being considered a black hat site. This gives us a huge opportunity for a possible gray hat site.

Okay so the next step would logically be to figure out our content sources. Sticking with the example we know that they get their content sources from large and small news related stories, mostly technology but that can excused for whatever niche we decide to target. This is a great place to start because Google has already been quoted as saying news related stories can’t be counted as duplicate content because they are so widely syndicated. It only makes sense. So getting news stories are easy. In fact it can be done by scraping lots of popular news RSS feeds. If we’re attempting to duplicate the Digg structure than we don’t need the entire articles despite what we’ve come to believe about SEO. We only need the partial story along with a title and then pad it with some user contributed content.

Where are we going to get some user contributed content? In this particular example I can’t think of a better place than the place we’re ripping off the structure from. We might as well pad each article with scraped user comments from Digg itself. So we can take the titles of each news piece and remove all the common words such as: “why”,”but”,”I”,”a”,”about”,”an”,”are”,”as”,”at”,”be”,”by”,”com”,”de”,”en”,”for”,”from”,”how”,”in”,”is”,”it”,”la”,”of”,”on”,”or”,”that”,”the”,”this”,”to”,”was”,”what”,”when”,”where”,”who”,”will”,”with”,”the”,”www”,”and”,”or”,”if”,”but” and any various others we find that aren’t commonly associated with the article subject. Then we can do a search on Digg and scrape several comments from the results, change up the usernames and make it all look unique. Hell if we wanted we could even markov some user submitted content in the middle of the scraped user content. Naturally not every user contributed content will match the topic exactly but who’s to say it’s not real? Once again as long as you remain “questionable” who can possibly deny you rankings? “Wow great article.”, “I pee’d in your pool.” <- users always submit this kind of shit, search engines are used to it and are more than well adapted to handle it. I realize this goes against the long preached world of “poison words” and such, but with the evolving net of social networking it directly conflicts with the true nature of the web and thus must be compensated for. Remember, its not what content you have, it’s how you use it. If it helps think about it like this. If you took all the comments on Youtube(which is the majority of their actual text based content) and truncated it all together in paragraph markers, how fast would you get banned? In no time, right. However when clones of places like Youtube organize it under headings of Comments and heavily break it up, somehow it all becomes legit. Take a moment to think about that.

More on this in a later post…

So now that we got our two elements of a successful gray hat site we can cook ‘em up together. We can even create a little mock voting and submission system. It doesn’t even have to work properly just as long as upon inspection it all looks legit. It’s all 100% autogenerated of course but as long as it’s laid out cleanly and correctly there’s no reason why we can’t generate hundreds of thousands of pages of stolen content while keeping visitors and the SE’s none the wiser. There’s no doubt we can do this exact technique for just about any authority site on the web. Let’s jump back to our Youtube example for a moment. Youtube isn’t the first nor the last video site on the web. As far as actual text based content goes it obviously takes a large piece of the brown cake. So how does it get away with it, and all it’s pages rank and do well while your clean and lengthy white hat articles struggle? Some would argue links are the answer. Well not all video content sites have tons of links but they can still survive and don’t get immediately banned for spammy content, but we’ll humor the notion anyway. So let’s get some links. :)

Gray hat sites frequently have an advantage over black hat sites in link building because since they can pass human check their links, more often than not, will tend to stick more. So of course the first place I would go is to attempt trackback pings on all these stories. If my link ends up on a few authoritative news sites, great, on a few blogs, just as well. Since it’s all legit and not only can we pass human check with our essentially black hat site as well as actually linking to them then there’s no good reason the links won’t have a high success rate. Which leads me to a bit of custom comment spam. Might as well find blog posts talking about related stuff to each story and leave something like, “I saw an interesting story related to this on” Sure why not, between just those two simple and common black hat techniques we somehow ended up with plenty of white hat links. Thats the beauty of gray hat. If you can at least get people to question whether or not your site is legit than you stand a very good chance of succeeding.

So essentially what we’re trying to do is play around within the margins between the pros of black hat and white hat till we can find a happy medium that is both acceptable to other webmasters and the search engine antispam algorithms, but how would an advanced Blue Hatter spin all this? :) Very good question. I personally would take a look at my potential competition. Since I’m taking these articles from other sources, they are the originals I am the linking to copy, naturally they will beat me out in all the SERPS. I can still drive traffic off their coattails perhaps by utilizing a few techniques to improve my CTR in the organics. However I’m still not ever going to reach my full traffic potential with the current gray hat setup. This is mostly due to my article titles being the exact same. I might have better luck if I can change up the titles and monetize on the surfers who search for slightly different variations to the topic. Let’s say for instance that one of the titles is, “Hilton’s Chiwawa Caught Snorting Coke In Background of Sex Video.” Alright so when I import my titles I can do a simple replacement algorithm to swap any instances of “Hilton” without the Paris for “Paris Hilton.” Or “Congressman Paul” for “Congressman Ron Paul.” If I wanted to capitalize on these possible search variations on a mass scale I could easily incorporate a thesaurus and swap out nouns for instance. Rock becomes Stone…etc. etc. IMDB also has a huge database of celebrity names I could possibly use for the example above. It’s all pretty endless and can get quite in depth, but I know if I do it right it’ll pay off big time. I may even get lucky and hit pay dirt with a big story coming out where everyone searches for something similar but not quite the same as the original headlines.

And that is how you be a gray hat :)

For shits and giggles I want to throw one more possible site structure available out there and get your opinions on it….How about