Guides


Guides21 Apr 2009 06:37 pm

Hello again!
I’ve been restless and wanting to write this post for a very long time and I’m not going to be happy until its out. So get out your reading glasses, and I have it on good authority that every reader of this blog happens to be the kind of dirty old men that hang out and harass high school chicks at gas stations so don’t tell me you don’t have a pair. Get ‘em out and let’s begin….

Fuck, how do I intro-rant this post without getting all industry political? Basically, this post is an answer to a question asked a long time ago at some IM conference to a bunch of gurus. They asked them does advanced White Hat SEO exist? If I remember right, and this was a long time ago and probably buzzed up so forgive me, every guru said something along the lines of there is no such thing as advanced White Hat SEO. Now I’m sympathetic to the whole self promotion thing to a small degree. If your job is to build buzz around yourself you have to say things that are buzz worthy. You can’t say the obvious answer, YOU BET IT DOES AND YOU’RE RETARDED FOR ASKING! You gotta say something controversial that gets people thinking, but not something so controversial that anyone of your popularity level is going to contradict in a sensible way making your popularity appear more overrated than a cotton candy vendor at the Special Olympics. In short, yes advanced white hat exists and there’s tons of examples of it; but you already knew that and I’m going to give you such an example now. That example is called Dynamic SEO. I’ve briefly mentioned it in several posts in the past and it is by every definition simple good ol’ fashion on-site keyword/page/traffic optimizing White Hat SEO. It also happens to be very simple to execute but not so simple to understand. So I’ll start with the basics and we’ll work into building something truly badhatass.

What Is Dynamic SEO?
Dynamic SEO is simply the automated no-guessing self changing way of SEOing your site over time. It is the way to get your site as close to 100% perfectly optimized as needed without ever knowing the final result AND automatically changing those results as they’re required. It’s easier done than said.

What Problems Does Dynamic SEO Address?
If you’re good enough at it you can address EVERY SEO related problem with it. I am well aware that I defined it above as on-site SEO, but the reality is you can use it for every scenario; even off-site SEO. Hell SQUIRT is technically dynamic off-site SEO. Log Link Matching is even an example of advanced off-site Dynamic SEO. The problems we’re facing with this post specifically includes keyword optimization which is inclusive of keyword order, keyword selection, and even keyword pluralization.

See the problem is you. When it comes to subpages of your site you can’t possibly pick the exact best keywords for all of them and perfectly optimize the page for them. First of all keyword research tools often get the keyword order mixed up. For instance they may say “Myspace Template” is the high traffic keyword. When really it could be “Templates For Myspace”. They just excluded the common word “for” and got the order wrong because “Template Myspace” isn’t popular enough. They also removed the plural to “broad” the count. By that logic Myspace Templates may be the real keyword. Naturally if you have the intuition this is a problem you can work around manually. The problem is not only will you never be perfect on every single page but your intuition as a more advanced Internet user is often way off, especially when it comes to searching for things. Common users tend to search for what they want in a broad sense. Hell the keyword Internet gets MILLIONS of searches. Who the fuck searches for a single common word such as Internet? Your audience is who. Whereas you tend to think more linear with your queries because you have a higher understanding of how Ask Jeeves isn’t really a butler that answers questions. You just list all the keywords you think the desired results will have. For instance, “laptop battery hp7100″ instead of “batteries for a hp7100 laptop.” Dynamic SEO is a plug n play way of solving that problem automatically. Here’s how you do it.

Create A Dynamic SEO Module
The next site you hand code is a great opportunity to get this built and in play. You’ll want to create a single module file such as dynkeywords.pl or dynkeywords.php that you can use across all your sites and easily plug into all your future pages. If you have a dedicated server you can even setup the module file to be included (or required) on a common path that all the sites on your server can access. With it you’ll want to give the script its own sql database. That single database can hold the data for every page of all your sites. You can always continue to revise the module and add more cool features but while starting out it’s best to start simple. Create a table that has a field structure similar to ID,URL,KEYWORD,COUNT. I put ID just because I like to always have some sort of primary key to auto increment. I’m a fan of large numbers what can I say? :)

Page Structure & Variables To Pass To Your Module
Before we get deep into the nitty gritty functions of the module we’ll first explore what basic data it requires and how the site pages will pass and return that data. In most coded pages, at least on my sites, I usually have the title tag in some sort of variable. This is typically passed to the template for obvious reasons. The important thing is it’s there so we’ll start with that. Let’s say you have a site on home theater equipment and the subpage you’re working on is on LCD televisions. Your title tag may be something like “MyTVDomain.com: LCD Televisions - LCD TVs”.

Side Note/
BTW sorry I realize that may bother some people how in certain cases I’ll put the period outside of the quotes. I realize it’s wrong and the punctuation must always go inside the quotes when ending a sentence. I do it that way so I don’t imply that I put punctuation inside my keywords or title tags etc etc.
/Side Note

You know your keywords will be similar to LCD Televisions, but you don’t know whether LCD TVs would be a better keyword. ie. It could either be a higher traffic keyword or even a more feasible keyword for that subpage to rank for. You also don’t know if the plurals would be better or worse for that particular subpage so you’ll have to keep that in your mind while you pass the module the new title variable. So before you declare your title tag create a quick scalar for it (hashref array). In this scalar you’ll want to put in the estimated best keywords for the page:
[
Keyword1 -> ‘LCD Television’,
Keyword2 -> ‘LCD TV’,
]
Then put in the plurals of all your keywords. It’s important not to try to over automate this because A) you don’t want your script to just tag the end of every word with “s” because of grammatical reasons (skies, pieces, moose, geese) and B) you don’t want your module slowing down all the pages of your site by consulting a dictionary DB on every load.
[
Keyword1 -> ‘LCD Television’,
Keyword2 -> ‘LCD TV’,
Keyword3 -> ‘LCD Televisions’,
Keyword4 -> ‘LCD TVs’,
]
Now for you “what about this awesome way better than your solution” mutha fuckas that exist in the comment section of every blog, this is where you get your option. You didn’t have to use a scalar array above you could of just have used a regular array and passed the rest of the data in their own variables, or you could of put them at the beginning of the standard array and assigned the trailing slots to the keywords OR you could use a multidimensional array. I really don’t give a shit how you manage the technical details. You just need to pass some more variables to the modules starting function and I happen to prefer tagging them onto the scalar I already have.
[
Keyword1 -> ‘LCD Television’,
Keyword2 -> ‘LCD TV’,
Keyword3 -> ‘LCD Televisions’,
Keyword4 -> ‘LCD TVs’,
URL -> ‘$url’,
REFERRER -> ‘$referrer’,
Separator -> ‘-’
]
In this case the $url will be a string that holds the current url that the user is on. This may vary depending on the structure of the site. For most pages you can just pull the environmental variable of the document url or if your site has a more dynamic structure you can grab it plus the query_string. It doesn’t matter if you’re still reading this long fuckin’ post you probably are at the point in your coding abilities where you can easily figure this out. Same deal with the referrer. Both of these variables are very important and inside the module you should make a check for empty data. You need to know what page the pageview is being made on and you’ll need to know if they came from a search engine and if so what keywords did they search for. The Separator is simply just the character you want to separate the keywords out by once its outputted. In this example I put a hyphen so it’ll be “Keyword 1 - Keyword 2 - Keyword 3″ Once you got this all you have to do is include the module in your code before the template output, have the module return the $title variable and have your template output that variable in the title tag. Easy peasey beautiful single line of code. :)

Basic Module Functions
Inside the module you can do a wide assortment of things with the data and the SQL and we’ll get to a few ideas in a bit. For now just grab the data and check the referrer for a search engine using regex. I’ll give you a start on this but trust it less the older this post gets:
Google: ^http:\/\/www\.google\.[^/]+\/search\?.*q=.*$
[?&]q= *([^& ][^&]*[^& +])[ +]*(&.*)?$
Yahoo: ^http:\/\/(\w*\.)*search\.yahoo\.[^/]+\/.*$
[?&]p= *([^& ][^&]*[^& +])[ +]*(&.*)?$
MSN: ^http:\/\/search\.(msn\.[^/]+|live\.com)\/.*$
[?&]q= *([^& ][^&]*[^& +])[ +]*(&.*)?$

Once you’ve isolated the search engines and the keywords used to find the subpage you can check to see if it exists in the database. If it doesn’t exist insert a new row with the page, the keyword, and a count of 1. Then select where the page is equal to the $url from the database order by the highest count. If the count is less than a predefined delimiter (ie 1 SE referrer) than output the $title tag with the keywords in order (may want to put a limit on it). For instance if they all have a count of 1 than output from the first result to the last with the Separator imbetween. Once you get your first visitor from a SE it’ll rearrange itself automatically. For instance if LCD TV has a count of 3 and LCD Televisions has a count of 2 and the rest have a count of 1 you can put a limit of 3 on your results and you’ll output a title tag with something like “LCD TV - LCD Televisions - LCD Television” LCD Television being simply the next result not necessarily the best result. If you prefer to put your domain name in your title tag like “MYTVSITE.COM: LCD TV - LCD Televisions - LCD Television” you can always create an entry in your scalar for that and have your module just check for it and if its there put it at the beginning or end or whatever you prefer (another neat customization!).

Becoming MR. Fancy Pants
Once you have the basics of the script down you can custom automate and SEO every aspect of your site. You can do the same technique you did with your title tag with your heading tags. As an example you can even create priority headings *wink*. You can go as far as do dynamic keyword insertion by putting in placeholders into your text such as %keyword% or even a long nonsense string that’ll never get used in the actual text such as 557365204c534920772f205468697320546563686e6971756520546f20446f6d696e617465. With that you can create perfect keyword density. If you haven’t read my super old post on manipulating page freshness factors you definitely should because this module can automate perfect timings on content updates for each page. Once you have it built you can get as advanced and dialed in as you’d like.

How This Works For Your Benefit
Here’s the science behind the technique. It’s all about creating better odds for each of your subpages hitting those perfect keywords with the optimal traffic that page with its current link building can accomplish. In all honesty, manually done, your odds are slim to none and I’ll explain why. A great example of these odds in play are the ranges in competitiveness and volume by niche. For instance you build a site around a homes for sale database you do a bit of keyword research and figure out that “Homes For Sale In California” is an awesome keyword with tons of traffic and low competition. So you optimize all your pages for “Homes For Sale In $state” without knowing it you may have just missed out on a big opportunity because while “Homes For Sale In California” may be a great keyword for that subpage “New York Homes” may be a better one for another subpage or maybe “Homes For Sale In Texas” is too competitive and “Homes In Texas” may have less search volume but your subpage is capable of ranking for it and not the former. You just missed out on all that easy traffic like a chump. Don’t feel bad more than likely your competitors did as well. :)

Another large advantage this brings is in the assumption that short tail terms tend to have more search volume than long tail terms. So you have a page with the keywords “Used Car Lots” and “Used Car”. As your site gets some age and you get more links to it that page will more likely rank for Used Car Lots sooner than Used Car. Along that same token once it’s ranked for Used Car Lots for awhile and you get more and more links and authority since Used Car is part of Used Car Lots you’ll become more likely to start ranking for Used Car and here’s the important part. Initially since you have your first ranking keyword it will get a lot of counts for that keyword. However once you start ranking for the even higher volume keyword even if it is a lower rank (eg you rank #2 for Used Car Lot and only #9 for Used Car) than the count will start evening out. Once the better keyword outcounts the not as good than your site will automatically change to be more optimized for the higher traffic one while still being optimized for the lesser. So while you may drop to #5 or so for Used Car Lot your page will be better optimized to push up to say #7 for Used Car. Which will result in that subpage getting the absolute most traffic it can possibly get at any single time frame in the site’s lifespan. This is a hell of a lot better than making a future guestimate on how much authority that subpage will have a year down the road and its ability to achieve rankings WHILE your building the fucking thing; because even if you’re right and call it perfectly and that page does indeed start to rank for Used Car in the meantime you missed out on all the potential traffic Used Car Lot could have gotten you. Also keep in mind by rankings I don’t necessarily always mean the top 10. Sometimes rankings that result in traffic can even go as low as the 3rd page, and hell if that page 3 ranking gives you more traffic than the #1 slot for another keyword fuck that other keyword! Go for the gold at all times.

What About Prerankings?
See this is what the delimiter is for! If your page hasn’t achieved any rankings yet than it isn’t getting any new entry traffic you care about. So the page should be optimized for ALL or at least 3-6 of your keywords (whatever limit you set). This gives the subpage at least a chance at ranking for any one of the keywords while at the same time giving it the MOST keywords pushing its relevancy up. What I mean by that is, your LCD page hasn’t achieved rankings yet therefore it isn’t pushing its content towards either TV or Televisions. Since it has both essentially equaled out on the page than the page is more relevant to both keywords instead of only a single dominate one. So when it links to your Plasma Television subpage it still has the specific keyword Television instead of just TV thus upping the relevancy of your internal linking. Which brings up the final advanced tip I’ll leave you with.

Use the module to create optimal internal linking. You already have the pages and the keywords, its a very easy to do and short revision. Pass the page text or the navigation to your module. Have it parse for all links. If it finds a link that matches the domain of the current page (useful variable) then have it grab the top keyword count for that other page and replace the anchor text. Boom! You just got perfectly optimized internal linking that will only get better over time. :)

There ya go naysayers. Now you can say you’ve learned a SEO technique that’s both pure white hat and no matter how simple you explain it very much advanced.

Guides05 Oct 2007 12:04 pm

Alrighty
Let’s discuss the SEO Empire Part 1 post.

The post covered a ton of information very quickly and talked about a lot of different types of example sites for each level. But the million dollar question remains; Why the structure? Why not just stick with the proven practice of, if you throw enough mud at the wall eventually something will stick? Why not just build site after site until something makes you money? After all, that is how most Internet marketers have made it. Well the answer comes from history.

I made a post about this time last year called Float Like A Porn Site Sting Like A Sales Page which basically talked about learning from history and the type of sites that are ahead of us in the game. I’ve always been on the fringe of the webmaster community and I’m a firm believer in that fringe websites such as Adult, Warez, Music, Poker, Pharms are years ahead of the mainstream as far as attracting traffic and conversions. So what works for them now will eventually make its way into mainstream marketing. Back in ‘97-’00 there was a trend amongst Adult, Warez, and MP3 sites to use Top Sites Lists as a method of sustaining traffic growth. At the time in order to make any headway into the niches you had to have at least 3k unique visitors/day. That was about the standard for any site within those groups to be considered semi successful. Now that differs quite a bit today but back then that made them fairly competitive. The owners of the top sites eventually figured out that the best way of propelling their growth was to create a ring of promotion. Remember those things called popups? Haha. So what they would figure out and do is, instead of creating one top sites list and heavily promoting it until it became big. They would create three. Each one would target a different but similar topic. They would manually edit the stats so the other two top sites would show up top on their list, thus getting the most traffic from incoming votes from the other sites on the lists. Then they would manually go through each top sites list they have and sign it up with dozens of other top sites list that make the closest match to their topic. After a few manual click-throughs and votes every day their ring of sites would start pulling in bottom level traffic, that bottom level traffic would more than likely go to another one of their top sites list via the list until they eventually voted out. Since most top sites list have a higher OUT traffic ratio than IN due to other traffic sources their traffic would exponentially climb amongst the small top sites network. Eventually once each top sites list started becoming super successful and getting lots of other legitimate webmasters signing up, they would slowly phase out the top sites list they manually signed up for and end up with nothing but genuine traffic. Traffic that had a tendency to get passed around their mini network mind you. Once the traffic on the promotional ring started reaching its critical mass the webmasters started looking at ways to better monotenize it. When DoubleClick (MS CPC), BabylonX (adult affiliate), and Casino Gold(gambling) banners at the top wouldn’t suffice to upkeep on the 10-20k visitors/day worth of was-then expensive bandwidth they started looking into what would be considered the modern affiliate sites. Which of course spore a boom in TGP, Webrings, and even affiliate landing pages. This, from my own experience, was the defining proof that network building works. Mostly because the webmasters that built networks back then in at least a similar fashion are still around today where as many, sad to say, poop flingers never made it past the early 2000 Internet bubble. The idea of a network works, we just have to apply it to our mainstream system of making money.

So what exactly are the direct benefits of the foundation and basement levels of our SEO Empire? Well, theres two primary benefits. The first is the cost efficiency vs. return on the investments. Like most starting Internet Marketers its good to start out smart. While there’s the need for immediate income for personal reasons theres also the need to make the investments count for as much as possible. Growth is very important in your first year in the industry. With foundation and basement sites you get the most growth and immediate income possible while ensuring less risky investments. In other words, its immediate and sustainable money. The money, over time, also isn’t dependant on how much immediate work you put into it. The work you put in this month, will give reoccurring income next month and a year down the road. So you’re never treading water as far as growth is concerned. Forward momentum is very important no matter what level of business you’re at. So at the very least, do it because it will make you money.

The indirect benefits aren’t as apparent without experience but are by far the most important. Every foundation and basement site you build is increasing the leverage you have for the next site. As you build the total leverage you have for your money sites increases exponentially. So breaking the cardinal rule of “Every site must pay it’s own rent” doesn’t do justice to the fact that these sites are worth a fortune down the road. Theres no bullshit when I say that and I’ll explain how it works. It begins with fundamental concepts of Entry Points (future post) and Link Building Leverage.

Link Building Leverage
I talked a little about Link Building Leverage in my SERP Domination post but SEO Empire takes it one step further. It actually gives you an immediate competitive standing within the niche. Theres four basic stages that make up a good link campaign. Each of which if not done results in no or dismal rankings.

Indexing - Covers all aspects of getting into the search engines, getting deep indexed, and minimizing supplementals.

Link Volume - The sheer quantity of links. In proportion to your competition your site must match or exceed the rest of the sites.

Link Quality - encompasses the relevancy and authority of the inbound links.

Link Saturation - The ratio of the volume of inbound links your site has, versus the quantity actually indexed and counting in the indeces. This was thoroughly covered in my Real Life SEO Example post and the Log Link Matching technique.

Link Building As it pertains to difficulty
Click To Expand
Click To View Full Sized

With the Y axis representing difficulty and the X representing rankings the graph can be stretched to fit any ratio of keyword competitiveness in regards to link building. Notice that Link Volume and Link Quality have essentially the same worth and in fact there is a blurred area between the two that allows one site with more link volume and slightly less link quality to outrank or underrank a site of inverse values. They are, essentially equal they just come in different stages of the rankings. The same could nearly be said for indexing and link saturation. When achieving a number 1 ranking you cannot discount indexing being of more value thank link saturation. Many sites don’t ever consider link saturation and thus many sites don’t achieve top rankings or have a hard time maintaining them. In other words, I kick their hippie rainbow chasing little asses.

So What About Difficulty/Availability?
Much like traffic curves mentioned in the SEO Empire post link building curves the same way. Indexing starts off very easy. It’s just getting your site listed with the site: command and an accurate title and description. It then starts to work into deep indexing where as the saturation levels rise the difficulty in getting more pages in increases. It then leads into removal of supplementals and peaks up in technical difficulty. This is where most webmasters start their adventures. From this point on you are an SEO Pro. YOU WILL SPEND AS LITTLE TIME ON THE INDEXING PHASE AS POSSIBLE! Amateurs spend 3 weeks to three months working on this. You need to spend as little as 10 minutes and let the rest happen. I’ve given you the tools to do it so there should be no excuses.

Link Volume is the next phase. This phase starts off easy. There’s millions of sites and just as many link opportunities. At first its very fast moving and its easy to find a good quantity of links. After awhile though you run into what I affectionately call the Link Wall where link volume meets the beginning of Link Quality and easy to grab links start to become a little more scarce, thus the difficulty begins to climb a bit.

Link Wall: Quote From SERP Domination
“You got to love the supposedly nonexistent brick wall of relevant links. Dipshits on newbie forums love telling people, “don’t worry about the rankings, just build some relevant links and they’ll come.” So you do just that, after all they have over 4,000 posts on that forum, it can’t all be complete garbage advice. At first it’s totally working, you’re gaining a good 50-200 very relevant links a day. You submit to directories and score a bunch of links from your competitors. After a couple weeks you even manage to score some big authority links within your niche. Suddenly it all starts to slow down. The sites that are willing to link already have, and the rest are holding firm. You’ve just hit the Relevant Link Wall. Don’t bother going back for further advice. They don’t have any, and if they did they are too busy trying to rank for Kitty Litter Paw Prints to help.”

That essentially, in more colorful wording, describes the Link Wall.

So when you’re entering the Link Quality phase at first its difficult. You’re new so very few well established and ranked sites want to link to you. You have to scrounge around and dig for relevant links. Eventually though, your site becomes established and more and more webmasters in the niche start to recognize it and link to it. Very much the same thing happens to blogging. Eventually you break through and you’re on a downhill coast to rankings.

Suddenly though you reach around the top 10 area and things start to get competitive. You can’t move up without shoving someone else down. They all have the same essential links as you and quite a few older ones that you don’t. You can make up in link volume but you need one last push to drive you to the top and maintain it. Thats where the last phase Link Saturation enters the field. If you can become more efficient in your link building you stand a much better chance of defeating them.

This process is essentially what common Internet Marketers have to go through every single time they release a new project. They are in a constant battle to get indexed, acquire link volume, scramble over the link wall and get some link quality then make those links count just to rank. This is why new sites take nearly a year to show their true potential, its a long hard battle with each and every one to get the resources necessary to compete. When you’ve been in the business as long or longer than I have you’ll be the first to testify that this…shit…gets…old. Throwing mud at a wall hoping something will stick is fine, but when it takes that much time consuming effort on every throw it’s easy to see why people drop out of the game so fast. Welcome to McDonalds may I take your order?

There is an easier way and that’s what I’m trying to get at with the first part of the SEO Empire post. Not only is it easier, better, but every step is also 100% profitable while you’re doing it. It may not be a gigantic shortcut like every aspiring Internet Marketer dreams of but it is in all essentials the primary plan of most SEO Pro’s for a very good reason. The resources are recyclable. Let’s talk about resources and how they play into the ranking phases. Here are a few I’ve given you….

The Applications Of Resources
Click To Expand
Click To View Full Sized

So in my SEO Empire post you can see what I was talking about when I said all my previous posts lead into that strategy. Throughout this blog I’ve given several methods of automatically skating through each and every stage of SEO that determine top rankings. So when you create a new site you no longer have to start from the beginning. In a sense you can nearly skip the entire process that takes most webmasters months to years to accomplish. Instead, when you enter a new niche, you start at the top of the Link Wall and only have to do the absolute minimal work to achieve the rankings. So with my basement and foundation levels of my SEO Empire where do I start each site in respect to my competition? Where does my personal involvement end?

Link Building Process After SEO Empire
Click To Expand
Click To View Full Sized

Thats quite the workload difference ain’t it? I’d say so. Calling it an easier shorter way is the understatement of the fucking century. I can build a site, launch it, spend a couple hours getting some relevant links from the competition or parallel competition (I’ll explain in a future post) and within the same day as the launch my brand new site instantly has everything it requires to achieve a top ranking. I’ve literally done six months to a years worth of work in a single day. It may take a couple months for the actual rankings to happen but it’s all automated, I don’t have to worry about it or check up on it. I’m off to my next project. Meanwhile my competition is working everyday struggling to keep up with me. To them I end up looking like some kind of sleep deprived SEO juggernaut, when the reality is I spent so little time on the site I didn’t even have a chance to memorize the domain. Let’s jump to some questions….

Questions

Why Build Up? Why Not Down?
I normally don’t check my Technorati very often but the other day I caught a very interesting one. Someone plans on writing an article similar to my SEO Empire Part 1 except teach people how to build a money site, then build the foundation and basement below it. It’s essentially the same concept except you can jump straight to money sites. Here’s the problem with that. Look at the chart above. That allows me to accomplish the requirements for any new site in a single day. If you build money sites then build downward it would take several days to a week or two worth of work to get what you were needing with every site launch. It kind of defeats the purpose and isn’t practical in the long run. The second point being is the speed at which you’ll be adding new money sites. No point in building a foundation or basement site for the sole reason of promoting a single money site, so every one you build you have to look back through your other money sites and add them in as well to your new foundation and basement site. Management of foundation and basement sites are easy, management of money sites becomes increasingly difficult the larger your empire grows. I’m still looking forward to reading the post though. :)

How Do You Manage Your Foundation and Basement Sites?
I was fortunate enough to preplan my empire through lots of premeditation and previous experience. So I developed a centralized system that allows for easy management. Every page I create with my database sites, hosted blackhat, spammed blackhat etc. etc. I put in a database call to a single table that holds all the pages of every site I make and a list of links with each one. When I create a new database site I loop through all the pages and give them a number (primary key ids), i make a small note of their keywords. Such as in my example I said a certain page would have the keywords Remax Real Estate In Portland Oregon. The keywords that would be inserted into the link inventory database would be Remax,real,estate,portland,oregon. The same goes with my cycle sites, the .htaccess goes to a script that gives the error message -Keywords- Sorry this site has been shut down due to terms of service violation -sincerely DoucheHosting Inc. -ads- (hehe still gotta make the money). Then the script pulls from the cyclesite database that also has its keywords. If it gets assigned a site it then does a 301 perm redirect to that site instead of displaying the error page. So whenever I create a new money site, all I have to do is enter my dashboard, put in its keywords, url, anchor text, and specify how many links I want. It’ll come up with list of all the pages/cycle sites that have a relative match to the keywords along with a count of how many total links. I select as many or all that I want, then if I want any more I can go through and select irrelevant ones for as much as I need, or specify alternate keywords that’ll work as well, such as land, homes, garden, pets or some shit along those lines. Building thousands of links becomes a 5 minute job.

How Long Did It Take You To Build Your SEO Empire?
It took me about a year to get it up to the level I’m happy with. It wasn’t a big deal, because during that entire year and process I was making money. The foundation and basements have been very successful for me and continually brought me in bigger and bigger checks every month (rule #2). So it was not some rough time I had to bear through. I’m also always building on my SEO Empire. I treat it, much like it is, a 9-5 job. If you already have a 9-5 make the SEO Empire your 7pm:9pm job. Hell 1am-5am job. It doesn’t matter. The reality is, you can build a respectable and workable foundation and basement within only one month. Then as you get more time and more money just keep building it outward (rule #3). It’s an ongoing process. If you’re new to it, try building one of each type of site mentioned. You can have that done by the end of the week. Once you got the initial ones built, by recycling much of the code you can have 9 more of each type built by end of next week. It’s all very easy and quick once you sit down and actually do it. I’m not writing this shit out of vanity ya know :) I want you to actually accomplish it.

How Long Should I Spend On My Foundation Sites?
The best advice I can give you. Make your first one absolutely beautiful. Be proud of it, get a few opinions from your IM friends (rule #4). As you build more you’ll quit caring and you’ll get sloppier and sloppier. You’ll see the same results but that template pack you started using will start wearing thin and you’ll start going for the easier to change templates. You’ll still always have that prideful one to show your affiliate managers and the volume to match. Most importantly I say this because the more time and thought you invest in your first site the less problems you’ll run into on your second.

Help! I Don’t Have Access To SQUIRT! And You Just Mentioned It In That Little Chart Thingy.
I did a post on How To Build Your Own SQUIRT. I didn’t post that immediately after the launch to brag or deter people from signing up. I want people to build it, if you read Blue Hat regularly it shouldn’t be some big secret on how it works. I laid it out for you in plain English, there it is. A lot of people have the skills but not the financial resources, its not their fault and we were all there at one point. If that’s you, read the post and go for it buddy. If you run into something you’re unsure of email me and I’ll see if I can help you through it (rule #4). Just don’t let a little set back deter you from your goals.

Adsense Is A Possible Footprint. Everything Is Ruined Before I Even Made My First Site! My Empire Is Crumbling!
*stares blankly at Kenneth*………*hands him a Problem Solving For Dummies book and a red pill*

You’ll be alright buddy

Guides03 Jun 2007 08:45 pm

Gray Hat SEO. Gray Hat SEO. Gray Hat SEO. I’m not keyword stuffing, I’m thinking. If gray hat is so widely accepted as a popular tactic than why are there no good articles on what it is and how to do it? Maybe, and this is just one theory, its fuckin’ impossible to write about. It’s just too damn much of a gray area in the industry (pun intended). Well, we’ll see about that. :)

So what exactly is Gray Hat SEO? Most define it as a site that uses questionable tactics? I think thats an excellent analysis, because the number one rule to gray hat is to be questionable. In fact the more you can get a trained eye to scratch their head wondering if your site is black hat or white the better. If you can fool the average visitor than you more than likely will fool a SE bot. In my opinion the best way to do this is to have an innate eye for structure.

In designing a structure for your gray hat site the best way to go about it is to steal a structure from a site that couldn’t possibly be banned. Let’s take Digg.com for example. Digg is setup in various primary categories where each contains news related stories. Each news story consists of a title with the link to the original article and a small, 255 character or so description. Each news story is accompanied by some user contributed content. Upon a indifferent perspective this is a very questionable structure. The content itself is very short and aren’t organized like a standardized article would be, the user contributed content is always very short and dispersed. Not to mention there is an enlarged lack of control over length of and quality of the user contributed “comments.” However the structure gives us some possibilities. We know that since Digg, Reddit and other social sites of the likeness are authoritative and standardized in the industry than naturally the search engine antispam algorithms couldn’t possibly automatically consider it of bad taste or a possibility of being considered a black hat site. This gives us a huge opportunity for a possible gray hat site.

Okay so the next step would logically be to figure out our content sources. Sticking with the Digg.com example we know that they get their content sources from large and small news related stories, mostly technology but that can excused for whatever niche we decide to target. This is a great place to start because Google has already been quoted as saying news related stories can’t be counted as duplicate content because they are so widely syndicated. It only makes sense. So getting news stories are easy. In fact it can be done by scraping lots of popular news RSS feeds. If we’re attempting to duplicate the Digg structure than we don’t need the entire articles despite what we’ve come to believe about SEO. We only need the partial story along with a title and then pad it with some user contributed content.

Where are we going to get some user contributed content? In this particular example I can’t think of a better place than the place we’re ripping off the structure from. We might as well pad each article with scraped user comments from Digg itself. So we can take the titles of each news piece and remove all the common words such as: “why”,”but”,”I”,”a”,”about”,”an”,”are”,”as”,”at”,”be”,”by”,”com”,”de”,”en”,”for”,”from”,”how”,”in”,”is”,”it”,”la”,”of”,”on”,”or”,”that”,”the”,”this”,”to”,”was”,”what”,”when”,”where”,”who”,”will”,”with”,”the”,”www”,”and”,”or”,”if”,”but” and any various others we find that aren’t commonly associated with the article subject. Then we can do a search on Digg and scrape several comments from the results, change up the usernames and make it all look unique. Hell if we wanted we could even markov some user submitted content in the middle of the scraped user content. Naturally not every user contributed content will match the topic exactly but who’s to say it’s not real? Once again as long as you remain “questionable” who can possibly deny you rankings? “Wow great article.”, “I pee’d in your pool.” <- users always submit this kind of shit, search engines are used to it and are more than well adapted to handle it. I realize this goes against the long preached world of “poison words” and such, but with the evolving net of social networking it directly conflicts with the true nature of the web and thus must be compensated for. Remember, its not what content you have, it’s how you use it. If it helps think about it like this. If you took all the comments on Youtube(which is the majority of their actual text based content) and truncated it all together in paragraph markers, how fast would you get banned? In no time, right. However when clones of places like Youtube organize it under headings of Comments and heavily break it up, somehow it all becomes legit. Take a moment to think about that.

More on this in a later post…

So now that we got our two elements of a successful gray hat site we can cook ‘em up together. We can even create a little mock voting and submission system. It doesn’t even have to work properly just as long as upon inspection it all looks legit. It’s all 100% autogenerated of course but as long as it’s laid out cleanly and correctly there’s no reason why we can’t generate hundreds of thousands of pages of stolen content while keeping visitors and the SE’s none the wiser. There’s no doubt we can do this exact technique for just about any authority site on the web. Let’s jump back to our Youtube example for a moment. Youtube isn’t the first nor the last video site on the web. As far as actual text based content goes it obviously takes a large piece of the brown cake. So how does it get away with it, and all it’s pages rank and do well while your clean and lengthy white hat articles struggle? Some would argue links are the answer. Well not all video content sites have tons of links but they can still survive and don’t get immediately banned for spammy content, but we’ll humor the notion anyway. So let’s get some links. :)

Gray hat sites frequently have an advantage over black hat sites in link building because since they can pass human check their links, more often than not, will tend to stick more. So of course the first place I would go is to attempt trackback pings on all these stories. If my link ends up on a few authoritative news sites, great, on a few blogs, just as well. Since it’s all legit and not only can we pass human check with our essentially black hat site as well as actually linking to them then there’s no good reason the links won’t have a high success rate. Which leads me to a bit of custom comment spam. Might as well find blog posts talking about related stuff to each story and leave something like, “I saw an interesting story related to this on www.blahblah.com/story123.” Sure why not, between just those two simple and common black hat techniques we somehow ended up with plenty of white hat links. Thats the beauty of gray hat. If you can at least get people to question whether or not your site is legit than you stand a very good chance of succeeding.

So essentially what we’re trying to do is play around within the margins between the pros of black hat and white hat till we can find a happy medium that is both acceptable to other webmasters and the search engine antispam algorithms, but how would an advanced Blue Hatter spin all this? :) Very good question. I personally would take a look at my potential competition. Since I’m taking these articles from other sources, they are the originals I am the linking to copy, naturally they will beat me out in all the SERPS. I can still drive traffic off their coattails perhaps by utilizing a few techniques to improve my CTR in the organics. However I’m still not ever going to reach my full traffic potential with the current gray hat setup. This is mostly due to my article titles being the exact same. I might have better luck if I can change up the titles and monetize on the surfers who search for slightly different variations to the topic. Let’s say for instance that one of the titles is, “Hilton’s Chiwawa Caught Snorting Coke In Background of Sex Video.” Alright so when I import my titles I can do a simple replacement algorithm to swap any instances of “Hilton” without the Paris for “Paris Hilton.” Or “Congressman Paul” for “Congressman Ron Paul.” If I wanted to capitalize on these possible search variations on a mass scale I could easily incorporate a thesaurus and swap out nouns for instance. Rock becomes Stone…etc. etc. IMDB also has a huge database of celebrity names I could possibly use for the example above. It’s all pretty endless and can get quite in depth, but I know if I do it right it’ll pay off big time. I may even get lucky and hit pay dirt with a big story coming out where everyone searches for something similar but not quite the same as the original headlines.

And that is how you be a gray hat :)

For shits and giggles I want to throw one more possible site structure available out there and get your opinions on it….How about Del.icio.us?

Guides24 Nov 2006 01:49 pm

Well I hope everyone had a great thanksgiving. I love them turkey birds! I love them stuffed. I love them covered in gravy. I love the little gobbling noises they make. :)

Back to business. By now you should have at least a decent understand of what scraping is and how to use it. We just need to continue on to the next most obvious step, crawling. A crawler is a script that simply makes a list of all the pages on a site you would like to scrape. Creating a decent and versatile crawler is of the utmost importance. A good crawler will be not only thorough but will weed out a lot of the bullshit big sites tend to have. There are many different methods to crawling a site. It really is only limited to your imagination. The one I’m going to cover in this post isn’t the most efficient but it is very simple to understand and thorough.

Since I don’t feel like turning this post into a mysql tutorial I whipped up some quick code for a crawler script that will make a list of every page on a domain(supports subdomains) and put into a return delimited text file. Here is an example script that will crawl a website and make an index of all the pages. For you master coders out there; I realize there is more efficient ways to code this(especially the file scanning portion) but I was going for simplicity. So bear with me.

The Script

Crawler.cgi

How To Use

copy and paste the code into notepad and save it as crawler.cgi. Change the variables at the top. If you would like to exclude all the subdomains on the site include the www. infront of the domain. If not then just leave it as the domain. Be very careful with the crawl dynamic option. With the crawl dynamic on certain sites will cause this script to run for a VERY long time. In any crawler you design or use it is also a very good idea to set a limit to the maximum number of pages you would like to index. Once this is completed upload crawler.cgi into your hostings cgi-bin in ASCII mode. Set the chmod permissions to 755. Depending on your current server permissions you may also have to create a text file in the same directory called pages.txt and set the permissions to 666 or 777.

The Methodology
Create a database- Any database will work. I prefer sql but anything will work. A flat file is great because it can be used later on anything including Windows apps.

Specify the starting url you would like to crawl- In this instance the script will start at a domain. It can also index everything in a subpage as long as you don’t include the trailing slash.

Pull the starting page- I used the LWP simple module. It’s easy to use and easy to get started with if you have no prior experience.

Parse for all the links on the page- I use the HTML::LinkExtor module which is a submodule of LWP. It will take content from the lwp call and generate a list of all the links on the page. This includes links made on images.

Remove unwanted links- Be sure to remove any links it grabs that are unwanted. In this example i removed links to images, flash, javascript files, and css files. Also be sure to remove any links that don’t exist outside of the specified domain. Test and retest your results on this. There are many more you will find that will need to be removed before you actually start the scraping process. It is very site dependant.

Check your database for duplicates- Scan through your new links and make sure none already exist in your database. If they exist remove them.

Add the remaining links to your database- In this example I appended the links to the bottom of the text file.

Rinse and repeat- Move to the next page in your database and do the same thing. In this instance I used a while command to cycle through the text file till it reaches the end. When it finally reaches the end of the file the script is done and it can assume every crawlable page on the site has been accounted for.

This method is called the pyramid crawl. There are many different methods of crawling a website. Here’s a few to give you a good idea of your options.

Pyramid Crawl
It assumes the website flows outward in an expanding fashion like an upside down pyramid. It starts with the initial page which has links to pages 2,3,4 etc. Each one of those pages has more pages that they link to. They may also link back up the pyramid but they also link further down. From the starting point the pyramid crawl moves its way down until every building block on the pyramid doesn’t contain any unaccounted for links.

Block Crawl
This type of crawl assumes a website flows in levels and dubbs them as “stages.” It takes the first level (every link on the main page) and it creates an index for them. It then takes all the pages on level one and uses their links to create level 2. This continues until it has reached a specified number of levels. This is a much less thorough method of crawling but it accomplishes a very important task. Lets say you wanted to determine how deep your back link is buried into the site. You could use this method to say your link is located on level 3 or level 17 or whatever. You could use this information to determine your average link depth on all your site’s inbound links.

Linear Crawl
This method assumes a website flows in a set of linear links. You take the first link on the first page and crawl it. Then take the first link on that page and crawl it. You repeat this until you reach a stopping point. Then you take the second link on the first page and crawl it. In otherwords you work your way linearly through the website. This is also a not a very thorough process. It can be with a little work. For instance if you took the second link from the last page instead of the first on your second cycle and worked your way backwards. However this crawling also has its purpose. Lets say you wanted to determine how promenant your backlink was on a site. The sooner your linear crawl finds your link it can be assumed the more promenant the link is placed on the website.

Sitemap Crawl
This is exactly what it sounds like. You find their sitemap and crawl it. This is probably the quickest crawl method you can do.

Search Engine Crawl

Also very easy. You just crawl all the pages they have listed under the site: command in the search engine. This one has it’s obvious benefits.
Black Hatters: If you’re looking for a sneaky way to get by that pesky little duplicate content filter consider doing both the Pyramid Crawl and the Search Engine Crawl and then compare your results. :)

For those of you who are new to crawling you probably have a ton of questions about this. So feel free to ask them in the comments below and the other readers and I will be happy to answer them the best we can.

Guides17 Nov 2006 06:46 am

In the spirit of releasing part four of my Wikipedia Links series we’re going to spend a couple posts delving into good ol’ black hat. Starting with of course; scraping. I’ve been getting a few questions lately about scraping and how to do it. So I might as well get it all out of the way, explain the whole damn thing, and maybe someone will hear something they can use. Lets start at the beginning.

What exactly is scraping?
Scraping is one of those necessary evils that is used simply because writing 20,000+ pages of quality content is a real bitch. So when you’re in need of tons of content really fast what better way of getting it than copying it from someone else. Teachers in school never imagined you’d be making a living copying other peoples work did they? The basic idea behind scraping is to grab content from other sources and store it in a database for use later. Those uses include but not limited to, putting up huge websites very quickly, updating old websites with new information, creating blogs, filling your spam sites with content, and filling multimedia pages with actual text. Text isn’t the only thing that can be scraped. Anything can be scraped: documents, images, videos, and anything else you could want for your website. Also, just about any source can be scraped. If you can view it or download it, chances are you can figure out a way to copy it. That my friend is what scraping is all about. Its easy, its fast and it works very very well. The potential is also limitless. For now lets begin with the basics and work our way into the advanced sector and eventually into actual usable code examples.

The goals behind scraping?
The ultimate goal behind scraping are the same as actually writing content.
1) Cleanliness- Filter out as much garbage and useless tags as possible. The must have goal behind a good scrape is to get the content clean and without any chunks of their templates or ads remaining in it.

2) Unique Content- The biggest money lies in finding and scraping content that doesn’t exist yet. Another alternative lies in finding content produced by small timers that aren’t even in the search engines and aren’t popular enough for anyone to even know the difference.

3) Quantity- More the better! This also qualifies as finding tons of sources for your content instead of just taking content from one single place. The key here is to integrate many different content sources together seamlessly.

4) Authoritive Content- Try to find content that has already proven itself to be not only search engine friendly but also actually useful to the visitors. Forget everything you’ve ever heard about black hat seo. Its not about providing a poor user experience, infact its exactly the opposite. Good content and user experience is what black hat strives for. It’s the ultimate goal. The rest is just sloppiness.

Where do I scrape?
There are basically four general sources that all scraping categorizes into.
1) Feeds- Real Simple Syndication feeds(RSS) are one of the easiest forms of content to scrape. Infact that is what RSS was designed for. Remember not all scraping is stealing, it has its very legitimate uses. RSS feeds give you a quick and easy way to separate out the real content from the templates and other junk that may stand in your way. They also provide useful information about the content such as the date, direct link, author and category. This helps in filtering out content you don’t want.

2) Page Scrapes- Page scrapes involve grabbing an entire page of a website. Than through a careful process, that I’ll go into further detail later, filter out the template and all the extra crap. Grab just the content and store it into your database.

3) Gophers- Other portions of the Internet that aren’t websites. This includes many places like IRC, newsgroups…..all hell here’s a list -> Hot New List of Places To Scrape

4) Offline- Sources and databases that aren’t online. As mentioned in the other post encyclopedias, dictionary files, and let us not forget user manuals.

How Is Scraping Performed?

Scraping is done through a set methodology.
1) Pulling- First you grab the other site and download all its content and text. In the future I will refer to this as an LWP call, because that is the CGI module that is used to perform the pull action.

2) Parsing- Parsing is nothing short of an art. It involves grabbing the page’s information (as an example) and removing everything that isn’t the actual content (the template and ads for instance).

3) Cleaning- Reformatting the content in preparation for your use. Make the content as clean as possible without any signs of the true source.

4) Storage- Any form of database will work. I prefer mysql or even flat files (text files).

5) Rewrite- This is the optional step. Sometimes if you’re scraping nonoriginal content it helps to perform some small necessary changes to make it appear as an original. You’ll learn soon enough that I don’t waste my time scraping content if it isn’t original (already in the engines) and focus most of my efforts on grabbing content that isn’t used on any pages that would already exist on search engines.
In the next couple posts in this series I’ll start delving into each scrape types and sources. i’ll even see about giving out some code and useful resources to help you a long the way. How many posts are going to be in this series? I really have no idea, its one of those poorly planned out posts that I enjoy doing. So I guess as many as are necessary. Likewise they’ll follow suite with the rest of my series and increasingly get better as the understanding and knowledge of the processes progresses. Expect this series to get very advanced. I may even give out a few secrets I never planned on sharing should I get a hair up my ass to do so.