Alrighty I’m moving this post up a bit to answer a few questions. In my Real Life SEO Example post I talked a bit about the technique of Log Link Matching. It’s an awesome technique that deserves a bit of attention. So here we go. :)

Description
The reality of the search engines are that they only have a certain percentage of the documents on the web indexed. This is apparent by looking at your own saturation levels with your own sites. Often you’re very lucky if you get 80% of a large site indexed. Unfortunately this means that tons upon tons of the natural links out there aren’t counting and giving proper credit to you and their respective targets. This is a two edged sword. This means your competitors actually have quite a bit more links than it appears, and more than likely so do you. Naturally you can guess what has to be done. :)

Objective
Saturation usually refers to how many pages you have in the index in comparison to the total number of actual pages on your site. For instance if you have a 100 page site and 44 pages are indexed than you have 44% saturation. Since this is a topic that never really gets talked about, for the sake of making it easy on ourselves I’m going to refer to our goal as “link saturation.” In other words the number of links you have showing in the index in comparison to your total actual inbound links. So if you have 100 links in the index but you really have 200 actual links that are identifiable than you have 50% link saturation. That aside, our object is to use methods of early detection to quickly identify inbound links to our sites, get them indexed, and if possible give them a bit of link power so the link to our site will count for more. This will have an ultimate ending result of huge efficiency in our link building campaign. It also will more than likely stir up a large percentage of long dormant links on our older sites that are yet to use the Log Link Matching technique. First let’s focus on links we’ve already missed by taking a look at our log files.

Methodology #1 - The Log Files
Our site’s common log files are a great indicator of a new and old inbound links that the search engines may have missed. Most log files are usually located below the root of of the public html folder. If you’re on a standard CPanel setup the path to the log file can be easily found by downloading and viewing your Awstats config file, which is usually located in /tmp/awstats/awstats.domain.com.conf. Around line 35 it’ll tell you the path of the log file: LogFile=”/usr/local/apache/domlogs/domain.com”. Typically your site as a Linux user has access to this file and can read it through a script. If not than contact your hosting provider and ask for read access to the log.

1) Open up the log file in a text editor and identify where all the referrers are then parse them out so you have a nice list of all the sites that link to you. If you use Textpad you can click Tools - Sort - Delete Duplicate Lines - OK. That will clean up the huge list and organize it into a manageable size.

2) Once you have your list of links there’s several routes you can take to get them indexed. These include but not limited to creating a third party rolling site map, roll over sites, or even distributing the links through blogrolls within your network. Those of course are the more complicated ways of doing it and also the most work intensive, but they’re by far the most effective simply because they involve using direct static links. The simplest of course would be to simply ping Blog Aggregators like the ones listed on Pingomatic or Pingoat. My recommendation is, if you are only getting a couple dozen links/day or are getting a huge volume of links (200+/day) than use the static link methods because they are more efficient and can be monitored more closely. If you’re somewhere in between than there’s no reason you can’t just keep it simple and continuously ping Blog Aggregators and hope a high percentage eventually will get indexed. After so many pings they will all eventually get in anyways. It may just take awhile and is harder to monitor (one of the biggest hatreds in my life..hehe).

There are several Windows applications that can help you mass ping this list of referral URLS. Since I use custom scripts instead of a single Windows app myself I have no strong recommendations for one, but feel free to browse around and find one you like. Another suggestion I have to help clean up your list a bit is to clean the list of any common referrers such as Google, MSN, and Yahoo referrals. That’ll at least save you a ton of wasted CPU time. Once you’ve gotten this taken care of you’ll want to start considering an automated way of doing this for any new links as they come in. I got a few suggestions for this as well.

Methodology #2 - Direct Referrals
Of course you can continue to do the method above to monitor for new referrals as long as you keep the list clean of duplicates. However it doesn’t hurt to consider accomplishing the task upon arrival. I talked a little bit about this last year with my Blog Ping Hack post, and the same principle applies except instead of pinging the current page we’ll ping the referral if it exists.

1) First check to see if a referral exists when the user display the page. If it does exist than have it open up the form submit for a place such as Pingomatic to automatically ping all the services using the users browser. Here’s a few examples of how to do it in various languages.

CGI CODE
if(($ENV{'HTTP_REFERER'} ne "") || ($ENV{'HTTP_REFERER'} =~ m/http:\/\/(www\.)?$mydomain\//)) {
print qq~<iframe src="http://pingomatic.com/ping/?title=$title&blogurl=$ENV{'HTTP_REFERER'}&rssurl=$ENV{'HTTP_REFERER'}&chk_weblogscom=on&chk_blogs=on&chk_technorati=on&chk_feedburner=on&chk_syndic8=on&chk_newsgator=on&chk_feedster=on&chk_myyahoo=on&chk_pubsubcom=on&chk_blogdigger=on&chk_blogrolling=on&chk_blogstreet=on&chk_moreover=on&chk_weblogalot=on&chk_icerocket=on&chk_audioweblogs=on&chk_rubhub=on&chk_geourl=on&chk_a2b=on&chk_blogshares=on" border="0" width="1" height="1"></iframe>~;
}

PHP CODE
if($_SERVER['HTTP_REFERER'] != "" || preg_match("/http:\/\/(www\.)?$mydomain\///i",$_SERVER['HTTP_REFERER'] > 0) {
echo "<iframe src="http://pingomatic.com/ping/?title=$title&blogurl=$_SERVER['HTTP_REFERER']&rssurl=$_SERVER['HTTP_REFERER']&chk_weblogscom=on&chk_blogs=on&chk_technorati=on&chk_feedburner=on&chk_syndic8=on&chk_newsgator=on&chk_feedster=on&chk_myyahoo=on&chk_pubsubcom=on&chk_blogdigger=on&chk_blogrolling=on&chk_blogstreet=on&chk_moreover=on&chk_weblogalot=on&chk_icerocket=on&chk_audioweblogs=on&chk_rubhub=on&chk_geourl=on&chk_a2b=on&chk_blogshares=on" border="0" width="1" height="1"></iframe>";
}

JAVASCRIPT CODE
I really don’t know. :) Can someone fill this in for me? It’s entirely possible I just don’t know Javascript regex well enough.

This will check to see if the referrer exists. If it does and its not a referrer from within your domain than it’ll display an invisible IFRAME that automatically submits the referrer to PingOMatic. If you wanted to get a bit advanced with it you could also check for Google, MSN, and Yahoo referrers or any other unclean referrers you may get on a regular basis.

If you have an older site and you use this technique you’ll probably be shocked as hell about how many actual links you already had. Like I mentioned in the other post, at first you’ll start seeing your links tripling and even quadrupling but as also mentioned its just an illusion. You’ve had those links all along they just didn’t count since they weren’t indexed in the engines. After that starts to plateau, as long as you keep it up you’ll notice considerable difference in the efficiency and accuracy of your link saturation campaigns. I really believe this technique should be done on almost every site you use to target search traffic. Link Saturation is just too damn fundamental to be ignored. Yet, at the same time, its very good for those of us who are aware that it is not a common practice. Just the difference between your link saturation percentage and your competitors could be the difference between who outranks who.

Any other ideas for methods of early detection you can use to identify new inbound links? Technorati perhaps? How about ideas for ways to not only get the inbound links indexed but boost their creditability in an automated and efficient way? I didn’t mention this but when you’re pinging or rolling the pages through your indexing sites it doesn’t hurt to use YOUR anchor text, it won’t help much but it never hurts to help push the relevancy factor of your own site to their pages while you’re at it.