Blue Hat Technique #18 - Link Saturation w/ Log Link Matching
Alrighty I’m moving this post up a bit to answer a few questions. In my Real Life SEO Example post I talked a bit about the technique of Log Link Matching. It’s an awesome technique that deserves a bit of attention. So here we go.
Description
The reality of the search engines are that they only have a certain percentage of the documents on the web indexed. This is apparent by looking at your own saturation levels with your own sites. Often you’re very lucky if you get 80% of a large site indexed. Unfortunately this means that tons upon tons of the natural links out there aren’t counting and giving proper credit to you and their respective targets. This is a two edged sword. This means your competitors actually have quite a bit more links than it appears, and more than likely so do you. Naturally you can guess what has to be done.
Objective
Saturation usually refers to how many pages you have in the index in comparison to the total number of actual pages on your site. For instance if you have a 100 page site and 44 pages are indexed than you have 44% saturation. Since this is a topic that never really gets talked about, for the sake of making it easy on ourselves I’m going to refer to our goal as “link saturation.” In other words the number of links you have showing in the index in comparison to your total actual inbound links. So if you have 100 links in the index but you really have 200 actual links that are identifiable than you have 50% link saturation. That aside, our object is to use methods of early detection to quickly identify inbound links to our sites, get them indexed, and if possible give them a bit of link power so the link to our site will count for more. This will have an ultimate ending result of huge efficiency in our link building campaign. It also will more than likely stir up a large percentage of long dormant links on our older sites that are yet to use the Log Link Matching technique. First let’s focus on links we’ve already missed by taking a look at our log files.
Methodology #1 - The Log Files
Our site’s common log files are a great indicator of a new and old inbound links that the search engines may have missed. Most log files are usually located below the root of of the public html folder. If you’re on a standard CPanel setup the path to the log file can be easily found by downloading and viewing your Awstats config file, which is usually located in /tmp/awstats/awstats.domain.com.conf. Around line 35 it’ll tell you the path of the log file: LogFile=”/usr/local/apache/domlogs/domain.com”. Typically your site as a Linux user has access to this file and can read it through a script. If not than contact your hosting provider and ask for read access to the log.
1) Open up the log file in a text editor and identify where all the referrers are then parse them out so you have a nice list of all the sites that link to you. If you use Textpad you can click Tools - Sort - Delete Duplicate Lines - OK. That will clean up the huge list and organize it into a manageable size.
2) Once you have your list of links there’s several routes you can take to get them indexed. These include but not limited to creating a third party rolling site map, roll over sites, or even distributing the links through blogrolls within your network. Those of course are the more complicated ways of doing it and also the most work intensive, but they’re by far the most effective simply because they involve using direct static links. The simplest of course would be to simply ping Blog Aggregators like the ones listed on Pingomatic or Pingoat. My recommendation is, if you are only getting a couple dozen links/day or are getting a huge volume of links (200+/day) than use the static link methods because they are more efficient and can be monitored more closely. If you’re somewhere in between than there’s no reason you can’t just keep it simple and continuously ping Blog Aggregators and hope a high percentage eventually will get indexed. After so many pings they will all eventually get in anyways. It may just take awhile and is harder to monitor (one of the biggest hatreds in my life..hehe).
There are several Windows applications that can help you mass ping this list of referral URLS. Since I use custom scripts instead of a single Windows app myself I have no strong recommendations for one, but feel free to browse around and find one you like. Another suggestion I have to help clean up your list a bit is to clean the list of any common referrers such as Google, MSN, and Yahoo referrals. That’ll at least save you a ton of wasted CPU time. Once you’ve gotten this taken care of you’ll want to start considering an automated way of doing this for any new links as they come in. I got a few suggestions for this as well.
Methodology #2 - Direct Referrals
Of course you can continue to do the method above to monitor for new referrals as long as you keep the list clean of duplicates. However it doesn’t hurt to consider accomplishing the task upon arrival. I talked a little bit about this last year with my Blog Ping Hack post, and the same principle applies except instead of pinging the current page we’ll ping the referral if it exists.
1) First check to see if a referral exists when the user display the page. If it does exist than have it open up the form submit for a place such as Pingomatic to automatically ping all the services using the users browser. Here’s a few examples of how to do it in various languages.
CGI CODE
if(($ENV{'HTTP_REFERER'} ne "") || ($ENV{'HTTP_REFERER'} =~ m/http:\/\/(www\.)?$mydomain\//)) {
print qq~<iframe src="http://pingomatic.com/ping/?title=$title&blogurl=$ENV{'HTTP_REFERER'}&rssurl=$ENV{'HTTP_REFERER'}&chk_weblogscom=on&chk_blogs=on&chk_technorati=on&chk_feedburner=on&chk_syndic8=on&chk_newsgator=on&chk_feedster=on&chk_myyahoo=on&chk_pubsubcom=on&chk_blogdigger=on&chk_blogrolling=on&chk_blogstreet=on&chk_moreover=on&chk_weblogalot=on&chk_icerocket=on&chk_audioweblogs=on&chk_rubhub=on&chk_geourl=on&chk_a2b=on&chk_blogshares=on" border="0" width="1" height="1"></iframe>~;
}
PHP CODE
if($_SERVER['HTTP_REFERER'] != "" || preg_match("/http:\/\/(www\.)?$mydomain\///i",$_SERVER['HTTP_REFERER'] > 0) {
echo "<iframe src="http://pingomatic.com/ping/?title=$title&blogurl=$_SERVER['HTTP_REFERER']&rssurl=$_SERVER['HTTP_REFERER']&chk_weblogscom=on&chk_blogs=on&chk_technorati=on&chk_feedburner=on&chk_syndic8=on&chk_newsgator=on&chk_feedster=on&chk_myyahoo=on&chk_pubsubcom=on&chk_blogdigger=on&chk_blogrolling=on&chk_blogstreet=on&chk_moreover=on&chk_weblogalot=on&chk_icerocket=on&chk_audioweblogs=on&chk_rubhub=on&chk_geourl=on&chk_a2b=on&chk_blogshares=on" border="0" width="1" height="1"></iframe>";
}
JAVASCRIPT CODE
I really don’t know.
Can someone fill this in for me? It’s entirely possible I just don’t know Javascript regex well enough.
This will check to see if the referrer exists. If it does and its not a referrer from within your domain than it’ll display an invisible IFRAME that automatically submits the referrer to PingOMatic. If you wanted to get a bit advanced with it you could also check for Google, MSN, and Yahoo referrers or any other unclean referrers you may get on a regular basis.
If you have an older site and you use this technique you’ll probably be shocked as hell about how many actual links you already had. Like I mentioned in the other post, at first you’ll start seeing your links tripling and even quadrupling but as also mentioned its just an illusion. You’ve had those links all along they just didn’t count since they weren’t indexed in the engines. After that starts to plateau, as long as you keep it up you’ll notice considerable difference in the efficiency and accuracy of your link saturation campaigns. I really believe this technique should be done on almost every site you use to target search traffic. Link Saturation is just too damn fundamental to be ignored. Yet, at the same time, its very good for those of us who are aware that it is not a common practice. Just the difference between your link saturation percentage and your competitors could be the difference between who outranks who.
Any other ideas for methods of early detection you can use to identify new inbound links? Technorati perhaps? How about ideas for ways to not only get the inbound links indexed but boost their creditability in an automated and efficient way? I didn’t mention this but when you’re pinging or rolling the pages through your indexing sites it doesn’t hurt to use YOUR anchor text, it won’t help much but it never hurts to help push the relevancy factor of your own site to their pages while you’re at it.
Javascript:
if(document.referrer != ‘’ && document.referrer.toLowerCase().indexOf(document.location.host.replace(/^www\./gi, ‘’).toLowerCase()) == -1) {
var ttle = document.getElementsByTagName(’title’)[0].innerHTML;//this will submit to pingomatic with YOUR post’s title
var ifr = document.createElement(’iframe’);
ifr[’src’] = ‘http://pingomatic.com/ping/?title=’+escape(ttle)+’&blogurl=’+escape(document.referrer);
ifr[’border’] = 0;
ifr[’width’] = 1;
ifr[’height’] = 1;
document.getElementsByTagName(’body’)[0].appendChild(ifr);
}
Sorry, not tested.
usually , i use the windows applications for ping.
Even better than link exchange!
It just keeps coming
Now we just need to know how to get main competitors links out of the index. LOL.
I would prefer cgi or php code more than javascript, cause php can’t be disabled by the user hehe
Nice tip on the Textpad feature.
Does it matter what ip it is that requests the ping from pingomatic? If it doesn’t wouldn’t it just be better to store the urls in a mysql and call them later from the server?
Another great post . . . I’m starting to slowly understand . . .
I got so inspired from this post that I had to make my own version of that script and incorporate it with the SEO Website CMS that my team is developing.
Thanks for the great post!
Cheers,
Venetsian
I lost my original post because I can’t add up!
Don’t use your primary network. Using (follow, noindex) on your static links is ineffective because you’re creating reciprocal links.
Do use a quality secondary network with one of bluehat’s many techniques for power indexing.
Caveat: newly discovered IBLs are more likely to belong to an already lousy neighborhood. Two factors will determine if you help them out of the lousy neighborhood - the quality of your 2ndary network and quality of the IBL pages.
Technorati is excellent and has helped us discover a boatload of low quality blog posts that link to us. Google Blog search can help find pages not indexed in Yahoo/MSN.
Would something similar be good for Google supplemental results in an attempt to get pages with our links out of the supplemental results?
And what would the best way of going about getting these pages out of the supplemental results be?
If I”m not mistaken, links on pages that are in the Sup Results don’t count.
Great post! If youre Linux (who cares about Windows and some Textpad
) user and want to delate duplicate lines from a file you can do it by typing from console:
uniq inputfile.txt > outputfile.txt
Cheers
Assuming the duplicate lines are on successive lines!
You’d better sort the file first:
sort inputfile.txt | uniq > outputfile.txt
Awesome post. I’ll have to work on this one…a lot. I’m sure with enough playing around I can begin to understand what you’re talking about.
In the case of the real time approach where we are pinging with the referral page, are there any issues with repeatedly pinging the same referrer page? Is it worth a short db routine to add the referrer page to the db so that you can keep track of it and only ping once per page?
Very good question. I actually proved this with the blog ping hack post. My point was that as long as the pings are coming from different ips(the users’ ip) than it doesn’t matter how often it happens. As a test I actually put the code up on blue hat for over a year, the url never got banned or quit being successfully pinged.
As far as I understand this, you want to get links from sites which are indexed???
So you could turn it around and take content from sites wich are not indexed.
The problem with method one is the “nice list of all the sites that link to you” is mostly referer spam from pills and casinos. There doesn’t seem to be a way to filter them out so I’ll try method 2 and verify the backlink first.
can you speak on some fun uses of iframes and other funs with visitors ip addresses?
Hi Folks!
My question is about the ping hack post.
Can it be used for Blogspot blog?! Is there any way to modify it?!
;)
Niko, visit the link on my name it will generate the code for you in a general way, not just for wordpress.
Thanks mate!
Is it of any importance where to stick the code?!
I am not sure where to put in blog spot, but it seems that one per page of posts would be a start if you have an rss feed.
Note that the ping hack link to above is for visitors to ping your blog, if you want the link saturation post (i have a tiny modification in mine that gets rid of the issues with having the bots visit the pinger, and gives keywords for titles) i will put that up on my site.
I condensed this info down and I loved the simplicity. I wonder if there are any other tactics for increasing link saturation…
1. Look at logs
2. Check the logs in comparison with what links have been indexed
3. Use a ping bot/software to send each links not indexed in the SEs to the search engines for indexing (higher link saturation = higher PR and more links counting)
Let me know of any new updates, I’m intrigued =o) Terribly cool
social networking is one thing which I feel is the best to get your pages indexed. Linkedin, digg, twitter, squidoo you name it..
What is $title supposed to be set to?
What exactly is the third-party site map that was mentioned? Isn’t it correct that I can only make a sitemap that is on my domain, referencing only webpages in my domain?
Or did you mean just make a page full of referrer links (not an actual XML sitemap) and hope that page gets indexed?
This is the way that i do it in php, i made a page *click on my name) that will generate the code with titles as random keywords, also i filter for non Mozilla browsers, seems to get rid of a bunch of the spiders.
You rule. Thanks for that…
I was thinking maybe i should expand the domain filter to multiple domains, in case people have a network that they want to keep off the ping list, what do you think.
like block, domain domain … domaine
could also be done to no ping when people click from the popular search engines,
or maybe thats a stupid idea and i should just add a featuer that inserts a guy with a cellphone being eatin by a half-buisness-women-half-housewife, what do you guys think?
Yes, block my own domain, any specific domains of my choice, and maybe search engines.
Ok i have added it so you can set a list of domains to block and also it automatically blocks google, msn.com and yahoo.com
it works ultra snazy.
So, removing MOZILLE user agents removes spyders from pinging?? Doesn’t that remove a LOT of users too though? (I use my own script but foundt hat interesting how you remove those agents.)
And for the record I tested it on 1 site and that site visitors went up 20% within 3 days. Coincedence… possibly but lets hope not
it removes not mozilla.
i have:
if(preg_match(”;mozilla;i”,$ua) != 1 or $ref == “”){
$good_ref=0;
}
$good_ref=0 means that the referer will not be pinged.
maybe i should have written it as preg_match(…) == 0, to read easyer but they are logically equivelent.
great to hear about the 20% increase with visitors. That must be worth a buck or two a extra a day
Thanks for the code. I grabbed the ‘Link saturation’ and ‘Visitor ping hack’ code from your site. Just one question - Is it ok to use both scripts on a page…or will the SE’s perhaps see this as a form of link farming/spamming? a Lot of my pages have been de-indexed recently and I want to see if this was a factor (probably not).
For me, since I don’t have a good 404 handler right now, I added the extra twist of importing the log files to a database and search for 404:s. I have a lot of inlinks going to old removed pages and now I’ll a) create the rewrite and as Eli suggests b) ping the pages
Also forgot to say. In my routine I also scrape the page supposed to be linking to me and verifies it’s really there. Cause some directories I am listed in uses searches with http POST instead of GET so therefore the referring page doesn’t have my link on it. Also a lot of mozilla users seems to change the referer in their browser.
Another idea I have implemented, you could scrape the page to verify the link to you is really there before pinging. Cause if the refrerring page is the result of a HTTP POST for example just the url might not have the link.
I did this on my web site using ASP.NET. In case anyone is interested, here’s how I did it.
In a master page near the end of the page, put the following:
In the codebehind:
protected void Page_Load(object sender, EventArgs e)
{
if (Request.UrlReferrer != null)
{
Regex domainRegex = new Regex(@”(?:azavia\.(?:com|net))|localhost”);
if (!domainRegex.IsMatch(Request.UrlReferrer.Host))
{
PingReferrerPanel.Visible = true;
string referrer = Server.UrlEncode(Request.UrlReferrer.ToString());
string title = Server.UrlEncode(Page.Title);
PingReferrerIframe.Attributes[”src”] = “http://pingomatic.com/ping/?title=” +
title + “&blogurl=” + referrer + “&rssurl=” + referrer +
“&chk_weblogscom=on&chk_blogs=on&chk_technorati=on&” +
“chk_feedburner=on&chk_syndic8=on&chk_newsgator=on&” +
“chk_feedster=on&chk_myyahoo=on&chk_pubsubcom=on&” +
“chk_blogdigger=on&chk_blogrolling=on&” +
“chk_blogstreet=on&chk_moreover=on&” +
“chk_weblogalot=on&chk_icerocket=on&” +
“chk_audioweblogs=on&chk_rubhub=on&chk_geourl=on&” +
“chk_a2b=on&chk_blogshares=on”;
}
}
}
Hope it helps someone.
Well that first bit of code didn’t post, so let’s try that again:
<asp:Panel ID=”PingReferrerPanel” runat=”server” Visible=”false”>
<iframe id=”PingReferrerIframe” runat=”server” border=”0″ width=”1″ height=”1″ />
</asp:Panel>
Very nice!
Should give it a try!!
So, quick but basic question, what technique/tool do you use for working out what inbound links *are* in the Google index??
Just a question on the code. Why did you set width an height to 1 and not 0, is there a special reason?
using 0 iframe can get you in trouble in some cases. 0 iframes are used by cheaters to load other pages without the surfers knowing it
I’ll have to work on this one…a lot. I’m sure with enough playing around I can begin to understand what you’re talking about.
yet to start this
Nice article. You explained a lot of things I didn’t know. I will try it today and share my results by doing it.
Nice article. Just a question, do you think that using the full url on internal links helps to increase the saturation levels?
Hi had some problems with the php version so I’ve made some alterations.
This one also pings autopinger.com
if($_SERVER[’HTTP_REFERER’] != “” || preg_match(”@^(?:http://)?([^/]+)@i”,$_SERVER[’HTTP_REFERER’] > 0)) {
echo ‘’;
echo ‘’;
}