Power Indexing
If you’re just now joining us we’ve been talking about creating huge Madlib Sites and powerful Link Laundering Sites. So we’ve built some massive sites with some serious ranking power. However now we’re stuck with the problem of getting these puppies indexed quickly and thoroughly. If you’re anything like the average webmaster you’re probably not used to dealing with getting a 20k+ page site fully indexed. It’s easier than it sounds, you just got to pay attention and do it right and most importantly follow my disclaimer. These tips are only for sites in the 20k-200k page range. Anything less or more requires totally different strategies. Lets begin by setting some goals for ourselves.
Crawling Goals
There are two very important aspects of getting your sites indexed right. First is coverage.
Coverage- Getting the spiders to the right areas of your site. I call these areas the “joints.” Joints are pages on the site where many other important landing pages are connected to, or found through, yet are buried deeply into the site. For instance if you have a directory style listing where you have the links at the bottom of the results saying “Page 1, 2, 3, 4..etc.” That would be considered a joint page. It has no real value other than listing more pages within the site. They are important because they allow the spiders and visitors to find those landing pages even though they hold no real SEO value themselves. If you have a very large site it is of the utmost importance to get any joints on your site indexed first because the other pages will naturally follow.
The second important factor is the sheer volume of spider visits.
Crawl Volume- This is the science of drawing as much spider visits to your site as possible. Volume is what will get the most pages indexed. Accuracy with the coverage is what will keep the spiders on track instead of them just hitting your main page hundreds of times a day and never following the rest of the site.
This screenshot is from a new Madlib site of mine that is only about 10 days old with very few inbound links. Its only showing the crawlstats from Feb. 1st-6th(today). As you can see thats over 5,700 hits/day from Google followed shortly by MSN and Yahoo. Its also worth noting that the SITE: command is as equally impressive for such an infant site and is in the low xx,xxx range. So if you think that taking the time to develop a perfect mixture of Coverage and Spider Volume isn’t worth the hassle, than the best of luck to you in this industry. For the rest of us lets learn how it is done.
Like I said this is much easier than it sounds, we’re going to start off with a very basic tip and move on to a tad more advanced method and eventually end on a very advanced technique I call Rollover Sites. I’m going to leave you to choose your own level of involvement here. Feel free to follow along to the point of getting uncomfortable. There’s no need to fry braincells trying to follow techniques that you are not ready for. Not using these tips by no means equals a failure and some of these tips will require some technical know-how. So at least be ready for them.
Landing Page Inner linking This is the most basic of the steps. Lets refer back to the dating site example in the Madlib Sites post. Each landing page is an individual city. Each landing page suggests dating in the cities nearby. The easiest way to do this is to look for zipcodes that are the closest number to the current match. Another would be grab the row id’s from the entries before and after the current row id. This causes the crawlers to move outward from each individual landing page until they reach every single landing page. Proper inner linking among landing pages should be common sense for anyone with experience so I’ll move on. Just be sure to remember to include them because they play a very important roll in getting your site properly indexed.
Reversed and Rolling Sitemaps By now you’ve probably put up a simple sitemap on your site and linked to it on every page within your template. You’ve figured out very quickly that a big ass site = a big ass sitemap. It is common belief that the search engines treat a page that is an apparent site map differently in the number of links they are willing to follow than other pages, but when you’re dealing with a 20,000+ page site thats no reason to view the sitemaps indexing power any differently than any other page. Assume the bots will naturally follow only so many links on a given page. So its best to optimize your sitemap with that reasoning in mind. If you have a normal 1-5,000 page site its perfectly fine to have a small sitemap that starts at the beginning and finishes with a link to the last page in the database. However when you got a very large site like a Madlib site might produce it becomes a foolish waste of time. Your main page naturally links to the landing pages with low row id’s in the database. So they are the most apt to get crawled first. Why waste a sitemap that’s just going to get those crawled first as well. A good idea is to reverse the sitemap by changing your ORDER BY ‘id’ into ORDER BY ‘id’ DESC (descending meaning the last pages show up first and the first pages show up last). This makes the pages that naturally show up last to appear first in the sitemap so they will get prime attention. This will cause the crawlers to index the frontal pages of your site about the same time they index the deeply linked pages of your site(the last pages). If your inner linking is setup right it will cause them to work there way from the front and back of the site inward simultaneously until they reach the landing pages in the middle. Which is much more efficient than working from the front to the back in a linear fashion. An even better method is to create a rolling sitemap. For instance if you have a 30,000 page site have it grab entries 30,000-1 for the first week then 25,000-1:30,000-25,001 for the second week. Then the third week would be pages 20,000-1:30,30,000-20,001. Then repeat so on and so forth eventually pushing each 5,000 page chunk to the top of the list while keeping the entire list intact and static. This will cause the crawlers to eventually work there way from multiple entry points outward and inward at the same time. You can see why the rolling sitemap is the obvious choice for the pro wanting some serious efficiency from the present Crawl Volume.
Deep Linking Deep linking from outside sites is probably the biggest factor in producing high Crawl Volume. A close second would be using the above steps to show the crawlers the vasts amounts of content they are missing. The most efficient way you can get your massive sites indexed is to generate as much outside links as possible directly to the sites’ joint pages. Be sure to link to the joint pages from your more established sites as well as the main page. There are some awesome ways to get a ton of deep links to your sites but I’m going to save them for another post.
Rollover Sites This is where it gets really cool. A rollover site is a specially designed site that grabs content from your other sites, gets its own pages indexed and then rolls over and the pages die to help the pages for your real site get indexed. Creating a good Rollover Site is easy, it just takes a bit of coding knowledge. First you create a mainpage that links to 50-100 subpages. Each subpage is populated from data from your large sites’ databases that you are wanting to get indexed (Madlib sites for instance). Then you focus some link energy from your Link Laundering Sites and get the main page indexed and crawled as often as possible. What this will do is, create a site that is small and gets indexed very easily and quickly. Then you will create a daily cronjob that will pull the Google, Yahoo, and MSN APIs using the SITE: command. Parse for all the results from engines and compare them with the current list of pages the site has. Whenever a page of the site (the subpages that use the content of your large sites excluding the main page) gets indexed in all three engines have the script remove the page and replace it with a permanent 301 redirect to the target landing page of the large site. Then you mark it in the database as “indexed.” This is best accomplished by adding another boolean column to your database called “indexed.” Then whenever it is valued at “true” your Rollover Sites ignore it and move on to the next entry to create their subpages. It’s the automation that holds the beauty of this technique. Each subpage gets indexed very quickly and then when it redirects to the big site’s landing page that page gets indexed instantly. Your Rollover Sites keep their constant count of subpages and just keep rolling them over and creating new ones until all of your large site’s pages are indexed. I can’t even describe the amazing results this produces. You can have absolutely no inbound links to a site and create both huge crawl volumes and Crawl Coverage from just a few Rollover Sites in your arsenal. Right when you were starting to think Link Laundering Sites and Money Sites were all there was huh
As you are probably imagining I could easily write a whole E-book on just this one post. When you’re dealing with huge sites like Madlib sites there is a ton of possibilities to get them indexed fully and quickly. However if you focus on just these 4 tips and give them the attention they deserve there really is no need to ever worry about how your going to get your 100,000+ page site indexed. A couple people have already asked me if I think growing the sites slowly and naturally is better than producing the entire site at once. At this point I not only don’t care, but I have lost all fear of producing huge sites and presenting them to the engines in their entirety. I have also seen very little downsides to it, and definitely not enough to justify putting in the extra work to grow the sites slowly. If you’re doing your indexing strategies right you should have no fear of the big sites either.
-->
Comments (203)
These comments were imported from the original blog. New comments are closed.
Thank you for this great information, you write very
well which i like very much. I really impressed by your post.
Thanks for posting! I really like what you’ve acquired here;
You should keep it up forever! Best of luck
Thank you!
Quit was a great tool
Too bad it’s gone ://
As I understand your rolling sitemap, you keep a full 30.000k links in your sitemap - the same links always - and just switch the order of them from time to time.
Is it very important to keep the map ’static’ e.g. to have all the links in the map always? I was thinking about just putting 2-3k in my sitemap at a time and then replacing them with a new batch now and then. Would that work?
Great question.
I think having a static sitemap is very important. Infact keep it as static as possible. Like I mentioned in the article only change it once a week or so. As far as having ALL the links on the sitemap at all times, it is important for sites in this size range. The technique you’re touching on is a chunked sitemap. Its a sitemap technique for sites that are over the 200k page mark. I won’t get into too much detail on it but basically what it boils down to is having multiple sitemaps on the same site, each one rolling at different intervals. One being a daily sitemap, one being a weekly sitemap and the third being a monthly sitemap. Each one is larger then the one before it. I’ll eventually explain the whole thing in a post. For now don’t worry about creating a dynamic feature on your sitemaps because its pretty useless without creating a chunked sitemap system. but we’ll dive into that some other time.
I totally agree with you phil
This is awesome
A sideline question… When you’re rolling out mega sites, do you stagger the growth? I don’t have the experience to know if it’s true, but I hear about the need to avoid mega jumps in website sizes (not to raise any flags)… Thanks
Bloop
Eli, Nice tips.
About the last method, it’s still alot of link building work to get the rollover sites indexed. What would be the benefits of getting those indexed and 301 all pages to the “real” site than work on the “real” site’s link building directly? Best Regards
VJ
Great question!
Roll over sites only need to be built once. They can be used over and over to get all your future projects indexed. They are quite a bit of work, but they don’t just serve a one time purpose and your done with them. If you are the type that is constantly building new projects they become invaluable and well worth the effort.
Thank you for awnser, However, once you 301 all pages to the new site the rollover site will automaticly get de-indexed by the search engines no? A 301 basicly means that you have moved the document elsewhere, why would the SE keep them in the index? You also mention that it will get about anything indexed “instantly” which is surprising since it takes a few weeks to the SE to recognize the 301. What do you have to say about that? Thank you for your time!
VJ
hi your ideas are so different and great
thanks for sharing it.
Hi, very good posting. My english is not so good, but even I understood everything. Keep going and write always easy like that please. Thanks,
Egon
very good article
many thanks
thanks for the articlo
tim
Mit meiner Homepage biete ich Rauchern und werdenden Nichtrauchern das, was mir half meine Gesundheit und das gesamte soziale Umfeld enorm zu entlasten.
Warum? Weil ich weiß, dass es funktioniert, aber die meisten Raucher das E-Rauchen ungetestet abwerten oder von Abzockern abgeschreckt werden, ohne positive Ergebnisse.
Durch die elektrische Zigarette entsteht ein deutlich geringeres Gesundheitsrisiko.
Sie sind um die 80% günstiger als normale Zigaretten und sie dürfen überall dort geraucht werden,
wo für normale Raucher Rauchverbot herrscht.
Angesichts der derzeitigen Nichtraucherschutz-Diskussionen zweifellos immer beliebter.
Elektronische Zigaretten scheinen eine verblüffend einfache Lösung für all die derzeit diskutierten Probleme zu bieten:
Einerseits bieten sie Rauchern die Möglichkeit, ihren Schadstoffhaushalt drastisch zu reduzieren,
zum anderen werden Passiv-Raucher nicht mehr belästigt.
Vom Nutzen und Funktion des E-Rauchens, E-Zigarette und E-Liquid, möchte ich nun anderen Rauchern berichten und die Möglichkeit bieten, alles was dazu benötigt wird zu fairen Preisen auf meiner Seite zu erstehen und vorab alle Fakten abrufen zu können.
Ein anschauliches Video ist neben Downloads und tiefgründigem Wissen eingebunden und direkt anschaubar.
Viele Grüße, e-zigarette-preisert.de
Blue Hat SEO is top in almost every respect: The amount of (relevant) information base is large (more about that
also the Alexa ranking for the linking domain is given, underlining that it is in link building and to direct traffic, not just indirectly through better rankings), and the
Presentation is appealing. This is especially true for the additional information at the end, which are clearly presented as either a chart or as a ranking (eg anchor text)
keep it up
thanx
it is really interesting; . I continuously
look for these kinds of blogs which give lot of information.
i get a lot of information from this.and wish to complete this job.
i read from net about sorts,teaching and other blogs.
Helpfull Article, i do a lot of your Tip’s, NOW!
Thanks and have a nice Time
Great Inspirations for my Blog!
Thanks
very useful information! it is what I’m looking for! Thanks for the info! Regards,
CoolTips2u
This is truly a great read for me. I have bookmarked it and I am looking forward to reading new articles.
Keep up the good work.
nice man
Great article
thanks