Sometime during your SEO career you will probably find that you have to research a large volume of data.  Here are some tips to make it easier to do big projects

  • Threading - If you want to programatically check out 1 million websites and it takes 1 second per site, you will be busy for the next 11 days.  Thus the need to set up multiple threads to perform parallel processes.  Instead of your program checking 1 site a second, it can have 100 threads each checking 1 site a second and you can be done very quickly.
  • IP Limits - Many sites do not like automated programs.  One way they deal with this is by limiting the amount of requests per IP address before they place a temporary ban on requests from the ip.  Be careful because this temporary ban can become a permanent ban.  Find out what the limit is for any site you are going to be researching data on and stay within that limit.  If you are using off-the-shelf software make sure it has a feature that enables you to control the volume of requests.  If you are custom designing software, I don’t know why you are reading this post since you should know this stuff already.
  • Know what you are getting into - Before starting a project make sure to review it and understand the true scope of the project so you can assign the right tools for the job.  Recently I made this mistake and did not realize how big a research project I had started.  Currently I am well over 1 million websites and using an underpowered tool.  If I had followed this advice I could have had the results already instead I am running an impromptu test to see just how big a project my little program can handle and just how long it will take.

Speak up and be heard: