There are a few reasons:
Reason 1: you attempted to crawl a domain without entering : www
for example, you enter:
http://shenandoahconnection.com/
but the links on that page are all pointing to URLs with : HTTP://www.
http://www.shenandoahconnection.com/aboutus.htm
http://www.shenandoahconnection.com/worldnews.htm
http://www.shenandoahconnection.com/testimonials.htm
http://www.shenandoahconnection.com/post-here.htm
http://www.shenandoahconnection.com/shenandoahadvertising.htm
http://www.shenandoahconnection.com/make-payment.htmso
so our crawler is very STRICT in that sense and sees that 2 different domains.
It has a simple rule to ONLY crawl within the domain you set to crawl.
So those links are skipped!
Solution: Crawl with www
http://www.shenandoahconnection.com/
Reason 2:
Our crawler only crawled URLs from the domain you set to crawl. ✅
All other external links ( including subdomains ) will be ignored ❌
All dynamic JS URLs will be ignored ( here is a Technical reason & solution 🤓)
Example: https://desyr.co.uk
If you view the page-source
*, you will see only these standard URLs:
❌ https://medium.com/desyr
*Pro-tip 👉install this chrome addon to easily see all URLs of a page.
Reason 3: You entered a URL with a /directory/
in it.
More here >