Sitemap crawling is limited by three dimensions:
Max Pages
: Total number of total pages
Max Pages
per /Directory
/ : Total number of total pages per URL /directory/
Max Depth
: Page-level depth
Depth is interpreted like so:
example: Max Depth = `2`
for : Site.com/products/shoes/
this will crawl up to /shoes/
If you leave it blank, it will crawl all levels ( allowed by your plan ).
When you create a sitemap you can specify both of these limits or leave them blank, which will default to your plan's limits.
Take a sitemap like Stripe.com
:
You can enter Max pages 3
to get something like:
OR this..
The outcome will always be slightly different because the crawler looks at the first set of links it finds from the homepage, and then randomly crawls them, until it is done, and finds more links on the other child pages.
The result depends on your site URL structure, this is one possible outcome. The important thing is that you always will end with 3 pages and no more.
Limiting results to a specific depth via Max Depth
:
You can enter “Max depth 1” to ignore pages deeper than level 1 and get something like:
Leaving Max Pages or Max Depth ... empty
So what happens when you leave any of those fields empty
?
Does it mean that there is no limit? Well, no.
In that case, a sitemap is going to use the workspace’s plan limit.
For example, if you are on a Free plan, leaving “max pages” empty will crawl up to 25
pages, and leaving “Max depth” empty will crawl up to 2 levels
, while a Freelancer plan will crawl up to 1500 pages
and unlimited
levels.
Reaching limits: Setting them manually vs plan's hard limits
What happens to the sitemap when the crawl reaches a limit?
Well, naturally that depends on crawling settings!
If you explicitly enter a limit, the crawler stops, and the sitemap is finished when it hits that limit. If the limit is empty (and crawl's under the plan's limits) then the crawl pauses when it reaches that limit, so that you have the opportunity to upgrade to a higher plan to finish the crawl.
Paused vs Finished sitemaps
Why do these differences matter?
Paused sitemaps can be resumed and continue where you left crawling but once you or the plan's limit is reached the sitemap stops... it’s done.
Finishing the sitemap is a requirement to use features like Editing and Visual Comparison. If you are exploring, it makes sense to work on plan limits let us pause the crawl and decide to upgrade & resume or stop. If you know ahead what you want, i.e if you are scheduling a visual comparison, it’s better to specify the limits.
Note: When resuming paused crawls, you also have the ability to change the Max-Page/Depth limits.
Changing your plan
Upgrading your plan will change the plan's limits instantly. Downgrading keeps the old limits until the end of your 30 days billing cycle.
Once you upgrade and resume a paused
crawl, you will have the opportunity to configure the crawl's limits again. stopped
crawls do not have this option.
Examples
Let’s say that you crawl under a Free plan a site that has:
300 level-1 pages
25 level-2 pages
5 level-3 pages.
Let’s see what happens depending on the settings.
Max Page = 10, Max Depth = 1
: Sitemap will be finished with 10 pages, all level-1
Max Pages = empty, Max Depth = 1:
The sitemap will be finished with 25 pages, all level-1
Max Pages = 25, Max Depth = 1
: Sitemap will be finished with 25 pages, all level-1
Max Pages = 25, Max Depth = empty:
The sitemap will be finished with 25 pages. Levels depend on the site structure.
Max Pages = empty, Max Depth = empty:
The sitemap will pause due to a max page limit of 25
pages. A mix of level-1 and 2. The amount of pages of each level depends on the site structure.
Now let’s say that you crawl under a Free plan a website with
20 level-1 pages and 2 level-2 pages, 3 level-3 pages.
Max Page = 22, Max Depth = empty
: Sitemap will stop with 22 pages (level-1 and 2).
Max Page = 25, Max Depth = empty:
The sitemap will be paused due to the depth limit of 25 pages (level-1 and 2). This configuration shows that we pause when the only way to fulfill max-pages expectations is by crawling deeper.
Max Page = 20, Max Depth = 1:
The sitemap will be finished with 20 pages, all level-1
Max Page = empty, Max Depth = empty:
The sitemap will pause due to the depth limit of 22 pages (level-1 and 2).
Best practices:
🔥 If you are not sure what to do, just leave
Max Depth
empty!
This will ensure that the crawl will auto-pause if we find more pages on deeper levels, and allow you to upgrade to fully complete the crawl.