Advanced Settings
Custom Headers: This allows your server to identify our crawlers, in case you need to white-list them to allow them through the security walls.
Crawl an XML URL:
The URL of an XML sitemap. We just crawl URLs found in the XML sitemap, links from the pages will not be followed.
Crawl a CSV file:
Format = the first column must contain a URL.
We just crawl URLs found in the file, links from the pages will not be followed. Remember to use a comma ,
as a delimiter and "
as an escape character.
The CSV may use headers as the first row.
Advanced FILTERS
Include & Exclude Directories
Directories are now NOT-STRICT ( aka fuzzy ) in their matches!
ie: /dog/
will match either of these URLs :
site.com/dog/... OR
site.com/products/dog/...
ie: /pets/dog/
will match either of these URLs:
site.com/pets/dog/... OR
site.com/products/pets/dog/...
Include & Exclude Keywords
Keywords are NOT STRICT, but "fuzzy" in their matches, and are more flexible than directories as they can match any part of the URL's path.
To include all urls containing 'dog' ie > https://site.com/dog-products.html
> Enter dog
. This will also include urls such as https://site.com/categories/dogs/food
IMPORTANT!
for URLs that have parameters like:
site.com/index.php?page=1
to Exclude all those pages only use the params for the Keywords:
page=
do not use
index.php?page=
Advanced Combos:
All of the filters can be used in combination with each other:
ie: /directory/
+ keyword
= will return only results that strictly match both.
Example: Directories + Keywords together
will not crawl: site.com/products/dog/monkey-pants.html
✘
will not crawl: site.com/dog/monkey-pants.html
✘
will not crawl: site.com/dog/monkeys.html
✘
it will crawl: site.com/dog/foo.html
✅
_________________
Multiple values within each filter, use an “OR” logic.
That is if you enter 3 keywords (dog
, cat
, monkey
), then the URL will pass the check as long as it has dog
OR cat
OR monkey
in its path.
Example: Multiple Keywords
will not crawl: site.com/dog/foo.html
✘
will not crawl: site.com/products/dog/...
✘
will not crawl: site.com/tags/cat/...
will not crawl: site.com/category/monkey/...
✘
it will crawl: site.com/cats/foo.html
✅
Example : Multiple /Directories/ + Multiple Keywords
will not crawl : site.com/dog/monkey-pants.html
✘
will not crawl : site.com/cat/monkeys.html
✘
will not crawl : site.com/dog/birds/..
✘
will not crawl : site.com/monkey/..
✘
will not crawl : site.com/bird-cages/..
✘
it will crawl : site.com/dog-cages/
✅
it will crawl : site.com/categories/cat-litters/
✅
Example: Exclude + Include Combos:
will not crawl: site.com/cake/monkeys.html
✘
will not crawl: site.com/cake/monkey-pants/foo.html
✘
will not crawl: site.com/monkey/cakes.html
✘
it will crawl: site.com/cake/foo.html
✅
*These can also cancel themselves if you use:
Exclude Directory ="/cat/" + Include keyword "cat"