A description of the features of Dead Link Checker can be found here.
Click here for a list of error codes and their meanings.
If you have a question that is not answered here, you can contact us at:
info@deadlinkchecker.com
Q: What are dead/broken links?
Dead or broken links are links in a web page which do not take the user to where the author intended. This may be because the link was incorrectly set up by the site's author, or because the destination web page no longer exists, or because there is some other problem with accessing the destination (for example, the server is not responding). When a dead link is clicked, an error code (e.g. 404 Not Found, or 500 Server Error) is normally returned by the server, or a timeout occurs if the server does not exist or returns no information within a given time span.
Q: Why do I need to check my website for broken links? How can I check my website for broken links?
Broken links are a source of irritation to users, and will cause your site to have a lower ranking in search engine results. Sites with lower rankings receive fewer visitors. Use deadlinkchecker.com to find broken links on your site so that they can be fixed, so restoring the site's rankings and attracting more web traffic.
Q: Is this service free?
The interactive version of this service is free. You can enter the URL of a website and it will be scanned for dead links. If you have signed in with a valid email address, you can request that a report be emailed to you when the scan is complete, and you can specify multiple websites to be scanned in a single operation. If you make a small monthly payment to subscribe to our automatic checker then you can set up a daily, weekly or monthly schedule for a number of sites to be scanned automatically with no further interaction required. Reports will be emailed to you containing details of any dead links found on your web pages.
Q: What is SEO?
SEO stands for Search Engine Optimisation. Search engines such as Google, Bing, etc. order their results using a number of criteria, including the perceived quality of the site being indexed. SEO is designed to ensure that a site is ranked as high as possible in a search engine's results. By removing dead links on your web pages you can signal to search engines that the site is an actively maintained and reliable source of information.
Q: What do the different server status codes mean?
Some common server status codes are:
- 302 - page moved to new location
- 403 - forbidden
- 404 - page not found
- 500 - internal server error
More information can be found here.
Q: How do I schedule a site scan?
Assuming you have subscribed to the auto-checker service and have logged in to DeadLinkChecker.com using your email address, you can
set up a scheduled scan by clicking the 'auto check' menu option. Clicking the button 'Add new scan' will pop up a small box in which you can
enter the URL of the website to scan, when the next scan should occur, how frequently the site is to be scanned, and the depth of the scan. Clicking
the 'Now' button will schedule the scan to run as soon as possible after you click 'Create'.
When a scan has been scheduled, you will be able to check on its status, or edit the details of the scan. After the scan has run you will be
emailed a report with details of any dead links found.
Q: How do I fix bad links?
To fix a bad link, you need to determine why the link is flagged as bad. If it is referring to a non-existent page (404 error) then it is likely that either the link has been mis-typed in the HTML source (in which case you need to edit it) or the destination page has been temporarily or permanently removed or renamed. If the destination no longer exists then you either need to link to a different page, or remove the link entirely. In any case, if you want to change the link then you will need to be able to edit the HTML of the website, or have access to someone who can do this for you.
Q: Can I check password protected pages?
No, unfortunately it is not possible to use Dead Link Checker to scan password protected pages. To do so would require our server to know and store username and password details for the site being scanned, which would represent a security risk.
Q: Can I restrict the scan to a subdirectory of my site?
Yes - if you start the scan from a page within a subdirectory, then all links found on that page will be checked for existence, but only pages within that subdirectory (or a descendant of it) will be
scanned for further links.
For example starting at http://www.example.com/news/index.html would verify links to pages such as http://www.example.com/weather.html but that page would not be scanned for
more links since it is not within the news/ subdirectory. However the page http://www.example/com/news/current/story1.html would be scanned for links, since it is within the news/ subdirectory.
Note that this is a feature of the auto-checker only.
Q: What is a robots.txt file?
At the root of most websites there exists a file called robots.txt - for example, http://www.microsoft.com/robots.txt. This file is used to tell any system that scans the site (such as a link checker, or a search engine's site crawler) which pages or folders should not be scanned or indexed. The robots.txt file can also be used to indicate the location of a sitemap, and to reduce the rate at which automated systems request pages from the site (using the 'crawl-delay' directive). Note that there is no obligation for a scanner to follow the instructions in robots.txt - it is merely a request to behave in a certain way - but well-behaved scanners will attempt to obey the directives. Dead Link Checker follows the 'disallow' and 'crawl-delay' directives where it can. The user-agent used by Dead Link Checker is 'www.deadlinkchecker.com'. For more information on robots.txt, see http://www.robotstxt.org/.
Q: Can I exclude certain links or subdirectories from the scan?
There are two ways to achieve this. If you are able to edit the robots.txt file on the website's server, you can add a directive which denies the link checker access to specific pages, or all pages within a certain subdirectory. For example, to exclude
the shoppingbasket/ folder (and all its descendants), add the following to robots.txt:
User-agent: www.deadlinkchecker.com Disallow: /shoppingbasket/Alternatively, when using the auto-checker there is an 'advanced' section in the scheduled scan editor where you can specify a list of strings that will be used to prevent specific URLs from being checked. URLs containing any of the strings specified in this section will not be checked for errors.
Q: Not all pages on my website have been scanned, why?
Dead Link Checker checks links on the initial page given to it, and if any of those links are on the same website then it also checks the linked pages for dead links, and so on. The depth of this scan is limited to 10
for a full scan, so any links which are only accessible by traversing more than this many steps, will not be checked. In addition, pages which have no incoming links will not be discovered. A further consideration is that
a change in the subdomain is treated as being a different website, so links on pages on these domains will not be checked (even if they point back to the original domain). For example when scanning the site http://www.example.com,
a link to http://test.example.com would be checked, but the contents of that page would not be further checked for links.
Links may also not be scanned if they are disallowed in the site's robots.txt file. In addition the Auto Checker will ignore child links on pages linked
using a 'rel=nofollow' attribute: the link itself will be verified but not scanned for further links. Pages containing a <meta name="robots" content="nofollow"> tag will similarly not
be checked for further links in the Auto Checker. This treatment of 'nofollow' links or pages can be overriden in the advanced section of the scan
settings popup dialog, using the "Override 'nofollow' directives" checkbox.
Pages may not be correctly scanned if they use client-side JavaScript to download or generate content after the page has loaded in a browser, because the
scanner is unable to execute JavaScript. Any links which only occur in the downloaded/generated content will not be checked.
Q: Why does Dead Link Checker report links as bad although I can open them in my browser?
Under some circumstances Dead Link Checker might be unable to access a page which is accessible from your browser.
Our server is located in the USA, so if there is a problem accessing your site from there, or if your site
serves different content depending on (for example) the geo-location of the requester, or its user-agent or IP address, then you may see errors being reported.
Some sites take exception to being accessed by anything other than a standard web browser, and will return error codes such
as 404 (Not Found) or 403 (Forbidden) to DeadLinkChecker.com
Sometimes a scan is interpreted as an attack, which often results in the web server returning error codes such
as 404 (Not Found), 403 (Forbidden) or 503 (Service Unavailable) to DeadLinkChecker.com even though the pages are visible
in a browser. If possible, whitelist our server's IP address (74.208.42.172) so that the scan is always permitted.
Some web browsers will automatically correct URLs which are actually invalid. For example, URLs are
not allowed to contain a backslash '\' character. Chrome and IE seem to silently convert it to a forward slash '/' but
other browsers do not. Dead Link Checker will flag such URLs as errors.
Sometimes pages are temporarily unavailable, perhaps due to server loading issues. Dead Link Checker will retry such
links after a pause, but if it cannot access the page then it will be marked as a bad link even though it may be possible
to reach the page at a later time.
The Auto-Checker scan can be configured to ignore URLs which you know to be valid - click on the 'advanced' link
at the bottom of the Scheduled Scan edit box. You can also adjust the time DeadLinkChecker will wait before
deciding a page is unresponsive.
Q: Will the tool work on an intranet?
Dead Link Checker is an online tool which will not work on an intranet, because the webpages need to be visible to our server which is external to your intranet.
Q: Will the tool work on dynamic (ASP/JSP/PHP/Rails) sites?
Dead Link Checker will work on dynamic pages where the content is generated server-side. However it only checks any given URL once in a
scanning session, so if the active page content changes from one call to the next then only links found on the first encounter
will be processed.
Pages may not be correctly scanned if they use client-side JavaScript to download or generate content after the page has loaded in a browser, because the
scanner is unable to execute JavaScript. Any links which only occur in the downloaded/generated content will not be checked.
Q: What is the difference between the Site Checker, Multi-Site Checker, Auto-Checker and the Auto-Checker Premium and Professional services?
Site Checker is a free tool which allows you to scan a single website for dead links. Multi-Site Checker is also free to use but requires an email address to be used as a login name. You can then scan multiple sites in one sitting, and have a report automatically emailed to you at the end of the scan. Auto-Checker is our entry-level subscription service. For a small monthly fee you can have up to five sites scanned automatically on a regular schedule, with no further interaction required on your part. A report will be emailed to you after each scan, and is also available online. Auto-Checker Premium and Professional allow you to check a larger number of sites, with more links on each site.
Q: How can I reduce the load on my server when it is being scanned?
Dead Link Checker has been optimised to scan websites as quickly as possible, whilst automatically adjusting its scan rate to reduce server errors. However some servers may struggle if pages are requested too quickly, or the requests may be interpreted as a Denial-Of-Service attack. You can slow down the rate at which pages are requested by modifying the robots.txt file on your server, to include a section:
User-agent: www.deadlinkchecker.com Crawl-delay: 1This will restrict the page requests to approximately one per second. The scan will be slowed considerably but the server load will be reduced in proportion.
Alternatively when using the auto-check feature, you can access the 'advanced' settings in the scheduled scan editor and enter a value for the 'Interval' to specify a minimum duration between successive page requests on the website being scanned.
Q: How does Dead Link Checker identify itself when scanning a web site?
Dead Link Checker uses a user-agent string starting with 'www.deadlinkchecker.com' when requesting web pages.
The server's IP address is 74.208.42.172
Q: How can I check an on-going auto-checker scan?
When using the auto-checker you can visit the Auto Check page and see a list of scan schedules you have created. Details of any scans in progress are listed below this section - you can see statistics on the scan, the URL of a recently checked link, and buttons which allow you to terminate the scan or request an intermediate error report.
Q: Can I reduce the number of emails I receive?
When using the auto-checker you can edit the 'advanced' scan settings and tick the checkbox marked 'suppress email if no errors' - this will stop Dead Link Checker from sending you emails if a scheduled scan detected no errors.
Q: Can I change the email address that reports are sent to?
For security reasons, Dead Link Checker will only email reports to the registered account holder. You can change the email address for your account, when logged in, by clicking the settings icon at top right. Alternatively it should be possible to configure your email software to forward the report email to a third party, based on the sender and/or subject line.
Q: Will Dead Link Checker's scan affect Google Analytics results?
You can configure Google Analytics to filter out all requests from Dead Link Checker's IP address (see 'How does Dead Link Checker identify itself?'). For instructions on configuring Google Analytics, see here.
Q: Do pages with different query strings count as different pages?
Server-generated web pages can alter their content depending on parameters passed to them, so Dead Link Checker regards pages with differing query strings as being distinct pages which are checked separately.
Q: Can I import scan results into a spreadsheet such as Excel?
The reports generated by the subscription Auto-Check service can also be downloaded as a CSV file which can be saved to your computer and imported into Excel etc. for further analysis. A download link is shown at the foot of the online report.
Q: What is a redirection loop?
When a URL is requested, the server can respond with a redirection status code (301 or 302) indicating a new location that should be requested for the resource. If this new location is the same as the original request, or if it in turn redirects to the original URL, then a redirection loop exists - the requests would cycle endlessly without ever returning the requested information. Dead Link Checker reports this situation as a Redirection Loop. Most browsers will give a similar error message after a number of redirections have been followed.
Note: Some websites or servers behave differently if cookies are disabled. Dead Link Checker Site-check and Multi-check do not propagate cookies when scanning a site - if a redirection loop is reported by Dead Link Checker but the link appears to work correctly in a browser, it may be that visiting the link with cookies disabled would trigger an error. Auto-check propagates cookies for external links only. You can find out how to disable or enable cookies at www.whatarecookies.com/enable.asp
Q: How do I optimise my Auto-Checker scan?
There are a few steps you should follow before running your Auto-Checker scan. Following these steps will help to optimise the use of your link quota, minimise stress on your server, and ensure that you see your scan results as quickly as possible.
• Ensure that emails from report_generator@deadlinkchecker.com will not be rejected by your email software. Your scan reports typically contain many links, and sometimes this can trigger spam detectors which then reject the email. Sometimes this will result in them being sent to your spam folder, but if emails are returned to us then this might also prevent subsequent emails from being sent to your account.
• Use the 'ignore' feature of the scan settings to tell the scanner not to check unnecessary URLs within your site. The 'ignore' box allows you to enter a list of text fragments, and the scanner will ignore URLs containing any of these. For example, if there is a link on each of your site's pages to add an item to a shopping cart then you typically do not need to verify these links, so you might filter them out by adding /cart/ or /checkout/ or similar to the ignore list - depending on how your site is constructed. Similarly you might want to ignore URLs containing /login or /search for example - check your website for such URLs before starting the scan.
• Ensure that your server will not interpret the scan as an attack, which can result in the link checker server being denied access to your site. If this happens you will usually see a large number of unexected errors, typically (but not always) 403 Forbidden, 503 Service Unavailable, or 400 Bad Request. If possible, whitelist our server's IP address (74.208.42.172) so that the scan is always permitted.
• Catalog sites, or sites containing large lists of items, often have options to view the items page by page with various filters to restrict or sort the items shown, for example by color or by manufacurer or both. If these filter options are added to the URL using '?' parameters then the link checker will treat each unique combination of filter terms as a new URL. This can result in a potentially enormous number of different combinations even though there may be a perfectly manageable number of catalog items being viewed. For example, a website listing cars might have options to select by Manufacturer, Style, Color, Age and Price using a URL such as
findmeacar.com/list?manufacturer=ford-bmw-mercedes&style=coupe&color=red-green-black&age=3&price=10000-20000If there are 10 different manufacturers to select from, and 9 style, 15 colours, 8 age groups and 5 price ranges, and if any or all of these options can be selected, then it will lead to over 140 trillion different combinations! This will clearly waste all your link quota without achieving anything useful. Even if only one selection (or none) can be made in each category it will lead to over 95,000 combinations. In cases like these, the best option is to add all of the filters to the ignore list:
manufacturer=Then the scanner will not waste resources scanning different combinations of the same items, but should hopefully still be able to find all the catalog items individually by going through the list page by page.
style=
color=
age=
price=
• In general, it is a good idea to monitor a new scan as it progresses - keeping an eye on the URLs being checked to ensure there are no filter options which have been missed, and that all the URLs seem sensible. If you need to abandon a scan then you can click on the 'end scan' button, reconfigure the 'ignore' settings, and restart the scan to try again.