Web Crawler Policy
Our goal and policy concerning the collecting of web pages by Corporate Research & Development Center, Toshiba Corporation are briefly described below. If you have any query, please email us. We appreciate your cooperation and support.
Contents
- 1. Goal
- 2. Policy
- Our crawler accesses each site in a page-by-page manner with some intervals.
- It always reads the robots.txt file and never crawls restricted pages.
- If the respective web page has the robots meta tag included as follows, our crawler never crawls the page.
- We exercise great care regarding the management of web pages.
- 3. Contact
1. Goal
Our main objective at the Corporate Research & Development Center for the collection of web pages is to conduct research on natural language processing and to develop new technologies and products for the Toshiba Group.
Our crawler accesses each site in a page-by-page manner with some intervals.
Though we interleave the crawling processes with the processes for detecting host aliases, chances are that an aliased server may be accessed simultaneously under different host names.
It always reads the robots.txt file and never crawls restricted pages.
You can specify directives to the crawler in robots.txt file at the top of your site (e.g., https://www.toshiba.co.jp/robots.txt). For example, the following directive forbids our crawler to retrieve any content from your site.
User-agent: TosCrawler Disallow: /
If you want to control the rate of access, specify Crawl-delay parameter in robots.txt file. For example, the following directs our crawler to access the site not more than once every 30 seconds.
User-agent: TosCrawler Crawl-delay: 30.0
If the respective web page has the robots meta tag included as follows, our crawler never crawls the page.
You can also protect the contents in a file-by-file manner with the robots meta tags . If you put the following in the header of your HTML documents, our crawler will not follow the links found in the documents.
<META NAME="robots" CONTENT="nofollow, noindex">
We exercise great care regarding the management of web pages.
We register collected web pages and dictionaries in databases in the Corporate Research & Development Center, Toshiba Corp. We manage access to the database and prevent unauthorized access.
3. Contact
For any query or comment or request please email us.
Please clarify host name(s) and IP address(es) of your site in the email.
2012/10/04