First of all, let's clarify a few things. "Spiders" are what scan the pages of your website, and transfer the information to the search engines, for analysis. The spiders surf the internet, on the lookout for new information for the search engines. The spiders surf from webpage to webpage using the links on each page.
When humans surf the internet, we see a lot of information that the search engine spiders are unaware of, for example, we can see and analyze pictures, which the spiders are not able to do. Another example is flash animations. Humans are impressed by the beauty of flash, and we are capable of understanding the messages embedded in the animation. However, the spiders have no notion of being impressed by flash, and (for now) only understand the messages in a distorted manner. The spiders scan the page, they "understand" that there is a picture or a flash animation on the page, but they don't know what is in them.
In order to see our webpages through the eyes of the search engine spiders, there are many tools available on the internet. However, I would like to show you the simplest and "most exact" tool – the Google Cache.
The main advantage of the cache is that it lets us know whether Google is already familiar with our page. If our page has been on-line for a few days or more, and Google has not yet read it, this may indicate that there is a problem with our site.
The second advantage of the tool is that it is so simple to use.
Before we see how to use the tool, let's familiarize ourselves with the Google command
site:www.domain.com. This command lets us extract from the search engine the list of different pages belonging to a particular internet site, that have been logged in the search engines database.
In order to run the command, surf to the search engine, and in the search box write the command site: followed immediately by the domain name, as can be seen in the following diagram:
Clicking on the "Google Search" button will display the list of pages from that domain which are in the Google search engine. Similarly, we can check whether a specific webpage is recognized by Google, by typing in its exact address, instead of the domain address.
Now that we know how to find out which pages of a particular website are recognized by the search engine, we are ready to move on to the tool that will show us how the search engine spider "sees" the different pages on the site.
When we looked for a particular webpage, or for all the pages of a particular domain, in Google, the results were displayed as a list. If we look at the list, we can see that next to the address of each webpage is a link "Cached", see the diagram below:
Clicking on the "Cache" link will take us to a page divided into two parts: the lower part is the web page, and the upper part (in a frame) contains information about the web page. This part also includes a link to "Cached Text", see the diagram below:
Pressing on the "Cached Text" link will display the webpage as the Google spider managed to detect. Note that you don't see the pictures, and if the <ALT> tag has been used, you can see the tags text, instead of the picture. In addition, if H tags and links have been used, you will be able to see how the text on the page appears to the spiders, see below:
If parts of the text that are visible on the website are not visible in the Cache, it's worthwhile checking whether a technique has been used to hide the text from the spiders.
Translated by Debi Zylbermann
|