29
Jul
detecting cloaked web pages Best answer on the web
Author: mike // Category: xn--g7qx97f.comThanks,
webadept-ga
I used to use IP cloaking (before it was frowned on by the search engines) and the script was free. This is free too http://scriptsmatrix.com/Detailed/558.html - but has since been removed. This is also an inexpensive IP cloaking service http://www.improved-ranking.com. However it is important to point out I am not recommending these services but simply highlighting their existance and the ease that unscrupulous webmasters can implement IP cloaking. Conversely at the other end of the spectrum, in order to see these cloaked pages you would need to perform IP spoofing, to impersonate an identity (one of the search engines) by assuming their IP address and user agent, which is legally and ethically questionable in itself. regards
lot-ga
I didn't find a way to accurately detect a cloaked page, (as the major search engines have trouble) but any pages you suspect may be cloaked, I can tell some methods to reveal cloaked pages in a deliberately vague way, as the methods of doing so may be illegal and against policies. So you will have an idea and not a step by step guide of how to actually do it. A bit limiting I know, but if you will find this useful let me know, kind regards
lot-ga
As Hailstorm and Lot have pointed out, there are different types of cloaking, which this program doesn't look at. However, there are acceptable reasons to cloak a page using these methods. For instance, the IP address is known to come from France, so your page is sent out in the French language, rather than English. Or, the IP address was recorded for a registered user, so his preferences are given rather than the basic page. A page is created for IE, Netscape and "others" and the cloaking program sends the pages in the format best seen by those browsers. These are all reasonable means and methods of cloaking which are used by websites. Heck, I use them.
What Google is miffed about is cloaking to the googlebot itself, and this is much more difficult to cloak. First off IP cloaking doesn't work, because, well, They are Google. As Lot pointed out as well, the IP can be spoofed. I can set up my server to have a spoofed Google address and run the program through that using code to send through that "spoofed" port and wala! I'm the googlebot. I'm not going to do that with this program, as it's not really required to do so.
Cloaking to the bots is much harder, especially if the bots are checking. It is really easy for them to jump an IP address, become a IE browser, get the page and check again as the normal Bot. Just about any programmer at my level can do it too. So most of the "hype" on the page Hailstorm has given about the "greatness" of their service is just that.. hype.
So, there is nothing wrong with cloaking a page for user viewing, and the service Hailstorm has given looks really good for that, and like I said, I do it myself. It's very useful for the users and helps keep your site fresh and alive. But cloaking to the bots is not easy to get away with and has huge repercussions. Also, as you say, most companies aren't going to spend the money on IP cloaking for Bots. They may do it for Users and IP blocks, but not for the bots. Anyone that runs a webserver for any length of time knows that the IP addresses change for the bots at random times. So a company could get unlisted simply because they didn't respond to the bot when it showed up, if it wasn't done right.
Thanks,
webadept-ga
tobes
Anyway, as far as cloaking goes, it is a lot of work, very high risk, and difficult to maintain. The simple check I created there, simple meaning rather fast to create, not in technology, is really easy to add too, and make even more devious. Search engines don't do it a lot because it is processor intensive and they need to keep their servers running as fast as they can. But I am sure that they do run checks periodically on various sites. Spot checks if you will. There are times when I'm asked "Why did my site suddenly vanish after being on top for the last year" and I find out that they were using a cloaked page. That answers the question. Also it brings up the huge risk.
Once you are out of the index, it takes a very long time to get Google to put you back, and when they do your pagerank is down, far down. They don't like it, and they say so rather bluntly. So if you are a serious company on the Internet with a mind for a future, why risk it? PageRank and Page Relevance are available to anyone willing to put in the time and effort. There are no guarantees, but there is a guarantee that if you do cloak, eventually the bots are going to figure it out and your company is suddenly going to disappear.
Faith in humanity? Maybe, maybe not. Faith in personal survival.. yes.. definitely. By the way, the one you found you can report to Google using this page here.. :
://www.google.com/contact/spamreport.html
Thanks
webadept-ga
1. What is cloaking?
The term "cloaking" is used to describe a website that returns altered webpages to search engines crawling the site. In other words, the webserver is programmed to return different content to Google than it returns to regular users, usually in an attempt to distort search engine rankings. This can mislead users about what they'll find when they click on a search result. To preserve the accuracy and quality of our search results, Google may permanently ban from our index any sites or site authors that engage in cloaking to distort their search rankings.
That is what Google calls cloaking. The way to detect this is to search the page twice, once making the server think the Googlebot is looking at it, and the second time by telling it a Webbrowser like Netscape is looking at it, this can be done with a Perl program.
Since there doesn't appear to be a program out there for public use to do this, I decided to make one since it would only take an hour or so to accomplish that and it would be rather cool to have. You can go to:
http://www.webadept.net/cloaker/index.html
and use the program there to check websites.
Thanks,
webadept-ga
http://www.searchengineworld.com/misc/cloaking_agents.htm
There are five different types of cloaking:
1) User Agent Cloaking (UA Cloaking)
2) IP Agent Cloaking (IP Cloaking)
3) IP and User Agent Cloaking (IPUA Cloaking).
4) Referral based cloaking.
5) Session based cloaking
Some of these, especially IP cloaking, is extremely advanced. Fantomaster site http://fantomaster.com sells a database that updates all the search engine robot IP addresses several times a day.
So, I am sorry to say that I don't think webadept's tool can work on advanced cloaking of this nature.
#If you have any other info about this subject , Please add it free.# |
// Add Comment