Help - Search - Members - Calendar
Full Version: yikes! how do i manage the search engine filing?
Hostony Board > General Support > General Support
rez
I have a new hostony account for a site. I did a search on this site in yahoo. I can see that the search engines have filed all kinds of stuff from the previous host. individual picture pages from photo galleries, individual pages from calendar.pl., etc. There are a lot of ridiculous things in yahoo for this site! Like direct links to a page that appears to be the frame that is just the navigation. What a mess!! Some stuff, people shouldnt even be looking at.

With this new site, I would like to manage what is listed in the search engines. It would be nice to have the index.htm and maybe a few others show up like the web cam page or forum.

How do I manage what gets filed? I have read about meta tags and how to make use of good keywords. But that only tells me how to increase my chances of getting stuff in the engines. I guess my concern would be how to prevent pages and individual files from filing.

I have heard about a robots.txt. Is that the answer? Or maybe one of the answers? If so, do I just edit the contents? Probably directions somewhere on the web, right?

Any ideas on other info on how to manage this situation and what's causing this problem?

This is horrible. Thanks for any direction.
Webslave
robots.txt is the answer!

http://www.yoursite.com/robots.txt


That's what you write in your robots.txt
These are for e.g. folders which you wanna block out of spiders:


User-agent: *
Disallow: /forum/admin/
Disallow: /forum/cache/
Disallow: /forum/docs
Disallow: /forum/lang_german
Disallow: /forum/db/
Disallow: /forum/files/
Disallow: /forum/images/
Disallow: /forum/includes/
Disallow: /forum/language/
Disallow: /forum/templates/
Disallow: /forum/common.php
Disallow: /forum/config.php
Disallow: /forum/groupcp.php
Disallow: /forum/memberlist/
Disallow: /forum/memberlist.php
Disallow: /forum/modcp/
Disallow: /forum/privmsg
Disallow: /forum/profile
Disallow: /forum/search
Disallow: /forum/statistics.php
Disallow: /forum/viewonline.php
Disallow: /forum/login
Disallow: /forum/faq
Disallow: /forum/./search.php
Disallow: /forum/portal.php?
Disallow: /forum/./index.php
Disallow: /forum/./memberlist
Disallow: /forum/index.php?mark=forums
Disallow: /forum/posting
Disallow: /forum/viewtopic
Disallow: /forum/sutra
Disallow: /forum/ptopic
Disallow: /forum/ntopic
Disallow: /forum/faq.php
Disallow: /forum/viewforum1-0
Disallow: /test
Disallow: /forum/viewforum2-0
Disallow: /cgi-bin/
Disallow: /forum/watched_topics.php
Disallow: /forum/ftopic132-0


#
#Despictable and evil robots to keep out smile.gif

User-agent: scooter
Disallow: /forum/

User-agent: grub-client
Disallow: /

User-agent: grub
Disallow: /

User-agent: looksmart
Disallow: /forum/

User-agent: WebZip
Disallow: /

User-agent: larbin
Disallow: /

User-agent: b2w/0.1
Disallow: /

User-agent: Copernic
Disallow: /

User-agent: psbot
Disallow: /

User-agent: Python-urllib
Disallow: /

User-agent: Googlebot-Image
Disallow: /

User-agent: NetMechanic
Disallow: /

User-agent: URL_Spider_Pro
Disallow: /

User-agent: CherryPicker
Disallow: /

User-agent: EmailCollector
Disallow: /

User-agent: EmailSiphon
Disallow: /

User-agent: WebBandit
Disallow: /

User-agent: EmailWolf
Disallow: /

User-agent: ExtractorPro
Disallow: /

User-agent: CopyRightCheck
Disallow: /

User-agent: Crescent
Disallow: /

User-agent: SiteSnagger
Disallow: /

User-agent: ProWebWalker
Disallow: /

User-agent: CheeseBot
Disallow: /

User-agent: LNSpiderguy
Disallow: /

User-agent: Mozilla
Disallow: /

User-agent: mozilla
Disallow: /

User-agent: mozilla/3
Disallow: /

User-agent: mozilla/4
Disallow: /

User-agent: mozilla/5
Disallow: /

User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows NT)
Disallow: /

User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 95)
Disallow: /

User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 98)
Disallow: /

User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows XP)
Disallow: /

User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 2000)
Disallow: /

User-agent: ia_archiver
Disallow: /

User-agent: ia_archiver/1.6
Disallow: /

User-agent: Alexibot
Disallow: /

User-agent: Teleport
Disallow: /

User-agent: TeleportPro
Disallow: /

User-agent: MIIxpc
Disallow: /

User-agent: Telesoft
Disallow: /

User-agent: Website Quester
Disallow: /

User-agent: moget/2.1
Disallow: /

User-agent: WebZip/4.0
Disallow: /

User-agent: WebStripper
Disallow: /

User-agent: WebSauger
Disallow: /

User-agent: WebCopier
Disallow: /

User-agent: NetAnts
Disallow: /

User-agent: Mister PiX
Disallow: /

User-agent: WebAuto
Disallow: /

User-agent: TheNomad
Disallow: /

User-agent: WWW-Collector-E
Disallow: /

User-agent: RMA
Disallow: /

User-agent: libWeb/clsHTTP
Disallow: /

User-agent: asterias
Disallow: /

User-agent: httplib
Disallow: /

User-agent: turingos
Disallow: /

User-agent: spanner
Disallow: /

User-agent: InfoNaviRobot
Disallow: /

User-agent: Harvest/1.5
Disallow: /

User-agent: Bullseye/1.0
Disallow: /

User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95)
Disallow: /

User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
Disallow: /

User-agent: CherryPickerSE/1.0
Disallow: /

User-agent: CherryPickerElite/1.0
Disallow: /

User-agent: WebBandit/3.50
Disallow: /

User-agent: NICErsPRO
Disallow: /

User-agent: Microsoft URL Control - 5.01.4511
Disallow: /

User-agent: DittoSpyder
Disallow: /

User-agent: Foobot
Disallow: /

User-agent: WebmasterWorldForumBot
Disallow: /

User-agent: SpankBot
Disallow: /

User-agent: BotALot
Disallow: /

User-agent: lwp-trivial/1.34
Disallow: /

User-agent: lwp-trivial
Disallow: /

User-agent: BunnySlippers
Disallow: /

User-agent: Microsoft URL Control - 6.00.8169
Disallow: /

User-agent: URLy Warning
Disallow: /

User-agent: Wget/1.6
Disallow: /

User-agent: Wget/1.5.3
Disallow: /

User-agent: Wget
Disallow: /

User-agent: LinkWalker
Disallow: /

User-agent: cosmos
Disallow: /

User-agent: moget
Disallow: /

User-agent: hloader
Disallow: /

User-agent: humanlinks
Disallow: /

User-agent: LinkextractorPro
Disallow: /

User-agent: Offline Explorer
Disallow: /

User-agent: Mata Hari
Disallow: /

User-agent: LexiBot
Disallow: /

User-agent: Web Image Collector
Disallow: /

User-agent: The Intraformant
Disallow: /

User-agent: True_Robot/1.0
Disallow: /

User-agent: True_Robot
Disallow: /

User-agent: BlowFish/1.0
Disallow: /

User-agent: JennyBot
Disallow: /

User-agent: MIIxpc/4.2
Disallow: /

User-agent: BuiltBotTough
Disallow: /

User-agent: ProPowerBot/2.14
Disallow: /

User-agent: BackDoorBot/1.0
Disallow: /

User-agent: toCrawl/UrlDispatcher
Disallow: /

User-agent: WebEnhancer
Disallow: /

User-agent: suzuran
Disallow: /

User-agent: VCI WebViewer VCI WebViewer Win32
Disallow: /

User-agent: VCI
Disallow: /

User-agent: Szukacz/1.4
Disallow: /

User-agent: QueryN Metasearch
Disallow: /

User-agent: Openfind data gathere
Disallow: /

User-agent: Openfind
Disallow: /

User-agent: Xenu's Link Sleuth 1.1c
Disallow: /

User-agent: Xenu's
Disallow: /

User-agent: Zeus
Disallow: /

User-agent: RepoMonkey Bait & Tackle/v1.01
Disallow: /

User-agent: RepoMonkey
Disallow: /

User-agent: Microsoft URL Control
Disallow: /

User-agent: Openbot
Disallow: /

User-agent: URL Control
Disallow: /

User-agent: Zeus Link Scout
Disallow: /

User-agent: Zeus 32297 Webster Pro V2.9 Win32
Disallow: /

User-agent: Webster Pro
Disallow: /

User-agent: EroCrawler
Disallow: /

User-agent: LinkScan/8.1a Unix
Disallow: /

User-agent: Keyword Density/0.9
Disallow: /

User-agent: Kenjin Spider
Disallow: /

User-agent: Iron33/1.0.2
Disallow: /

User-agent: Bookmark search tool
Disallow: /

User-agent: GetRight/4.2
Disallow: /

User-agent: FairAd Client
Disallow: /

User-agent: Gaisbot
Disallow: /

User-agent: Aqua_Products
Disallow: /

User-agent: Radiation Retriever 1.1
Disallow: /

User-agent: WebmasterWorld Extractor
Disallow: /

User-agent: Flaming AttackBot
Disallow: /

User-agent: Oracle Ultra Search
Disallow: /

User-agent: MSIECrawler
Disallow: /

User-agent: PerMan
Disallow: /

User-agent: searchpreview
Disallow: /
rez
Hey, that was a fast response!

So, you mean just make my own text file with the first line as:

http://www.mysite.com/robots.txt



Then put the other stuff you have listed below it? Where do I put this text file or is it already on my hostony site?
rez
Also, can I do pages in my root or just folders? Like:

Disallow: /test_page.htm


And if I do a folder will it keep out search engines fom filing anything in it? Like:


Disallow: /test

would cove everthing in that folder?


Thank you! Do you have a link that you use to keep your robots.txt up to date? Like a forum or something?
Webslave
QUOTE
Hey, that was a fast response!
thanx - but don't mistake me to be a Hostony co worker, just online waiting for my own ftp answers from hostony.


QUOTE
So, you mean just make my own text file with the first line as:

http://www.mysite.com/robots.txt

Then put the other stuff you have listed below it? Where do I put this text file or is it already on my hostony site?


you must place the robots.txt file in your root folder at hostony.

QUOTE
Also, can I do pages in my root or just folders? Like:

Disallow: /test_page.htm
You do as you wish, pages, folders, as many as you want. No problem there.



QUOTE
And if I do a folder will it keep out search engines fom filing anything in it? Like:

Disallow: /test

would cove everthing in that folder?


User-agent: * {means all spiders}

Disallow: /test {it will not spider whatever is in test}

QUOTE
Thank you! Do you have a link that you use to keep your robots.txt up to date? Like a forum or something?


You don't update that dynamically, just by hand, you create a new folder that you don't want to get spidered, just include it in that file. Just make one robots.txt file per domain, or subdomain. Anyway, i don't use robots.txt at all!!! I'm a search engine guru and I want them to spider everything! That's how i make several thousand unique users a DAY! biggrin.gif
Alexandre
You can use "HotLink Prevention " in your cpanel:
Hotlinking is when another web site owner links directly to one or more of your images or multimedia files and includes it on their web page.

CPanel can prevent hotlinking by only allowing named sites (such as your own web site) to access files on your site.

To prevent hotlinking:
1. Click on the HotLink Protection button on the home page.
2. Enter any other addresses that you will allow to access your site other than the provided defaults in the central area.
3. Enter the protected extensions in the Extensions to allow field. Make sure you separate each extension with a comma.
4. Enter the address to redirect any hotlinking to in the Url to Redirect to field.
5. Click on the Allow direct requests tick box if you want to allow direct URL access to non-HTML files, such as images.
6. Click on the Activate button.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2024 Invision Power Services, Inc.
IPS Driver Error

IPS Driver Error

There appears to be an error with the database.
You can try to refresh the page by clicking here