# User-agent strings Inktomi Slurp WebCrawler GoogleBot Googlebot ZyBorg Openbot Scooter ia_archiver UrlDispatcher Ask Jeeves larbin MarkWatch www.walhello.com Netprospector Crawler Ultraseek AlkalineBOT libwww-perl bumblebee@relevare.com Inet32 Ctrl 3D_SEARCH LinkLint Teleport vspider BaiDuSpider TITAN/ GetRight/ WebTrends Link Analyzer MyApp/ SystemSearch-robot contype lwp-trivial/ lwp-request/ LexiBot linkbot cosmos/ webcollage byteserver/ NetAnts/ RaBot NEC-MeshExplorer Crawl_Application Aport nabot_ Verity-URL-Gateway WebSpider viasarchivinginformation.html searchengine autoemailspider Xenu WebStripper Robot Web Downloader WebCopier BizWorks Retriever Pompos/ Gulliver/ ASPSeek/ FirstGov.gov Search psbot/ Sqworm/ MOSES WebWasher CacheBlaster/ FlashGet wwwxref/ VoilaBot/ Gaisbot/ htdig/ grub-client Grubclient- SQ Webscanner SlySearch/ Mercator- gsa-crawler DOF-Verify/ WebCapture EasyDL/ Rotondo/ InternetLinkAgent/ Gigabot/ DigOut4U Cyberdog/ ASPseek/ http://www.almaden.ibm.com/cs/crawler eCatch/ indylabs_marius b2w/ AnswerChase PROve Mass Downloader/ Teradex Mapper Lycos_Spider RPT-HTTPClient/ WebBOT Szukacz/ GaisBot/ targetblaster.com/ Infoseek Sidewinder/ InfoSeek Sidewinder/ Infoseek SideWinder/ DiaGem/ Steeler/ Fluffy the spider searchhippo.com webbandit/ webfetch/ ah-ha.com crawler Lytranslate/ Vagabondo/ moget/ WebZIP/ Robozilla/ Oracle Ultra Search w3m/ EmailLeach OrangeBot iCab/ metacarta TurnitinBot/ DISCoFinder InfoLink/ crawler_for_infomine minibot NetResearchServer/ EmailWolf eidetica.com/spider polybot HomePageSearch spider WebFetch AnswerBot UdmSearch NG/ 3DE_SEARCH2 Mozilla/3.0 (compatible) Mozilla/3.01 (compatible;) DoCoMo/ URLBlaze Ad Muncher Net Vampire/ EyeNetIE www.galaxy.com/info/crawler.html Net Vampire/ DoCoMo/ nabot Pita Knowledge Engine Webclipping.com SSM Agent INGRID/ Mister PiX IAArchiver- DISCo Pump NPBot- Python-urllib/ Zeus Lachesis lachesis HLoader Caddbot/ Zao/ WWW-Mechanize/ Nessus VoilaBot Wget/ WebReaper LARBIN-EXPERIMENTAL efp@gmx.net Goldfire Server Mozilla/2.0 (compatible; NEWT ActiveX; Win32) T-H-U-N-D-E-R-S-T-O-N-E thatrobotsite.com NPBot WebSauger Mozilla/3.0 (compatible; Indy Library) WebGather MFHttpScan NutchOrg/ SiteFinder/ search.usgs.gov TestBot Space Bison/ Sleipnir QuepasaCreep MSNBOT/ pavuk/ VSE/ SearchSpider.com/ SpiderKU/ Dattatec.com-Sitios-Top Microsoft Internet Explorer/4.40.426 WebVac YahooSeeker/ infomine.ucr.edu SiteSweeper PageFetcher/ msnbot/ GetHTMLContents Cowbot- freshlinks.exe Cowdog Bot Art-Online.com AnswerBus MultiText/ Dattatec.com NaverBot- Panopy Bot ez-robot Scich/ Dumbot Web Magnet LinkWalker TranSGeniKBot LinkScan/ Links SQL Google/ antibot- Wotbox/ SpiderEngine X-clawler X-crawler TygoBot TulipChain NetResearchServer LinkSweeper/ Z-Add Link Checker web-agent/ NaverBot_dloader/ mini-robot/ www.searchnear.com Botswana nuSearch Spider slurp CrawlConvera geometabot/ kuloko-bot/ Webdup/ JoeDog/ Dillo/ Ocelli/ GPU p2p crawler TheUsefulbot_ Unitek UniEngine/ COAST scan engine/ BlackMask.Net Search Engine Offline Explorer/ EAH/ Spider.TerraNautic.net Scrubby/ Check&Get IECheck GoForIt.com sohu-search NuSearch Spider Search-Channel NutchCVS/ LinkAlarm/ CSE HTML Validator oBot WISEbot/ boitho.com-dc/ EMPAS_ROBOT Jetbot/ iVia Site Checker findlinks/ USS-Cosmix/ mod_accessibility ht://check/ LookBot CheckLinks/ Html Link Validator osis-project.jp Seekbot/ Iltrovatore-Setaccio/ Website Downloader BDFetch CreativeCommons/ Nutch SOFT411 Directory Yandex/ ClariaBot/ MSNPTC/ heritrix/ HiSoftware AccVerify Diamond/ OmniFind Download Ninja Artera contype FAST-WebCrawler/ Yahoo-VerticalCrawler-FormerWebCrawler/ falcon/ Shareaza ZipppBot/ WorQmada/ Web Link Validator CFNetwork/ CyberSpyder FAST Data Search Document Retriever/ ImageBot ABACHOBot Microsoft Office Protocol Discovery Microsoft Data Access Internet Publishing Provider Cache Manager EmailSiphon searchmarks f-bot test pilot Antro.Net www.sygol.com HSFT - Link Scanner w3search Buscaplus Robi/ adminshop.com statcrawler Clushbot/ Blinkx/DFS-Fetch Header_Test_Client fast-search-engine WebXM WebPix DiamondBot Express WebPictures www.superbargainspace.com archive.org_bot/ omnifind statbot Combine/ SpiderMan searchmarking/ SiteXpert ContentSmartz gazz/ FindAnISP.com_ISP_Finder CCGCrawl/ TravelBot/ NSDL_Search_Bot pipeLiner/ Webinator-WBI/ KummHttp/ crawler.kpricorn.org versus.integis.ch search.updated.com websearchbench Cafi/ LinkChecker/ Microsoft_Site_Analyst/ webcrawl.net HTTrack Buibui Stumbler athenusbot HiSoftware AccMonitor Server MetaGloss SURF TREX Validator/ HooWWWer/ www.thebananatree.org/ Spinne/ Favorites Sweeper MJ12bot/ maxomobot/ KnowItAll VisWeb WebIndex/ BruinBot ShowLinks/ Acme.Spider Nogate CD-Preload websphinx.test Missigua Locator ApacheBench RSSbot/ Yahoo-Newscrawler/ P3P Client googlebot WebAuto/ SiteSnagger Plucker/ NetShift= websitemirror Eventware/ Akamai-SiteSnapshot/ zagrebin theusefulbot inetbot/ iSiloX/ hcat/ gnome-vfs/ didaxusbot WebTrafficExpress/ Talkro Web-Shot/ TPSystem SuperBot/ SpaceBison/ SiteSucker/ SEB Spider Poodle predictor PhpDig/ PageRank Monitor NASA Search MonkeyCrawl/ Mag-Net Lament/ LMQueueBot/ K.S.Bot Hooblybot-Image/ HTMLParser/ GoodJelly/ DiGi-RSSBot CydralSpider/ Amfibibot/ AONDE-Spider/ moduna.com/ BecomeBot/ YottaCars_Bot/ DOY/ Forest Conservation Spider Patwebbot tankvit@e-mail.ru Eco-Portal Spider EcoEarth Portal sherlock/ Knowledge.com/ smartwit.com W3C-WebCon/ Digger/ USyd-NLP-Spider ichiro/ Link Checker/ accoona EduGovSearch/ WinHTTP Example/ FDM 1.x unchaos_crawler_ TygoProwler grub crawler IRLbot/ GigabotSiteSearch/ FreshDownload/ WinHttp.WinHttpRequest. ClimateArk Spider BoaConstrictor/ CyberNavi_WebGet/ updated/ wish-la Custo InelaBot/ oegp v. Water Conserve Spider labourunions411/ SpeedySpider FindWeb OmiExplorer_Bot/ btbot/ Govbot/ tScholarsBot OmniExplorer_Bot/ www.unchaos.com NWSpider www.nameprotect.com cfetch/ AtlocalBot/ abot/ SCrawlTest/ Forests.org Spider aipbot/ MojeekBot/ DataFountains/DMOZ Downloader Speedy Spider McBot/ BigCliqueBot/ eStyleSearch ProjectWF-java-test-crawler Rational SiteCheck wgao@genieknows.com Download Master Kitenga-crawler-bot BigCliqueBOT/ Ocean Conserve Spider snap.com beta crawler IlTrovatore-Setaccio/ Blaiz-Bee/ Metager2 GOFORITBOT genevabot versus crawler Filangy/ Twiceler Eco-Portal http://www.environmentalsustainability.info/ Yahoo-MMAudVid/ nicebot Squid-Prefetch parasite cookieNET PageBitesHyperBot/ Dir_Snatch.exe CipinetBot WebZIP Arachmo ccubee/ IEAutoDiscovery Bilbo/ WebZIP CE-Preload endeca HSFT - LVU Scanner rssImagesBot/ Twisted PageGetter Space Fung/ BrowserEmulator/ Microsoft Scheduled Cache Content Download Service locust PoCoHTTP Web Site Downloader Seeker.lookseek.com Norbert the Spider searchbot grapeFX/ CSHttpClient/ AIBOT/ Miva iVia/ WebIndexer/ Microsoft URL Control AF Knowledge Now Verity Spider MemacBot ExactSearch UofTDB_experiment www.Syntryx.com Charlotte/ NewMedhunt/ Metaspinner/ LeechGet yacy.net MSRBOT/ SygolBot Zyte/ axfeedsbot/ testbot OnetSzukaj/ LinksManager.com_bot Helix/ MyEngines-US-Bot maxamine.com--robot SBIder/ Newsgroupreporter IpselonBot/ byindia/ Xerka WebBot Xerka MetaBot Der große BilderSauger Tecomi Bot fgcrawler TutorGigBot/ yoono/ LocalcomBot/ dtSearchSpider WebMiner/ wbdbot via translate.google.com WebDownloader for X wume_crawler/ Eco Earth Spider Checkbot/ Thumbnail.CZ robot National Park Service Dan Buan KATATUDO-Spider xirq/ Tarantula/ its-learning crawler Cuasarbot Ipselonbot/ Deepindex DataSpearSpiderBot/ libiViaCore/ generate_infomine_category_classifiers RufusBot sohu agent Everest-Vulcan Inc./ Yahoo-Blogs/ focused_crawler SocSciBot SiteArchive WebMiner wacbot ObjectsSearch/ Pockey-GetHTML/ OutfoxBot/ SearchBlox PHPCrawl STEGMANN-Bot Plumtree 6.0; AChulkov.NET page walker SearchIt-Bot/ crawler@ LetsCrawl.com/ COAST WebMaster Pro/ voyager/ AVSearch- InspireBot Eco Earth Portal genieBot Theophrastus/ MyFamilyBot/ fr_crawler Myra Wavefire/ Forest Conservation Portal, 1Noonbot ActiveTouristBot DataSpider/ MSRBOT Kyluka krawl Water Conserve Portal Cerberian Drtrs silk/ BOI_crawl_00 Vortex/ ZoomSpider KSbot/ BuyHawaiiBot Arikus_Spider ImagesHereImagesThereImagesEverywhere/ GruBot OpenIntelligenceData/ Evaal/ cvaulev c r a w l 3 r Forschungsportal/ LBot Girafabot combine/ COMBINE/ www.dlese.org virus-detector web crawler Link Checker Scumbot/ SearchSpider.com GulperBot Wysigot AccMonitor Compliance Server DTAAgent Y!J-BSC/ NetSongBot/ full_breadth_crawler DataFountains/ AISIID/ Vermut WebCorp/ Poirot searchmee_v Syntryx ANT Scout Chassis Pheromone; Mozilla/4.0 compatible crawler PerMan Surfer rsssupport@repia.com Climate Change Portal http://www.climateark.org/ zimeno/ webcrawler RAMPyBot YahooSeeker-Testing/ vermut +http://vermut.aol.com pucl/ personal ultimate crawler Snoopy __TBJ_WEB_CRAWLER__ TERAGRAM_CRAWLER www.octora.com exactseek.com VIP/ Exabot-Images/ BuildCMS crawler mercuryboard_user_agent_sql_injection.nasl Linkman wwwster/ AdamM Bot, webbot bzBot/ worldshop/ search.msn.com/msnbot.htm NewsGatherer/ EcoEarth.Info Environment Portal bot/ FurlBot/ ScollSpider; Favcollector/ WIRE/ NLese Feedfetcher-Google; FeedBurner/ Jakarta Commons-HttpClient/ IlTrovatore/ INFOMINE/ Mammoth/ Searchmee! Spider Crawl/ BecomeJPBot/ Sokitomi crawl; http://www.sokitomi.com/crawl.html Megite Geomaxenginebot Skywalker VIPr/ WEBCRAWLER@VUNET.ORG KeepNI web site monitor MFcrawler 1on1searchBot/ Yeti SuperPagesBot/ Search Publisher seeqpod-vertical-crawler NewsTroveBot MQbot metaquerier.cs.uiuc.edu Big Brother Froola Bot webGobbler/ zedzo.validate/ zedzo.digest/ FedContractorBot/ EARTHCOM.info/ !Susie Harvest/ BaiduSpider virus_detector Oracle Secure Enterprise Search FlashCapture Alpha Search Agent yoogliFetchAgent yarienavoir.net/ topicblogs/ robotek page_verifier online link validator augurfind Selflinkchecker scSpider/ KakcleBot JetBrains Omea Pro HyperEstraier/ Gigabot 12soso/ pixfinder/ JoyScapeBot/ BySpider DXSeeker/ psycheclone bottybot Exabot-Test/ Exabot-XXX/ Nusearch Spider MaSagool/ WeRelateBot/ Fetch API Request TargetYourNews.com bot robots/ MAINSEEK_BOT Lydia Entity Spider Link Validator BIGLOTRON Bookmark Buddy bookmark checker UniversalSearch/ imds_monitor/ BilgiBetaBot/ VMBot/ crawl@digigetx.com perform_crawl HSlide/ Vital Search'n Urchin AESpider/ NaverBot/ METASpider SumitBot SquidClamAV_Redirector AstroFind/ QFKBot Website Quester DigExt; DTS Agent Weddings.info Bot/ ozelot/ Webinator-search2.fasthealth.com/ PsBot qualidade/ WEPA/ Blog Conversation Project; factbot ODP entries t_st; BilgiBot/ YahooFeedSeeker/ statedept-crawler YahooFeedSeeker Testing/ Exploder/ miniRank/ Spider wastrix/ TridentSpider/ arianna.libero.it WebImages Touche StackRambler/ PicoSearch/ NetMechanic vlad/ JobSpider_BA/ InsumaScout/ Fopper Chameleon/ Climate Ark http://www.climateark.org/ info seeker/ vlsearch (http://vlib.org/admin/robot) MnoGoSearch/ Pagebull http://www.pagebull.com/ http://pressemitteilung.ws/ ConnectSearch SurveyBot/ Factbot iVia Page Fetcher LinkCheck Scanner/ IU_CSCI_B659_class_crawler/ SurfControl kinjabot (http://www.kinja.com) Rondello/ FDSE robot SrevBot holmes/ IlseBot/ gsa-accuracyEval Qweery_robot.txt_CheckBot/ NLESE USEPA BLT/ Earth Science Educator robot crawl@globrix.com Nerima-crawl- TerrawizBot/ DataparkSearch/ Gordon-College-Google-Mini TravelLazerBot/ PrivacyFinder/ cs-crawler +http://citeseer.ist.psu.edu del.icio.us-thumbnails/ LT Scotland Checklink/ nextthing.org/ nys-crawler Y!J-PSC/ Blogslive JemmaTheTourist Web-Sniffer/ (www.cotse.net; Anon Proxy) ShopWiki/ pythonic-crawler (suzuki@tkl.iis.u-tokyo.ac.jp) crawler43.ejupiter.com QweeryBot/ Net::Trackback/ BYINDIA/ FU-NBI/FU-NBI- IIITBot (pvvpr@iiit.net) Y!J-SRD/ Hatena Antenna/ RedCarpet/ GurujiBot/ RSS_READER (mctwist@mail.dr-k.info) WebaltBot/ GigaBot/ AASP/ Whirlpool Web Engine OmniWeb http://www.mozilla.org/docs/en/bot.html; master@mozilla.com AboutUsBot/ TRAAZI/ masidani_bot_ LamerExterminator/ Lsearch/sondeur Yoono; http://www.yoono.com/ core-project/ zibber-v (www.zibb.com/crawler/) Pansophica/ iSearch/ ImageVisu/ Yoriwa/ kulturarw3 +http://www.kb.se/kw3/ bladder fusion 1-More Scanner OutfoxMelonBot/ GeoVisu/ BoardReader-Image-Fetcher cataguru/ FunnelBack; http://cyan.funnelback.com/ LM Harvester Metaeuro Web Search Treezy/ Mozilla/2.0 (compatible; MSIE 4.0; Windows 98) DataCha0s/ wish-project (http://wish.slis.tsukuba.ac.jp/) AlexfDownload My_Little_SearchEngine_Project/ Vacobot; (+http://vaco.ws/bot.html) http://www.t6labs.com// SumeetBot my-heritrix-crawler( yellowJacket/ ChemieDE-NodeBot/ Koninklijke Bibliotheek web archive (heritrix +http://www.kb.nl) BlogMyWay.Net/BlogMyWay-0.8.1 (admin@blogmyway.org) MYCOMPANYBOT RoboPal (http://www.goldcave.com/) Heritrix/ LapozzBot/ domaincrawler/ ScientificCommons.org/ OpenISearch/ HarvestMan Yahoo-Test/ Ocean Conserve http://www.oceanconserve.org/ gsa (Enterprise; GIX- Pogodak.co.yu/ Trovator heritrix bot ExaBotTest/ Little Grabber at Skanktale.com USAF AFKN K2SPIDER kbeta1 +http://www.kotoha.co.jp Exabot Test/ SiteOrbiter Interseek/ SnapPreviewBot Kyluka crawl; http://www.kyluka.com/crawl.html; crawl@kyluka.com here will be link to crawler site testing of bot; PiyushBot ROBOT Tailrank; http://tailrank.com/robot PythonWikipediaBot/ AntiSantyWorm StarDownloader/ fembot (myd@cs.stanford.edu) VisBot/ penthesila/ ucb-nutch/ FeedChecker/ Btsearch/ libcrawl/ grbot Semager/ imo-google-robot-intelink AlexaWebSearchPlatform; +http://websearch.alexa.com http://www.changedetection.com/bot.html TeezirBot/ Canon-WebRecordPro/ pulseBot (pulse Web Miner) FDM 2.x Faviconizer crawler/ Hyperix/ wenbin/search LeapTag/ WebSpear/ Dit/ wwwrobot iim_405/ VadixBot SkreemRBot +http://skreemr.com Bigado.com/ Search-Engine-Studio OsO; http://oso.octopodus.com/abot.html +http://www.convera.com) TapuzBot/ GalaxyBot/ WikiaBot nestReader/ LolongBot/ mozilla (nlmoccssearchadmin@mail.nlm.nih.gov) Dwaar crawler (dwaarbot@dwaar.com) semanticdiscovery/ maxamine.com-robot LeapTag ( IIITBOT/ Sphider2 MSMOBOT http://www.artiesoft.com/lexxbot.php Francis/ RcStartBot Intelix/ Snappy/ imagefortress +http://www.worldbank.org) woriobot (+http://worio.com) opidig_1.0 (dfuhry@cs.kent.edu) BuscadorClarin/ voyager-partner-deep/ Blackbird/ CazoodleBot/ Obvius external linkcheck/ Sphere Scout&v GoogleReport Search Engine - http://www.googlereport.org Attributor.comBot Dwaarbot (dwaarbot@dwaar.com) NatchCVS/ (Natch; http://lucene.apache.org/natch/bot.html; natch-agent@lucene.apache.org) Obvius external linkcheck/ TsWebBot/ Sphere Scout PIENO robot robot; http://www.xrss.eu/robot; Webscan +http://otc.dyndns.org/webscan/ Bot; http://www.activetourist.com Canon-WebRecord/ HD nutch agent/ Distilled-Reputation-Monitor/ slow-crawler +http://casr.ou.edu WebBot/ Camcrawler (+http://www.camdiscover.com/crawler.html) mowserbot; http://www.mowser.com/bot Anonymous/3G bot optidiscover/ Giant/ (Openmaru bot; robot@openmaru.com) ASAHA Search Engine Turkey BlogPulseLive (support@blogpulse.com) Charlotte DiBot Elblindo the Blind Bot MT-Soft (http://www.mt-soft.com.ar) Mediapartners-Google Mozilla/5.0 (FunnelBack) Jim +http://www.hanzoarchives.com) SearchnowBot_v1; +http://www.searchnow.com) SummizeBot +http://www.summize.com) Wazzup1.0. pogodak.ba/ PWeBot/ PageDown/ froGgle/ kinjabot medrabbit/ optidiscover/ semisearch/ stero (http://www.stero.pl; News_Search_App/ Giant/ (Openmaru bot; robot@openmaru.com) Compatible;Viking/ BlogRefsBot/ SummizeFeedReader +http://www.summize.com Wazzup1.0.4800; http://32.fb.354a.static.theplanet.com/Wazzup) dejan/ hul-wax +http://hul.harvard.edu/ois/projects/webarchive/) sslbot +http://www.networking4all.com) rtgibot; http://rtgi.fr/) owsBot/ (owsBot; www.oneworldstreet.com; owsBot) Sapienti/Indexer KeywenBot/ Chilkat/ woriobot (+http://www.worio.com/) REBI-Shoveler/ LiteFinder/ gsa (Enterprise; wastrix/ CRAWLER-ALTSE.VUNET.ORG-Lynx you-dir/ voyager-hc/ DjangoTraineeBot/ BSearchR&D/ webLyzard/ testBOT/ suchclip (Kalooga; http://www.kalooga.com; info@kalooga.com) fetch_ici/ Wikio (http://www.wiko.fr) UCLA%20Google%20Serch%20Appliance%20%232%20%28contact%3A%20 UCLA%20Google%20Serch%20Appliance%20%231%20%28contact%3A%20 sslbot +http://www.networking4all.com) hul-wax +http://hul.harvard.edu/ois/projects/webarchive/) http://32.fb.354a.static.theplanet.com/Wazzup) Najdi.si/ EARTHCOM/ Giant/ (Openmaru bot; robot@openmaru.com) superbot.com; +http://www.super.info) (Exabot-Thumbnails) Zotag Search egothor/ SafariBookmarkChecker/ sportcrew-Bot (Grub.org crawler; http://www.grub.org/; bot@grub.org) search x-bot The Dyslexalizer @ http://spunc.dsturgeon.net taxinomiabot DAUMOA-video; +http://ws.daum.net/aboutkr.html Website Explorer/ iHWebChecker WEP Search Google Keyword Tool; +http://adwords.google.com/select/KeywordToolExternal) Mozilla/3.0 [en] (AWV2.72f) Proximic crawler; +http://www.proximic.com/en/about-us/contact-us.html) OpiDig DAUMOA-video; +http://ws.daum.net/aboutkr.html) quest.durato/ (Suchmaschine der durato Ltd.; http://quest.durato.de; quest@durato.de) kalooga/ (Kalooga; http://www.kalooga.com; info@kalooga.com) Zotag Search Atomic_Email_Hunter/ besserscheitern-crawl WatzBot Verifactrola/ Toplistbot TopServer PHP Webwasher/ TSM Translation-Search-Machine (www.ttn.ch) REBI-shoveler/ (REBI's great worker; http://rebi.co.kr; noreply@rebi.co.kr) DAUMOA-web; +http://ws.daum.net/aboutkr.html) Yahoo-Kids/ mailto:vertical-crawl-support@yahoo-inc.com) link_checker/ Bot Apoena http://www.katatudo.com.br/ajuda/ AmPmPPC.com (http://www.ampmppc.com/) Topicalizer/www.topicalizer.com) FormulaFinderBot/ AmPmPPC.com (+http://www.ampmppc.com/) Runnk RSS aggregator : http://www.runnk.com/ G10-Bot/ autowebdir 1.1 (www.autowebdir.com) D1GArabicEngine/ crawlmaster@d1g.com) travel-search GoSeebot; +http://www.gosee.com/bot.html) Earth Platform Indexer wikiwix-bot- EnaBot/ ninetowns woriobot +http://worio.com) Mozilla/4.0 (compatible; MSIE 4.0; Windows NT; ....../1.0 ) Getleft GetLeft ShablastBot Quintura-Crw/ TinEye/ (http://tineye.com/crawler.html) (XML Sitemaps Generator iearthworm/ iearthworm@yahoo.com.cn C:\Documents and Settings\Joe\Desktop\HARVEST EMAILS\SEGMENTS\ GeonaBot/ BloobyBot iearthworm/ YesupBot/ ; +http://www.yesup.net/bot.html) CCBot/ (+http://www.commoncrawl.org/bot.html) HMSEbot crawler-upgrade-config JoBo/ Daumoa/ proximic; +http://www.proximic.com) ScoutJet; +http://www.scoutjet.com/) InfoUSABot/ Attributor/Dejan- (Test crawler; http://www.attributor.com; info at attributor com) exooba/exooba crawler (exooba; exooba) zermelo; +http://www.powerset.com) [email:paul@page-store.com,crawl@powerset.com] BitvoUserAgent (+http://www.bitvo.com) Bitvo/ zermelo; +http://www.powerset.com) culsearch/culs/ (crawl@citeulike.org) Hostcrawler Sunrise XP/ AnotherBot wisponbot(http://www.wispon.com,mailto:wispon@theory.snu.ac.kr) woriobot support [at] worio [dot] com +http://worio.com) GrubNG swish-e http://swish-e.org/ search.KumKie.com SeeqBot +http://www.seeqpod.com YebolBot zermelo/ +http://www.powerset.com/about/zermelo) hijbul-heritrix-crawler (+http://mobide.korea.ac.kr/) Knight/ (Zook Knight; http://knight.zook.in/; knight@zook.in) Sphider BpBot/ (- -; http://blitzpost.com; search@blitzpost.com) FeedHub MetaDataFetcher/ attributor/ +http://www.attributor.com) Kyluka crawl; http://www.kyluka.com/static/crawl.html; crawl@kyluka.com) TeamSoft WinInet Component DotSpotsBot/ (crawler; support at dotspots.com) msnbot-products FeedFetcher(www.radian6.com/crawler) ScooperBot www.customscoop.com Yahoo Pipes 3GSE bot (Internet Research Institute UK, http://iri-uk.com) Kyluka crawl; crawl@kyluka.com; http://www.kyluka.com/static/crawl.html) 192.comAgent +http://www.evri.com/evrinid) Google Keyword Tool; +https://adwords.google.com/select/KeywordToolExternal) CamelStampede/ eSyndiCat Bot crawl.UserAgent ShadowWebAnalyzer (http://www.safety-lab.com/) Vishal For CLIA/clia-alpha-testing (Crawling for CLIA project ; www.cfilt.iitb.ac.in; vishalv@cse.iitb.ac.in) CollapsarWEB qihoobot@qihoo.net) Yanga WorldSearch Bot Runnk online rss reader : http://www.runnk.com/ : RSS favorites : RSS ranking : RSS aggregator hybridwse@runnk.com BlitzBOT@tricus.net ; ODI3 Navigator) Axonize-bot DotBot/ Yanga WorldSearch Bot Vishal For CLIA/ (Crawling for CLIA project ; www.cfilt.iitb.ac.in; vishalv@cse.iitb.ac.in) OOZBOT/ (--; http://www.setooz.com/oozbot.html; agentname at setooz dot_com) Runnk online rss reader wauuu engine/Wauuu (wauuu engine; http://www.wauuu.com; wauuu@wauuu.com) NESSUS::SOAP healia/healia (the personalized health search engine.; http://www.healia.com) Climate Ark - http://www.climateark.org/) ^Byte (http://CaretByte.com) A1 Sitemap Generator/ (+http://www.micro-sys.dk/products/sitemap-generator/) miggibot JadynAve - http://www.jadynave.com/robot JadynAveBot; +http://www.jadynave.com/robot Drupal (+http://drupal.org/) (vBSEO; http://www.vbseo.com) crawly@commandcom.com FatBot http://www.thefind.com/crawler) CoolCheck iCopyright Conductor Firebat (http://lms.virtual-presence.org) Google Bot 2 Beta ornl_crawler_1 Mozilla crawl/ (compatible; frt/ BlogScope/ +http://www.blogscope.net/; U of Toronto) snookit/Snookit (domains@snookit.com) NetID.com Bot IOI/ (ISC Open Index crawler; http://index.isc.org/; bot@index.isc.org) betaBot xqrobot RIIGHTBOT/RIIGHT- (riight.com; http://www.riight.com/riightbot; riightbot@riight.com) Y!J-BRI/ crawler ( http://help.yahoo.co.jp/help/jp/search/indexing/ Grub/ (IOI crawler; http://index.isc.org/; crawl@index.isc.org) Labhoo+(+http://www.labhoo.com/) FollowSite Bot ( http://www.followsite.com/bot.html ) All Acronyms Bot Shelob (shelob@gmx.net) hitcrawler_ Pathtraq/ BuzzBot/ Map robot (http://garminmapsearch.com/) MSE360 - FredBot (See: http://mse360.com/about/bot.php) GreenYogi [ZSEBOT] XML Sitemaps Generator; http://www.xml-sitemaps.com) Gecko XML-Sitemaps/ nu_tch-princeton/Nu_tch (princeton crawler for cass project; http://www.cs.princeton.edu/cass/; zhewang a_t cs ddot princeton dot edu) Smarte Bot Rome Client (http://tinyurl.com/64t5n) isara-search/Isara-1.0 (A non-profit search engine operated by a charity organization.; www.isara.org; webmaster@isara.org) goroam/ (goraom geo crawler; http://goroam.net/; info@goroam.net) DKIMRepBot/ +http://www.dkim-reputation.org) S2Bot/ (http://search2.net; bot[at]search2[dot]net) LynnBot/ RedBot/ (Indian Language Web Search Engine; Rediff.com; pvvpr at iiit dot ac dot in) eChooseBot/ Nambu URL Destination Determinator +bot http://nambu.com Hawler (http://spoofed.org/files/hawler/) Spinn3r (Spinn3r http://spinn3r.com/robot) QEAVis agent I am a friendly (BETA) R.O.B.O.T. I am crawling the web. If I do not respect your robots.txt, e-mail me. Soon I will have a website. More info at ggggggg73@hotmail.com SiteGuardBot support@@siteguard.com research robot UFAM-crawler- VerbstarBot/ Butterfly/ ; +http://labs.topsy.com/butterfly.html) PaxleFramework/ ; +http://www.paxle.net/en/bot) hclsreport-crawler (+http://hclsreport.com/ baypup/ (Baypup; http://www.baypup.com/; jason@baypup.com) KaloogaBot; http://www.kalooga.com/info.html?page=crawler) http://domino.research.ibm.com/comm/research_projects.nsf/pages/sai-crawler.callingcard.html MSIndianWebcrawl Fooooo_Web_Video_Crawl http://fooooo.com/bot.html) kulturarw3 +http://www.kb.se/soka/internet/sv-webbsidor/) YandexSomething/ WeBot/ uberbot Twitturls; +http://twitturls.com) Twitturly / Proxem WebSearch mnoGoSearch (http://www.truthsearch.us/) BackStreet Browser EllerdaleBot/ +http://www.ellerdale.com/crawler.html) ^Nail (http://CaretNail.com) Crawly/ +http://92.51.162.40/crawler.html) AutoBaron crawler 50.nu/ ( +http://50.nu/bot.html ) topyx-crawler HiScan LexxeBot/ (lexxebot@lexxe.com) SaladSpoon/ShopSalad (Search Engine crawler for ShopSalad.com; http://shopsalad.com/en/partners.html; crawler AT shopsalad.com) GenieBotRD_SmallCrawl ageorge@genieknows.com Sogou+web+robot+(+http://www.sogou.com/docs/help/webmasters.htm#07) FlickySearchBot/ bitlybot searchmining/ +http://www.searchtechnologies.com) DelvuBot/ TwengaBot/ (+http://www.twenga.com/bot.html) Plukkie/ http://www.botje.com/plukkie.htm) aria eQualizer /; +http://www.associatedresearch.org/search/) Isara/Isara- (A non-profit search engine for the benefit of charity.; http://www.isara.org; search@isara.org) aggregator:Vocus (VocusBot ); http://www.vocus.com/vnhs.html) Gecko/2009060215 AbotEmailSearch Linguee Bot (bot@linguee.com) FAST ESP Document Retriever/CVS HEAD SBL-BOT (http://sbl.net) Advanced Email Extractor Mozilla/3.0 (compatible; MemoWeb Enswer Neuro Bot/ Mail.Ru/ lssbot datascape robot ibuena robot@ibuena.net radian6_linkcheck_(www.radian6.com/crawler) EntityCubeBot InsightsWorksBot/ special_archiver/ +http://www.loc.gov/minerva/crawl.html) Yahoo! SearchMonkey 1.0; http://developer.yahoo.com/searchmonkey/useragent) aiHitBot/ +http://www.aihit.com/) XmarksFetch/ +http://www.xmarks.com/about/crawler; info@xmarks.com) to-night-Bot powered by www.to-night.de +http://www.to-night.de/pfadzurbotseite/bot.html) GleameBot VocusBot +http://www.vocus.com/vnhs.html) conpilot crawler (http://www.conpilot.com/privacy.jsp) Linguee Bot (http://www.linguee.com/bot; bot@linguee.com) nutch1/huntsman (huntsman@mailguard.com.au) citeseerxbot Dow Jones Searchbot) Typhoeus - http://github.com/pauldix/typhoeus/tree/master Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.2; AskTB5.5) cizilla.com/Cizilla- (CIZILLA - Complete Internet Search Engine; http://www.cizilla.com; crawl@cizilla.com) Google Site map Creator; http://www.lionhardt.ca/gmc) Apache-HttpClient/4.0.1 (java 1.5) CatchBot/1.0; +http://www.catchbot.com WebRipper Eurosoft-Bot powered by www.eurosoftware.de +http://www.eurosoftware.de/zeige/bot.html) Mozilla/4.0 (compatible; MSIE 6.0; Windows 98) OpenLink Virtuoso RDF crawler NoteworthyBot/ (bot; http://) FriendFeedBot/ +Http://friendfeed.com/about/bot) drone bnf.fr_bot; +http://www.bnf.fr/fr/outils/a.dl_web_capture_robot.html) my-heritrix-crawler WebmasterCoffee/ ; +http://webmastercoffee.com/about) acquia-crawler Sosospider http://help.soso.com/webspider.htm) Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322; +http://www.archive.org) TwengaBot-Discover (http://www.twenga.fr/bot-discover.html) kmbot- +http://knowmore.com/bots) crabbyBot/ YandexBot/ +http://yandex.com/bots) QunarBot/ pipBot hoge (co2h2onacl@gmail.com) plaNETWORK Bot Search DeepTrawl (http://www.DeepTrawl.com) Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; FunWebProducts-MyWay; (R1 1.3); .NET CLR 1.1.4322) PSS-Bot (+http://www.dcs.shef.ac.uk/~aca07pdr/pss/) Linkbot Jbot PBBOT U.S. Government Printing Office http://bimeon.com/crawler_BootV kmccrew Bot Search Keybot Translation-Search-Machine inagist.com url crawler WMCAI-robot (http://www.topicmaster.jp/wmcai/crawler.html) Domaincrawler bingbot-media/ +http://www.bing.com/bingbot.htm) TextBot +http://www.globalvisioncommunication.com gvcbot.com GVC WEB crawler GVC SEARCH BOT GVC crawler GVC WORLD LINKS GVC BUSINESS crawler GVC Weblink crawler PROXY crawler YioopBot +http://www.yioop.com/bot.php) URL-Checker/ PostPost/ (+http://postpo.st/crawlers) MendeleyBot +http://mendeley.com)) Ayna +http://www.ayna.com) Mozilla/4.0 (compatible; Synapse) AntBot/ Netsparker L.webis/ (http://webalgo.iit.cnr.it/index.php?pg=lwebis) BOIA.ORG-Scan-Agent/ NRCan intranet crawler DeepTrawl Nymesis/ (http://nymesis.com) Supybot thumbshots-de-bot (+http://www.thumbshots.de/) ScrapeBox INTERNET RADIO crawler ( +http://www.radio.gvcbot.com:83 ) SynapticWalker/ (MailWalker http://www.websynaptics.com) KoepaBot http://www.koepa.nl/bot.html) ORISBot YioopBot +http://yioop.com/bot.php) metal crawler gold crawler Bender; http://sites.google.com/site/bendercrawler) Ezooms/ ezooms.bot@gmail.com YahooCacheSystem Covario-IDS/ gsa-eusleg (Enterprise; T3-RSESXG772WWB3; informatika@parlam.euskadi.net) crawler4j (http://code.google.com/p/crawler4j/) TweetedTimes Bot/ ; +http://tweetedtimes.com) DigitalArchivesBot/ Declumbot (+http://www.declum.com) jeba.ride@gmail.com DomainWatcher Bot IPAdd Bot EMail Exractor UnwindFetchor/ PaperLiBot/ NjuiceBot TweetmemeBot archive.org_bot +http://www.archive.org/details/archive.org_bot) nuggetize.com BOT JSpyda/ yp(yp+http://www.yp.com) archive.org_bot +http://pandora.nla.gov.au/crawl.html) A1 Website Download/ (+http://www.microsystools.com/products/website-download/) miggibot pub-crawler; +http://wiki.github.com/bixo/bixo/bixocrawler; bixo-dev@yahoogroups.com) twinuffbot cis455crawler WMSBot webcheck 1 KiwiStatus/ (NZS.com New Zealand Search; http://www.nzs.com/kiwi-status/) Pattern/1.0 +http://www.clips.ua.ac.be/pages/pattern Apache-HttpClient/ Bender; http://benderthewebrobot.tumblr.com) urlchecker/ RiverGlassScanner NerdByNature.Bot; http://www.nerdbynature.net/bot) ZuiBot (+http://www.kidzui.com) NetSrcherP/ nimbus-1 (Enterprise; q1; +http://www.qleeq.com; info@qleeq.com) (compatible; ICS) RankurBot/Rankur (http://rankur.com; info at rankur dot com) http://web.idrc.ca/challenge/ev-136691-201-1-DO_TOPIC.html bosug; +http://borrowedsugar.com/; development@borrowedsugar.com) Y!J; for robot study; RiseNetBot/ (+http://risenet.iti.upv.es) BOT_dselvik@isd.lacounty.gov UnChaos Real time trends bot support@unchaos.com FDM 3 (compatible; Goodzer/ www.socialayer.com Agent SkimWordsBot/ sqlmap/ (http://www.sqlmap.org) Anemone/ W3C_Unicorn/ SearchBot SEOENGWorldBot/ (+http://www.seoengine.com/seoengbot.htm) SeznamBot/ (+http://fulltext.sblog.cz/) fastbot crawler (+http://www.fastbot.de) SearQuBot/SearQuBot v1.0 Ezooms/ ezooms.bot@gmail.com Y!J-BRW/ crawler (http://help.yahoo.co.jp/help/jp/search/indexing/ Magus Bot SemrushBot/ RSS-Harvester/ nutch/ BRAINTIME_SEARCH BOIA-Scan-Agent/ (www.boia.org) SkimBot/ (www.skimlinks.com ) yBot/ Zing-BottaBot/ TwengaBot (http://www.twenga.com/bot.html) (Chicago) EdisterBot (http://www.edister.com/bot.html) archive.org_bot; Archive-It; +http://archive-it.org/files/site-owners.html) linkdex.com/ yolinkBot Feed Seeker Bot (RSS Feed Seeker http://www.MyNewFavoriteThing.com/fsb.php)  CoolBot  YioopBot; +http://www.yioop.com/bot.php) Falconsbot; +http://ws.nju.edu.cn/falcons/) PagePeeker.com (info: http://pagepeeker.com/robots) EC2LinkFinder rssreader@newstin.com; HTTPClient deepnet crawler IstellaBot/ GSLFbot Blekkobot; ScoutJet; +http://blekko.com/about/blekkobot) LitlrBot (http://litlr.me/bot.html) ContextAd Bot Y!J-BRJ/YATS crawler WeblexBot (http://www.weblex.org/bot.html) Spatineo Serval GetMapBot (http://www.spatineo.com/) KaloogaBot; http://kalooga.com/crawler) Checklinks/ (pywikipedia robot; http://toolserver.org/~dispenser/view/Checklinks) Searcharoo.NET; robot) AcoonBot/ +http://www.acoon.de/robot.asp) OpenSearchServer_Bot CareerBot/ +http://www.career-x.de/bot.html) r2iBot/ Waypath development crawler - info at waypath dot com mycrawler AdMedia bot Amerla Search Bot metager2-verification-bot; +http://metager2.de/technology.php) SuperLumin Downloader/ Junut Bot attrakt/ siclab (cboc-test@lab.ntt.co.jp) Grabber usasearch PopScreenBot Genieo/ http://www.genieo.com/webfilter.html) TAMU_CRAWLER/ bintellibot Semantifire1/ http://www.setooz.com/oozbot.html ; agentname at setooz dot_com ) BeetleBot; ZumBot/ (ZUM Search; http://help.zum.com/inquiry) Spatineo Serval Controller (http://www.spatineo.com/) ShowyouBot (http://showyou.com/crawler) Web CEO Online robot) Mail.RU_Bot/ Twikle/ http://twikle.com , contact@twikle.com BrowserMob Flamingo_SearchEngine (+http://www.flamingosearch.com/bot) FreeWebMonitoring SiteChecker/ (+http://www.freewebmonitoring.com) Abonti/ http://www.abonti.com) RiverglassScanner YioopBot; +http://173.13.143.74/bot.php) parsijoo pamsnbot.htm) appid: s~stremor-crawler- news bot / crawl/ Kurzor crawlerbot (http://www.kurzor.hu) DomainScan s~stremor-crawler) AMZNKAssocBot/ smart-crawler coccoc/ (http://help.coccoc.com/) # Unlikely and unusual user agents associated with large robotic downloads Mozilla/4.0 (compatible; MSIE 5.5; Windows 98) Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0) Mozilla/4.0 (compatible; MSIE 5.0; Windows NT 4.0) Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; DigExt) Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0) Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0 ) Mozilla/5.0 (compatible; MSIE 5.0) Mozilla Firefox 2.0 Mozilla/3.0 (Win95; I; HTTPClient 1.0) Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Win 9x 4.90) Mozilla/4.0(compatible; MSIE 5.0; Windows 98; DigExt) User-Agent Mozilla/5.0 (Windows; U; Windows NT 5.1) # Host names that I think are always robots or attackers offense.sses.net .crawl.yahoo.net .netvigator.com .search.msn.com crawler.bloglines.com .verity.com .archive.org user.connect.gpo.gov mail.encharter.org sandbox-d.gsfc.nasa.gov sandbox-qa1.gsfc.nasa.gov gcmdstage.gsfc.nasa.gov sandbox-t.gsfc.nasa.gov www.mvspy.com doi-esn-gw.customer.alter.net .keymachine.de .ask.com .maxamine.net .abhsia.telus.net .avantgo.com simple5.dragonara.net .nigma.ru static.kpn.net 196.209.178.221 217.148.95.149 119.201.245.60 mail.businessesforsale.ru keyworks-tower1.digimark.net 64.191.213.29 doe.osti.gov indexer.cyberalert.com # Host names that might be robots only temporarily trancom.naukanet.ru 189.38.225.46 ec2-184-73-88-123.compute-1.amazonaws.com ec2-54-242-237-205.compute-1.amazonaws.com static.108.153.46.78.clients.your-server.de 186.70.237.121.broad.nj.js.dynamic.163data.com.cn 221.226.169.43 124.67.22.208 98.126.176.18.static.customer.krypt.com 188.48.142.77 176.227.199.101 bzq-114-71-13.static.bezeqint.net h176-227-197-107.host.redstation.co.uk c-24-22-212-39.hsd1.wa.comcast.net S010600119518b69d.gv.shawcable.net 77.72.209.26 180.253.137.210