home chapter list downloads target addresses answers & updates videos community purchase author contact
NOTE: This is the website for the 2nd (2012) edition of this book. If you have the 1st (2007), Chinese or Italian editions, go here.
   
  What's Inside? (Chapter List)
Webbots, Spiders, and Screen Scrapers is designed to not only teach you how to write webbots and spiders, but also why to write these automated agents. As you will learn, there's more to writing webbots that downloading and parsing web pages.

Chapter NameHighlights
Introduction In the intorduction, you'll learn how I started writing webbots and spiders in 1996, what to expect from the book, tools you'll need (all open source) and coding standards.
Part I: Fundamental Concepts and Techniques
#1 What's in It for You?
Describes webbots can uncover the Internet's true potential
Read a sample chapter at the No Starch Press website.
#2 Ideas for Webbot Projects Where do ideas for webbots come from?
Read a sample chapter at the No Starch Press website.
#3 Downloading Web Pages Explores techniques for downloading web pages with PHP built-in functions and PHP/CURL
#4 Parsing Techniques Teaches how to effectively parse data from web pages.
#5 Advanced Parsing with Regular Expressionsnew This chapter shows how Regular Expressions can be used to parse data. It also describes when best--and best not, to use Regular Expressions.
#6 Automating Form Submission Explains how to write webbots that automatically fill out forms and upload data to remote web servers
#7 Managing Large Amounts of Data Describes how to organize and store large amounts of data with compression, tag removal and thumbnailing
Part II: Projects
#8 Price-Monitoring Webbots Shows how to write webbots that monitor prices at online stores
#9 Image-Capturing Webbots Describes a project that downloads all the images from a web page
#10 Link-Verification Webbots Explores a project that verifies all the links on a web page
#11 Search-Ranking Webbots Explores a webbot that determines the search engine ranking of a web page
#12 Aggregation Webbots Explains how to write webbots that combine information from multiple resources, including RSS feeds
#13 FTP Webbots Explains how webbots can use FTP as an online resource
#14 Webbots That Read Email Describes methods webbots can use to read email from POP3 Mail Servers
#15 Webbots That Send Email Explores methods webbots can use to send email to SMTP Mail Servers
#16 Converting a Website into a Function Identifies ways to convert an online service into a PHP function your webbots can call
Part III: Advanced Technical Considerations
#17 Spiders A study of spider theory, with a simple spider project
#18 Procurement Webbots and Snipers Explores how webbots automatically buy things from online stores and how snipers bid on online auctions.
#19 Webbots and Cryptography Learn how to communicate with websites that use encryption.
#20 Authentication Discover various authentication methods and how webbots can auto authenticate into various websites.
#21 Advanced Cookie Management Master reading and writing cookies with webbots.
#22 Scheduling Webbots and Spiders Learn how to make webbots and spiders launch and run automatically.
#23 Scraping Difficult Websites with Browser Macrosnew Learn how to scrape the most difficult websites (that used JavaSCript, AJAX or Flash) by deploying browser macros (iMacros).
#24 Hacking iMacrosnew Learn how to programatically modify iMacros macros with PHP/MySQL for added functionality.
#25 Deployment and Scalingnew This chapter describes how to deploy large-scale webbot projects. (Or, how to write a botnet.)
Part IV: Larger Considerations
#26 Designing Stealthy Webbots and Spiders Learn when and why its important for your webbots to run without detection. Then learn how to achieve stealth with your webbots.
#27 Proxiesnew Learn the various types of proxies, how they're used and what advantages they offer.
#28 Writing Fault-Tolerant Webbots Discover how to write webbots and parse routines that are "less affected" by changes to the web pages you target.
#29 Designing Webbot-Friendly Websites Master Search Engine Optimization as well as methods for communicating data with websites, including light-weight interfaces and SOAP
#30 Killing Spiders Gain an understanding of techniques web developers use to discourage the use of automated browsing agents.
#31 Keeping Webbots out of Trouble Uncover the dangers of writing disreputable webbots and spiders
Appendixes
A PHP/CURL Reference A handy reference for using PHP/CURL
B Status Codes A list of HTTP and NNTP status codes
C SMS Email Addresses Address and tips for sending text messages through email

 
home chapter list downloads target addresses answers & updates videos community purchase author contact
Copyright 2024, Michael Schrenk