Chapter Name | Highlights |
Introduction |
In the intorduction, you'll learn how I started writing webbots and spiders in 1996, what to expect from the book, tools you'll need (all open source) and coding standards. |
Part I: Fundamental Concepts and Techniques |
#1 |
What's in It for You?
|
Describes webbots can uncover the Internet's true potential
Read a
sample chapter
at the No Starch Press website.
|
#2 |
Ideas for Webbot Projects |
Where do ideas for webbots come from?
Read a
sample chapter
at the No Starch Press website.
|
#3 |
Downloading Web Pages
|
Explores techniques for downloading web pages with PHP built-in functions and PHP/CURL |
#4 |
Parsing Techniques |
Teaches how to effectively parse data from web pages. |
#5 |
Advanced Parsing with Regular Expressionsnew |
This chapter shows how Regular Expressions can be used to parse data. It also describes when best--and best not, to use Regular Expressions. |
#6 |
Automating Form Submission |
Explains how to write webbots that automatically fill out forms and upload data to remote web servers |
#7 |
Managing Large Amounts of Data |
Describes how to organize and store large amounts of data with compression, tag removal and thumbnailing |
Part II: Projects |
#8 |
Price-Monitoring Webbots |
Shows how to write webbots that monitor prices at online stores |
#9 |
Image-Capturing Webbots |
Describes a project that downloads all the images from a web page |
#10 |
Link-Verification Webbots |
Explores a project that verifies all the links on a web page |
#11 |
Search-Ranking Webbots |
Explores a webbot that determines the search engine ranking of a web page |
#12 |
Aggregation Webbots |
Explains how to write webbots that combine information from multiple resources, including RSS feeds |
#13 |
FTP Webbots |
Explains how webbots can use FTP as an online resource |
#14 |
Webbots That Read Email |
Describes methods webbots can use to read email from POP3 Mail Servers |
#15 |
Webbots That Send Email |
Explores methods webbots can use to send email to SMTP Mail Servers |
#16 |
Converting a Website into a Function |
Identifies ways to convert an online service into a PHP function your webbots can call |
Part III: Advanced Technical Considerations |
#17 |
Spiders |
A study of spider theory, with a simple spider project |
#18 |
Procurement Webbots and Snipers |
Explores how webbots automatically buy things from online stores and how snipers bid on online auctions. |
#19 |
Webbots and Cryptography |
Learn how to communicate with websites that use encryption. |
#20 |
Authentication |
Discover various authentication methods and how webbots can auto authenticate into various websites. |
#21 |
Advanced Cookie Management |
Master reading and writing cookies with webbots. |
#22 |
Scheduling Webbots and Spiders |
Learn how to make webbots and spiders launch and run automatically. |
#23 |
Scraping Difficult Websites with Browser Macrosnew |
Learn how to scrape the most difficult websites (that used JavaSCript, AJAX or Flash) by deploying browser macros (iMacros). |
#24 |
Hacking iMacrosnew |
Learn how to programatically modify iMacros macros with PHP/MySQL for added functionality. |
#25 |
Deployment and Scalingnew |
This chapter describes how to deploy large-scale webbot projects. (Or, how to write a botnet.) |
Part IV: Larger Considerations |
#26 |
Designing Stealthy Webbots and Spiders |
Learn when and why its important for your webbots to run without detection. Then learn how to achieve stealth with your webbots. |
#27 |
Proxiesnew |
Learn the various types of proxies, how they're used and what advantages they offer. |
#28 |
Writing Fault-Tolerant Webbots |
Discover how to write webbots and parse routines that are "less affected" by changes to the web pages you target. |
#29 |
Designing Webbot-Friendly Websites |
Master Search Engine Optimization as well as methods for communicating data with websites, including light-weight interfaces and SOAP |
#30 |
Killing Spiders |
Gain an understanding of techniques web developers use to discourage the use of automated browsing agents. |
#31 |
Keeping Webbots out of Trouble |
Uncover the dangers of writing disreputable webbots and spiders |
Appendixes |
A |
PHP/CURL Reference |
A handy reference for using PHP/CURL |
B |
Status Codes |
A list of HTTP and NNTP status codes |
C |
SMS Email Addresses |
Address and tips for sending text messages through email |