80legs provides several advanced features that allow you to fully customize and control your web crawl. You can process web pages and collect data on 80legs itself, while you are crawling.
There's no need download the entire web page. You can filter and extract the data you want during the crawl. This can lead to a huge savings in downloading and processing time when you're crawling millions of web pages.
80legs uses plugins called 80apps to process and extract data from web pages. They can scrape data from a website, process media content, and do lots of other nifty things.
We’ve built several 80apps that you can use off-the-shelf (no coding required!). A few examples:
- Regular Expression Matcher - Match regular expressions on pages and return matches
- Image Resizer - Download and resize images found during a crawl
- Link Tracer - Return a link trace of the crawl from parent URL to each child URL
- Word Count - Return a count of all keywords on a web page
- Return Page Content - Get the page content of any URL
Log in to 80legs and visit the Marketplace to see more 80apps. Many of them are free with the purchase of a Pricing Plan.
You can also build your own 80app using our 80app framework, which lets you completely customize the kind of data you extract while crawling. Developers can follow our step-by-step instructions to start building their own 80apps.
We offer a fully-featured API for users that want to automatically control web crawler creation and result downloads. Our API is available in Java, Python, and .NET. It is available with a purchase of one of our plans. Detailed documentation is available at http://wiki.80legs.com/80legs-API.
|Just the beginning!|
That's the end of our tour, but just the beginning of the exciting things you can do with 80legs. Check out the Services and Plans & Pricing sections for more information on making 80legs work for you.