TABLE OF CONTENTS
Proxy Pilot is a software that completely manages your proxy lists intelligently. This means you can build your system to be more flexible by using this “microservice” that handles a significant workload that is otherwise put into your main code base.
The list of current features are:
Specific cooldowns between each proxy attempt
Specific cooldown after a ban is detected
Ban detection (the certificate you install on your server allows us to decrypt your traffic as a man-in-the-middle attack and read the resulting HTML for ban messages), specific per each site
Optimal proxy usage (round-robin at first, and then once various cooldown timers are started, it will use the proxies that are most cooled-down)
Automatically downloads your proxy list from a proxy API endpoint
Geo-targeting, if you use multiple countries in your proxy list
Advanced statistics powered by ELK (see below)
Powered by ELK
The ELK stack is one that has grown massively in popularity over the past few years… and Proxy Pilot is powered by it! See the amazing statistics that you will have at your fingertips through Kibana’s amazing visualizations:
What it is, and is not
One confusion we get from interested users of Proxy Pilot is what does Proxy Pilot do, and not do. Here is an excerpt from our General Troubleshooting article:
"In this article we will discuss some steps you can take to help troubleshoot any unexpected issues when trying to use your proxies via Proxy Pilot. As a reminder, Proxy Pilot is a tool that relies on proper configurations by the end user in order to work properly. If you set bad headers or cookies, use bad proxies, or so forth, then you will get poor results nonetheless.
At the core of web scraping, if you cannot load a request on your browser, using your home/work IP address, then you will unlikely be able to scrape a page using software + a proxy source.
There are many ways to detect scraping software (see example1 and example2), so the more customization you add to loading a website (your software + proxies), the greater your footprint will be, and the easier it will be to detect you.
If you do not wish to worry about such anti-scraping battles, please consider our API at: https://scrapingrobot.com/api/ Our Scraping Robot API was built to solve this exact issue: allowing you to focus on your core business, instead of fighting with anti-scraping technologies.
If you wish to manage your own proxies, use developer resources, and pay for server compute power, then using Proxy Pilot will help (but not solve!) with some of these common scraping issues for you. "
Therefore, it's important to make the distinction between how Proxy Pilot can help you, versus its inability to prevent against many common anti-scraping technologies when you use your own software.
What it does:
All listed features above
Allows you to separate proxy code from your main code base and into a separate microservice
100% open-sourced (coming Q4 2021)
What it does not do:
It does not guarantee 100% success rate if your proxy pools and settings are not appropriate.
Example: if you want to scrape 1m requests/hour to domain.com, and only input 10 proxies into your proxy pool, you will most likely receive a ban on the target website. When this happens, all 10 of your proxies would go into a “ban cooldown”, and Proxy Pilot will return a ‘No Proxies’ error message
Proxy Pilot does not “charge per successful scrape”. If you’d like to offload all portions of scraping to us then we recommend you consider our Scraping Robot API. Our Scraping Robot API handles all browser management, proxy management, and ensures 100% success back to your software. Proxy Pilot is only a proxy manager, which is highly dependent on the proxies you provide it. If you provide low quality proxy IP addresses, or configure your software incorrectly, then you will get low quality results.
Proxy Pilot does not provide you free proxies or access to a specific proxy pool. You must provide it with the proxies you wish to use. Again, if you do not want to purchase proxies or manage them at all, then our Scraping Robot API would be recommended. Our main proxy products can be found here:
How to set it up
Please read the following documentation on how to implement Proxy Pilot into your systems:
Proxy Pilot Setup Instructions