TABLE OF CONTENTS
- Technical Setup Explanation - How Does It Work?
- Programming Language Implementations
- How do retries work? Are you doing the scraping on my behalf?
Technical Setup Explanation - How Does It Work?
If you haven’t read What Is Proxy Pilot?, we recommend reading the business overview article first.
This article outlines the technical details on how to implement Proxy Pilot. First, let’s define how it works:
Install a custom certificate in your software. For most software, this is 1-2 lines of code to do this.
Once installed, this allows us to emulate what a man-in-the-middle-attack does, which decrypts your HTTPS traffic so we can read the HTML. Once we are able to read the full HTML of your requests, we can detect bans and do the appropriate retries.
You connect to a central Proxy Pilot server (self-hosted or managed hosting).
We will provide you a single proxy IP (ip:port with IP authorization, or ip:port:user:pass for user:pass authorization). You will send all requests to this single proxy gateway and from there the Proxy Pilot system will take over and forward your request to the appropriate proxy.
Your actual proxy list
As mentioned in What Is Proxy Pilot?, you must provide your own proxies to the system. These proxies are the ones that Proxy Pilot forwards your requests to.
Programming Language Implementations
Please see the following links for the programming language of your choice to implement Proxy Pilot into. For most languages, it requires less than 2 lines of code to install the custom certificate, and from that point on you will use the proxy gateway the same way you use a normal proxy.
See setup instructions for the following languages:
How do retries work? Are you doing the scraping on my behalf?
This question is a common one given the intricacies of what’s going on in the solution. The simple answer: no, we are not scraping on your behalf.
When the following flow happens:
You send a request to scrape domain.com to Proxy Pilot gateway
Proxy Pilot forwards your request to proxyA
proxyA returns a banned HTML page back to Proxy Pilot
Proxy Pilot sees this is a ban, and then sends this same request to proxyB
proxyB returns a successful HTML page to Proxy Pilot
Proxy Pilot returns the successful HTML back to you (the user)
… on step #4 we have a common question which asks whether or not we are using our server resources to do the scraping, or if your server compute resources are doing the scraping. The answer is that your server is still doing the act of the scraping.
The best way to think about it is when your internet disconnects midway through a connection to a website and your browser shows you a longer-than-usual “Loading” symbol as your internet attempts to do a retry. This is mostly what is happening with Proxy Pilot: as it makes retries on your behalf, your software is keeping the connection tunnel open while it waits for a response from Proxy Pilot.
Confusing? We agree! Please sign up for Proxy Pilot and we’re happy to give you free proxies to trial it out.
Please refer to the following article that explains some steps on how to troubleshoot any unexpected issues:
General Troubleshooting article