Technical Setup Explanation - How Does It Work?

If you haven’t read What Is Proxy Pilot?, we recommend reading the business overview article first. 

This article outlines the technical details on how to implement Proxy Pilot. First, let’s define how it works:

Key components:

  1. Install a custom certificate in your software. For most software, this is 1-2 lines of code to do this.

    1. Once installed, this allows us to emulate what a man-in-the-middle-attack does, which decrypts your HTTPS traffic so we can read the HTML. Once we are able to read the full HTML of your requests, we can detect bans and do the appropriate retries.

  2. You connect to a central Proxy Pilot server (self-hosted or managed hosting).

    1. We will provide you a single proxy IP (ip:port with IP authorization, or ip:port:user:pass for user:pass authorization). You will send all requests to this single proxy gateway and from there the Proxy Pilot system will take over and forward your request to the appropriate proxy.

  3. Your actual proxy list

    1. As mentioned in What Is Proxy Pilot?, you must provide your own proxies to the system. These proxies are the ones that Proxy Pilot forwards your requests to.

Programming Language Implementations

Please see the following links for the programming language of your choice to implement Proxy Pilot into. For most languages, it requires less than 2 lines of code to install the custom certificate, and from that point on you will use the proxy gateway the same way you use a normal proxy.

See setup instructions for the following languages:

How do retries work? Are you doing the scraping on my behalf?

This question is a common one given the intricacies of what’s going on in the solution. The simple answer:  no, we are not scraping on your behalf.

When the following flow happens:

  1. You send a request to scrape to Proxy Pilot gateway

  2. Proxy Pilot forwards your request to proxyA

  3. proxyA returns a banned HTML page back to Proxy Pilot

  4. Proxy Pilot sees this is a ban, and then sends this same request to proxyB

  5. proxyB returns a successful HTML page to Proxy Pilot

  6. Proxy Pilot returns the successful HTML back to you (the user)

… on step #4 we have a common question which asks whether or not we are using our server resources to do the scraping, or if your server compute resources are doing the scraping. The answer is that your server is still doing the act of the scraping. 

The best way to think about it is when your internet disconnects midway through a connection to a website and your browser shows you a longer-than-usual “Loading” symbol as your internet attempts to do a retry. This is mostly what is happening with Proxy Pilot:  as it makes retries on your behalf, your software is keeping the connection tunnel open while it waits for a response from Proxy Pilot.

The compute consumption is actually happening on Proxy Pilot by resending the exact same request headers and body. By resending the exact same headers and body, we have proven with extensive testing that it does not affect the results of your scraping (i.e. - if you are using Puppeteer on Chromium). The best way to prove this yourself with Proxy Pilot:  connect to a javascript-only website (like Google Maps) with your browser. You will notice that you will be able to load the page, because javascript is still being executed by your browser while that tunnel connection is still open.

Confusing? We agree! Please sign up for Proxy Pilot and we’re happy to give you free proxies to trial it out.

General Troubleshooting

Please refer to the following article that explains some steps on how to troubleshoot any unexpected issues:

General Troubleshooting article