Thursday, April 3, 2008

Describing HTTP requests

I was having a chat on an internet forum. Basically, a person was wondering how HTTP connections work, also, how do they apply to HTTPs and proxy server requests. Here is my take, written in forum casual dialog.

---------------------------

I think I understand what you are trying to do. But I don't understand why you need the proxy software. I will ignore that and concentrate on your goal.

Your goal is to successfully log into a HTTPS site. First, to the developer on the client side; there really isn't that much difference between HTTP and HTTPS. The data is encrypted during transmission and protects against people trying to eavesdrop on the transmission between A and B. Basically, you should see data (HTML string data) requests between client and server. The client does have to make sure that it loads the SSL libraries properly for the HTTPS request to work.

I do this a lot with java. I have to do this to initialize the SSL aspect of the http request;

Part A:
javax.net.ssl.SSLSocketFactory sf = (javax.net.ssl.SSLSocketFactory) javax.net.ssl.SSLSocketFactory.gtDefault();
javax.net.ssl.SSLSocket sock = null;
X509TrustManager tm = new MyX509TrustManager();
HostnameVerifier hm = new MyHostnameVerifier();

...
And then I do this:
Part B:
URL url = getSSLURL(fullURL);
conn = (HttpsURLConnection) url.openConnection();
...
My point, Part A is kind of complicated, that is required to initialize SSL for a HTTPS request. You would have to do something similar on the .NET side. I bet it is easier.

(Example 1, assuming no proxy)
Assuming you don't have the proxy in between the client and the server.

Stateless request from Client

Client -> Data -> Server
Server -> HTML Response -> Client

1. Client sends request to https://site (GET / HTTP ... blah)
2. Server sends the HTML data back to the client.

Once again, it shouldn't matter that this is HTTPS. If you don't see data, you didn't load the SSL client libraries properly. Or you aren't fully reading the data from the server.

----------------

The proxy.

A proxy is a funny sounding word for middle-man. Talk to the middle-man only to get to the server. With the proxy, it isn't that much different. First, you have to assume that the proxy is working properly. Our office web-proxy server is reliable and works properly. A cheap proxy server may give funky results. Your proxy may not even support HTTPS. I would check that first.

Once again, on the client, there really isn't that much difference. But, there are two situations with the proxies. One is where the proxy is configurable and the other is where the proxy is supposed to act transparently, just like you would any other request.

Scenario 2 is simple. Lets cover that one first. Proxy acts like an office, firewall proxy that you might find at a big office.

In this scenario, you are making a socket connection to the (HTTP/HTTPS works on a socket connection) the proxy server. But you are putting the FULL URL in your GET request.

Example request.

Client socket to myproxy:9999

Adding absolute path to the GET request

GET https://www.google.com/encrypted-area HTTP/1.1
Host: www.google.com

That is a pseudo example, just make sure that all requests use the "https://www.google.com/" as opposed to the relative path.

Also, in java; I can use a library to build the HTTP request content for me including constructing that header. So, ideally I don't have to build that request for working with a proxy server. I use the java settings to enable proxying.

E.g. This sets the proxy and any client requests are taken care of. .NET make work the same.

System.getProperties().put("proxySet", "" + getTestClient().isEnableProxy()); System.getProperties().put("proxyHost", getTestClient().getProxyHost());
System.getProperties().put("proxyPort", getTestClient().getProxyPort());


Also, you still have to probably make a secure socket connection to the proxy server.

-----------------

Scenario 1. where the proxy is configurable.

This is like Scenario 2. Except, you may not have to make a HTTPS request to the proxy server and you can make a HTTP request. If the proxy server is configurable, sometimes it makes it easier on the developer.

Back to your original question.

To know that you connected to your site. It is simple, send POST data to the HTTPS site and you should get a configuration page.

I don't want to bring up botlist, but I have an automated test framework where I automatically register a user and automatically login and the user clicks on various links.

----
Troubleshooting.
----

You may run into all kinds of issues with your custom client.

1. Redirects. Make sure you have redirects enabled. the HTTPS may ask for a redirect from the client (that is you).
2. Unsupported clients (not Firefox or Internet Explorer). Change the user-agent on your client to resemble IE or Firefox. Some servers monitor check your user-agent to make sure you aren't a bot.
3. Proper HTTP 1.1 headers. If you don't build a correct header, the server may not like what you and nullify the request.


Also, what is the goal of the proxy server? Modify the client requests? Are you running the proxy server on a remote server somewhere to mask where the client requests are coming from?

"I managed to post a modified string of HTML to a server. I set the requestURI, which I believe is the URL I want to be redirected to after my successful post to the original URL, and I got some piece of javascript back within an HTML tag."

I don't understand what you are saying here. I will paraphrase.

You want to make a POST request to a server and send data that server. I don't understand why you want to send HTML data to a server? Typically, you would send just "data" to the server. For example, I bet CoT is using a POST request and the data may contain "Full Name", "Email", "Home Page" and the "Message".

A post request to CoT might look like (this is pseudo code, I am going on memory).

POST http://www.crazyontap.com/post.php HTTP/1.1
Host: http://www.crazyontap.com
User-Agent: Firefox
Set-Cookie: "remember-me"


---- The data
full_name=Bot Berlin&Email=suckit@gmail.com&home_page=http://www.daddy.com

...
...
And from that post page, the server will send back the HTML confirmation page data for the client to read.

"Thanks, data saved"

It could also send a redirect. The client has to handle either or.

On the proxy server stuff, think like this.

Your client is making a "socket" connection to the proxy server and building a HTTP request that will be used against the target web server. That is all you need to worry about (assuming the proxy server isn't crap).


Client: 1. Hello proxy web server at host: my-office-proxy, port 9999.
Client: 2. I want to send this HTTP POST data to login into my gmail account. Can you help me out? By the way, it is a HTTPs request. Here is the top header:

POST https://www.gmail.com/login HTTP/1.1

------
Proxy-Server: 1. I just got a client request, what does he want? Oh, he wants me to send a request to gmail.
Proxy-Server connects to gmail with the request from the client.

Proxy-Server to Gmail: Hey gmail, I have this request, do you accept?

Gmail to Proxy-Server: Sir, here is the confirmation page for the client.

Proxy-Server to Client: Gmail just sent me the confirmation page data.

Have a nice day.

No comments: