Web page ripping, also known as screen scraping, is the art of downloading a document from the Web, parsing it for a particular piece of content, and then incorporating that information into your own page. This is all done server-side so the end-user never makes a request to the remote page and may not even realise that such a request has even been made. Screen scraping can be used for a wide variety of applications such as connecting to third-party gateways, consuming RSS feeds and performing content syndication. In this tutorial we take a quick look at the different ways in which server-side HTTP connections can be made using ASP, ASP.NET and PHP, and how they can be adapted to a range of applications.
Using ASP to perform a remote login
In some ways ASP is the most awkward environment in which to make server side HTTP requests because it doesn't natively support the creation of socket connections. Instead, you need to write an ActiveX component to do the work for you. Fortunately, there are also a handful of free components available that can be used, saving you the trouble. In this example we use ASPTear, but others include the Coalesys HTTP Client Library, HTTPConnect, HTTP Transfer and INX HTTP Poster components, all of which are available with this tutorial.
To start with, you need to register the DLL file with the server, which means copying it to the System32 directory and running regsvr32 from the command line as follows:
regsvr32 asptear.dll
For this example we will demonstrate how to perform a login on a remote site and use the returned HTML to determine whether it was successful or not. Our login page has just two input fields for the username (uid) and password (pwd).
<form action="LoginAction.asp" method="post">
<table>
<tr>
<td>AccountId</td>
<td><input type="text" name="uid"></td>
</tr>
<tr>
<td>Password</td>
<td><input type="password" name="pwd"></td>
</tr>
<tr>
<td> </td>
<td><input type="submit" value="Login"></td>
</tr>
</table>
</form>
The LoginAction.asp page simply checks the supplied credentials and returns an error code. A positive number indicates an error, and a zero means the login was successful. We have kept the code basic by replacing any database login functions with a simple hard-coded login as follows.
<%
uid = request("uid")
pwd = request("pwd")
responseCode = 1
if uid = "myUserName" and pwd = "myPassWord" then
responseCode = 0
end if
response.write responseCode
%>
Using the HTML form to submit a login, the page will display either a 0 or a 1 depending on whether you supplied the correct credentials. Let's suppose, however, that this form is hosted remotely on the Web and you want to design your own page that pre-processes the login before submitting it to the remote page. Here's how we do it using ASPTear.
<%
uid = request("uid")
pwd = request("pwd")
responseCode = 1
call preProcessLogin(uid,pwd)
responseCode = remoteLogin(uid,pwd)
sub preProcessLogin(myUid, myPwd)
' do database logging and any other processing here
end sub
function remoteLogin(myUid, myPwd)
Const Request_POST = 1
Const Request_GET = 2
Set xobj = CreateObject("SOFTWING.ASPtear")
Response.ContentType = "text/html"
' URL, action, payload, username, password
strParams = "uid=" & uid & "&pwd=" & pwd
remoteLogin = xobj.Retrieve("http://RemoteSite.com/Rippers/LoginAction.asp", _
Request_POST, strParams, "", "")
end function
if responseCode = "0" then
strMsg = "Your login was successful"
else
strMsg = "Your login was unsuccessful. The error code was: " & responseCode
end if
%>
Here you can see we have invoked the ASPTear component and passed it five parameters. The first is the remote URL to connect to, followed by an integer value which specifies whether to use POST or GET as the method. We have used POST to simulate a form submission. The third parameter is a list of key/value pairs that are to be sent as the form inputs and their values. If the GET method were used, these would form the query string, and that is exactly the syntax expected by the ASPTear component. The last two parameters are used to provide user name and password credentials on the remote site. Although we have left these blank, these values would be required if the remote site required a conventional login (ie: using Web server security, not an HTML form).
Performing a remote login in this way is a common method of handling payment gateways in eCommerce applications. For instance, when a customer submits an order, it is likely you will want to process the order and calculate the total prior to connecting to the payment gateway. You will use predefined credentials for the gateway login, but the total value of the transaction will be determined by your pre-processing routines. If an error is returned by the gateway, you still have control over what is displayed to the user and a full record of the transaction can be stored in a local database. The HTTP connection happens entirely behind the scenes and is wrapped-up by your application's business logic.
Do you need help with HTML? 



1
Kevin - 19/10/05
Awwww.
#cfhttp.filecontent#
Don't forget the insanly simple ColdFusion method...
Then use the returned 'cfhttp.filecontent' variable along with a large number of other returned variables to do as you please with the page.
quite literaly to get a page and display it from my server would be as easy as:
» Report offensive content
2
jshk - 14/01/07
;fkjg
» Report offensive content