Using ASP.NET to emulate a browser
Things are a little bit easier when it comes to making server side HTTP requests in ASP.NET, as the .NET framework does have built-in support for making socket connections. Consequently, there no need to compile or install a separate object to make it work. You also have the choice of a few different classes to use, depending on your needs. The simplest method is to call the System.Net.WebClient class, which makes basic Web requests and returns the HTML as a string variable. This can then be manipulated using conventional string functions as required. The code here demonstrates how to make a connection and display the results in a text label Web control.
Dim myWebClient As New System.Net.WebClient
Const myURL As String = "http://www.google.com/"
Dim remoteHTML() As Byte
remoteHTML = myWebClient.DownloadData(myURL)
Dim objUTF8 As New System.Text.UTF8Encoding
Dim strHTML As String
strHTML = objUTF8.GetString(remoteHTML)
lblHTML.Text = strHTML
This is a very straightforward method of retrieving HTML from a remote site which can then be parsed for the data required. If more control is required over the request -- such as specifying a timeout value or proxy server, for instance -- you will need to use the HttpWebRequest object instead. This is a little more complicated in that the request needs to be treated as a stream object, but the end result is the same.
A couple of things are going on here. Firstly, we set up a 5 second timeout and create a Web Proxy object for the HTTP request to use. Then we configure the request to simulate a POST action and use a stream writer object to add parameters to the request. Next we use a Web Response object to store the results of the GetResponse() method of the request object and a stream reader is used to store it in a string variable. This result is displayed in a text label which is also used to output any error messages if necessary.
This example may seem a little more code intensive than the ASP version, but it is worth noting that we didn't include any error checking code around the ASPtear component, which would have resulted in a similar amount of code in the end. There is a third way of making a server-side HTTP connection with ASP.NET, and that is to use the System.Xml.XmlTextReader class instead. This is particularly good for consuming RSS feeds as the XML data can be read into a data set and bound to a data grid, making the whole process possible in a dozen lines of code. Here is an example of a very rudimentary ASP.NET page for displaying RSS feeds.
<script language="VB" runat="server">
Sub Page_Load(sender as Object, e as EventArgs)
Dim rssReader as System.Xml.XmlTextReader = New System.Xml.XmlTextReader("http://slashdot.org/index.rss")
Dim rssds as System.Data.DataSet = New System.Data.DataSet()
rssds.ReadXml(rssReader)
datagrid.DataSource = rssds.Tables(6)
datagrid.DataBind()
End Sub
</script>
<asp:DataGrid runat="server" id="datagrid" />
Making socket connections with PHP
Like ASP.NET, PHP offers a couple of alternative methods of making server-side HTTP requests. On the one hand, simple screen-scaping can be performed in just two lines of code as shown below.
$strURL = "http://www.RemoteSite.com ";
$strHTML = file_get_contents($strURL);
On the other hand, a more sophisticated sockets connection can be made using the fsockopen() function instead. This requires you to build the HTTP headers manually beforehand as well as stripping the returned headers from the result. Below is an example of how this is done.
There is another way to do this which is less complicated, although it requires the cURL extensions to be compiled and installed as part of PHP. cURL also supports HTTPS so it can be used to connect to payment gateways and other sites that require a secure encrypted connection. Here's an example of how it's done.
<?
$myURL="www.remoteSite.com/login.php";
$chttp = curl_init();
curl_setopt($chttp, CURLOPT_URL,"https://$myURL");
curl_setopt($chttp, CURLOPT_POST, 1);
curl_setopt($chttp, CURLOPT_POSTFIELDS, "key1=value1&key2=value2");
curl_exec ($chttp);
curl_close ($chttp);
?>
In this code the cURL extensions are used to simulate an HTTP form POST action, with the form values being passed using regular URL query string syntax.
Where to go from here
Depending on what you are ripping and why, you may or may not have a lot more work ahead of you! A text file is the simplest data to work with and can be easily parsed by processing it line by line. A common way of doing this is to convert the string to an array using carriage returns as delimiters, and then looping through the array items. Alternatively, if you are ripping XML data such as RSS feeds, you may want to load the data as an XML object within the document and manipulate it by "walking the tree" or using other XML data functions. Last but not least, if you are ripping HTML you will have a fair bit of tag stripping and searching to do before you can extract the information you are looking for. For more examples of screen scraping and other server side socket applications, check the online resources listed below.
Do you need help with HTML? 





1
Kevin - 19/10/05
Awwww.
#cfhttp.filecontent#
Don't forget the insanly simple ColdFusion method...
Then use the returned 'cfhttp.filecontent' variable along with a large number of other returned variables to do as you please with the page.
quite literaly to get a page and display it from my server would be as easy as:
» Report offensive content
2
jshk - 14/01/07
;fkjg
» Report offensive content