<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Spyder Web Tech's SEO Journey</title>
	<atom:link href="http://spyderwebtech.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://spyderwebtech.wordpress.com</link>
	<description>Creating My SEO Empire</description>
	<lastBuildDate>Wed, 19 Oct 2011 15:26:51 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='spyderwebtech.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>Spyder Web Tech's SEO Journey</title>
		<link>http://spyderwebtech.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://spyderwebtech.wordpress.com/osd.xml" title="Spyder Web Tech&#039;s SEO Journey" />
	<atom:link rel='hub' href='http://spyderwebtech.wordpress.com/?pushpress=hub'/>
		<item>
		<title>Announcing the Future Launch of the &#8220;SpyderSchool&#8221;</title>
		<link>http://spyderwebtech.wordpress.com/2009/01/30/announcing-the-future-launch-of-the-spyderschool/</link>
		<comments>http://spyderwebtech.wordpress.com/2009/01/30/announcing-the-future-launch-of-the-spyderschool/#comments</comments>
		<pubDate>Fri, 30 Jan 2009 17:10:36 +0000</pubDate>
		<dc:creator>spyderwebtech</dc:creator>
				<category><![CDATA[Automation]]></category>
		<category><![CDATA[cURL Tutorials]]></category>
		<category><![CDATA[General Discussion]]></category>
		<category><![CDATA[PHP Tutorials]]></category>
		<category><![CDATA[SEO Tools]]></category>
		<category><![CDATA[Web Page Scraping]]></category>
		<category><![CDATA[Web Spiders]]></category>
		<category><![CDATA[build w]]></category>
		<category><![CDATA[Building Database Sites]]></category>
		<category><![CDATA[curl crawler]]></category>
		<category><![CDATA[curl tu]]></category>
		<category><![CDATA[curl tut]]></category>
		<category><![CDATA[curl tutorial]]></category>
		<category><![CDATA[curl web]]></category>
		<category><![CDATA[curl web automation]]></category>
		<category><![CDATA[curl web page scraping]]></category>
		<category><![CDATA[php curl spider]]></category>
		<category><![CDATA[php curl tutorial]]></category>
		<category><![CDATA[php curl web mining]]></category>
		<category><![CDATA[php spider]]></category>
		<category><![CDATA[php we]]></category>
		<category><![CDATA[php web scraping]]></category>
		<category><![CDATA[php web spid]]></category>
		<category><![CDATA[php web spider]]></category>
		<category><![CDATA[php web spider tutorial]]></category>
		<category><![CDATA[scraping curl]]></category>
		<category><![CDATA[spyderwebech]]></category>
		<category><![CDATA[web crawler]]></category>
		<category><![CDATA[web page scr]]></category>
		<category><![CDATA[web scraping curl]]></category>
		<category><![CDATA[web spider php]]></category>
		<category><![CDATA[web spider tutorial]]></category>

		<guid isPermaLink="false">http://spyderwebtech.wordpress.com/?p=88</guid>
		<description><![CDATA[I want to thank everyone who left a comment, emailed me, and even took the time to track me down and give me a phone call about starting this project.  It seems as if there is a real need for this subject and to my knowledge there is yet to be a definitive source of [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=spyderwebtech.wordpress.com&amp;blog=2116186&amp;post=88&amp;subd=spyderwebtech&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<div class="wp-caption alignnone" style="width: 460px"><img title="Coming Soon - SpyderSchool Launch" src="http://www.cre8tivelabs.com/spyderschool/spyderschool.jpg" alt="Coming Soon - SpyderSchool Launch" width="450" height="203" /><p class="wp-caption-text">Coming Soon - SpyderSchool Launch</p></div>
<p><a class="DiggThisButton DiggMedium" href="http://digg.com/submit?url=http%3A%2F%2Fspyderwebtech.wordpress.com%2F2009%2F01%2F30%2Fannouncing-the-future-launch-of-the-spyderschool%2F&amp;title=Announcing+the+Future+Launch+of+the%26nbsp%3B%26%238220%3BSpyderSchool%26%238221%3B"></a>I want to thank everyone who left a comment, emailed me, and even took the time to track me down and give me a phone call about starting this project.  It seems as if there is a real need for this subject and to my knowledge there is yet to be a definitive source of web automation information on the net.</p>
<p>I am very happy to announce that I am going forward with the <strong>SpyderSchool,</strong> an online school on how to <span style="text-decoration:underline;">Automate the Internet</span>.  The scheduled release is tentatively planned for June 1st, 2009.</p>
<p>This school will teach its members how to spider, scrape, mine data, and automate nearly any site or process on the net.  I will teach you how to do this in step-by-step videos, tutorials, and live seminars.</p>
<p>I am <span style="text-decoration:underline;">limiting the number of students</span> that I will be accepting to the <strong>SpyderSchool</strong> for the first year to 250.  So if you are interested, please fill the form linked below as soon as possible.</p>
<p style="text-align:center;"><a href="http://spreadsheets.google.com/viewform?key=pEQvNJS2ee3GZ00BMzioqSw&amp;hl=en" target="_blank">Put me on the Waiting List for the SpyderSchool</a></p>
<p>The first 250 people who sign up will be contacted first, and the if there are still openings then I will continue down the list until all spots are filled.  Only then will I open up the enrollment, so don&#8217;t wait!  Even if you are only remotely interested, you had better sign-up.</p>
<p>In the <strong>SpyderSchool</strong> you will learn-by-doing with hands on examples, data-mining challenges, and competitions to test yours skills and hone your newly found techniques.  As you complete each challenge, you will be building a library of web scraping code that you can use for your future career in web automation as well as a portfolio to impress your future clients.</p>
<p>Don&#8217;t have any experience programming for the web?  No worries, the <strong>SpyderSchool</strong> is being built for the beginner.</p>
<p>In the <strong>SpyderSchool</strong>, you will learn:</p>
<ol>
<li>How the Internet Works</li>
<li>What Types of Scraping Technologies Exist</li>
<li>How to Analyze a Web Site for Scraping</li>
<li>Basic/Advanced Web Spider Programming</li>
<li>What Tools to Use</li>
<li>How to Automate Your Web Hosting Accounts (Linux Servers)</li>
<li>How to Automate Forms</li>
<li>How to Automate AJAX Dynamic Content Sites</li>
<li>How to Beat Captcha</li>
<li>and Much, Much, More.</li>
</ol>
<p>This course we be taught using only open-source technologies,  sorry Microsoft, so you won&#8217;t have to pay for one single solitary thing besides your tuition to the course.  There will be no up-sells, down-sells, cross-sells, or any other marketing pressure&#8230; guaranteed.  I hate that crap, nothing but learning here.</p>
<p>Be looking forward to more posts on the progress of the SpyderSchool.  And be sure to sign-up for more information at the following location:</p>
<p style="text-align:center;"><a href="http://spreadsheets.google.com/viewform?key=pEQvNJS2ee3GZ00BMzioqSw&amp;hl=en" target="_blank">Put me on the Waiting List for the SpyderSchool</a></p>
<p style="text-align:left;">You can also follow the progress of this school at my Twitter:</p>
<p style="text-align:center;"><a class="aligncenter" href="http://www.twitter.com/spyderwebtech" target="_blank">http://www.twitter.com/spyderwebtech</a></p>
<p style="text-align:left;">Thanks again for the huge amount of interest, and for those who took the time to encourage me to start this project.</p>
<p style="text-align:left;">&#8211;Spyderwebtech</p>
<p style="text-align:left;"><a class="DiggThisButton DiggMedium" href="http://digg.com/submit?url=http%3A%2F%2Fspyderwebtech.wordpress.com%2F2009%2F01%2F30%2Fannouncing-the-future-launch-of-the-spyderschool%2F&amp;title=Announcing+the+Future+Launch+of+the%26nbsp%3B%26%238220%3BSpyderSchool%26%238221%3B"></a></p>
<p style="text-align:left;">
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/spyderwebtech.wordpress.com/88/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/spyderwebtech.wordpress.com/88/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/spyderwebtech.wordpress.com/88/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/spyderwebtech.wordpress.com/88/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/spyderwebtech.wordpress.com/88/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/spyderwebtech.wordpress.com/88/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/spyderwebtech.wordpress.com/88/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/spyderwebtech.wordpress.com/88/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/spyderwebtech.wordpress.com/88/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/spyderwebtech.wordpress.com/88/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/spyderwebtech.wordpress.com/88/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/spyderwebtech.wordpress.com/88/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/spyderwebtech.wordpress.com/88/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/spyderwebtech.wordpress.com/88/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=spyderwebtech.wordpress.com&amp;blog=2116186&amp;post=88&amp;subd=spyderwebtech&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://spyderwebtech.wordpress.com/2009/01/30/announcing-the-future-launch-of-the-spyderschool/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/03e0b7ee648041024c711ade3fc2d1a9?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">spyderwebtech</media:title>
		</media:content>

		<media:content url="http://www.cre8tivelabs.com/spyderschool/spyderschool.jpg" medium="image">
			<media:title type="html">Coming Soon - SpyderSchool Launch</media:title>
		</media:content>
	</item>
		<item>
		<title>Scraping Web Pages with cURL Tutorial &#8211; Part 2</title>
		<link>http://spyderwebtech.wordpress.com/2008/08/13/scraping-web-pages-with-curl-tutorial-part-2/</link>
		<comments>http://spyderwebtech.wordpress.com/2008/08/13/scraping-web-pages-with-curl-tutorial-part-2/#comments</comments>
		<pubDate>Wed, 13 Aug 2008 13:29:24 +0000</pubDate>
		<dc:creator>spyderwebtech</dc:creator>
				<category><![CDATA[Automation]]></category>
		<category><![CDATA[General Discussion]]></category>
		<category><![CDATA[SEO Concepts]]></category>
		<category><![CDATA[SEO Tools]]></category>
		<category><![CDATA[Web Page Scraping]]></category>
		<category><![CDATA[Web Spiders]]></category>
		<category><![CDATA[build web spider]]></category>
		<category><![CDATA[CURL]]></category>
		<category><![CDATA[curl tutorial]]></category>
		<category><![CDATA[cURL Tutorials]]></category>
		<category><![CDATA[curl web page scraping]]></category>
		<category><![CDATA[curl_setopt]]></category>
		<category><![CDATA[file_get_contnets]]></category>
		<category><![CDATA[HTML]]></category>
		<category><![CDATA[page scrape]]></category>
		<category><![CDATA[parse HTML]]></category>
		<category><![CDATA[parse links]]></category>
		<category><![CDATA[parse tags]]></category>
		<category><![CDATA[PHP Tutorials]]></category>
		<category><![CDATA[php web spider]]></category>
		<category><![CDATA[php web spider tutorial]]></category>
		<category><![CDATA[SEO robot]]></category>
		<category><![CDATA[web crawler]]></category>
		<category><![CDATA[web site scraping]]></category>

		<guid isPermaLink="false">http://spyderwebtech.wordpress.com/?p=60</guid>
		<description><![CDATA[In Scraping Web Pages with cURL Tutorial &#8211; Part 1, I demonstrated how to create a web spider class that uses the cURL library to transfer any type of data from the web direct to your server. In this tutorial we are going to talk about how to parse that data into some sort of [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=spyderwebtech.wordpress.com&amp;blog=2116186&amp;post=60&amp;subd=spyderwebtech&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><a class="DiggThisButton DiggMedium" href="http://digg.com/submit?url=http%3A%2F%2Fspyderwebtech.wordpress.com%2F2008%2F08%2F13%2Fscraping-web-pages-with-curl-tutorial-part-2%2F&amp;title=Scraping+Web+Pages+with+cURL+Tutorial+%26%238211%3B+Part%26nbsp%3B2"></a>In <a href="http://spyderwebtech.wordpress.com/2008/08/08/scraping-web-pages-with-curl-tutorial-part-1/">Scraping Web Pages with cURL Tutorial &#8211; Part 1</a>, I demonstrated how to create a web spider class that uses the cURL library to transfer any type of data from the web direct to your server.</p>
<p>In this tutorial we are going to talk about how to parse that data into some sort of usable form by extending our wSpider class functionality.</p>
<p>The key to scraping web pages is to first understand how a web page is laid out and it&#8217;s resulting HTML structure.  It is this HTML structure that allows our spider to identify and scrape the part of the web page that you are interested in.  So let&#8217;s take a look at some example HTML and review what kind of HTML tags we may encounter with our spider.</p>
<p>Below is an example of the HTML that you might see on a typical web page:<br />
<code><br />
&lt;html&gt;<br />
&lt;head&gt;<br />
&lt;title&gt;My Web Page Title&lt;/title&gt;<br />
&lt;meta name="keywords" content="key1,key2,key3" &gt;<br />
&lt;/head&gt;<br />
&lt;body&gt;<br />
&lt;h1&gt;Header 1&lt;/h1&gt;<br />
&lt;div id="mypics"&gt;<br />
&lt;div class="picclass"&gt;<br />
&lt;img src="mypic.jpg" width="100" height="100"&gt;<br />
&lt;/div&gt;<br />
&lt;div class="picclass"&gt;<br />
&lt;img src="mypic2.jpg" width="100" height="100"&gt;<br />
&lt;/div&gt;<br />
&lt;/div&gt;<br />
&lt;a href="nextpage.php"&gt;Goto Next Page&lt;/a&gt;<br />
&lt;/body&gt;<br />
&lt;/html&gt;<br />
</code></p>
<p>As you can see that each part of the web page is encompassed between and opening tags ( &lt;..&gt; ) and closing tags ( &lt;/..&gt; ) .  Every web page will have these two primary sections:</p>
<ol>
<li>&lt;head&gt; section &#8211; The content between the opening and closing tags provide information about the website/web page including title, keywords, doctype, ect.</li>
<li>&lt;body&gt; section &#8211; This section contains all of the visual items of a web page including text, images, links, tables, and a container type item called a DIV.</li>
</ol>
<p>Most of our attention will be focused on the &lt;body&gt; section as this is where a lot of the &#8220;juicy&#8221; content that we might what to scrape.  In a later tutorial, I will show you how to create a competition analysis spider working mostly in the &lt;head&gt; section.</p>
<p>Within the &lt;body&gt; tags of our example website we find the &lt;h1&gt;, &lt;div&gt;, &lt;img&gt;, and &lt;a&gt; tags which we will pass onto our spider to scrape either the content between these tags OR a attribute of these tags.  For example, the link tag &lt;a&gt; has information between the opening and closing tag as well as a HREF property which will tell us the URL that will be navigated when the link is clicked.</p>
<p>So let&#8217;s extend our wSpider class that we began to build in <a href="http://spyderwebtech.wordpress.com/2008/08/08/scraping-web-pages-with-curl-tutorial-part-1/">part 1</a> of this tutorial, by creating a function that will take the HTML stored in our $this-&gt;html string and strip out these different tags.  I am going to call this function parse_array because I want to take all the occurrences of the supplied tag and then store it into an array.</p>
<p>Below is the code for our parse_array() function:</p>
<p><code><br />
function parse_array($beg_tag, $close_tag)<br />
{<br />
preg_match_all("($beg_tag.*$close_tag)siU", $this-&gt;html, $matching_data);<br />
return $matching_data[0];<br />
}<br />
</code></p>
<p>The above function takes two parameters which are the beginning tag ($beg_tag) and the ending tag($close_tag) and then will capture everything in between these tags.  This will happen for each and every time the program finds this particular configuration.</p>
<p>The real workhorse of this function the preg_match_all function that is part of PHP.  The preg_match_all function takes a regular expression and searches a string for all of the occurrences and then extracts them into an array for us (in this case an array called $matching_data).</p>
<p>The regular expression that we use is the &#8220;($beg_tag.*$close_tag)siU&#8221; portion of the function above which starts by looking at the beginning tag supplied by the user.  The next part of the expression is the .* which is sometimes called the <em><strong>&#8220;greedy&#8221;</strong></em> because the . will match anything and the * will grab as much as it can.  The last part of the expression is the closing tags, once again supplied by the user, and then some flags that will ignore case which we will talk about in a later tutorial on regular expressions.</p>
<p>Once all this work is done the function will return the array back to where we called it from.</p>
<p>Let&#8217;s take a quick look on how we can uses our new parse_array() function to scrape all of the links from our example HTML above.</p>
<p>The first thing we need to know is how is a link structured in HTML so we can send our function the beginning tag and the end tag.  A link&#8217;s structure looks like this from our example:</p>
<p><code> &lt;a href="nextpage.php"&gt;Goto Next Page&lt;/a&gt;</code></p>
<p>I can see that the link begins with a &#8220;&lt;a&#8221; and ends with a &#8220;&lt;/a&gt;&#8221;.  If I uses these as my start and end tags in my function then the function should return an array with one element and inside that element will be the entire HTML for the link above.</p>
<p>Putting it all together, below is the code that I would use to download all the links from my example web page.  I am also going to put in a foreach loop that will be used to do whatever I want to with each link.  This loop will be important to us when we create a spider that can crawl multiple web pages and website without further user interaction.</p>
<p>$myspider = new wSpider();<br />
$myspider-&gt;fetchPage(&#8220;http://wwww.examplesite.com/example.html&#8221;);<br />
$linkarray = $myspider-&gt;parse_array(&#8220;&lt;a&#8221;, &#8220;&lt;/a&gt;&#8221;);</p>
<p>foreach ($linkarray as $result) {</p>
<p>/// store each $result in database or create a new spider to spider next page</p>
<p>}</p>
<p>As you can see from the code above we create a wSpider instance by using the new keyword.    Next we get the HTML by using the fetchPage() function and we set the target URL in the constructor of the function (see <a href="http://spyderwebtech.wordpress.com/2008/08/08/scraping-web-pages-with-curl-tutorial-part-1/">part 1 of this tutorial</a> if you don&#8217;t understand).  From this function we store the resultant HTML in the $this-&gt;html variable within the wSpider class instance.</p>
<p>The next line of code assigns the result of our parse_array() function to another variable called $linkarray which we use in a foreach loop at the bottom of this code.  Remember, what has been returned into the $linkarray array is the entire HTML of the link including the &lt;a&gt; tags, so depending on which type of information you want you are going to have to strip out that information.</p>
<p>PHP has a lot of great functions that can help us grab what information we want.  For instance, if we wanted to grab the anchor text of these links we could use the strip_tags() function in PHP to grab the text between the opening &lt;a&gt; and closing &lt;/a &gt; tags (this can be used for anything that has an opening and closing tag).</p>
<p>Once we have the information we want then we can do a number of things such as save the information to a database, spawn a new spider to scrape the next page, or to record what the anchor text is for each link (for SEO purposes) as explained above.</p>
<p><a class="DiggThisButton DiggMedium" href="http://digg.com/submit?url=http%3A%2F%2Fspyderwebtech.wordpress.com%2F2008%2F08%2F13%2Fscraping-web-pages-with-curl-tutorial-part-2%2F&amp;title=Scraping+Web+Pages+with+cURL+Tutorial+%26%238211%3B+Part%26nbsp%3B2"></a>In the next tutorial, we will continue to develop our spider class to include a number of functions that will allow us to make repetitive tasks very simple, and how to use these new functions to create a simple spider that can crawl an entire website.</p>
<p><span style="color:#ff0000;"><em><strong>Other Web Spider Tutorials:</strong></em></span></p>
<p><a href="http://spyderwebtech.wordpress.com/2007/12/01/building-a-web-spider-part-1/">Build a Web Spider &#8211; Part 1</a></p>
<p><a href="http://spyderwebtech.wordpress.com/2007/12/05/building-a-web-spider-part-2/">Building a Web Spider &#8211; Part 2</a></p>
<p><a href="http://www.stumbleupon.com/submit?url=http://spyderwebtech.wordpress.com&amp;title=Building A%20Web%20Spider%20-%20Part%202"><img src="http://cdn.stumble-upon.com/images/120x20_su_white.gif" alt="" /> Stumble it!<br />
</a></p>
<p>**************************************************************************</p>
<p>* Looking for a comprehensive course on Web Page Scraping?</p>
<p>* Let me know your interest by commenting on the <a href="http://spyderwebtech.wordpress.com/2008/07/22/the-webspyder-school/">SpyderSchool Post</a></p>
<p>* ************************************************************************</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/spyderwebtech.wordpress.com/60/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/spyderwebtech.wordpress.com/60/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/spyderwebtech.wordpress.com/60/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/spyderwebtech.wordpress.com/60/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/spyderwebtech.wordpress.com/60/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/spyderwebtech.wordpress.com/60/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/spyderwebtech.wordpress.com/60/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/spyderwebtech.wordpress.com/60/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/spyderwebtech.wordpress.com/60/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/spyderwebtech.wordpress.com/60/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/spyderwebtech.wordpress.com/60/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/spyderwebtech.wordpress.com/60/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/spyderwebtech.wordpress.com/60/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/spyderwebtech.wordpress.com/60/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/spyderwebtech.wordpress.com/60/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/spyderwebtech.wordpress.com/60/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=spyderwebtech.wordpress.com&amp;blog=2116186&amp;post=60&amp;subd=spyderwebtech&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://spyderwebtech.wordpress.com/2008/08/13/scraping-web-pages-with-curl-tutorial-part-2/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/03e0b7ee648041024c711ade3fc2d1a9?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">spyderwebtech</media:title>
		</media:content>

		<media:content url="http://cdn.stumble-upon.com/images/120x20_su_white.gif" medium="image" />
	</item>
		<item>
		<title>Scraping Web Pages with cURL Tutorial- Part 1</title>
		<link>http://spyderwebtech.wordpress.com/2008/08/08/scraping-web-pages-with-curl-tutorial-part-1/</link>
		<comments>http://spyderwebtech.wordpress.com/2008/08/08/scraping-web-pages-with-curl-tutorial-part-1/#comments</comments>
		<pubDate>Fri, 08 Aug 2008 14:23:31 +0000</pubDate>
		<dc:creator>spyderwebtech</dc:creator>
				<category><![CDATA[Automation]]></category>
		<category><![CDATA[cURL Tutorials]]></category>
		<category><![CDATA[General Discussion]]></category>
		<category><![CDATA[PHP Tutorials]]></category>
		<category><![CDATA[SEO Tools]]></category>
		<category><![CDATA[Web Page Scraping]]></category>
		<category><![CDATA[Web Spiders]]></category>
		<category><![CDATA[build web spider]]></category>
		<category><![CDATA[CURL]]></category>
		<category><![CDATA[cURL library]]></category>
		<category><![CDATA[curl tutorial]]></category>
		<category><![CDATA[curl web page scraping]]></category>
		<category><![CDATA[data mining]]></category>
		<category><![CDATA[object oriented programming]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[PHP classes]]></category>
		<category><![CDATA[php object]]></category>
		<category><![CDATA[php web spider]]></category>
		<category><![CDATA[php web spider tutorial]]></category>
		<category><![CDATA[web site download]]></category>

		<guid isPermaLink="false">http://spyderwebtech.wordpress.com/?p=42</guid>
		<description><![CDATA[In my last post, Scraping Web Pages with cURL, I talked about what the cURL library can bring to the table and how we can use this library to create our own web spider class in PHP. What I want to do in this tutorial is to show you how to use the cURL library [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=spyderwebtech.wordpress.com&amp;blog=2116186&amp;post=42&amp;subd=spyderwebtech&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><a class="DiggThisButton DiggMedium" href="http://digg.com/submit?url=http%3A%2F%2Fspyderwebtech.wordpress.com%2F2008%2F08%2F08%2Fscraping-web-pages-with-curl-tutorial-part-1%2F&amp;title=Scraping+Web+Pages+with+cURL+Tutorial-+Part%26nbsp%3B1"></a>In my last post, <a href="http://spyderwebtech.wordpress.com/2008/08/07/scraping-websites-with-curl/">Scraping Web Pages with cURL</a>, I talked about what the cURL library can bring to the table and how we can use this library to create our own web spider class in PHP.</p>
<p>What I want to do in this tutorial is to show you how to use the cURL library to download nearly anything off of the web.  In upcoming tutorials I will show you how to manipulate what you downloaded and extract whatever information that you want and either store that data in a database or save it on your server.</p>
<p><span style="color:#008000;"><strong><span>Creating a PHP Class -</span></strong></span></p>
<p>Before we start talking about the cURL library, I first want to show you how to create a class in PHP.  Classes are very useful in that they can hold a number of properties and functions and will allow use to easily reuse the code for better productivity.  Classes are PHP&#8217;s way of creating what are called &#8220;Objects&#8221;.</p>
<p>An Object in PHP is very similar to an Object in the physical world.  Everything that we can see and touch are objects.  For instance, a Book is an Object.  The Book also has a number of properties that describe that book such as an author, title, number of pages, line spacing, publisher, ect.  We can use this same concept in our PHP code to represent a book by using the following code:</p>
<p style="padding-left:30px;"><code>class Book<br />
{<br />
var $author = "";<br />
var $title = "";<br />
var $nopages = 0;<br />
var $publisher = "";</code></p>
<p style="padding-left:30px;">function Book($author, $title, $pages, $publisher)<br />
{<br />
$this-&gt;author = $author;<br />
$this-&gt;title = $title;<br />
$this-&gt;nopages = $pages;<br />
$this-&gt;publisher = $publisher;</p>
<p style="padding-left:30px;">}<br />
}</p>
<p>The above code is an example of class that I built used to describe a book.</p>
<p>After declaring the class, I have listed a number of properties that I want the class to have including author, title, number of pages, and publisher.</p>
<p>Next comes a function called the &#8220;constructor&#8221; which must be the same name as the class.  All this function is used for is to set the properties that are passed into the object (i will show you how to do this later).</p>
<p>The $this keyword is used to reference the current instance of the object.  So in this case, we are talking about the Book that we are describing, and not all the rest of the books on planet earth.  $this is a very useful keyword that makes object oriented programming possible.</p>
<p>Now comes the good stuff, creating instances of the object we just created.  Let&#8217;s say I wanted to create an Object from the class book (let&#8217;s say that I am creating an online library for people to look up their favorite romance novel).  All we need to do is write the following code:</p>
<p style="padding-left:30px;"><code><br />
$firstbook = new Book("John Doe", "In Love", 200, "ACME Publishing");</code></p>
<p style="padding-left:30px;"><code> $secondbook = new Book("John Doe", "In Love 2", 230, "ACME Publishing");</code></p>
<p>We have now created an instance of the Book in PHP with an author of &#8220;John Doe&#8221; and a title of &#8220;In Love&#8221;.  John Doe&#8217;s second book is created in exactly the same way by declaring a &#8220;new Book&#8221; and passing different values into the class constructor.  As you can see, we can reuse this code as many times as we want very easily.</p>
<p>Having reusable code structured this way, we can create hundreds of web spiders very quickly with very little effort.  So now let&#8217;s create of web spider class&#8230;</p>
<p><span style="color:#008000;"><strong>Creating a Web Spider Class in PHP -</strong></span></p>
<p>Now lets use the same thinking in creating a web page scraping spider class that we can use to download virtually anything off of the web.  Let&#8217;s start our class by giving it a name of &#8220;wSpider&#8221; and let&#8217;s create the constructor.</p>
<p style="padding-left:30px;"><code>class wSpider<br />
{</code></p>
<p style="padding-left:30px;"><code>var $ch;   /// going to used to hold our cURL instance<br />
var $html;  /// used to hold resultant html data<br />
var $binary;  /// used for binary transfers<br />
var $url;  /// used to hold the url to be downloaded<br />
</code></p>
<p style="padding-left:30px;"><code><br />
function wSpider()<br />
{<br />
$this-&gt;html = "";<br />
$this-&gt;binary = 0;</code></p>
<p>$this-&gt;url = &#8220;&#8221;;<br />
}<br />
}</p>
<p>In the above code, we create some properties which we are going to need for our class and then in the constructor we initialize all the properties.  Right now our class does a whole lot of nothing, so to add functionality we are going to have to add functions.</p>
<p>A function, or sometimes called a method, is a list of instructions of what to do on that object.  In the book example, I could have a method called Read(), which could cause someone to begin reading that book.</p>
<p style="padding-left:30px;">So for our wSpider class, let&#8217;s create a function underneath the constructor called fetchPage():<br />
<code><br />
function fetchPage($url)<br />
{<br />
$this-&gt;url = $url;<br />
if (isset($this-&gt;url)) { </code></p>
<p style="padding-left:30px;">$this-&gt;ch = curl_init ();     /// open a cURL instance</p>
<p style="padding-left:30px;">curl_setopt ($this-&gt;ch, CURLOPT_RETURNTRANSFER, 1);   // tell cURL to return the data</p>
<p style="padding-left:30px;">curl_setopt ($this-&gt;ch, CURLOPT_URL, $this-&gt;url);    /// set the URL to download</p>
<p style="padding-left:30px;">curl_setopt($this-&gt;ch, CURLOPT_FOLLOWLOCATION, true);  /// Follow any redirects</p>
<p style="padding-left:30px;">curl_setopt($this-&gt;ch, CURLOPT_BINARYTRANSFER, $this-&gt;binary);  /// tells cURL if  the data is binary data or not</p>
<p style="padding-left:30px;">$this-&gt;html = curl_exec($this-&gt;ch);   // pulls the webpage from the internet</p>
<p style="padding-left:30px;">curl_close ($this-&gt;ch);    /// closes the connection</p>
<p style="padding-left:30px;">}<br />
}</p>
<p>The above function does the following:</p>
<ol>
<li>Checks to see if the url was passed through the function</li>
<li>Sets the options for the web page pull (see code for what each option does)</li>
<li>Pulls the web page from the Internet</li>
<li>Closes the cURL connection</li>
</ol>
<p>The resultant html from the web spider is held in the $this-&gt;html property.  Below is the finished code used to download an HTML web page and print it out to the screen.</p>
<p style="padding-left:30px;"><code><br />
$mySpider = new wSpider();   //// creates a new instance of the wSpider<br />
$mySpider-&gt;fetchPage("http://www.msn.com");  /// fetches the home page of msn.com</code></p>
<p style="padding-left:30px;">echo $mySpider-&gt;html;  /// prints out the html to the screen</p>
<p>If you wanted to download a picture instead you would have to set the $this-&gt;binary equal to true(1).  Pictures, Images, and Videos are made up of data that is binary (1s and 0s).  So the following code will download a picture and store the data into the $this-&gt;html variable.</p>
<p style="padding-left:30px;"><code><br />
$mySpider = new wSpider();   //// creates a new instance of the wSpider<br />
$mySpider-&gt;binary = 1;  /// turns on the binary transfer mode<br />
$mySpider-&gt;fetchPage("http://www.msn.com/pic123.jpg");  /// fetches a picture off of the msn home page<br />
</code></p>
<p>You can then use some PHP code to save it to a database or to save a picture file to your hard drive.  The sky are the limit as to what you want to do with this data.  Actually, this is the same technique that we can use to by-pass Captcha so make sure that you know how to use it.</p>
<p>So we are off to a good start.  What you have learned here is crucial to understand if you are planning on doing some large scale scraping projects.  Creating reusable code can sometimes take longer in the beginning, but if you do it right&#8230; then you can create spiders very quickly and you can use it on all your future project.</p>
<p>In the <a href="http://spyderwebtech.wordpress.com/2008/08/13/scraping-web-pages-with-curl-tutorial-part-2/">next tutorial</a> we will talk about how to manipulate the data that you just have pulled as well as how to extend our class to be more functional.  Happy Scraping!! <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p><span style="color:#ff0000;"><em><strong>Other Web Spider Tutorials:</strong></em></span></p>
<p><a href="http://spyderwebtech.wordpress.com/2008/08/13/scraping-web-pages-with-curl-tutorial-part-2/">Scraping Web Pages with cURL Tutorial- Part 2</a></p>
<p><a href="http://spyderwebtech.wordpress.com/2007/12/01/building-a-web-spider-part-1/">Build a Web Spider &#8211; Part 1</a></p>
<p><a href="http://spyderwebtech.wordpress.com/2007/12/05/building-a-web-spider-part-2/">Building a Web Spider &#8211; Part 2</a></p>
<p><a href="http://www.stumbleupon.com/submit?url=http://spyderwebtech.wordpress.com&amp;title=Building A%20Web%20Spider%20-%20Part%202"><img src="http://cdn.stumble-upon.com/images/120x20_su_white.gif" alt="" /> Stumble it!<br />
</a></p>
<a class="DiggThisButton DiggMedium" href="http://digg.com/submit?url=http%3A%2F%2Fspyderwebtech.wordpress.com%2F2008%2F08%2F08%2Fscraping-web-pages-with-curl-tutorial-part-1%2F&amp;title=Scraping+Web+Pages+with+cURL+Tutorial-+Part%26nbsp%3B1"></a>
<p>**************************************************************************</p>
<p>* Looking for a comprehensive course on Web Page Scraping?</p>
<p>* Let me know your interest by commenting on the <a href="http://spyderwebtech.wordpress.com/2008/07/22/the-webspyder-school/">SpyderSchool Post</a></p>
<p>* ************************************************************************</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/spyderwebtech.wordpress.com/42/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/spyderwebtech.wordpress.com/42/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/spyderwebtech.wordpress.com/42/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/spyderwebtech.wordpress.com/42/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/spyderwebtech.wordpress.com/42/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/spyderwebtech.wordpress.com/42/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/spyderwebtech.wordpress.com/42/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/spyderwebtech.wordpress.com/42/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/spyderwebtech.wordpress.com/42/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/spyderwebtech.wordpress.com/42/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/spyderwebtech.wordpress.com/42/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/spyderwebtech.wordpress.com/42/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/spyderwebtech.wordpress.com/42/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/spyderwebtech.wordpress.com/42/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/spyderwebtech.wordpress.com/42/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/spyderwebtech.wordpress.com/42/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=spyderwebtech.wordpress.com&amp;blog=2116186&amp;post=42&amp;subd=spyderwebtech&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://spyderwebtech.wordpress.com/2008/08/08/scraping-web-pages-with-curl-tutorial-part-1/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/03e0b7ee648041024c711ade3fc2d1a9?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">spyderwebtech</media:title>
		</media:content>

		<media:content url="http://cdn.stumble-upon.com/images/120x20_su_white.gif" medium="image" />
	</item>
		<item>
		<title>Scraping Websites With cURL</title>
		<link>http://spyderwebtech.wordpress.com/2008/08/07/scraping-websites-with-curl/</link>
		<comments>http://spyderwebtech.wordpress.com/2008/08/07/scraping-websites-with-curl/#comments</comments>
		<pubDate>Thu, 07 Aug 2008 13:50:00 +0000</pubDate>
		<dc:creator>spyderwebtech</dc:creator>
				<category><![CDATA[Automation]]></category>
		<category><![CDATA[Building Database Sites]]></category>
		<category><![CDATA[General Discussion]]></category>
		<category><![CDATA[Web Page Scraping]]></category>
		<category><![CDATA[CURL]]></category>
		<category><![CDATA[curl tutorial]]></category>
		<category><![CDATA[curl web page scraping]]></category>
		<category><![CDATA[data mining]]></category>
		<category><![CDATA[page scrape]]></category>
		<category><![CDATA[php web spider]]></category>
		<category><![CDATA[php web spider tutorial]]></category>
		<category><![CDATA[scraping]]></category>
		<category><![CDATA[webpage scraping]]></category>

		<guid isPermaLink="false">http://spyderwebtech.wordpress.com/?p=26</guid>
		<description><![CDATA[Web Page Scraping is a hot topic of discussion around the Internet as more and more people are looking to create applications that pull data in from many different data sources and websites. In my other tutorials, I talked about using PHP&#8217;s file_get_contents function to pull a web page and download the information into a [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=spyderwebtech.wordpress.com&amp;blog=2116186&amp;post=26&amp;subd=spyderwebtech&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Web Page Scraping is a hot topic of discussion around the Internet as more and more people are looking to create applications that pull data in from many different data sources and websites.</p>
<p>In my other tutorials, I talked about using PHP&#8217;s file_get_contents function to pull a web page and download the information into a variable string for later manipulation.  This method of pulling data off of the web is very good when you are dealing with only text and html.</p>
<p>But what if you wanted to download pictures, graphics, or video off a number of websites and store them on your server?  This is were PHP&#8217;s file_get_contents can not help us.</p>
<p><span style="color:#0000ff;"><strong>Introducing cURL!</strong></span></p>
<p>cURL is  <span class="mw-redirect">command line</span> <span class="mw-redirect">tool</span> for transferring files with URL syntax, which means that we can transfer most any type of file using this tool.  Most, not all, web servers have the cURL library module installed already so you won&#8217;t have to do anything to begin using this powerful library.</p>
<p>cURL has the ability to transfer files using an extensive list of protocols, including:</p>
<ul>
<li><strong>FTP</strong></li>
<li>FTPS</li>
<li><strong>HTTP</strong></li>
<li><span class="mw-redirect">HTTPS</span></li>
<li>TFTP</li>
<li>SCP</li>
<li>SFTP</li>
<li><span class="mw-redirect">Telnet</span></li>
<li>DICT</li>
<li><span class="mw-redirect">FILE</span></li>
<li>LDAP</li>
</ul>
<p>As you can see cURL can not only use the HTTP protocol (which is what PHP&#8217;s file_get_contents function uses), but also the FTP protocol which can prove very useful if you want to <em>create a web spider</em> to upload files to server automatically or FTP videos to video sharing sites.</p>
<p>The good news is that cURL is so powerful that it can do most everything that you will ever need to do when it comes to web page scraping.  The down-side is that cURL can be very tricky to deal with because there are a tremendous number of options to set and pit-falls to side step.</p>
<p>What I hope to do in this series of tutorials is show you how to work with cURL and how to create you own web scraping class in PHP so you can reuse the code time and time again.  So let&#8217;s begin&#8230;</p>
<p><strong>cURL and Your Web Server</strong></p>
<p>Like I had mentioned that most of the time cURL is already set-up on your web server if you are using a hosted plan.  (Sometimes on the &#8220;cheaper&#8221; plans, cURL is disabled so contact your administrator to see if they will enable it for you)</p>
<p><strong><span style="color:#0000ff;">I personally do most of my web page scraping using my local web server.  That&#8217;s right, you don&#8217;t even need to pay for a hosted server to scrape web pages.  All you need is a computer and a web server like Xampp!</span></strong></p>
<p>If you are using Xampp, like I recommended in my tutorial <a href="http://spyderwebtech.wordpress.com/2007/11/14/creating-a-seo-development-enviroment/">Creating a Local Development Environment</a>, you will need to enable the cURL module in PHP.</p>
<p>To do this goto the PHP.ini file in your <em><strong>Xampp/php</strong></em> folder <strong>and</strong> the <em><strong>Xamp/apache/bi</strong><strong>n</strong></em> folder and uncomment the &#8220;php_curl.dll&#8221; line by removing the semi-colon.</p>
<p style="padding-left:30px;"><span style="color:#ff0000;"><code>; Windows Extensions<br />
; Note that ODBC support is built in, so no dll is needed for it.<br />
; Note that many DLL files are located in the extensions/ (PHP 4) ext/ (PHP 5)<br />
; extension folders as well as the separate PECL DLL download (PHP 5).<br />
; Be sure to appropriately set the extension_dir directive.</code></span></p>
<p style="padding-left:30px;"><span style="color:#ff0000;">;extension=php_apc.dll<br />
;extension=php_apd.dll<br />
;extension=php_bcompiler.dll<br />
;extension=php_bitset.dll<br />
;extension=php_blenc.dll<br />
;extension=php_bz2.dll<br />
;extension=php_bz2_filter.dll<br />
;extension=php_classkit.dll<br />
;extension=php_cpdf.dll<br />
;extension=php_crack.dll<br />
<span style="color:#0000ff;"><strong> extension=php_curl.dll</strong></span><br />
;extension=php_cvsclient.dll<br />
;extension=php_db.dll<br />
;extension=php_dba.dll<br />
;extension=php_dbase.dll<br />
;extension=php_dbx.dll</span></p>
<p>Save the changes and restart your web server.</p>
<p>You are now ready to start scraping the web.  In the next tutorial, I will show you how you can create your own web scraping class in PHP using cURL.</p>
<p>Next tutorial:</p>
<p><a href="http://spyderwebtech.wordpress.com/2008/08/08/scraping-web-pages-with-curl-tutorial-part-1/">Scraping Web Pages Using cURL Tutorial &#8211; Part 1</a></p>
<p>**************************************************************************</p>
<p>* Looking for a comprehensive course on Web Page Scraping?</p>
<p>* Let me know your interest by commenting on the <a href="http://spyderwebtech.wordpress.com/2008/07/22/the-webspyder-school/">SpyderSchool Post</a></p>
<p>* ************************************************************************</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/spyderwebtech.wordpress.com/26/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/spyderwebtech.wordpress.com/26/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/spyderwebtech.wordpress.com/26/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/spyderwebtech.wordpress.com/26/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/spyderwebtech.wordpress.com/26/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/spyderwebtech.wordpress.com/26/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/spyderwebtech.wordpress.com/26/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/spyderwebtech.wordpress.com/26/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/spyderwebtech.wordpress.com/26/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/spyderwebtech.wordpress.com/26/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/spyderwebtech.wordpress.com/26/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/spyderwebtech.wordpress.com/26/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/spyderwebtech.wordpress.com/26/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/spyderwebtech.wordpress.com/26/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/spyderwebtech.wordpress.com/26/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/spyderwebtech.wordpress.com/26/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=spyderwebtech.wordpress.com&amp;blog=2116186&amp;post=26&amp;subd=spyderwebtech&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://spyderwebtech.wordpress.com/2008/08/07/scraping-websites-with-curl/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/03e0b7ee648041024c711ade3fc2d1a9?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">spyderwebtech</media:title>
		</media:content>
	</item>
		<item>
		<title>The WebSpyder School&#8230;.</title>
		<link>http://spyderwebtech.wordpress.com/2008/07/22/the-webspyder-school/</link>
		<comments>http://spyderwebtech.wordpress.com/2008/07/22/the-webspyder-school/#comments</comments>
		<pubDate>Tue, 22 Jul 2008 16:24:44 +0000</pubDate>
		<dc:creator>spyderwebtech</dc:creator>
				<category><![CDATA[Automation]]></category>
		<category><![CDATA[Blog and Pinging]]></category>
		<category><![CDATA[General Discussion]]></category>
		<category><![CDATA[SEO Tools]]></category>
		<category><![CDATA[Web Page Scraping]]></category>
		<category><![CDATA[build web spider]]></category>
		<category><![CDATA[scraping web pages]]></category>
		<category><![CDATA[web spider]]></category>

		<guid isPermaLink="false">http://spyderwebtech.wordpress.com/?p=17</guid>
		<description><![CDATA[A Course on How to Scrape the Internet&#8230;Who is Interested?? This blog has really changed since the time I started it earlier this year. I am still working on my SEO empire and it is starting to pay off, but the majority of the traffic coming to this blog are people who are interested in [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=spyderwebtech.wordpress.com&amp;blog=2116186&amp;post=17&amp;subd=spyderwebtech&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><strong>A Course on How to Scrape the Internet&#8230;Who is Interested??</strong></p>
<p><a class="DiggThisButton DiggMedium" href="http://digg.com/submit?url=http%3A%2F%2Fspyderwebtech.wordpress.com%2F2008%2F07%2F22%2Fthe-webspyder-school%2F&amp;title=The+WebSpyder%26nbsp%3BSchool%26%238230%3B."></a><br />
This blog has really changed since the time I started it earlier this year.  I am still working on my SEO empire and it is starting to pay off, but the majority of the traffic coming to this blog are people who are interested in learning the mechanics of <em><strong><span style="color:#0000ff;">&#8220;how to scrape data off the web&#8221;</span>.</strong></em></p>
<p>In addition to this, I get emails every single week from people who need help scraping a particular site or aren&#8217;t sure how to store data in a database and retrieve it.  Some questions even go into more advanced topics like how to automate desktop applications and post the data to the web.</p>
<p>I am a person who believes that it is more important to &#8220;teach a person to fish&#8221; rather than give solutions on a one-on-one basis.  Which got me thinking about starting up a web based school for people who want to become &#8220;Expert Web Automationist&#8221; (Yes, I did just make that up).</p>
<p>Actually, you can make a lot of money on freelance sites on projects that require data to be scraped off the web.  The LEAST amount of money that I have made on one project was $80 when I was first starting out.  Doesn&#8217;t sound like a lot of money for a project, but consider that the computer was doing all the work and it only took me 1/2 hour to setup the spider.  That is $160/hr!!!  And like I said that was the LEAST amount of money that I got paid to scrape data of the internet.</p>
<p>The school that I am envisioning will be for beginners all the way to those people who already have a lot of knowledge about programming for the web.  So there will be something for everyone whether you have never written a line of code before in your life, or you are a seasoned web designer.  The intent of the school will be to give you enough knowledge so you will be an &#8220;Expert&#8221; in the field of web automation, and will feel confident taking on any job for cash.</p>
<p>Like I said, I am only thinking about the concept of this school, but I have already put together a rough &#8220;curriculum&#8221;:</p>
<ol>
<li>Web Automation Concepts</li>
<li>Web Scraping Tools &#8211; To make life easier</li>
<li>Client-Side Web Automation
<ul>
<li>Javascript Basics</li>
<li>IMacros Basics</li>
<li>Others</li>
</ul>
</li>
<li>Server-Side Web Automation</li>
<li>Regular Expressions</li>
<li>Desktop Automation</li>
<li>Remote Objects/Yahoo PIPES/Other</li>
<li>Advanced Topics (Beating Captcha)</li>
</ol>
<p>Each one of these topics would have many hours of videos, tutorials, practice scraping exercises, and practice exams.  I will also be giving you MY CODE that I have already compiled and use on my own projects so you don&#8217;t have to create them from scratch.  It is my intent to make you BETTER than I am at the art of scraping information off the internet.</p>
<p>If I do create the course, I will limit the amount of people who can enroll to 250 people per year.  The reason being is that I don&#8217;t want to train 1000&#8242;s of people who will all flock to guru.com and drive the price of scraping down.  I also don&#8217;t want 1000&#8242;s of copies of my code floating around the internet&#8230; after all it took me 100s of hours to create the code, so I only want people who are serious having it.</p>
<p>So what is the price???  It sure the hell won&#8217;t be $30 or $100.  I haven&#8217;t narrowed it down yet, obviously because I haven&#8217;t created it yet.  But if you take this course, you should be able to recoup the cost with a few freelance web scraping jobs.</p>
<p><strong>So here is where I need your involvement!!</strong></p>
<p>If you are interested in this course&#8230;. <strong><span style="color:#0000ff;"><em>LEAVE A COMMENT!!!!!</em><a href="mailto:spyderwebtech.wp@gmail.com"><em></em></a></span> </strong>and let me know that you are interested.  If I don&#8217;t get enough interest then I will NOT create the course.  I don&#8217;t want to waste my time?  Tell your friends about this if you think they are interested, tell your mother, post on forums, ect, ect.</p>
<p>Also email me on your thoughts on pricing, what you want to see in the course, and anything else you can think of to make the course better.  If you are proficient in web scraping already and want to be involved in creating this course then let me know.  If you are interested in Beta (or Alpha) testing the site let me know as well.</p>
<p><span style="color:#ff0000;"><strong>Notice!!!  If I don&#8217;t get 100 interested people to comment to me by the 1st week of October 2008 then I will scratch the idea.</strong></span></p>
<p>[Edit]  I just noticed that some spanner is stealing my traffic, so I am extending this offer until I can get my traffic back to normal [/Edit]</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/spyderwebtech.wordpress.com/17/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/spyderwebtech.wordpress.com/17/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/spyderwebtech.wordpress.com/17/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/spyderwebtech.wordpress.com/17/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/spyderwebtech.wordpress.com/17/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/spyderwebtech.wordpress.com/17/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/spyderwebtech.wordpress.com/17/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/spyderwebtech.wordpress.com/17/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/spyderwebtech.wordpress.com/17/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/spyderwebtech.wordpress.com/17/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/spyderwebtech.wordpress.com/17/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/spyderwebtech.wordpress.com/17/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/spyderwebtech.wordpress.com/17/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/spyderwebtech.wordpress.com/17/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/spyderwebtech.wordpress.com/17/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/spyderwebtech.wordpress.com/17/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=spyderwebtech.wordpress.com&amp;blog=2116186&amp;post=17&amp;subd=spyderwebtech&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://spyderwebtech.wordpress.com/2008/07/22/the-webspyder-school/feed/</wfw:commentRss>
		<slash:comments>29</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/03e0b7ee648041024c711ade3fc2d1a9?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">spyderwebtech</media:title>
		</media:content>
	</item>
		<item>
		<title>Why use WordPress MU??</title>
		<link>http://spyderwebtech.wordpress.com/2008/01/07/why-use-wordpress-mu/</link>
		<comments>http://spyderwebtech.wordpress.com/2008/01/07/why-use-wordpress-mu/#comments</comments>
		<pubDate>Mon, 07 Jan 2008 21:38:22 +0000</pubDate>
		<dc:creator>spyderwebtech</dc:creator>
				<category><![CDATA[Automation]]></category>
		<category><![CDATA[Building Database Sites]]></category>
		<category><![CDATA[General Discussion]]></category>
		<category><![CDATA[Progress Reports]]></category>
		<category><![CDATA[SEO Concepts]]></category>
		<category><![CDATA[database sites]]></category>
		<category><![CDATA[ELI]]></category>
		<category><![CDATA[Plugins]]></category>
		<category><![CDATA[regular Wordpress install]]></category>
		<category><![CDATA[themes]]></category>
		<category><![CDATA[Wordpress MU]]></category>

		<guid isPermaLink="false">http://spyderwebtech.wordpress.com/2008/01/07/why-use-wordpress-mu/</guid>
		<description><![CDATA[Hey everyone, Thanks for the great questions and for starting some great conversations. I am finding that the reply to my comments are getting longer and longer, so I figure why not just make the responses a post themselves. In this post, I would like to make a comment on using WordPress MU as a [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=spyderwebtech.wordpress.com&amp;blog=2116186&amp;post=16&amp;subd=spyderwebtech&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Hey everyone,</p>
<p>Thanks for the great questions and for starting some great conversations.  I am finding that the reply to my comments are getting longer and longer, so I figure why not just make the responses a post themselves.</p>
<p>In this post, I would like to make a comment on using WordPress MU as a &#8220;template system&#8221; and what I have observed and maybe what you can take away from the lessons that I have learned.</p>
<p>First off, I would like to say to everyone else who is on or thinking about starting their own SEO Journey&#8230; I personally believe that there is not right or wrong way to ANYTHING on the internet.  There are just different ways of doing things.  My main objective is to make &#8220;systems&#8221; so I can produce websites very fast and with as little effort as possible.</p>
<p>To accomplish this goal, I wanted to use WordPress MU as a type of template system so I could just use pre-existing templates and plugin my content with as few modifications as possible.<br />
<span id="more-16"></span><br />
One question arrose as to why not simply just use a regular install of WordPress rather than WordPress MU.  The answer to this questions is simply this&#8230; Context!  The reason why I decided to use WordPress MU is because I was creating a number of product review websites in which I would have many different categories (toys, fashion, electronics, music, etc) under one domain name.</p>
<p>Where WordPress MU comes in is that MU will allow you to create multiple blogs using only one install of wordpress.  So for instance is my main domain was <a href="http://www.reviewsite.com/">www.reviewsite.com</a>, I could have a:</p>
<ul>
<li>
<div>music.reviewsite.com &#8211; for music</div>
</li>
<li>
<div>fashion.reviewsite.com &#8211; for fashion</div>
</li>
<li>
<div>cars.reviewsite.com &#8211; for cars</div>
</li>
<li>
<div>and on and on&#8230;</div>
</li>
</ul>
<p>And like I said this can be done with only one install of WordPress MU.</p>
<p>If you have read Eli&#8217;s site you would have learned that subdomain are treated like individual websites as long a you don&#8217;t use a lot of inner linking from the main domain or do cross-linking.</p>
<p>Yes, I could have used just a regular version of WordPress and created a whole lot of subdomains from my Cpanel&#8230;. but that would mean that I would have to also install wordpress for each subdomain and then setup all the plugins and themes.  With WordPress MU you can configure it to have plugins standard with each new blog that you create.</p>
<p>So can you see why I went the MU route?  One install&#8230; hundreds of &#8220;individual sites&#8221;&#8230; one domain name&#8230; preconfigured plugins&#8230; preconfigured themes&#8230;. all that is needed is the content.  I can actually create a new site (actually section of my review site) in literally less than a minute.  Not bad huh?</p>
<p>One last comment on WordPress MU&#8230; Plugins.  A comment was posted that there are more Plugins for the regular WordPress installation than for WordPress MU&#8230; That is absolutely correct!  BUT&#8230; Plugins for WordPress MU are supposed to be designed to increase the functionality of administration of users blogs rather than displaying content on individual blogs.</p>
<p>In the WordPress MU installation you will finding two WP-Content/Plugins directories (where you put your Plugins)&#8230; one is for MU and the other is for the individual blogs.  You can use ANY plugin for the individual blogs that you can with a regular WordPress installation (just make sure to put it in the right folder).  So really there is not limitations to Plugins with MU compared to the regular installation.</p>
<p>I hope this answered some questions as to why I used MU rather than the regular installation of WordPress.  Sorry that this ran on a bit&#8230; In my next post I will share some of the results that I am seeing&#8230;</p>
<p>Remember in Eli&#8217;s blog&#8230;. that a new site will take around 8 months to show it&#8217;s true potential&#8230; so until then keep on building and never look back.</p>
<p>Eli you rock&#8230;. and thanks guys for the great questions&#8230; keep them coming!</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/spyderwebtech.wordpress.com/16/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/spyderwebtech.wordpress.com/16/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/spyderwebtech.wordpress.com/16/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/spyderwebtech.wordpress.com/16/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/spyderwebtech.wordpress.com/16/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/spyderwebtech.wordpress.com/16/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/spyderwebtech.wordpress.com/16/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/spyderwebtech.wordpress.com/16/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/spyderwebtech.wordpress.com/16/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/spyderwebtech.wordpress.com/16/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/spyderwebtech.wordpress.com/16/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/spyderwebtech.wordpress.com/16/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/spyderwebtech.wordpress.com/16/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/spyderwebtech.wordpress.com/16/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/spyderwebtech.wordpress.com/16/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/spyderwebtech.wordpress.com/16/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=spyderwebtech.wordpress.com&amp;blog=2116186&amp;post=16&amp;subd=spyderwebtech&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://spyderwebtech.wordpress.com/2008/01/07/why-use-wordpress-mu/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/03e0b7ee648041024c711ade3fc2d1a9?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">spyderwebtech</media:title>
		</media:content>
	</item>
		<item>
		<title>SEO Art of WAR &#8211; 3 Ways to Spy on Your Competitors</title>
		<link>http://spyderwebtech.wordpress.com/2007/12/07/seo-art-of-war-3-ways-to-spy-on-your-competitors/</link>
		<comments>http://spyderwebtech.wordpress.com/2007/12/07/seo-art-of-war-3-ways-to-spy-on-your-competitors/#comments</comments>
		<pubDate>Fri, 07 Dec 2007 16:20:47 +0000</pubDate>
		<dc:creator>spyderwebtech</dc:creator>
				<category><![CDATA[General Discussion]]></category>
		<category><![CDATA[Link Building]]></category>
		<category><![CDATA[SEO Concepts]]></category>
		<category><![CDATA[content is king]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[HTML]]></category>
		<category><![CDATA[making content]]></category>
		<category><![CDATA[scraping]]></category>
		<category><![CDATA[seo]]></category>

		<guid isPermaLink="false">http://spyderwebtech.wordpress.com/2007/12/07/seo-art-of-war-3-ways-to-spy-on-your-competitors/</guid>
		<description><![CDATA[Let&#8217;s face it, to succeed in the world of online marketing you need to know exactly what your competitors are up to otherwise you could find yourself falling down Goolge&#8217;s search results faster than Senator Larry Craig&#8217;s pants in an airport&#8217;s bathroom. SUN TZU wrote the ART OF WAR during the 6th century BC as [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=spyderwebtech.wordpress.com&amp;blog=2116186&amp;post=14&amp;subd=spyderwebtech&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Let&#8217;s face it, to succeed in the world of online marketing you need to know exactly what your competitors are up to otherwise you could find yourself falling down Goolge&#8217;s search results faster than Senator Larry Craig&#8217;s pants in an airport&#8217;s bathroom.</p>
<p>SUN TZU wrote the ART OF WAR during the 6th century BC as a strategic guide to out-wit, out-think, and over-come the enemy.  One of the tactics he used was to use the enemy&#8217;s strength against them.  These tactics are no more true today online than they were 2000 years ago on the battle field.</p>
<p>Here are three ways that you can use SUN TZU&#8217;s tactics to spy on your competitors to see what they are up to and possibly increase your revenue in the process.</p>
<ol>
<li><strong>Content Momentum</strong> &#8211; In the world of SEO there is a saying that <em>Content is King.  </em>If your competitors are constantly adding content to their websites they are encouraging Internet spiders to come to their sites and crawl them more frequently.  And when this happens, more pages will get indexed and there is a potential for the SERPs to view their sites as &#8220;more authoritative&#8221; than yours.  This is what I refer to as content momentum.</li>
<p align="left"><em>Art of War Tactic:</em>  Keep track of how many new pages are being indexed by the search engines per time period (day, week, or month &#8211; depends on your niche).  If you see your competitors adding 10 pages of high quality content every month, you had better make plans to match their efforts or you may find yourself playing second fiddle.  Use the search term &#8220;site:www.yourcompetitorswebsite.com&#8221; in Google to find out how many web pages are indexed.</p>
<p><span id="more-14"></span></p>
<li><strong>Link Momentum</strong> &#8211; In the same respects as content momentum, Link Momentum is just as important.  The number and quality of inbound links is a huge factor in placing high in the search engines.  So why not know how many links are being added.</li>
<p><em>Art of War Tactic:</em>  Use the search term &#8220;link:www.competitorwebsite.com&#8221; to determine how many links are being added to your competitors per time period. (Yahoo and MSN have similar ways of finding out how many links each site has).  If you see that your competitors are constantly adding incoming links to their site then you can bet their are after your position in the search engines.  A way to use all your &#8220;enemy&#8221; hard work against her is find out who link to her site and get them to link to yours.</p>
<li><strong>Use Your Enemy&#8217;s Keywords</strong> &#8211;  Keyword are extremely important in your meta tags, headlines, the body of your content, and in the anchor links point to your site. Is your site optimized for all the keywords for your niche?  Why not see what keywords your competitors are targeting and get a bigger piece of the pie?<em>Art of War Tactic:</em>  View the HTML code of your competitor&#8217;s websites to see what kind of keyword they are using in the keyword meta tages, &lt;h&gt; tags,  links and in their description.  Using this information, strengthen your site by adding new content, or by modifying your current content to incorporate some of the keyword phrases.</li>
</ol>
<p>Use these tactics to track what your competitors are doing, and use it against them by doing it bigger and better.  Does this sound unfair?  You bet it is, but after all this is WAR!</p>
<ol>
<blockquote></blockquote>
</ol>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/spyderwebtech.wordpress.com/14/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/spyderwebtech.wordpress.com/14/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/spyderwebtech.wordpress.com/14/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/spyderwebtech.wordpress.com/14/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/spyderwebtech.wordpress.com/14/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/spyderwebtech.wordpress.com/14/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/spyderwebtech.wordpress.com/14/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/spyderwebtech.wordpress.com/14/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/spyderwebtech.wordpress.com/14/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/spyderwebtech.wordpress.com/14/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/spyderwebtech.wordpress.com/14/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/spyderwebtech.wordpress.com/14/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/spyderwebtech.wordpress.com/14/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/spyderwebtech.wordpress.com/14/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/spyderwebtech.wordpress.com/14/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/spyderwebtech.wordpress.com/14/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=spyderwebtech.wordpress.com&amp;blog=2116186&amp;post=14&amp;subd=spyderwebtech&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://spyderwebtech.wordpress.com/2007/12/07/seo-art-of-war-3-ways-to-spy-on-your-competitors/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/03e0b7ee648041024c711ade3fc2d1a9?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">spyderwebtech</media:title>
		</media:content>
	</item>
		<item>
		<title>Building A Web Spider &#8211; Part 2</title>
		<link>http://spyderwebtech.wordpress.com/2007/12/05/building-a-web-spider-part-2/</link>
		<comments>http://spyderwebtech.wordpress.com/2007/12/05/building-a-web-spider-part-2/#comments</comments>
		<pubDate>Wed, 05 Dec 2007 12:08:49 +0000</pubDate>
		<dc:creator>spyderwebtech</dc:creator>
				<category><![CDATA[Automation]]></category>
		<category><![CDATA[General Discussion]]></category>
		<category><![CDATA[SEO Tools]]></category>
		<category><![CDATA[Web Page Scraping]]></category>
		<category><![CDATA[HTML]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[preg match]]></category>
		<category><![CDATA[regular expressions]]></category>
		<category><![CDATA[web crawler]]></category>
		<category><![CDATA[web page download]]></category>
		<category><![CDATA[web site scraping]]></category>
		<category><![CDATA[web spider]]></category>

		<guid isPermaLink="false">http://spyderwebtech.wordpress.com/2007/12/05/building-a-web-spider-part-2/</guid>
		<description><![CDATA[In my last post, Building A Web Spider &#8211; Part 1, I covered how to analyze a website so that you can build a very simple web spider to collect the data that you want, and then I showed you how to use PHP to download that page to your server. In this post I [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=spyderwebtech.wordpress.com&amp;blog=2116186&amp;post=13&amp;subd=spyderwebtech&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>In my last post, <a href="http://spyderwebtech.wordpress.com/2007/12/01/building-a-web-spider-part-1/">Building A Web Spider &#8211; Part 1</a>, I covered how to analyze  a website so that you can build  a very simple web spider to collect the data that you want, and then I showed you how to use PHP to download that page to your server.  In this post I am going to show you how to extract the information that you need and leave the rest behind.</p>
<p>When you download a web page using PHP&#8217;s function get_file_contents(), the resulting web page is downloaded and put into a variable as a string.  What this means is that all the HTML from that web page is now located in a variable which you can access and manipulate (for instance a variable called &#8220;$page&#8221;) .  The variable will contain not only all of the text that you can see on the web page, but all of the HTML tags as well (&lt;p&gt;,&lt;br&gt;,&lt;img&gt;,&lt;a&gt;,etc).  We will use these tags to help us find the data that we need.</p>
<p>Remember in <a href="http://spyderwebtech.wordpress.com/2007/12/01/building-a-web-spider-part-1/">Part 1</a>, the purpose of creating a web spider was to scrape the current stock price of Microsoft(MSFT) from finance.yahoo.com.  If we look at the <a href="http://finance.yahoo.com/q?s=msft">results page for Microsoft</a> we will notice so unique characteristics of the current stock price compared to all the other prices listed on the page.  The current price of the stock is both <strong>bolded</strong> and <strong><span>bigger</span></strong> than all the other prices on the page.  When we go and look at the HTML for the stock price we see these HTML tags surround the stock price.</p>
<p>&lt;big&gt;&lt;b&gt;&lt;span id=&#8221;yfs_l10_msft&#8221;&gt;32.77&lt;/span&gt;&lt;/b&gt;&lt;/big&gt;</p>
<p>We also notice a &lt;span&gt; tag which we must deal with in the HTML.  Remember that all we want is the stock price of MSFT and nothing else which is the 32.77 (which will obviously change when you look at the page).  Once again there are a lot of ways to get this value, but the way I am going to show you is by using <strong>regular expressions</strong>.</p>
<p>If you have heard about regular expression you are probably petrified right now, but really they are pretty straight forward and simple to use if you understand what you are trying to scrape from the web page.  What regular expression do is to provide a pattern for a computer program to check for, and if it finds it then it will be recorded.  In our example we want to check for a price pattern.<span id="more-13"></span></p>
<p>In this example we want to scrape a number off of the page.  The number has 2 digits before a &#8220;.&#8221; and two digits after.  So we want to write an experssion to match any number in this form.  To do that we will use the expression &#8220;\d &#8220;</p>
<p>Expression : \d  &#8212; will match any <em><strong>single numeric digit</strong></em></p>
<p>So to match two digits we could use \d\d, and three digits \d\d\d, and so on and so on.  But there is an easier way.  We can match any number of digits by using &#8220;\d+&#8221; which will match one or more digits.</p>
<p>So to match our stock price on the page we could write a regular expression pattern of:</p>
<p>\d+. \d+</p>
<p>Which would match one or more digits before a period and then one or more after the period.  This pattern will work except that in a regular expression the &#8220;.&#8221; has a significant meaning.</p>
<p>Expression:   .    &#8212; will match any digit, letter, symbol, space&#8230; <strong>well anything!</strong></p>
<p>If we actually wanted to match a period(&#8220;.&#8221;) we would escape the period with a backslash like this</p>
<p>\d+<strong>\.</strong>\d+     &#8212; This expression will match any stock price on the page!!!</p>
<p>But remember we don&#8217;t want just any stock price, we want the current stock price.  That is where the HTML tags come in.  Let&#8217;s simply add the other tags into our expression.</p>
<p>Our original HTML:</p>
<p>&lt;big&gt;&lt;b&gt;&lt;span id=&#8221;yfs_l10_msft&#8221;&gt;32.77&lt;/span&gt;&lt;/b&gt;&lt;/big&gt;</p>
<p>First we need to deal with the&lt;span&gt; tag.  I notice that in the id name there lies the MSFT stock symbol which I am sure will change with each stock that I look up.  So I will write an expression that looks like :</p>
<p>&lt;span id=&#8221;.+&#8221;&gt;\d+\.\d+&lt;/span&gt;</p>
<p>As you can see, I simply added the span tag to my expression but I use the &#8220;.+&#8221; expression to match anything between the quote tags because I know that it will change.  Now all I have to add is the &lt;b&gt; and &lt;big&gt; tags to my expression and I am ready to search for my pattern in page that I downloaded. My final expression is:</p>
<p>&lt;big&gt;&lt;b&gt; &lt;span id=&#8221;.+&#8221;&gt;\d+\.\d+&lt;/span&gt;&lt;/b&gt;&lt;/big&gt;</p>
<p>Now to use this expression in PHP I will use a function called preg_match().  preg_match will match the regular expression to my string and output it to an array.  To use my expression in PHP however I am going to have to add a &#8220;/&#8221; to the beginning and end of my regular expression and we will need to escape all the forward slashes (for instance in the &lt;/span&gt; tag) using the backslash (&#8220;\&#8221;)in my expression.</p>
<p>Also we want to extract the stock price to do that we want to put a &#8220;( )&#8221; around the portion of the expression that we want to extract.  Notice the changes in the code below.</p>
<p>$page = file_get_contents($url)   /// this is the code to download the page from the last post</p>
<p>$pattern = &#8216;/ &lt;big&gt;&lt;b&gt; &lt;span id=&#8221;.+&#8221;&gt;(\d+\.\d+)&lt;\/span&gt;&lt;\/b&gt;&lt;\/big&gt;/&#8217;;</p>
<p>preg_match_all($pattern, $page, $results, PREG_PATTERN_ORDER );</p>
<p>/// the preg match will use the pattern to check the web page and then put the results in the the $results array</p>
<p>To access the stock price we can get it from the first record of the array and the second argument.  In PHP it looks like this.</p>
<p>$stock_price = $results[0][1];</p>
<p>The stock price is now in a variable for you to print off on the page, store in a database or do what ever you want.</p>
<p>Now that you have your spider, you can feed it 1000s of stock symbols and your spider will download all of the current prices within a matter of seconds.  This spider may be very simple but you can use the same thought process to gather links to your page, scrape data, or anything else you can think of.</p>
<p>Happy Scraping <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p><span style="text-decoration:underline;">Other Tutorials:</span></p>
<p><a href="http://spyderwebtech.wordpress.com/2008/08/08/scraping-web-pages-with-curl-tutorial-part-1/">Scraping Web Pages Using cURL Tutorial &#8211; Part 1</a></p>
<p><a href="http://www.stumbleupon.com/submit?url=http://spyderwebtech.wordpress.com/2007/12/05/building-a-web-spider-part-2building-a-web-spider-part-2/&amp;title=Building A%20Web%20Spider%20-%20Part%202"><img src="http://cdn.stumble-upon.com/images/120x20_su_white.gif" alt="" /> Stumble it!</a></p>
<p>**************************************************************************</p>
<p>* Looking for a comprehensive course on Web Page Scraping?</p>
<p>* Let me know your interest by commenting on the <a href="http://spyderwebtech.wordpress.com/2008/07/22/the-webspyder-school/">SpyderSchool Post</a></p>
<p>* ************************************************************************</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/spyderwebtech.wordpress.com/13/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/spyderwebtech.wordpress.com/13/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/spyderwebtech.wordpress.com/13/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/spyderwebtech.wordpress.com/13/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/spyderwebtech.wordpress.com/13/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/spyderwebtech.wordpress.com/13/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/spyderwebtech.wordpress.com/13/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/spyderwebtech.wordpress.com/13/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/spyderwebtech.wordpress.com/13/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/spyderwebtech.wordpress.com/13/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/spyderwebtech.wordpress.com/13/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/spyderwebtech.wordpress.com/13/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/spyderwebtech.wordpress.com/13/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/spyderwebtech.wordpress.com/13/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/spyderwebtech.wordpress.com/13/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/spyderwebtech.wordpress.com/13/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=spyderwebtech.wordpress.com&amp;blog=2116186&amp;post=13&amp;subd=spyderwebtech&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://spyderwebtech.wordpress.com/2007/12/05/building-a-web-spider-part-2/feed/</wfw:commentRss>
		<slash:comments>24</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/03e0b7ee648041024c711ade3fc2d1a9?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">spyderwebtech</media:title>
		</media:content>

		<media:content url="http://cdn.stumble-upon.com/images/120x20_su_white.gif" medium="image" />
	</item>
		<item>
		<title>Monday&#8217;s Report &#8211; My Craigslist Weekend</title>
		<link>http://spyderwebtech.wordpress.com/2007/12/04/mondays-report-my-craigslist-weekend/</link>
		<comments>http://spyderwebtech.wordpress.com/2007/12/04/mondays-report-my-craigslist-weekend/#comments</comments>
		<pubDate>Tue, 04 Dec 2007 01:04:50 +0000</pubDate>
		<dc:creator>spyderwebtech</dc:creator>
				<category><![CDATA[Activity Log]]></category>
		<category><![CDATA[Automation]]></category>
		<category><![CDATA[Progress Reports]]></category>
		<category><![CDATA[craigslist posting script]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[seo]]></category>
		<category><![CDATA[SERP]]></category>
		<category><![CDATA[SMARTY]]></category>
		<category><![CDATA[template]]></category>
		<category><![CDATA[web crawler]]></category>
		<category><![CDATA[Wordpress MU]]></category>
		<category><![CDATA[Yahoo]]></category>

		<guid isPermaLink="false">http://spyderwebtech.wordpress.com/2007/12/04/mondays-report-my-craigslist-weekend/</guid>
		<description><![CDATA[A new week&#8230; making money and going Mad! Success!  I scored my first click through adsense this week. Pulling down a decent days earnings of $.09 makes this all work the while and I envision what will happen when I have 100s of websites all pulling in cash.. all the wonders of scale-ability! This week [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=spyderwebtech.wordpress.com&amp;blog=2116186&amp;post=12&amp;subd=spyderwebtech&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>A new week&#8230; making money and going Mad!</p>
<p>Success!  I scored my first click through adsense this week. Pulling down a decent days earnings of $.09 makes this all work the while and I envision what will happen when I have 100s of websites all pulling in cash.. all the wonders of scale-ability!</p>
<p>This week has been pretty challenging with writing more web spider code to automate my SEO journey. My target this week is automating craigslist posts (as much as possible anyway).  I must say that I am impressed with the craigslist staff and how much effort they put into making it a complete bitch to automate craigslist postings.  Well, I am happy to say that I have fought the battle and won&#8230; The downside is that it took me much of Saturday to iron out all the bugs in my craigslist posting script.</p>
<p>My results are starting to come through now in the SERPs.  My oldest database website has 32 pages index with no more than a few posts to some forums and the url submissions to the major search engine databases.  Yahoo&#8217;s spider visits my site nearly everyday and crawls about 60 pages, while Google&#8217;s bot visits every other day and crawls only about 6 pages.  The odd thing is that in Google I have 32 pages index while in Yahoo I have only 2&#8230;. very strange.  I have been looking around the forums but I haven&#8217;t found a reason.</p>
<p>So what&#8217;s next for this week?  I am going to be putting up more sites using WordPress MU as my template engine.  Is MU better than SMARTY?&#8230;. Now it is.  I had to do a lot of research to get everything humming and strumming along and figure out how the whole engine worked.  For beginners who aren&#8217;t ready for a lot of programming, I would highly recommend SMARTY.  Just make a very generic template and bust out a whole lot of sites.</p>
<p>Well that&#8217;s it for this post. I am going to try to bust out the second post on creating your own web spider.  I can tell from my stats on this blog that this is a popular subject&#8230; so until then.  Happy SEOing</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/spyderwebtech.wordpress.com/12/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/spyderwebtech.wordpress.com/12/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/spyderwebtech.wordpress.com/12/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/spyderwebtech.wordpress.com/12/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/spyderwebtech.wordpress.com/12/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/spyderwebtech.wordpress.com/12/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/spyderwebtech.wordpress.com/12/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/spyderwebtech.wordpress.com/12/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/spyderwebtech.wordpress.com/12/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/spyderwebtech.wordpress.com/12/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/spyderwebtech.wordpress.com/12/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/spyderwebtech.wordpress.com/12/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/spyderwebtech.wordpress.com/12/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/spyderwebtech.wordpress.com/12/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/spyderwebtech.wordpress.com/12/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/spyderwebtech.wordpress.com/12/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=spyderwebtech.wordpress.com&amp;blog=2116186&amp;post=12&amp;subd=spyderwebtech&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://spyderwebtech.wordpress.com/2007/12/04/mondays-report-my-craigslist-weekend/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/03e0b7ee648041024c711ade3fc2d1a9?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">spyderwebtech</media:title>
		</media:content>
	</item>
		<item>
		<title>Building A Web Spider &#8211; Part 1</title>
		<link>http://spyderwebtech.wordpress.com/2007/12/01/building-a-web-spider-part-1/</link>
		<comments>http://spyderwebtech.wordpress.com/2007/12/01/building-a-web-spider-part-1/#comments</comments>
		<pubDate>Sat, 01 Dec 2007 01:30:48 +0000</pubDate>
		<dc:creator>spyderwebtech</dc:creator>
				<category><![CDATA[General Discussion]]></category>
		<category><![CDATA[build web spider]]></category>
		<category><![CDATA[database sites]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[scraping web pages]]></category>
		<category><![CDATA[SEO SMARTY]]></category>
		<category><![CDATA[web spider]]></category>

		<guid isPermaLink="false">http://spyderwebtech.wordpress.com/2007/12/01/building-a-web-spider-part-1/</guid>
		<description><![CDATA[Whether you want to gather data for a database web site, or determine which links are broken in your site a web spider is a great tool for getting repetitive tasks done. Scraping web pages from the internet is a great way to gather content and ideas for web sites that you want to build. [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=spyderwebtech.wordpress.com&amp;blog=2116186&amp;post=11&amp;subd=spyderwebtech&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Whether you want to gather data for a database web site, or determine which links are broken in your site a web spider is a great tool for getting repetitive tasks done.  Scraping web pages from the internet is a great way to gather content and ideas for web sites that you want to build.  You can build a web spider to check the current prices of your favorite stocks, or even to get the balance of your bank account.</p>
<p>I use web spiders all the time to build databases for my database sites&#8230; I have even created them to get my email for me and respond if certain conditions are met.  The secret to understanding how to build a great web spider is to first understand the environment that it must work in.</p>
<p>What do I mean by this?  Well first you must understand that a web spider is simply a computer program.  And computer programs are based upon rules (wait didn&#8217;t I hear this in the <strong>MATRIX)</strong> and these programs will blindly follow these rules until we tell them to stop.  So for us to have our internet spider complete the task that we ask of it, we first must create the rules that it will follow.  And of course for us to create the rules we must first understand the environment.</p>
<p>To illustrate this point, let&#8217;s say that we want to create a web spider to download our favorite stock quote from Yahoo (substitute MSN, or Etrade if you like).  Let&#8217;s write some pseudo-code for the way we would do this manually:<span id="more-11"></span></p>
<ol>
<li> Goto the main page of Etrade.com</li>
<li>Fill-in form on front page with our stock symbol &#8211; MSFT</li>
<li>Hit Enter</li>
<li>Look to see the stock price which is in large font and red</li>
</ol>
<p>Now let&#8217;s see what kind of question may arise when we start creating a web spider from this simple process.</p>
<ol>
<li>Can I bypass the form by entering a dynamic url (say finance.yahoo.com/stockquote.asp?stocksym=MSFT)?</li>
<li>If I have to use the form, what are the form variables that I need to send and how (GET vs POST)?</li>
<li>Do I need to store cookies or session variables?</li>
<li>How do I get the stock price from the web page after downloading?</li>
<li>the list goes on and on.</li>
</ol>
<p>You notice that this simple example of getting a stock quote has some questions that need to be answered before creating the rules of your web spider. And the best way to get these answers is by going directly to the source&#8230; the web page you are trying to scrape.  So let&#8217;s get started!</p>
<p>To really figure out how a web site works we are going to need to look at the HTML code as well as the urls that appear in your navigation bar of your browser.  So that is where we are going to start.</p>
<p>Navigate to Yahoo finance by going <a href="http://finance.yahoo.com/" target="_blank">HERE </a></p>
<p>You will notice the base url is finance.yahoo.com, you can find this at the top of the browser.  Next fill in the form for the stock that you want to look up (MSFT) and hit the enter button.  You will notice now that the url has change to:</p>
<p>http://finance.yahoo.com/q?s=msft</p>
<p>Apparently the portion of the url that changes the stock to viewed is contained in the &#8220;q?m=&#8221; portion of the url.  Therefore, I can look up any stock that I choose by putting a different value behind the &#8220;q?m=&#8221; portion.  In this case it is very simple to create a spider to look up a lot of stocks by feeding different stock symbols into this url and the downloading the page.</p>
<p>So how do you download a page from the internet?  Well PHP has a lot of really cool ways of doing this.  I am only going to show you one quick way now.  That is to use PHP&#8217;s file_get_contents() function.</p>
<p>Let&#8217;s say we want to get Microsoft&#8217;s stock page.  Use the following code:</p>
<p>&lt;?</p>
<p>$url = &#8220;http://finance.yahoo.com/q?m=MSFT&#8221;;   // This is the url for Microsoft</p>
<p>///// The next line will download the HTML and put it into a variable called $page</p>
<p>$page = file_get_contents($url);</p>
<p>echo $page;   /// will print the html onto your page</p>
<p>?&gt;</p>
<p>The file_get_contents function might be disabled on certain servers.  Hostgato.com is where I have my hosting accounts and I know that it works there.  I used to use Godaddy.com and I know that their cheaper shared hosting won&#8217;t allow for using the file_get_contents() function.</p>
<p>In my next post, I am going to show you how to extract the values that you want from the page that you just downloaded. So until then play around with the above code by changing the value of the stock symbol.<br />
<a href="http://www.stumbleupon.com/submit?url=http://spyderwebtech.wordpress.com/2007/12/01/building-a-web-spider-part-1building-a-web-spider-part-1&amp;title=Building A%20Web%20Spider%20-%20Part%201"><img src="http://cdn.stumble-upon.com/images/120x20_su_white.gif" alt="" /> Stumble it!</a></p>
<p>**************************************************************************</p>
<p>* Looking for a comprehensive course on Web Page Scraping?</p>
<p>* Let me know your interest by commenting on the <a href="http://spyderwebtech.wordpress.com/2008/07/22/the-webspyder-school/">SpyderSchool Post</a></p>
<p>* ************************************************************************</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/spyderwebtech.wordpress.com/11/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/spyderwebtech.wordpress.com/11/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/spyderwebtech.wordpress.com/11/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/spyderwebtech.wordpress.com/11/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/spyderwebtech.wordpress.com/11/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/spyderwebtech.wordpress.com/11/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/spyderwebtech.wordpress.com/11/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/spyderwebtech.wordpress.com/11/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/spyderwebtech.wordpress.com/11/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/spyderwebtech.wordpress.com/11/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/spyderwebtech.wordpress.com/11/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/spyderwebtech.wordpress.com/11/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/spyderwebtech.wordpress.com/11/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/spyderwebtech.wordpress.com/11/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/spyderwebtech.wordpress.com/11/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/spyderwebtech.wordpress.com/11/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=spyderwebtech.wordpress.com&amp;blog=2116186&amp;post=11&amp;subd=spyderwebtech&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://spyderwebtech.wordpress.com/2007/12/01/building-a-web-spider-part-1/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/03e0b7ee648041024c711ade3fc2d1a9?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">spyderwebtech</media:title>
		</media:content>

		<media:content url="http://cdn.stumble-upon.com/images/120x20_su_white.gif" medium="image" />
	</item>
	</channel>
</rss>
