<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Creating Quick Scrapers Using Xpather, Snoopy &amp; PHP</title>
	<atom:link href="http://flintston.es/nerd-stuff/creating-quick-scrapers-using-xpather-snoopy-php/feed/" rel="self" type="application/rss+xml" />
	<link>http://flintston.es/nerd-stuff/creating-quick-scrapers-using-xpather-snoopy-php/</link>
	<description>Affiliate Multivitamins</description>
	<lastBuildDate>Thu, 17 Nov 2011 08:03:11 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
	<item>
		<title>By: Gary B</title>
		<link>http://flintston.es/nerd-stuff/creating-quick-scrapers-using-xpather-snoopy-php/comment-page-1/#comment-524</link>
		<dc:creator>Gary B</dc:creator>
		<pubDate>Fri, 30 Jul 2010 23:38:43 +0000</pubDate>
		<guid isPermaLink="false">http://flintston.es/?p=275#comment-524</guid>
		<description>I&#039;ve been using methods for this sort of thing for a long time.  It&#039;s a great time saver.  One thing I suggest: some websites out there have excruciatingly bad HTML that can make the DOM throw up its hands.  So, to prevent that, I will use the tidy functions to repair the HTML before I run it to the DOM.

I don&#039;t know how this will display, but here goes:

  $tidy_config = array(
    &#039;clean&#039; =&gt; true,
    &#039;output-xhtml&#039; =&gt; true,
    &#039;show-body-only&#039; =&gt; true,
    &#039;wrap&#039; =&gt; 0,
    );
  $tidy = new tidy();
  $file = $tidy-&gt;repairString($pagehtml, $tidy_config, &#039;latin1&#039;);

I then usually run a preg_replace on the result to strip out non-ASCII characters, depending on the purpose.</description>
		<content:encoded><![CDATA[<p>I&#8217;ve been using methods for this sort of thing for a long time.  It&#8217;s a great time saver.  One thing I suggest: some websites out there have excruciatingly bad HTML that can make the DOM throw up its hands.  So, to prevent that, I will use the tidy functions to repair the HTML before I run it to the DOM.</p>
<p>I don&#8217;t know how this will display, but here goes:</p>
<p>  $tidy_config = array(<br />
    &#8216;clean&#8217; =&gt; true,<br />
    &#8216;output-xhtml&#8217; =&gt; true,<br />
    &#8216;show-body-only&#8217; =&gt; true,<br />
    &#8216;wrap&#8217; =&gt; 0,<br />
    );<br />
  $tidy = new tidy();<br />
  $file = $tidy-&gt;repairString($pagehtml, $tidy_config, &#8216;latin1&#8242;);</p>
<p>I then usually run a preg_replace on the result to strip out non-ASCII characters, depending on the purpose.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bamm</title>
		<link>http://flintston.es/nerd-stuff/creating-quick-scrapers-using-xpather-snoopy-php/comment-page-1/#comment-450</link>
		<dc:creator>Bamm</dc:creator>
		<pubDate>Wed, 26 May 2010 03:28:47 +0000</pubDate>
		<guid isPermaLink="false">http://flintston.es/?p=275#comment-450</guid>
		<description>no problem klaas</description>
		<content:encoded><![CDATA[<p>no problem klaas</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: klaas</title>
		<link>http://flintston.es/nerd-stuff/creating-quick-scrapers-using-xpather-snoopy-php/comment-page-1/#comment-449</link>
		<dc:creator>klaas</dc:creator>
		<pubDate>Wed, 26 May 2010 03:24:41 +0000</pubDate>
		<guid isPermaLink="false">http://flintston.es/?p=275#comment-449</guid>
		<description>Thanks for the tbody tip!</description>
		<content:encoded><![CDATA[<p>Thanks for the tbody tip!</p>
]]></content:encoded>
	</item>
</channel>
</rss>

