WeberDev.com PHP and MySQL Code

LOG IN
BEGINNER GUIDES  |  PHP CLASSES  |  CODE SEARCH  |  ARTICLES SEARCH  |  PHP FORUMS  |  PHP MANUAL  |  PHP FUNCTIONS LIST  |  WEB SITE TEMPLATES
Start typing to search for PHP and MySQL Code Snippets and Articles Search
Submit a code Example / Snippet Submit Your Code
Search Engine Optimization Monitor SEO Monitor
Web Site UpTime Monitor UpTime Monitor
WeberDev's Monthly code contest PHP Code Contest
Your Personal Examples List My Favorite Examples
Your Personal Articles List My Favorite Articles
Edit Account Info Update Your Profile
PHP Code Search
Web Development Forums
Learn MySQL Playing Trivia
PHPBB2 Templates
Web Development Index
PHP Web Logs (BLogs)
Web Development Resources
Web Development Content
PHPClasses
PHP Editor
PHP Jobs
Vision.To Design
Ajax Tutorials
PHP Programming Help
PHP/MySQL Programming
Webmaster Resources
Webmaster Forum
XML meta language
website builder
Submit Site
Forex Trading Online forex trading platform

Go Back Add a Comment Send this example to a friend Add this Article to your personal favoritest for easy future access to your favorite Code Examples and Articles. Submit a code example Print this code example.
BACK ADD A COMMENT SEND TO A FRIEND ADD TO MY FAVORITES ADD CODE EXAMPLES PRINT
Title : YellowPages Content Grabber (PHP5 +)
Categories : PHP, PHP Classes, Regexps, Databases, MySQL Click here to Update Your Picture
Joseph Crawford
Date : Dec 07th 2005
Grade : 4 of 5 (graded 6 times)
Viewed : 7673
File : 4273.zip
Images : No Images for this code example.
Search : More code by Joseph Crawford
Action : Grade This Code Example
Tools : My Examples List

 
Like this code?
Show the author your appreciation.
Submit your own code examples 
 

ok so last time i showed you how to reach out to 411.com and get the address, etc.. from a list of phone #'s. I have since come up with a more automated way to acomplish this.

PLEASE NOTE - This code will not work out of the box, you need to alter the code to use your database abstraction layer. Everything except the database calls should work fine however.

First i needed to grab all the category id's from this page
http://yellowpages.superpages.com/topcats.jsp?LTTR=A
there is a page for every letter in the alphabet and even one for # i did not worry about the # though i could have easilly included that.

you will need to create a database table for the categories

CREATE TABLE `yp_categories` (
  `id` int(11) NOT NULL auto_increment,
  `catid` varchar(15) NOT NULL default '',
  `letter` char(1) NOT NULL default '',
  `fetched` int(1) NOT NULL default '0',
  `timestamp` int(11) NOT NULL default '0',
  PRIMARY KEY  (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;


I created a class called YellowPages and will show it in a bit, for now here's the page i call to fetch the categories.

<?php
include_once('YellowPages.php');
include_once(
'Database.php');
ini_set('max_execution_time', 180);
$a = ord('A');
$z = ord('Z');
$count = 0;
for(
$x = $a; $x <= $z; $x++) {
 
$yp = new YellowPages();
 
$letter = chr($x);
 
$cats = $yp->FetchCategories($letter);
  foreach(
$cats as $cat) {
   
$sql = "SELECT id FROM yp_categories WHERE catid='".$cat."'";
   
$res = $db->Execute($sql);
    if(
$res->NumRows() == 0) {
     
$count++;
     
$sql = "INSERT INTO yp_categories (catid, letter)
      VALUES('"
.(string)$cat."', '".$letter."')";
     
$db->Execute($sql);
    }
  }
}
echo
'Added '.$count.' New Categories.';
?>


basically this script just loops from A-Z and calls the FetchCategories method with the current letter. Now let's see the code to FetchCategories.

<?php
public function FetchCategories($letter) {
 
$url = new URL('http://yellowpages.superpages.com/topcats.jsp?LTTR='.strtoupper($letter));
 
$this->Execute($url->__toString());
 
preg_match_all(self::CATEGORY_REGEX, $this->source, $cats);
  return
$cats[1];
}
?>


dont worry about the $url = new URL(); I have a URL object that i use but will not show in this example.
This basically passes the URL to the execute method, then checks to make sure there are categories in the $this->source property.

<?
const CATEGORY_REGEX = '/&CID=([0-9]+?)&/is';
?>


that's the regex that is used in this method. Now let's take a look at the execute method.

<?
public function Execute($url = '') {
  if(
$url == '') $url = $this->url;
 
$url = explode('?', $url);
 
$ch = curl_init($url[0]);
 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
 
curl_setopt($ch, CURLOPT_POST, 1);
 
curl_setopt($ch, CURLOPT_POSTFIELDS, $url[1]);
 
curl_setopt($ch, CURLOPT_HEADER, 1);
 
curl_setopt($ch, CURLOPT_COOKIE, 1);
 
curl_setopt($ch, CURLOPT_ENCODING, "gzip,deflate");
 
curl_setopt($ch, CURLOPT_USERAGENT, "User-Agent=Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7");
 
curl_setopt($ch, CURLOPT_REFERER, "http://yellowpages.superpages.com/");
 
curl_setopt($ch, CURLOPT_COOKIEJAR, 'e:\htdocs\tmp\cookies\superpages.cookiejar.txt');
 
$this->source = curl_exec($ch);
 
curl_close($ch);
}
?>


This method uses curl to reach out to the URL and get the page source.
then it set's $this->source which was used in the FetchCategories method.
The preg_match_all in the FetchCategories method returns an array of the category id's. These are then stored in the database by the fetchcategories.php

Now that you have the categories, you can next create a page fetchrecords to reach out to each category and fetch the records from the page(s) If you look at this page
http://yellowpages.superpages.com/listings.jsp?STYPE=&CB=1&C=&CID=00000469515&cbdt=Antiques&E=&L=+VT+&A=&X=&P=&AXP=&PS=15&search=Find+It
you will notice that you are looking at 1-20 of 194 records, so there are more pages. This class is smart enough to loop over each page for this category while fetching records, it doesnt just fetch the records from the first page.

So here's the code for the fetchrecords.php page

<?php
include_once('YellowPages.php');
include_once(
'Database.php');

// NOTICE I SET THE TIME OUT LIMIT TO 160 Minutes (2.66 Hours)
// THIS IS NECESSARY IF YOU ARE FETCHING A LOT OF RECORDS
// YOU MIGHT ALSO HAVE TO INCREASE THIS IF YOUR STATE HAS MORE RECORDS
// THAN MINE.

ini_set('max_execution_time', 9600);
$timestamp = time();
if(isset(
$_POST['submit'])) {
 
$count = 0;
 
$skipped = 0;
 
$yp = new YellowPages();
 
$letter = strtoupper($_POST['letter']);
 
$rs = $db->Execute("SELECT catid FROM yp_categories WHERE letter='".$letter."' AND fetched=0");
 
$records = array();
 
$rs->FirstRow();
  while(!
$rs->EOF()) {
   
$r = $rs->fields;
   
$rets = $yp->FetchRecords($r['catid']);
    foreach(
$rets as $ret) array_push($records, $ret);
   
$sql = "UPDATE yp_categories SET fetched=1, timestamp=".$timestamp." WHERE catid=".$r['catid'];
   
$db->Execute($sql);
   
$rs->NextRow();
  }
  foreach(
$records as $record) {
   
$sql = "SELECT id FROM yp_records WHERE phone='".$record['phone']."'";
   
$res = $db->Execute($sql);
    if(
$res->NumRows() != 0) {
     
$skipped++;
      continue;
    } else {
     
$sql = "INSERT INTO yp_records (title, address, city, state, zip, phone, fax, toll, email, url) VALUES('".mysql_real_escape_string($record['title'])."', '".mysql_real_escape_string($record['address'])."', '".mysql_real_escape_string($record['city'])."', '".mysql_real_escape_string($record['state'])."', '".mysql_real_escape_string($record['zip'])."', '".$record['phone']."', '".$record['fax']."', '".$record['toll']."', '".mysql_real_escape_string($record['email'])."', '".mysql_real_escape_string($record['url'])."')";
     
$db->Execute($sql);
     
$count++;
    }
  }
  echo
'inserted '.$count.' records. skipped '.$skipped.' records.';
}
echo
"
    <html><head><title>Yellow Pages :: Fetch Records</title></head><body>
    <form action='"
.$_SERVER['PHP_SELF']."' method='POST'>
    <input type='text' name='letter' size='1' maxlength='1'> <input type='submit' name='submit' value='Fetch Categories'>
    </form>
    </body>
    </html>
    "
;
?>


Now as you see you need a database table for the records as well, here's the structure i have used.

CREATE TABLE `yp_records` (
  `id` int(11) NOT NULL auto_increment,
  `title` varchar(150) NOT NULL default '',
  `address` varchar(255) NOT NULL default '',
  `city` varchar(60) NOT NULL default '',
  `state` varchar(20) NOT NULL default '',
  `zip` varchar(10) NOT NULL default '',
  `phone` varchar(10) NOT NULL default '0',
  `fax` varchar(10) NOT NULL default '0',
  `toll` varchar(10) NOT NULL default '0',
  `email` varchar(255) NOT NULL default '',
  `url` varchar(255) NOT NULL default '',
  `notes` varchar(255) NOT NULL default '',
  PRIMARY KEY  (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;


basically this page will check to see if a form is submitted, if not it will display the form. You enter a letter into the form and click submit, the script will then load up all the categories (which you previously grabbed) which are part of the A category. It will then reach out and grab the results for each category/page.

Let's take a look at the FetchRecords method

<?
public function FetchRecords($catid) {
 
$url = new URL('http://yellowpages.superpages.com/listings.jsp?STYPE=&CB=1&C=&CID='.$catid.'&E=&L=+VT+&A=&X=&P=&AXP=&PS=15&search=Find+It');
 
$this->ParsePage($url->__toString());
  return
$this->records;
}
?>


again ignore the = new URL() but you will need to change something in the url you use. You will notice the &L=+VT+& you should change ONLY the VT to the abbreviation of whatever state you wish to target.

This will then call the ParsePage method passing in the URL.

The ParsePage method is the method that will loop over every record and grab the necessary information, it will also check to see if the category has a NEXT page, if so it calls the parse page method again with the next url automatically. This process will continue until there are no more NEXT pages, it will then move on to the next category.

I am not showing all code but you can find it in the attached zip file.

This is a great tool if you need to create a (snail mail) mailing list
quickly.

Enjoy :)



This script is a contact form between users of a website (kinda like the PM function on the forums)
Categories : PHP, Databases, MySQL, Regexps
MySQL Handler
Categories : PHP, Databases, MySQL, Classes and Objects, PHP Classes
Powerful php/mysql Pagination for up to 6 URL Params
Categories : PHP, PHP Classes, Databases, MySQL, Navigation
PostGreSQL and MySQL 2 in 1 db Manager
Categories : PHP, PHP Classes, Databases, PostgreSQL, MySQL
MySQL Class to ease Database connectivity
Categories : MySQL, PHP Classes, Databases, PHP
usercounter class
Categories : PHP, PHP Classes, Databases, MySQL, Environment Variables
Simple Mini Poll class library (SimPoll)
Categories : PHP, PHP Classes, Databases, MySQL, Complete Programs
Online Automatic Class Generator for MySQL Tables
Categories : PHP, PHP Classes, Classes and Objects, Databases, MySQL
Specify your connection settings and create a link to a MySQL database.
Categories : PHP, PHP Classes, Databases, MySQL, Beginner Guides
Simple database class
Categories : PHP, PHP Classes, MySQL, Databases
Simple usersOnline class - keep track of how many users are online on your site
Categories : PHP, PHP Classes, Databases, MySQL
Setting up InnoDB on MySQL and using Transactions Begin, Commit, Rollback in PHP.
Categories : PHP Classes, Databases, PHP, MySQL, InnoDB
Ajax PHP Tree (Left and Right) with MySQL
Categories : PHP, Databases, MySQL, AJAX, PHP Classes
Convert SQL from oracle,mysql,mssql,sqlite and odbc to SQL compatible
Categories : PHP, PHP Classes, Databases, MySQL, MS SQL Server
MySQL Connection/Query Class
Categories : Databases, MySQL, PHP, PHP Classes
 Jay Zwagerman wrote : 1734
I can't get this code to work. I keep getting a fatal error 
Fatal error: Class 'URL' not found in /home/jayzwag/public_html/YellowPages.php on line 37

Any ideas?
 
 Boaz Yahav wrote : 1735
Did you download and include the files that are attached to this code example?
 
 Jay Zwagerman wrote : 1736
Yes, I downloaded the zip file that contained

fetchcategories.php
fetchrecords.php
YellowPages.php

I don't see a class URL anywhere on the pages. Is there something missing?
 
 Boaz Yahav wrote : 1737
Did you check for any php errors? 
Are you sure the files are included?
 
 Jay Zwagerman wrote : 1738
I uploaded all the files to my webserver the only thing I 
added was the database connection. Do I need to change the 
permissions on the files?
 
 Boaz Yahav wrote : 1739
Don't think so but i would look for any php errors. Hopfuly the author will respond soon.
 
 Jay Zwagerman wrote : 1740
Does this work for you? Maybe I am doing something wrong.
 
 Jay Zwagerman wrote : 1741
Is there anyway to get a hold of the author? I have had 
several other people try this and they receive the same 
error I got.
 
 Joseph Crawford wrote : 1742
Sorry,

I have been really busy lately.  THe only thing stopping 
this from working is the regular expression for the 
listings.  THe categories seem to work just fine, however 
they changed the layout for their listing page.  I will 
see if I can get to this sometime this week but cannot 
make any guarantees.
 
 Justin Giesbrecht wrote : 1743
I for some reason cannot get this script to work. 
I created the same database structure... I keep getting the error. 

Fatal error: Class 'URL' not found...

doesn't work the URL, I hard-coded in the URL to see. and still didn't work. what am I missing? 
 
 Jay Zwagerman wrote : 1744
Justin,

You aren't missing anything. If you see I received the 
same error. Joseph is aware of the problem and when he 
gets time he will fix this script.
 
 Joseph Crawford wrote : 1745
Guys the only issue with the code is the regular 
expression matching.  If you see in the text above I 
clearly stated that it would not work out of the box 
because i did not write about all of the classes used.  I 
had a URL class that i used the manage URLs a bit better.  
If you alter the code to not use the URL class then alter 
the regex to work with the new sourcecode from yellowpages 
you should be fine.

This code was not meant to be a canned script rather 
something for you to learn by.
 
 Jay Zwagerman wrote : 1746
Care to give us a hint? I have been trying to get this to work for a couple of weeks with no luck.
 
 Joseph Crawford wrote :1747
You will have to remove all references to the URL class, 
make sure the values are treated as a string.  Also you 
will have to integrate your own database methods.  You 
will then have to modify the regular expression to work 
with their new layout.