|
|
|
|
Like this code?
Show the author your appreciation.
|
|
| |
ok so last time i showed you how to reach out to 411.com and get the address, etc.. from a list of phone #'s. I have since come up with a more automated way to acomplish this.
PLEASE NOTE - This code will not work out of the box, you need to alter the code to use your database abstraction layer. Everything except the database calls should work fine however.
First i needed to grab all the category id's from this page
http://yellowpages.superpages.com/topcats.jsp?LTTR=A
there is a page for every letter in the alphabet and even one for # i did not worry about the # though i could have easilly included that.
you will need to create a database table for the categories
| CREATE TABLE `yp_categories` (
`id` int(11) NOT NULL auto_increment,
`catid` varchar(15) NOT NULL default '',
`letter` char(1) NOT NULL default '',
`fetched` int(1) NOT NULL default '0',
`timestamp` int(11) NOT NULL default '0',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ; | |
I created a class called YellowPages and will show it in a bit, for now here's the page i call to fetch the categories.
| <?php
include_once('YellowPages.php');
include_once('Database.php');
ini_set('max_execution_time', 180);
$a = ord('A');
$z = ord('Z');
$count = 0;
for($x = $a; $x <= $z; $x++) {
$yp = new YellowPages();
$letter = chr($x);
$cats = $yp->FetchCategories($letter);
foreach($cats as $cat) {
$sql = "SELECT id FROM yp_categories WHERE catid='".$cat."'";
$res = $db->Execute($sql);
if($res->NumRows() == 0) {
$count++;
$sql = "INSERT INTO yp_categories (catid, letter)
VALUES('".(string)$cat."', '".$letter."')";
$db->Execute($sql);
}
}
}
echo 'Added '.$count.' New Categories.';
?> | |
basically this script just loops from A-Z and calls the FetchCategories method with the current letter. Now let's see the code to FetchCategories.
| <?php
public function FetchCategories($letter) {
$url = new URL('http://yellowpages.superpages.com/topcats.jsp?LTTR='.strtoupper($letter));
$this->Execute($url->__toString());
preg_match_all(self::CATEGORY_REGEX, $this->source, $cats);
return $cats[1];
}
?> | |
dont worry about the $url = new URL(); I have a URL object that i use but will not show in this example.
This basically passes the URL to the execute method, then checks to make sure there are categories in the $this->source property.
| <?
const CATEGORY_REGEX = '/&CID=([0-9]+?)&/is';
?> | |
that's the regex that is used in this method. Now let's take a look at the execute method.
| <?
public function Execute($url = '') {
if($url == '') $url = $this->url;
$url = explode('?', $url);
$ch = curl_init($url[0]);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $url[1]);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_COOKIE, 1);
curl_setopt($ch, CURLOPT_ENCODING, "gzip,deflate");
curl_setopt($ch, CURLOPT_USERAGENT, "User-Agent=Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7");
curl_setopt($ch, CURLOPT_REFERER, "http://yellowpages.superpages.com/");
curl_setopt($ch, CURLOPT_COOKIEJAR, 'e:\htdocs\tmp\cookies\superpages.cookiejar.txt');
$this->source = curl_exec($ch);
curl_close($ch);
}
?> | |
This method uses curl to reach out to the URL and get the page source.
then it set's $this->source which was used in the FetchCategories method.
The preg_match_all in the FetchCategories method returns an array of the category id's. These are then stored in the database by the fetchcategories.php
Now that you have the categories, you can next create a page fetchrecords to reach out to each category and fetch the records from the page(s) If you look at this page
http://yellowpages.superpages.com/listings.jsp?STYPE=&CB=1&C=&CID=00000469515&cbdt=Antiques&E=&L=+VT+&A=&X=&P=&AXP=&PS=15&search=Find+It
you will notice that you are looking at 1-20 of 194 records, so there are more pages. This class is smart enough to loop over each page for this category while fetching records, it doesnt just fetch the records from the first page.
So here's the code for the fetchrecords.php page
| <?php
include_once('YellowPages.php');
include_once('Database.php');
// NOTICE I SET THE TIME OUT LIMIT TO 160 Minutes (2.66 Hours)
// THIS IS NECESSARY IF YOU ARE FETCHING A LOT OF RECORDS
// YOU MIGHT ALSO HAVE TO INCREASE THIS IF YOUR STATE HAS MORE RECORDS
// THAN MINE.
ini_set('max_execution_time', 9600);
$timestamp = time();
if(isset($_POST['submit'])) {
$count = 0;
$skipped = 0;
$yp = new YellowPages();
$letter = strtoupper($_POST['letter']);
$rs = $db->Execute("SELECT catid FROM yp_categories WHERE letter='".$letter."' AND fetched=0");
$records = array();
$rs->FirstRow();
while(!$rs->EOF()) {
$r = $rs->fields;
$rets = $yp->FetchRecords($r['catid']);
foreach($rets as $ret) array_push($records, $ret);
$sql = "UPDATE yp_categories SET fetched=1, timestamp=".$timestamp." WHERE catid=".$r['catid'];
$db->Execute($sql);
$rs->NextRow();
}
foreach($records as $record) {
$sql = "SELECT id FROM yp_records WHERE phone='".$record['phone']."'";
$res = $db->Execute($sql);
if($res->NumRows() != 0) {
$skipped++;
continue;
} else {
$sql = "INSERT INTO yp_records (title, address, city, state, zip, phone, fax, toll, email, url) VALUES('".mysql_real_escape_string($record['title'])."', '".mysql_real_escape_string($record['address'])."', '".mysql_real_escape_string($record['city'])."', '".mysql_real_escape_string($record['state'])."', '".mysql_real_escape_string($record['zip'])."', '".$record['phone']."', '".$record['fax']."', '".$record['toll']."', '".mysql_real_escape_string($record['email'])."', '".mysql_real_escape_string($record['url'])."')";
$db->Execute($sql);
$count++;
}
}
echo 'inserted '.$count.' records. skipped '.$skipped.' records.';
}
echo "
<html><head><title>Yellow Pages :: Fetch Records</title></head><body>
<form action='".$_SERVER['PHP_SELF']."' method='POST'>
<input type='text' name='letter' size='1' maxlength='1'> <input type='submit' name='submit' value='Fetch Categories'>
</form>
</body>
</html>
";
?> | |
Now as you see you need a database table for the records as well, here's the structure i have used.
| CREATE TABLE `yp_records` (
`id` int(11) NOT NULL auto_increment,
`title` varchar(150) NOT NULL default '',
`address` varchar(255) NOT NULL default '',
`city` varchar(60) NOT NULL default '',
`state` varchar(20) NOT NULL default '',
`zip` varchar(10) NOT NULL default '',
`phone` varchar(10) NOT NULL default '0',
`fax` varchar(10) NOT NULL default '0',
`toll` varchar(10) NOT NULL default '0',
`email` varchar(255) NOT NULL default '',
`url` varchar(255) NOT NULL default '',
`notes` varchar(255) NOT NULL default '',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ; | |
basically this page will check to see if a form is submitted, if not it will display the form. You enter a letter into the form and click submit, the script will then load up all the categories (which you previously grabbed) which are part of the A category. It will then reach out and grab the results for each category/page.
Let's take a look at the FetchRecords method
| <?
public function FetchRecords($catid) {
$url = new URL('http://yellowpages.superpages.com/listings.jsp?STYPE=&CB=1&C=&CID='.$catid.'&E=&L=+VT+&A=&X=&P=&AXP=&PS=15&search=Find+It');
$this->ParsePage($url->__toString());
return $this->records;
}
?> | |
again ignore the = new URL() but you will need to change something in the url you use. You will notice the &L=+VT+& you should change ONLY the VT to the abbreviation of whatever state you wish to target.
This will then call the ParsePage method passing in the URL.
The ParsePage method is the method that will loop over every record and grab the necessary information, it will also check to see if the category has a NEXT page, if so it calls the parse page method again with the next url automatically. This process will continue until there are no more NEXT pages, it will then move on to the next category.
I am not showing all code but you can find it in the attached zip file.
This is a great tool if you need to create a (snail mail) mailing list
quickly.
Enjoy :) |
|
| This script is a contact form between users of a
website (kinda like the PM function on the forums)
Categories : PHP, Databases, MySQL, Regexps | | | MySQL Handler Categories : PHP, Databases, MySQL, Classes and Objects, PHP Classes | | | Powerful php/mysql Pagination for up to 6 URL Params Categories : PHP, PHP Classes, Databases, MySQL, Navigation | | | PostGreSQL and MySQL 2 in 1 db Manager Categories : PHP, PHP Classes, Databases, PostgreSQL, MySQL | | | MySQL Class to ease Database connectivity Categories : MySQL, PHP Classes, Databases, PHP | | | usercounter class Categories : PHP, PHP Classes, Databases, MySQL, Environment Variables | | | Simple Mini Poll class library (SimPoll) Categories : PHP, PHP Classes, Databases, MySQL, Complete Programs | | | Online Automatic Class Generator for MySQL Tables Categories : PHP, PHP Classes, Classes and Objects, Databases, MySQL | | | Specify your connection settings and create a link to a MySQL database. Categories : PHP, PHP Classes, Databases, MySQL, Beginner Guides | | | Simple database class Categories : PHP, PHP Classes, MySQL, Databases | | | Simple usersOnline class - keep track of how many users are online on your site Categories : PHP, PHP Classes, Databases, MySQL | | | Setting up InnoDB on MySQL and using Transactions Begin, Commit, Rollback in PHP. Categories : PHP Classes, Databases, PHP, MySQL, InnoDB | | | Ajax PHP Tree (Left and Right) with MySQL Categories : PHP, Databases, MySQL, AJAX, PHP Classes | | | Convert SQL from oracle,mysql,mssql,sqlite and odbc to SQL compatible Categories : PHP, PHP Classes, Databases, MySQL, MS SQL Server | | | MySQL Connection/Query Class Categories : Databases, MySQL, PHP, PHP Classes | |
| | | | Jay Zwagerman wrote : 1734
I can't get this code to work. I keep getting a fatal error
Fatal error: Class 'URL' not found in /home/jayzwag/public_html/YellowPages.php on line 37
Any ideas?
| | | | Boaz Yahav wrote : 1735
Did you download and include the files that are attached to this code example?
| | | | Jay Zwagerman wrote : 1736
Yes, I downloaded the zip file that contained
fetchcategories.php
fetchrecords.php
YellowPages.php
I don't see a class URL anywhere on the pages. Is there something missing?
| | | | Boaz Yahav wrote : 1737
Did you check for any php errors?
Are you sure the files are included?
| | | | Jay Zwagerman wrote : 1738
I uploaded all the files to my webserver the only thing I
added was the database connection. Do I need to change the
permissions on the files?
| | | | Boaz Yahav wrote : 1739
Don't think so but i would look for any php errors. Hopfuly the author will respond soon.
| | | | Jay Zwagerman wrote : 1740
Does this work for you? Maybe I am doing something wrong.
| | | | Jay Zwagerman wrote : 1741
Is there anyway to get a hold of the author? I have had
several other people try this and they receive the same
error I got.
| | | | Joseph Crawford wrote : 1742
Sorry,
I have been really busy lately. THe only thing stopping
this from working is the regular expression for the
listings. THe categories seem to work just fine, however
they changed the layout for their listing page. I will
see if I can get to this sometime this week but cannot
make any guarantees.
| | | | Justin Giesbrecht wrote : 1743
I for some reason cannot get this script to work.
I created the same database structure... I keep getting the error.
Fatal error: Class 'URL' not found...
doesn't work the URL, I hard-coded in the URL to see. and still didn't work. what am I missing?
| | | | Jay Zwagerman wrote : 1744
Justin,
You aren't missing anything. If you see I received the
same error. Joseph is aware of the problem and when he
gets time he will fix this script.
| | | | Joseph Crawford wrote : 1745
Guys the only issue with the code is the regular
expression matching. If you see in the text above I
clearly stated that it would not work out of the box
because i did not write about all of the classes used. I
had a URL class that i used the manage URLs a bit better.
If you alter the code to not use the URL class then alter
the regex to work with the new sourcecode from yellowpages
you should be fine.
This code was not meant to be a canned script rather
something for you to learn by.
| | | | Jay Zwagerman wrote : 1746
Care to give us a hint? I have been trying to get this to work for a couple of weeks with no luck.
| | | | Joseph Crawford wrote :1747
You will have to remove all references to the URL class,
make sure the values are treated as a string. Also you
will have to integrate your own database methods. You
will then have to modify the regular expression to work
with their new layout.
| |
|
|
|