Using the PHP QueryPath library


QueryPath is a "PHP library for working with XML and HTML." Work it, it does! Basically, QueryPath brings the CSS selector functionality of jQuery to PHP. Sure, we already have DomDocument and SimpleXML. However, sometimes these don't go far enough. For instance, what if you want to select a XML or HTML element by its class. This can be difficult to do in PHP as there is no getByClassName method. Rather than resort to someone's else hacked solution or some custom regex, just use QueryPath.

You can either get the QueryPath library here along with the documentation or if you are using Drupal then you can get the QueryPath module here.

Just to get you started here is an example of how I am using QueryPath in Drupal. Note that this requires the QueryPath module to be enabled, if not in Drupal you need to do a require('pathto/QueryPath'); at the top with pathto obviously being the path to whereever you downloaded QueryPath.  (There isn't a QueryPath module for Drupal 5.x yet, so I suggest just using the library itself if you are on 5). Example code: 

Update: You will need to first clean up your input or tell QueryPath to ignore parser warnings. Thanks to Matt over at TechnoSophos for the proper syntax. To clean up your input you can either use HTML Tidy which is also available as a PHP Pear package or the way I recommend which is to get the htmLawed PHP library (Note this is different than the Drupal module of the same name which affects input filters) and use the htmLawed function to clean your input. To tell QueryPath to ignore parser errors, or use the alternate syntax I provide as a comment after the qp() function in the code below, where you we add an ignore_parser_warnings flag. 

<?php

// Note that the QueryPath module needs to be installed and enabled

class CalendarCalendarAction{

public static function get_calendar($url){

 $_calendar_read = drupal_http_request($url);

 if ($_calendar_read->data) {

      $query_path_date = qp($_calendar_read->data)->find('date');

     // note that if you have dirty HTML you should use the syntax: $query_path_date = @qp($_calendar_read->data, 'date', array('ignore_parser_warnings' => TRUE));

      foreach ($query_path_date as $qp_date) {

          $calendar_html = $calendar_html . $qp_date->html(); // html method gets the innerHTML of the element

          $qp_next = $qp_date->nextAll();

          foreach ($qp_next as $qp_siblings) {

            if($qp_siblings->hasClass('date')) {

              break;

            }

            if($qp_siblings->hasClass('event')) {

              $calendar_html = $calendar_html . $qp_siblings->children()->html();

            }

          }

      }

   return $calendar_html;

 }

}

}

And of course the HTML file we would pass as the $url would contain: 

<html>
<body>
<div id="main">
<p class="date">Saturday, December 12, 2009</p>
<div class="event"> <p><a href="5">Concert</a></p> </div>
<div class="event"> <p><a href="4">Theatre</a></p> </div>
<p class="date">Sunday, December 13, 2009</p>
<div class="event"> <p><a href="3">Lecture</a></p> </div>
<div class="event"> <p><a href="2">Concert</a></p> </div>
<p class="date">Monday, December 14, 2009</p>
<div class="event"> <p><a href="1">Talk</a></p> </div>
<div class="event"> <p><a href="0">Recital</a></p> </div>
</div>
</body>
</html>

 So now, you can make a call to the class by either creating an object or accessing the method directly such as with the following code: 

$url = myhtmlfilethatIshowedyouabove.html
$calendar_html = CalendarCalendarAction::get_calendar($url);
print $calendar_html;

And you will get back the HTML just as you wanted it, getting the elements that had class="date" and class="event". In this case, the foreach loops and ifs are the way they are because we had HTML which wasn't nested, so we had to use the NextAll() method to get the sibling of each date classed element and then we loop through to get all the event classed elements until we hit a date classed element at which point we break and then loop through the events for the next date. Alright, hope that explains that and good luck in your universal travels! 

Comment using an existing account (Google, Twitter, etc.)