Crawling Web Pages and Creating Sitemaps

Creating a Sitemap Based on all the Links within a Website

I built this web crawler because I wanted a way to create a sitemap of this website I was building. I know there are a few websites out there that will do this for you but I didn’t want to rely on someone else and I wanted to change a few things. So in order to do this I used php and cURL.
I started out creating a class for the crawler. When I create a new crawler class I pass in the url of the website I want to start with. This also uses cURL to access the webpage and get the content and headers. Inside this class are also methods to get all the links of a page, the page title, the entire content, just the body content, and the headers. But you could easily add more to say grab all the images on a page.

The Crawler Class

  

class Crawler {
  protected $markup='';
  protected $httpinfo='';

  public function __construct($uri, $justheaders=0){
    $output = $this->getMarkup($uri, $justheaders);
    $this->markup = $output['output'];
    $this->httpinfo = $output['code'];
  }

  public function getMarkup($uri, $justheaders) {
    $ch = curl_init($uri);
    curl_setopt($ch, CURLOPT_HEADER, 1);
    if($justheaders){
      curl_setopt($ch, CURLOPT_NOBODY, 1);
      curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
      curl_setopt($ch, CURLOPT_FAILONERROR, 1);
    }
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt($ch, CURLOPT_MAXREDIRS, 5);

    $output['output'] = curl_exec($ch);
    $output['code'] = curl_getinfo($ch);
    curl_close($ch);
    return $output;
  }

  public function get($type){
    $method = "_get_{$type}";
    if (method_exists($this, $method)){
      return call_user_method($method, $this);
    }
  }

  protected function _get_info(){
    return $this->httpinfo;
  }

  protected function _get_links(){
    if(!empty($this->markup)){
      preg_match_all('/<a(?:.*?)href=(["|\'].*?["|\'])(.*?)>(.*?)\<\/a\>/i',
                               $this->markup, $links);
      return !empty($links[1]) ? array_flip(array_flip($links[1])) : FALSE;
    }
  }

  protected function _get_body(){
    if(!empty($this->markup)){
      preg_match('/\<body\>(.*?)\<\/body\>/msU', $this->markup, $body);
      return $body[1];
    }
  }
  protected function _get_content(){
    if(!empty($this->markup)){
      return $this->markup;
    }
  }

  protected function _get_pagetitle() {
    if (!empty($this->markup)){
     preg_match_all('/<title>(.*?)\<\/title\>/si', $this->markup, $pagetitles);
     return !empty($pagetitles[1]) ? $pagetitles[1] : FALSE;
    }
  }
}


After this I create a recursive function that will follow each of the links. Each time I call this function I create a new instance of the Crawler class. If the url isn’t valid I just return. If the url is redirected curl has an option to follow links, the CURLOPT_FOLLOWLOCATION option. Since this is set to on, you need to get the actual url which is contained in the header information. After this I call the get links function. This will return all the unique links on a page. (Calling the array_flip twice makes them unique).

I then get the title tag for each page. This is used when creating the sitemap. I remove all the script tags and all the html tags. The next thing I do is get a base url. Since I’m creating a sitemap of just one website I don’t need any links to external pages. So if any of the links don’t contain this base url I wont follow it.

Now I begin looping through all the links on the page. If its an external link I return. If it is an absolute link and it contains a “/” I replace the whole thing with a slash. If it doesn’t have the slash but is an absolute link then the new val is “”. I do this because I am prepending the base url that we got earlier to it.

I then explode on the “/”. If the first element is empty then I put the base url in there. Otherwise I prepend the url to it. This is done to get the correct link no matter if it is a relative link, root relative, or absolute.
The complete link is formed and checked to see if it already exist in the globallinkarr. If it doesn’t I add it and then begin getting the different levels. Basically everytime there is a “/” in the url then that is a different level. This is used when I am creating the html sitemap. Also to create this array of levels, I have to call array_merge_recursive. Well the regular php function didn’t quite work. If a url has numbers as one of its levels for example blog/2009/12/post then that function would turn the 2009 to its own key. So I needed it to keep the keys the same so I just got another function off of php.net.

The Function to Get all the Links


$dontfollow = array('pdf', 'jpg', 'png', 'jpeg','zip', 'gz', 'tar', 'txt');

function findAllLinks($url){
  global $globallinkarr;
  global $depthlinks;
  global $dontfollow;
  global $pagetitles;
  global $contentarr;

  $crawl = new Crawler($url);
  $info = $crawl->get('info');

  $validcodes = array(200,301,302);
  if(!in_array($info['http_code'], $validcodes))
    return;
  $url = $info['url'];
  $links = $crawl->get('links');
  $title = $crawl->get('pagetitle');
  $title = $title[0];
  $body = $crawl->get('body');
  $count++;

  $content = strip_tags(preg_replace('//msU', '', $body));

  if(!array_key_exists($url, $contentarr))
    $contentarr[$url] = array('title'=>"$title", 'pagecontent'=>"$content");

  if(!count($links) || !is_array($links)) return;
  else{
    if(preg_match('/http(?:s)?:\/\/(.*?)\/(.*)/', $url, $pattern)){
      $baseurl = $pattern[1];
    }else{
      $baseurl = $url;
    }

    foreach($links as $val){
      if(preg_match('/.*?javascript:void\(0\)/', $val) || ereg('#', $val)){
        continue;
      }
      if(!preg_match('/[0-9a-zA-Z]/', $val)) continue;
      $val = trim($val, '"\'');

      /**
       * CHECK IF LINK IS GOING TO ANOTHER DOMAIN.  IF SO DONT FOLLOW IT.
      */

      if(preg_match('/^http(s)?:\/\//', $val) &&
               !strpos($val, preg_replace('/http(s)?:\/\//', '', $baseurl))){
        continue;
      }

      if(ereg('http', $val) && preg_match('/^http(s)?:\/\/.*?\//', $val)){
        $val = preg_replace('/^http(s)?:\/\/.*?\//', '/', $val);
      }else if(ereg('http', $val)){
        $val = '';
      }

      $sl = explode('/', $val);

      if(!preg_match('/[0-9a-zA-Z]/', $sl[0])){
        $sl[0] = preg_replace('/^http(s)?:\/\//', '', $baseurl);
        $complink = implode('/', $sl);
        $sl = explode('/', $complink);

      }else{
        $prepend = explode('/', preg_replace('/^http(s)?:\/\//', '', $url));
        if(count($prepend)>1){
          array_pop($prepend);
          $prep = implode('/', $prepend);
        }else $prep = $prepend[0];
        $sl[0] = $prep.'/'.$sl[0];
        $complink = implode('/', $sl);

        $sl = explode('/', $complink);

      }
      if(!end($sl)) array_pop($sl);

      if(!in_array($complink, $globallinkarr)){
        $globallinkarr[] = $complink;
        $pagetitles[$complink] = $title;

        $depth = count($sl);
        $templinks = array();
        $newlinks = array();
        if($depth > 1){
           if(!$sl[$depth-1]) $sl[$depth-1] = 'index';
           $templinks[$sl[$depth-2]][] = $sl[$depth-1];

           if($depth > 2){
	     for($i=$depth-2; $i>0; $i--){
               $hold = $templinks;
               $templinks = array();
               $templinks[$sl[$i-1]] = $hold;

             }
          }

          $temp = $templinks[$sl[0]];
          $newlinks[$sl[0]] = $temp;

        }
        $depthlinks = array_merge_recursive2($newlinks,$depthlinks);
        $end = strtolower(end(explode(".", $complink)));

        if(!preg_match('/^http(s)?:\/\//', $complink))
          $complink = 'http://'.$complink;
          if(!in_array($end, $dontfollow) && !ereg("sitemap", $complink)){
            findAllLinks($complink);
          }
       }
    }
    return;
  }
}

The Array Merge Recursive Function I Used From php.net

function array_merge_recursive2($array1, $array2){
  $arrays = func_get_args();
  $narrays = count($arrays);

  // check arguments
  // comment out if more performance is necessary
  //   (in this case the foreach loop will trigger a warning if the argument is not an array)
  for ($i = 0; $i < $narrays; $i ++) {
   if (!is_array($arrays[$i])) {
   // also array_merge_recursive returns nothing in this case
     trigger_error('Argument #' . ($i+1) . ' is not an array - trying to merge array with scalar! Returning null!', E_USER_WARNING);
     return;
    }
  }

    // the first array is in the output set in every case
  $ret = $arrays[0];

  // merege $ret with the remaining arrays
  for ($i = 1; $i < $narrays; $i ++) {
    foreach ($arrays[$i] as $key => $value) {
     /***  KEEP THIS COMMENTED OUT TO KEEP THE ORIGINAL KEYS
     //if (((string) $key) === ((string) intval($key))) { // integer or string as integer key - append
     //   $ret[] = $value;
    // }
    // else { // string key - merge
      if (is_array($value) && isset($ret[$key])) {
        // if $ret[$key] is not an array you try to merge an scalar
        // value with an array - the result is not defined (incompatible arrays)
        // in this case the call will trigger an E_USER_WARNING and the $ret[$key] will be null.
        $ret[$key] = array_merge_recursive2($ret[$key], $value);
      }
      else {
        $ret[$key] = $value;
      }
           // }
    }
  }
  return $ret;
}

So after I created the arrays with all the links I create the sitemaps. The first one here is an xml sitemap used for the robots.txt file.
Its really simple and used the globallinkarr array.

XML Sitemap

function createXMLSiteMap($globallinkarr){
  $xml = '<?xml version="1.0" encoding="UTF-8"?>
    <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">';
   if(count($globallinkarr)){
   foreach($globallinkarr as $val){
     if(!preg_match('/http(s)?:\/\//', $val)){
       $val = 'http://'.$val;
     }
     $xml .= '
       <url>
          <loc>'.str_replace('&', '&amp',$val).'</loc>
       </url>';
    }
  }
  $xml .= '</urlset>';
  return $xml;
}

The html sitemap is created using a recursive function that goes through depthlinkarr. Also it checks if the link is valid if it isn’t in the globallinkarr since all those are already checked. When it does check it only needs the headers so to speed things up I set the curl option CURLOPT_NOBODY to true. The page was timing out on me a lot but with this option set and checking to see if it already existed in the globallinkarr helped stop it from timing out. But if you have a whole lot of links there is a good chance this will cause your page to timeout.

The HTML Sitemap

function createSiteMap($depthlinks, $before = ''){
  global $globallinkarr;
  global $pagetitles;
  $validcodes = array(200,301,302);
  if(count($depthlinks)){
  $sitetree = '<ul style="padding:5px; margin:5px;">';
  foreach($depthlinks as $key=>$val){
    if(is_array($val)){
      if($before) $newbefore = $before.'/';
      $newbefore .= $key;

      $newkey = preg_replace('/^http(s)?:\/\//', '', $newbefore);

    $title = ($pagetitles[$newkey] != "") ? $pagetitles[$newkey] : $newbefore;
      if(!preg_match('/^http(s)?:\/\//', $newbefore))
         $newbefore = 'http://'.$newbefore;
      $exist = 0;
      if(in_array($newbefore, $globallinkarr)){
        $exist = 1;
      }else{
        $test = new Crawler($newbefore, 1);
        $info = $test->get('info');
        if(in_array($info['http_code'], $validcodes))
          $exist = 1;
        }
        if($exist){
          $sitetree .= '
           <li><a style="display:block;" href="'.$newbefore.'"
           target="_blank" title="'.$title.'" />'.$title.'</a></li>';
        }else{
          $sitetree .= '
             <li>'.$title;
        }
        $temp = createSiteMap($val, $newbefore);
        if($temp){
          $sitetree .= $temp;
          $sitetree .= '</li>';
        }
      }else{
        if($before != '') $newval = $before.'/'.$val;
        else $newval = $val;
        $newkey = preg_replace('/^http(s)?:\/\//', '',$newval);
        $title = $pagetitles[$newkey] ? $pagetitles[$newkey] : $newval;
        if(!preg_match('/^http(s)?:\/\//', $newval))
           $newval = 'http://'.$newval;

        $exist = 0;
        if(in_array($newval, $globallinkarr)){
            $exist = 1;
        }else{
          $test = new Crawler($newval, 1);
          $info = $test->get('info');
          if(in_array($info['http_code'], $validcodes))
            $exist = 1;
        }

        if($exist){
          $sitetree .= '
            <li><a href="'.$newval.'" title="'.$title.'"
                target="_blank">'.$title.'</a></li>';
        }
      }
    }
  $sitetree .= '</ul>';
  return $sitetree;
  }
}

Execute Code After Browser Finishes Loading

How to execute a script after the browser stops loading

Recently I needed to execute a bit of code but the browser was timing out. I was actually using the php exec function to tar up some files. The browser kept showing a time out page. So in order to make the browser think it was done loading and then execute this bit of script, I put everything in the buffer with ob_start(). Then when I’m ready to stop the browser I get the size of the buffer. I then set the header content-length equal to this size and flush out the buffer. Now the browser will no longer show that its loading, since it has received all the content it is going to receive. So now you can execute any script.

Here is the code


<?php
ob_end_clean();
header("Connection: close");
ignore_user_abort();  // optional
ob_start();

echo "some content";

$size = ob_get_length();
 header("Content-Length: $size");
 ob_end_flush();
 flush();
//Browser done loading

//put your script here

?>

Round Robin Algorithm

Generating a Schedule Automatically

I need to create a way for every team to play each other in each round of games. I first started out creating a chart of who each team will play. After I created a chart that worked I built an algorithm to build that chart.

The Game Schedule

The Teams: Team 1 Team 2 Team 3 Team 4 Team 5 Team 6
Rounds: Round 1 Team 6 Team 5 Team 4 Team 3 Team 2 Team 1
Round 2 Team 4 Team 6 Team 5 Team 1 Team 3 Team 2
Round 3 Team 2 Team 1 Team 6 Team 5 Team 4 Team 3
Round 4 Team 5 Team 3 Team 2 Team 6 Team 1 Team 4
Round 5 Team 3 Team 4 Team 1 Team 2 Team 6 Team 5

Now you just need to figure out the pattern and we can code it up. I was told there is a name for this pattern but I didn’t know it, so if you know it let me know.

The Variables

$gamearr AND $tempgamearr are both an array of all the teams. The array would look like this: $gamearr[0] = “firstteam”, $gamearr[1] = “secondteam” and so on.

$numteamspadded is the number of teams. If the number was odd I add another team to the end of $gamearr and $tempgamearr as a “byteam”.

$gamenumperteam is however many number of games you want each team to play.

The Logic

As you can see at the beginning I set the variable $firstteam. You realize why I did this if you figured out the pattern above. The pattern above being that I have an array of teams playing the array of teams in reverse order. After each round I decrement the reverse order array and move the last one to the front. So we would have 654321 and then 165432 and then 216543. This would cause teams to play themselves though which is where the $firstteam comes in. Every time a team would play itself we switch it out with the firstteam.

I start out by looping through the number of games and then looping through just half of the teams since I’m setting both the home and away team in there. If I looped through all the teams I would have 1 vs 6 and then 6 vs 1 but I just want the unique games.

I then loop through all the games and set the $gamearr to the values that it would be had I been just decrementing the array and moving the last item to the beginning. Meaning the gamearr will look like this after each round 234561, 345612, 456123, etc. I use the $tempgamearr because I need to know the original order of the teams since we have to switch the first team out when there is a game playing itself.
After this I remove the first team from the array. I then loop through the game to find if one of the teams will be playing itself and get its value and position. This is one of the reasons why I remove the first team from the array. If I left it in there then 1 would play itself and overwrite the fact that another team was playing itself.
I then switch the team that is playing itself with the $firstteam.

After this you have an array of the games for each round.
I then create an array of all the games and check to see if the “byweek” is playing. If it is I don’t add it to the array cause we don’t want a byweek taking up a game slot.

The Round Robin Algorithm


$bottom = $numteamspadded-1;
$half = $numteamspadded/2;

$firsteam = $gamearr[0];
for($i=0; $i<$gamenumperteam; $i++){
    for($j=0; $j<$half; $j++){
         if($numteamspadded==4)
           $rrarr[$i][$j]['home'] = $tempgamearr[$j];
        else $rrarr[$i][$j]['home'] = $gamearr[$j];
        $rrarr[$i][$j]['away'] = $gamearr[$bottom-$j];
    }

    for($j=1; $j<=$bottom+1; $j++){
        $start = ($i+$j)%$numteamspadded;
        $gamearr[$j-1] = $tempgamearr[$start];
    }

    array_splice($gamearr, array_search($firsteam, $gamearr), 1);
    foreach($gamearr as $key=>$val){
        if($tempgamearr[$bottom-$key] == $val){
              $TempGameValue = $val;
              $switch = $key;
          }
    }

    $gamearr[$bottom] = $TempGameValue;
    $gamearr[$switch] = $firsteam;
}
foreach($rrarr as $key=>$val){
  foreach($val as $gkey=>$games){
    if($games['home'] != 'byweek' && $games['away'] != 'byweek'){
      $allgames[] = $games;
    }
  }
}

UPDATE: I came across another good implementation of the round robin here at http://phpbuilder.com/board/showthread.php?p=10935200

Sorting with Umlauts

Recently I needed to have a drop down box for countries. In english I just simply order by name but when you look at the german translation, ordering by name puts all the countries that have an umlaut character at the bottom. For example, I know the German translation for Austria is Österreich. So when I sort the translated names I now want this name to appear after the country named Oman. I know that the german umlaut html entity for ö is &ouml. I use that to get the character to sort the countries with.
Here is the code using PHP.

foreach($country as $val){
  $charset = htmlentities(utf8_decode($val['country_name']));
  if(substr($charset, 0,1)== '&'){
    $newcharset = str_replace(substr((html_entity_decode($charset)), 0,1),
                substr($charset, 1,1), html_entity_decode($charset));
    $origcountry[$val['country_id']] = $val['country_name'];
    $testarr[$val['country_id']]= $newcharset;
    $swap[$val['country_id']] = $newcharset;
  }else{
    $testarr[$val['country_id']] = $val['country_name'];
  }
}

asort($testarr);
foreach($testarr as $ckey=>$val){
  if($key= array_search($val, $swap)){
    $newcountryarr[$ckey] = $origcountry[$key];
  }else{
    $newcountryarr[$ckey] = $val;
  }
}

Create Images From HTML

The Problem:

I want a simple way to create an image from dynamically generated html.

So Recently I’ve been working a lot inside tinymce, a javascript html WYSIWYG editor. I’ve wrote a couple plugins to manage/upload files and images utilizing my flash uploader. Tinymce has a couple plugins you can buy, but if you’re a developer seems like a waste of money. Maybe I’ll write a basic tutorial on how to upload pics and insert them into tinymce next time. Anyway on top of that I wanted tinymce to work like word in that it would have paging. Once I got that paging working, for the most part, I wanted to have thumbnails of each page. This is where my problem comes in. Basically each page in tinymce is wrapped in a div with a unique id. Therefore I have access to the html of each page, but how can I create an image from just a string of html? I looked all over the web and there were tools to do it but all of them cost some money and seemed more complicated than it should be. I knew I needed some way to render the html server side. There are tools out there that are open source like Gecko and Webkit, but this just seemed more complicated than it should be to create an image.
So I started thinking of other ways and I knew specifically that php has built in pdf support. I ended up finding dompdf. The rendering engine for this seemed to be really accurate and supports both inline styles and css. After creating the pdf I found that ImageMagick can convert pdfs to an image using GhostScript, an interpreter for the PostScript language and portable document format.
After those are installed, it is really simple. In my situation I’m updating the thumbnails via ajax to keep them up to date. I’m also using jQuery to make the ajax call. I pass the id of the page and the content of the page. I then return the id of the page with the name of the image separated by a colon. Also if you were wondering, the id of the actual page is the same as the id of the thumbnail. This can be done because tinymce is actually in an iframe so technically there is only one id per page. Side Note: “Apparently a lot of people don’t realize that ids are supposed to be unique per page. If you want to style more than one element use classes not ids. People then tell me, ‘But I’ve never had a problem with using the same id.’ It’ll render just fine but if you ever use javascript and use document.getElementById(‘someid’) then you’ll have a problem. Anyway I digress, back to creating images.” Here is the code to make the ajax call.


   $.post('ajax/test.php', {id:pageid, content:str}, function(data){
	var d = data.split(':');
	$('#'+d[0]).html('<img src="'+d[1]+'" alt="thumbnail" />');
  });

Now here is the ajax page that actually does the conversion. I first create the pdf from the content I passed to it. I have to add the <html> <body> tags for this to work correctly. There is also a lot of things you can do when creating the pdf, just look at the documentation to learn more. I catch the output of the pdf and then put the content into a file called test.pdf. I then use the time() function to create a unique image name so the browser wont display a cached version of the image. Finally I call the system function which is used to execute an external command. In this case the command is ImageMagick’s convert function. Side Note: When trying to run “convert” in Windows without the path in front it will possibly fail because it is trying to execute Windows convert program. There are a couple ways to get around it. One way being to look in the PATH environment variables and set the ImageMagick path closer to the front in the PATH system variables so that it comes before the Windows convert program. Some people also said that changing ImageMagick’s convert.exe name to imconvert.exe and call that instead will solve the issue.

   
require_once("../dompdf-0.5.1/dompdf_config.inc.php");

 $id = $_POST['id'];
 $html = '<html><body>'.html_entity_decode($_POST['content']).'</body></html>';

 $paper = 'letter';
 $orientation = 'portrait';

 $old_limit = ini_set("memory_limit", "32M");

 $dompdf = new DOMPDF();
 $dompdf->load_html($html);
 $dompdf->set_paper($paper, $orientation);
 $dompdf->render();
 $pdf = $dompdf->output();
 file_put_contents("test.pdf", $pdf);

$imagesample =  'sample'.time().'.jpg';

$source = 'C:\\your\\director\\here\\test.pdf';
$dest = 'C:\\your\\directory\\here\\'.$imagesample;
system("C:\\ImageMagick-6.5.4-Q16\\convert.exe $source $dest", $ret);

echo $id.':'.$imagesample;

And that is it! Cheers!

PHP Menu with jQuery Drop Down

I wanted to create a menu where each menu could have submenus. I wanted a way where I could keep adding more submenus very easily. Also when a you are on a page it would be selected and all the parent menus would still stay selected. I decided to make a function to recursively go through an array to create the menu.

I started out with an array() like this.

$tabs = array(“page.php”=>”Page Name”, “page2.php”=>”Page Two”,
“page3.php”=>”Page Three”, “page4″=>”Page Four”);

This is very simple to go through but lets say Page Two has submenus and those submenus have submenus. So I would set it up like this.

$subpage[‘Subpage 2’] = array(“subpg3.php”=>”Sub Sub Page”);
$page2[‘Page Two’] = array(“subpage.php”=>”Subpage Name”, “subpage2.php”=>$subpage);

$tabs = array(“page.php”=>”Page Name”, “page2.php”=>$page2,
“page3.php”=>”Page Three”, “page4″=>”Page Four”);

So now when you’re looping through all the $tabs you’ll need to loop through the $page2 array() as well and then the $subpage array and take the keys for each of the submenus to set the name for the parent menu. Also if I am on subpg3.php I still want to keep “Subpage 2” and “Page Two” selected.

Another thing is if we don’t want to show the page name in the menu but we still want the parent menus selected. All we do is set the value to “HIDDEN” like so:
$subpage[‘Page Two’] = array(“profile.php”=>”Profile”, “profile_edit.php”=>”HIDDEN”);

So here is the function to get all the tabs.

First I declare some variables.
The global $filename could simply be $_SERVER[“PHP_SELF”]. but I personally like to use preg_match to get only the filename since $_SERVER[‘PHP_SELF’] will return the filename along with any subfolder its in. So I do

preg_match("/.*\/(.*)/", $_SERVER['PHP_SELF'], $name);
$filename = $name[1];

The global $dropdown is used to store all the submenus. I used this with jquery to create dropdown menus. If you dont want dropdowns then you can disregard anything commented for dropdowns.

The $found gets set to 1 as soon as the $filename matches one of the tabs.
The $menu is the menu that gets returned. The menu gets returned in an array format like this:
$menu[0] = array(“Home”=>”<a href=’home.php’ class=’selected’>Home</a>”, “About”=>”<a href=’about.php’ >About</a>”);
$menu[2] = array(“test”=>”<a href=’test.php’ class=’selected’>Test</a>”);

you need to do a ksort on menu to make it go $menu[0], $menu[2], $menu[4] instead of $menu[0], $menu[4], $menu[2];


function recursive_tabs($tabs, $row=1){
    global $filename;
    global $dropdown;
    static $found=0;

    $row = $row-1;
    $menu= array();
    foreach($tabs as $key=>$tab){

I check to see if $tab is an array; if it is and $found == 0 then I go into this “if statement” which calls this function again.

     
        if(is_array($tab) && !$found){
            $row = $row+2;
            $submenu = recursive_tabs($tab, $row);

            $tabkey = array_keys($tabs[$key]);

The next block is used for the dropdown array. We need to get all the keys of the $submenu. We need this to match up the $dropdown array correctly with its parent’s row. And then we need the keys of $tabs to match up the $dropdown with the name of its parent.
For example: if the $menu was
$menu[0] = array(“Home”=>”<a href=’home.php’ class=’selected’>Home</a>”, “About”=>”<a href=’about.php’ >About</a>”);

and $menu[0][‘Home’] had submenus then the dropdown would be $dropdown[0][‘Home’] = array(0=>”<a href=’drop.php’>Drop</a>”); so that they match up.


            $subkeys = array_keys($submenu);
            $sizeofkeys = array_keys($subkeys);

            $tabkeys = array_keys($tabs);

            if($subkeys[0]+1 == $row){
                foreach($submenu[$subkeys[0]] as $skey=>$sval){
                    $dropdown[$subkeys[0]-2][$tabkeys[0]][] = $sval;
                }
            }

if the $row is not 0, 2, 4, etc.. then I want to go ahead and return the $submenu.


            if($row%2 != 0  && $row !=0){
                return $submenu;
            }

Next I need to set the row back to the previous number. Then if the filename matches the current tab I loop through the submenu and set it to the menu. Also I check to see if $tabkey[0] == ‘HIDDEN’. $tabkey[0] is the name of each file. If it equals HIDDEN I don’t store it in the menu array but the parents of it will still be selected.
If it is not found I simply add the current $key to the menu.


            $row = $row -2;
            if($filename==$key) $found = 1;
            if($found){
                foreach($submenu as $subkey=>$sub){
                   $menu[$subkey] = $sub;
                }
                if($tabkey[0] != "HIDDEN"){
                   $menu[$row][$tabkey[0]] = '<a class="selected"
                      href="'.$key.'">'.$tabkey[0].'</a>
                }
            }else{
                if($tabkey[0] != "HIDDEN")
                   $menu[$row][$tabkey[0]] = "<a
                        href='$key'>".$tabkey[0]."</a>";
            }

If $tab is an array but $found == 1 I go into this one.


        }else if(is_array($tab)){

            $tabkey = array_keys($tabs[$key]);
            if($tabkey[0] != "HIDDEN")
                $menu[$row][$tabkey[0]] = "<a href='$key'>
                      {$tabkey[0]}</a>";

This next block is for the dropdown menu. Same thing we used above.


            $row = $row+2;
            $submenu = recursive_tabs($tab, $row);

            $subkeys = array_keys($submenu);

            $tabkeys = array_keys($tabs);

            if($subkeys[0]+1 == $row){
                foreach($submenu[$subkeys[0]] as $skey=>$sval){
                    $dropdown[$subkeys[0]-2][$tabkeys[0]][] = $sval;
                }
            }
             $row = $row -2;

The last block is if $tab is not an array. If the $filename == $key then I set $found = 1 and also a change the class to “selected”.


        }else{
            if($filename==$key){
              if($tab != "HIDDEN"){
                $menu[$row][$tab] = '<a class="selected"
                                       href="'.$key.'" >'.$tab.'</a>
              }
              $found = 1;
            }
            else {
                if($tab != "HIDDEN")
                $menu[$row][$tab] = "<a href='$key'  >$tab</a>";
            }
        }
    }
     return $menu;
}

That’s it. Here is all the code together. If you want to see how to use this menu to create a drop down menu keep reading.


function recursive_tabs($tabs, $row=1){
    global $filename;
    global $dropdown;
    static $found=0;

    $row = $row-1;
    $menu= array();
    foreach($tabs as $key=>$tab){
        if(is_array($tab) && !$found){
            $row = $row+2;
            $submenu = recursive_tabs($tab, $row);

            $tabkey = array_keys($tabs[$key]);
            $subkeys = array_keys($submenu);

            $tabkeys = array_keys($tabs);

            if($subkeys[0]+1 == $row){
                foreach($submenu[$subkeys[0]] as $skey=>$sval){
                    $dropdown[$subkeys[0]-2][$tabkeys[0]][] = $sval;
                }
            }
            if($row%2 != 0  && $row !=0){
                return $submenu;
            }

            $row = $row -2;
            if($filename==$key) $found = 1;
            if($found){
                foreach($submenu as $subkey=>$sub){
                   $menu[$subkey] = $sub;
                }
                if($tabkey[0] != "HIDDEN"){
                $menu[$row][$tabkey[0]] = '<a class="selected"
                                      href="'.$key.'" >'.$tabkey[0].'</a>
                }
            }else{
                if($tabkey[0] != "HIDDEN")
                    $menu[$row][$tabkey[0]] = "<a
                                     href='$key'>".$tabkey[0]."</a>";
            }
        }else if(is_array($tab)){
            $tabkey = array_keys($tabs[$key]);
            if($tabkey[0] != "HIDDEN")
                $menu[$row][$tabkey[0]] = "<a href='$key' >
                                                          {$tabkey[0]}</a>";

            $row = $row+2;
            $submenu = recursive_tabs($tab, $row);
            $subkeys = array_keys($submenu);
            $tabkeys = array_keys($tabs);

            if($subkeys[0]+1 == $row && $row>2){
                foreach($submenu[$subkeys[0]] as $skey=>$sval){
                    $dropdown[$subkeys[0]-2][$tabkeys[0]][] = $sval;
                }
            }
             $row = $row -2;
        }else{
            if($filename==$key){
              if($tab != "HIDDEN"){
                 $menu[$row][$tab] = '<a class="selected"
                                             href="'.$key.'" >'.$tab.'</a>
              }
              $found = 1;
            }
            else {
                if($tab != "HIDDEN")
                  $menu[$row][$tab] = "<a href='$key' >$tab</a>";
            }
        }
    }
     return $menu;
}

Once this function returns we’ll need to loop through all the menu’s.

This first thing I do is create an array of divs to wrap the menus in. This is used for styling.


$subdiv[0] = "<div id='toplinks_css'>";
$subdiv[1] = "<div id='sublinks_css'>";
$subdiv[2] = "<div id='sublinks2_css'>";

So as I’m looping through each of the menu items I add a wrapper div around each of them with a unique ID so it can be used with the jquery drop down. The drop down is also wrapped in this div. I also wrap a div around just the link. This is so jquery can can change the style of just that link if you wanted.
The next thing is if there is a drop down menu to add a wrapper around it with position of relative. This will allow each of the drop downs to be positioned under their parent menu. I then wrap the drop down in another div with a unique Id used by jquery and a class called dropdown. The main thing that you need to have in this class is position:absolute, display:none; and z-index:999 (“some number to make sure drop down goes on top”).


$head .= "<script type='text/javascript'>
$(document).ready(function(){";
$count = 0;
foreach($menu as $key=>$tabs){

$str .= $subdiv[$i];
foreach($tabs as $tkey=>$tab){
  $str .= "<div id='header$count' style='float:left; margin-left:20px;'>
       <div id='head$count'>$tab</div>";
  if($dropdown[$key][$tkey]){
     $str .= "<div style='position:relative;'>
                     <div id='dropdowncontainer$count' class='dropdown'>
                         <div class='inner'>";
     foreach($dropdown[$key][$tkey] as $dkey=>$dval){
       $str .= $dval;
    }
$str .= "</div>
</div>
</div><div style='clear:both;'></div>";
}

The next bit is the jQuery. I use a jQuery plugin called hoverIntent which allows you to set a time that the mouse must be over the link before it will execute. This way if you’re just moving the mouse across the page the drop down wont show up. It takes two functions. The first one is on hover and the second is off hover.


$head .= "$('#header$count').hoverIntent(function(){
    $('#head$count a').css('background-color','#0167B1');
";

   if($dropdown[$key][$tkey])
     $head .= "$('#dropdowncontainer$count').show();";
$head .= "    },

  function(){
    $('#head$count a').css('background-color','');";
    if($dropdown[$key][$tkey])
    $head .= "        $('#dropdowncontainer$count').hide();";

$head .= "  });  ";

$str .= "</div>";
$count++;
}
$str .= "<div style='clear:both;'></div></div>";
$i++;
}
$head .=   "});
</script>";

And that will give you a dropdown box which you can now style however you want.

Click here to view the example.