Drupal Development made easier with Qt Assistant - Part 3a

Welcome back! It's been a while since I posted parts one and two of this series (yeah, I should blog more often...). Remember I was making Drupal documentation available in the Qt Assistant, a useful documentation viewer? Nice. I have some good news: I have spent some time again to get the documentation in decent order. So, without furder ado, let's continue!

Plan for today

In part one we prepared our system and got the necessary tools ready to begin our work. In part two we actually got our Drupal documentation working in the Qt Assistant, although there were several problems in the layout, as you may have noticed.

After catching up with the changes in Drupal documentation since last time, today we will take a deeper look at some of the (Drupal specific) problems in the generated documentation. This includes getting back some text that is lost in the Doxygen process, by preprocessing the code that goes into Doxygen. This is subpart A, soon we'll also get the in-depth topics (e.g. the Forms API reference) working as well, in part 3B.

Changes since last time

Since the last time, there have been a few changes in the resources we used. Although I am now using a different Linux distribution, it seems that the Doxygen templates got a bit of a make-over: your documentation will look even better with the latest version. Everything looks a bit more modern now, which is good. You may want to update your software (Doxygen and Qt) to the latest versions if you haven't already done so, just for the sake of it.

A more important change, however, is that the Drupal Developer Documentation on CVS no longer includes examples. Instead, the examples have now moved to their own project. You can grab a snapshot of the appropriate version and extract it to your developer subdirectory, which contains the Drupal Developer Documentation. Another useful option is to also grab this from CVS (see "Checking out from the contributions repository" for detailed instructions):

cvs -z6 -d:pserver:anonymous:anonymous@cvs.drupal.org:/cvs/drupal-contrib checkout -d developer/examples contributions/modules/examples

Don't forget to update your developer documentation to the latest version as well. Especially if you're using Drupal 6, you may want to switch to the appropriate branch as follows (from the Drupal base directory, replace -r DRUPAL-6--1 with -A for the latest HEAD version):

cvs -d:pserver:anonymous:anonymous@cvs.drupal.org:/cvs/drupal-contrib update -d -P -r DRUPAL-6--1 developer

Just for completeness' sake, if you want to do a clean checkout of all this developer documentation, enter the following two commands:

cvs -z6 -d:pserver:anonymous:anonymous@cvs.drupal.org:/cvs/drupal-contrib checkout -r DRUPAL-6--1 -d developer contributions/docs/developer
cvs -z6 -d:pserver:anonymous:anonymous@cvs.drupal.org:/cvs/drupal-contrib checkout -r DRUPAL-6--1 -d developer/examples contributions/modules/examples

All done? Good, we're ready to move on again...

Noticed some problems?

If you used the documentation we created last time, you may have noticed some odd problems. We'll skip the broken links on the main page for now. First we deal with missing text. Look at the documentation of the t() function (in includes/common.inc). You can see what's wrong in the screenshot as well:

Broken Documentation

Notice how the @variable is missing in the second bullet, and the % character in front of variable in the third bullet? A closer look at the Doxygen error output will tell us that it doesn't know the @variable command. Doxygen commands are prefixed with a '@' or '\' in Doxygen, but the code Drupal API module (which is used to run api.drupal.org) happily ignores any commands it doesn't know. Doxygen doesn't, which causes an issue here. With the '%' character, we tell Doxygen that we do not want the word 'variable' to be auto-linked. But again, the Drupal API module doesn't seem to use that and simply prints it out.

We will take a few steps to solve these problems. Part one: time for some PHP Regular Expressions magic!

A basic preprocessor

Create a new plain text file with a nice name such as perprocess-drupal-doxygen.php. This tells us exactly what it does: preprocess Drupal files for Doxygen; and it's a PHP script. Let's set up a quick base class first by pasting the following code in the file:

  1. #!/usr/bin/php
  2. <?php
  3. abstract class Preprocessor {
  4.   private $_filename = '';
  5.  
  6.   public function __construct ($filename) {
  7.     $this->_filename = $filename;
  8.   }
  9.  
  10.   public function process() {
  11.     $contents = file_get_contents($this->_filename);
  12.     // Convert Mac/Win line breaks to Unix format.
  13.     $contents = str_replace("\r\n", "\n", $contents);
  14.     $contents = str_replace("\r", "\n", $contents);
  15.  
  16.     return $this->doProcess($contents);
  17.   }
  18.  
  19.   protected abstract function doProcess($contents);
  20. }

Note that I have omitted the comments in this (and further) code snippets; you can download the full script at the bottom of this post, with all comments.

Save the file and make sure you have execute permissions. On Windows, you may want to set up a file association such that *.phpx is executed by php (or php_cli), and name the file accordingly. Another option is creating a .cmd file which runs the PHP script (passing all comand line arguments).

The blob sets up a few things. The first line says the script is to be executed by PHP. Furthermore there is a Preprocessor class, which reads a file and changes all types of newlines into Unix format, for convenience (code borrowed from the Drupal API module). It then calls the (abstract) doProcess() function on the content. We'll work on this function in two child classes (the second one in part 3B). These will do the actual work.

Now we'll add the CodePreprocessor class which will handle our code files (.php, .module, .inc etc.). First things first: let's go and detect the comment blocks.

  1. class CodePreprocessor extends Preprocessor {
  2.  
  3.   public function __construct ($filename) {
  4.     Preprocessor::__construct($filename);
  5.   }
  6.  
  7.   protected function doProcess($contents) {
  8.     // Beyond Drupal's API module: we also work on blocks started with "/*!"
  9.     $contents = preg_replace_callback('@/\*[\*!](.*?)\*/@s',
  10.                                       array($this, 'processCommentBlock'),
  11.                                       $contents);
  12.     // And those with at least two lines of /// or //!
  13.     $contents = preg_replace_callback('@(//[/!])[^\\n]*\\n(\\1[^\\n]*\\n)+@s',
  14.                                       array($this, 'processCommentBlock'),
  15.                                       $contents);
  16.     // Return processed file contents
  17.     return $contents;
  18.   }
  19.  
  20.   private function processCommentBlock($matches) {
  21.     $contents = $matches[0];
  22.  
  23.     // ADD FUNCTION CALLS HERE LATER
  24.  
  25.     return $contents;
  26.   }

The doProcess implementation here finds comment structures indicating Doxygen documentation. Unlike Drupal's API module, this includes comment blocks starting with '/*!', as well as blocks of at least two lines starting with '///' or '//!'. These are all blocks Doxygen will process, so we do the same. On each comment block, we call processCommentBlock. This function gets the matches array from the regular expression ($matches[0] being the complete match), and expects a string in return. The match will be replaced with that string. All in all a very useful place to do our processing.

Escaping unknown commands

Now for some actual preprocessing code. To get Doxygen to keep the '@' and '%' characters in the output (to solve our first problem), we have to escape them, i.e. put a backslash in front of them. But we don't want actual Doxygen commands to be ignored. Also, we might as well solve another problem at the same time: drupal_match_path() has documentation containing "\r" and "\n". Drupal's API module doesn't replace the "\n" command, while Doxygen will throw in a newline. Also, Doxygen will complain about now knowing "\r" and discard it. Hence we'll also go and escape unknown backslash commands, and (valid!) backslash commands followed by a single letter (\a, \n etc), as long as they're not escaped already. So: \r will become \\r (and \r again in the HTML output), but something saying \\e scaped will remain the same.

So, it's now time for some more magic with regular expressions. Add a function in the CodePreprocessor class, which starts off listing all the known Doxygen commands:

  1.   private function escapeUnknownCommands($contents) {
  2.     static $commandsArray = array(
  3.       'a', 'addindex', 'addtogroup', 'anchor', 'arg', 'attention', 'author', 'b', 'brief', 'bug',
  4.       'c', 'callgraph', 'callgraph', 'callergraph', 'category', 'class', 'code', 'cond',
  5.       'copybrief', 'copydetails', 'copydoc', 'date', 'def', 'defgroup', 'deprecated', 'details', 'dir',
  6.       'dontinclude', 'dot', 'dotfile', 'e', 'else', 'elseif', 'em', 'endcode', 'endcond', 'enddot',
  7.       'endhtmlonly', 'endif', 'endlatexonly', 'endlink', 'endmanonly', 'endmsc', 'endverbatim', 'endxmlonly',
  8.       'enum', 'example', 'exception', 'extends', 'file', 'fn', 'headerfile', 'hideinitializer', 'htmlinclude',
  9.       'htmlonly', 'if', 'ifnot', 'image', 'implements', 'include', 'includelineno', 'ingroup', 'internal',
  10.       'invariant', 'interface', 'latexonly', 'li', 'line', 'link', 'mainpage', 'manonly', 'memberof', 'msc',
  11.       'n', 'name', 'namespace', 'nosubgrouping', 'note', 'overload', 'p', 'package', 'page', 'paragraph',
  12.       'param', 'post', 'pre', 'private', 'privatesection', 'property', 'protected', 'protectedsection',
  13.       'public', 'publicsection', 'protocol', 'ref', 'relates', 'relatesalso', 'remarks', 'return', 'retval',
  14.       'sa', 'section', 'see', 'showinitializer', 'since', 'skip', 'skipline', 'struct', 'subpage',
  15.       'subsection', 'subsubsection', 'test', 'throw', 'todo', 'tparam', 'typedef', 'union', 'until', 'var',
  16.       'verbatim', 'verbinclude', 'version', 'warning', 'weakgroup', 'xmlonly', 'xrefitem',
  17.       'annotatedclasslist', 'classhierarchy', 'define', 'functionindex', 'header', 'headerfilelist',
  18.       'inherit', 'l', 'postheader',
  19.     );
  20.     static $backslashCommandsArray = array(
  21.       'addindex', 'addtogroup', 'anchor', 'arg', 'attention', 'author', 'brief', 'bug',
  22.       'callgraph', 'callgraph', 'callergraph', 'category', 'class', 'code', 'cond',
  23.       'copybrief', 'copydetails', 'copydoc', 'date', 'def', 'defgroup', 'deprecated', 'details', 'dir',
  24.       'dontinclude', 'dot', 'dotfile', 'else', 'elseif', 'em', 'endcode', 'endcond', 'enddot',
  25.       'endhtmlonly', 'endif', 'endlatexonly', 'endlink', 'endmanonly', 'endmsc', 'endverbatim', 'endxmlonly',
  26.       'enum', 'example', 'exception', 'extends', 'file', 'fn', 'headerfile', 'hideinitializer', 'htmlinclude',
  27.       'htmlonly', 'if', 'ifnot', 'image', 'implements', 'include', 'includelineno', 'ingroup', 'internal',
  28.       'invariant', 'interface', 'latexonly', 'li', 'line', 'link', 'mainpage', 'manonly', 'memberof', 'msc',
  29.       'name', 'namespace', 'nosubgrouping', 'note', 'overload', 'package', 'page', 'paragraph',
  30.       'param', 'post', 'pre', 'private', 'privatesection', 'property', 'protected', 'protectedsection',
  31.       'public', 'publicsection', 'protocol', 'ref', 'relates', 'relatesalso', 'remarks', 'return', 'retval',
  32.       'sa', 'section', 'see', 'showinitializer', 'since', 'skip', 'skipline', 'struct', 'subpage',
  33.       'subsection', 'subsubsection', 'test', 'throw', 'todo', 'tparam', 'typedef', 'union', 'until', 'var',
  34.       'verbatim', 'verbinclude', 'version', 'warning', 'weakgroup', 'xmlonly', 'xrefitem',
  35.       'annotatedclasslist', 'classhierarchy', 'define', 'functionindex', 'header', 'headerfilelist',
  36.       'inherit', 'postheader',
  37.     );
  38.     static $noWordCommands = array(
  39.       'f[\$\[\]\{\}]', '[\$@\\\\&~<>#%"]',
  40.     );
  41.   }

Now we'll use these arrays to construct the major regular expressions we need: two to escape unknown commands. Oh, and we don't forget to escape all those unescaped percentage characters. Expand the function you just made with:

  1.     static $backslashCommandRegex = NULL;
  2.     static $commandsRegex = NULL;
  3.     static $noWordCommandsRegex = NULL;
  4.  
  5.     if (!isset($backslashCommandRegex)) {
  6.       $commandsRegex = '/(^|(?<=\W))(?<!\\\\|@)@(?!' . implode('\W|', $commandsArray) . '\W|'
  7.                      . implode('|', $noWordCommands) . ')/';
  8.       $backslashCommandRegex = '/(^|(?<=\W))(?<!\\\\|@)\\\\(?!'
  9.                              . implode('\W|', $backslashCommandsArray) . '\W|'
  10.                              . implode('|', $noWordCommands) . ')/';
  11.     }
  12.  
  13.     // First replace all unknown backslash commands that occur before commands
  14.     $contents = preg_replace($backslashCommandRegex, '\\\\\\\\', $contents);
  15.     // And unknown '@' prefixed commands
  16.     $contents = preg_replace($commandsRegex, '\@', $contents);
  17.     // Escape all unescaped percentage characters
  18.     $contents = preg_replace('/(?<!\\\\|@)%/', '\\\\%', $contents);
  19.     return $contents;

The regular expressions look complicated (especially because we have to do a lot of escaping of backslashes ourselves), but basically work as described above. The 'noWordCommands' array is separate, as such commands followed by a letter will be kept intact. If a regular command is followed by extra letters, we should still escape it (e.g. @var should remain as such, but @variable should become \@variable).

Now we still need to call the new function: place the following line in the main processCommentBlock() function as described earlier, near the clearly marked comment:

$contents = $this->escapeUnknownCommands($contents);

Try it out

Before we can actually use our code, we'll have to add some code to control the preprocessor, at the bottom of the script (outside any classes):

  1. // Default to processing stdin if no arguments are given
  2. if ($argc == 1) {
  3.   $argc = 2;
  4.   $argv[1] = '-';
  5. }
  6.  
  7. // Process all files in argument list
  8. for ($i = 1; $i < $argc; ++$i) {
  9.   $filename = $argv[$i];
  10.  
  11.   if ($filename == "-") {
  12.     $filename = "php://stdin";
  13.   }
  14.  
  15.   // Find out type of file (based on filename)
  16.   $processor = NULL;
  17.   $info = pathinfo($filename);
  18.   if (!empty($info['extension']) && in_array($info['extension'], array('html', 'htm', 'xhtml'))) {
  19.     // HTML Processing is for later...
  20.   }
  21.   else {
  22.     $processor = new CodePreprocessor($filename);
  23.   }
  24.  
  25.   // Process file
  26.   print $processor->process();
  27. }

This code will create the CodePreprocessor (we'll add a separate one for HTML files later). For each of the files in the script's arguments, we'll process the code and print out the results. It will fallback to standard input, allowing us to reuse this script for something outside Doxygen, if we ever find the need (It was mainly useful for debugging the script above, allowing me to run it on selections or other files straight from my IDE).

So let's try this script out now. Save it (I'll be using /home/cheetah/public_html/drupal.api/preprocess-drupal-doxygen.php here) and open up the Doxygen wizard once again (as in part two). Change the INPUT_FILTER in the Input topic to point to your script. You may also want to add it to the excludes list (add preprocess-drupal-doxygen.php the EXCLUDE setting in the Input topic), if you keep the script in your Drupal directory. This will prevent it from being included in your Drupal documentation.

Doxywizard - INPUT_FILTER setting
Doxywizard - EXCLUDE setting

Now if you run Doxygen and check out the documentation, you'll see if everything went well. If so, the documentation of t() should now look as it does online. Good.

Repaired Documentation
Broken and Repaired Documentation

Broken links

Something else you'll notice from the start is that the main page of your documentation will have some broken links. The group links (Module system, Database abstraction layer, etcetera) work just fine, as do those to the example modules, if you have included them in your documentation. However, the links to the constants, global variables and the in-depth discussions are all broken. Let's get those fixed as well, by expanding the preprocessor.

First off, we want to fix the link to all the constants, or enums in Doxygen. We know there's a page, but it's not at /api/constants. We want to use globals_enum.html instead; this page has all the define() elements found in the code. So, we make it a regular anchor link (or <a> tag):

  1.   private function makeAnchorLinks($contents, $links) {
  2.     foreach($links as $original => $new) {
  3.       $re = '/@link ' . str_replace('/', '\/', $original) . '(\/\S*)?\s(.*\S)\s*@endlink/';
  4.       $contents = preg_replace($re, '<a class="el" href="' . $new . '">\\2</a>', $contents);
  5.     }
  6.     return $contents;
  7.   }

The following function will change our /api/globals link to refer to the globals.php documentation. This file, in the extra developer documentation, contains documented versions of all the Drupal (core) globals. Another option is to link to globals_vars.html using the function above, which will include all global variables (and not just Drupal core ones). If you opt for that choice, the following function will not be necessary unless you find more broken links you want to fix.

  1.   private function replaceLinks($contents, $links) {
  2.     foreach($links as $original => $new) {
  3.       $re = '/@link ' . str_replace('/', '\/', $original) . '(\/\S*)?\s(.*\S)\s*@endlink/';
  4.       $contents = preg_replace($re, '@link ' . $new . ' \\2 @endlink', $contents);
  5.     }
  6.     return $contents;
  7.   }

Place these both in your CodePreprocessor class. Now call them as follows, after the call to escapeUnknownCommands:

  1.       $contents = $this->makeAnchorLinks($contents, array(
  2.         '/api/constants' => 'globals_enum.html',
  3.         // '/api/globals' => 'globals_vars.html',
  4.       ));
  5.       $contents = $this->replaceLinks($contents, array(
  6.         '/api/globals' => 'globals.php',
  7.       ));

Note the commented line? That's the alternative for the globals; remove/comment the other line mentioning /api/globals if you prefer the global_vars.html page (you can still access that through the Files, then Globals in the Doxygen navigation). Note that the code is flexible in that you can easily add more links. It will also automatically change longer URLs, if the extra part starts with a forward slash (e.g. /api/constants/7 will also be caught as a whole, but /api/constants_7 is considered to be a different link).

Now to deal with any external links (like the "Drupal Programming from an Object-Oriented Perspective" link on the main page). The following function will remove the used @link command, and replace it with an HTML tag again, which will not be touched by Doxygen.

  1.   private function replaceExternalLinks($contents) {
  2.     return preg_replace('/@link (([a-zA-Z]+:\/\/|mailto\:)\S*)\s+(.*\S)\s*@endlink/',
  3.                  '<a class="el" href="\\1">\\3</a>', $contents);
  4.   }

And the call:

$contents = $this->replaceExternalLinks($contents);

Note that any protocol (http://, ftp:// etc) will be caugt as external. The mailto: protocol is dealt with somewhat separately, as it doesn't have the two slashes after it, but other than that everything's the same. The whole URL, until the first space, is considered to be the actual URL to link to. The rest is the text that is linked. If you want to go a little bit further, you could even add code to show the URL as linked text, if nothing else is present (Doxygen will work that one out if you remove the @link and @endlink commands). I'll leave that as exercise to the reader.

Summary

So... wow. This post has turned out longer than expected. I've attached a renewed Doxyfile (with only the extra INPUT_FILTER and expanded EXCLUDE settings), as well as the code so far to this post. The major pains in the Doxygen code are now solved, making the documentation of Drupal more suitable for Doxygen itself. All of this work is useful whether or not you use the Qt Assistant; as soon as you use Doxygen instead of Drupal's API module, you'll probably want to have issues described here to be fixed.

Next is not part 4, but 3b. We'll continue fixing the documentation by having Doxygen properly include and find the HTML pages containing documentation, such as the Forms API reference. It'll be a quicker now we have a base to work from!

Other posts in this series:

AttachmentSize
Doxyfile.62.95 KB
preprocess-drupal-doxygen.php_.txt9.55 KB