Contribution Story: Adding an Option to Doxygen

I use Doxygen to generate documentation for a library I maintain, libSDL2pp. One day I’ve been looking the generated documentation through and my eye was caught by what supposed to be my email address, which was obfuscated beyond recognition. Email obfuscation is absolutely not what I wanted, and Doxygen had no way to disable this behavior, so I’ve decided to implement one.

Here’s how it looked in the generated HTML:

Code (formatted for readability):

<a href="#" onclick="location.href='mai'+'lto:'+'amd'+'mi'+'3@a'+'md'+'mi3'+'.r'+'u'; return false;">
	amdmi<span style="display: none;">.nosp@m.</span>3@am<span style="display: none;">.nosp@m.</span>dmi3.<span style="display: none;">.nosp@m.</span>ru
</a>

NB: I didn’t notice right away that these <span>s were supposed to be invisible, but that didn’t work because of Content-Security-Policy HTTP header I use on my servers, which forbid embedded styles. That’s another Doxygen problem to fix.

That’s some badass obfuscation! But I don’t need it.

  • I want my email to be readable as text, and mailto: link to be valid, even for these who have javascript disabled.
  • I see spam problem as mostly fictional nowadays, for instance I have my email published verbatim on hundreds of sites, and I’m not using any spam filters, yet I’m not getting any intolerable amount of spam.
  • Because my email is published verbatim on so many sites, obfuscating it on a single one doesn’t change a thing anyway.

So, let’s teach Doxygen to not mangle my email.

First, I need to find a place in the Doxygen code which does the obfuscation, and that’s straightforward, I just look for the text from the inserted garbage:

% git clone https://github.com/doxygen/doxygen/ .
% grep -R nosp@m . 
./src/htmldocvisitor.cpp:      if (*p) m_t << "<span style=\"display: none;\">.nosp@m.</span>";

That gets us into a function which, as the name suggests, handles URLs somehow:

375void HtmlDocVisitor::visit(DocURL *u)
376{
377  if (m_hide) return;
378  if (u->isEmail()) // mail address
379  {
380    QCString url = u->url();
381    // obfuscate the mail address link
382    writeObfuscatedMailAddress(url);
383    const char *p = url.data();
384    // also obfuscate the address as shown on the web page
385    uint size=5;
386    while (*p)
387    {
388      for (uint j=0;j<size && *p;j++)
389      {
390        p = writeUTF8Char(m_t,p);
391      }
392      if (*p) m_t << "<span style=\"display: none;\">.nosp@m.</span>";
393      if (size==5) size=4; else size=5;
394    }
395    m_t << "</a>";
396  }
397  else // web address
398  {
399    m_t << "<a href=\"";
400    m_t << u->url() << "\">";
401    filter(u->url());
402    m_t << "</a>";
403  }
404}

You don’t really have to grasp the obfuscation algorithm, it’s just enough to realize that this function gets an URL (which is email address in our case) in the url argument, indeed inserts <span>s with garbage into it, and outputs HTML code into m_t stream. One may also notice that URL is as well passed to writeObfuscatedMailAddress() which outputs a hypertext link tag (<a href="mailto:...">) and obfuscates mail address there as well.

So all I need now is to wrap these two instances of obfuscation code in if’s which check a configuration option, and write verbatim URLs in else branches.

But before I do that I need to define a new option and I need to know how to access it from the code. For the latter, I’ve tried searching for [Cc]onfig in the same source file, and that returned an useful example right away: Config_getBool(DOT_CLEANUP). Then I’ve grepped DOT_CLEANUP through Doxygen sources to find where it is defined. I’d expect a #define or enum value in some header file, but in this case it was an .xml instead:

% grep -R DOT_CLEANUP .
...
./src/config.xml:    <option type='bool' id='DOT_CLEANUP' defval='1'>
...

So Doxygen generates code for configuration options and probably a documentation for them as well from these xml definitions, neat. All I have to do now is to copy a definition of any boolean option and change it for my needs.

Finally I’ve build a fresh Doxygen (cmake . && cmake --build . which produces bin/doxygen) and tested it with my new option. As expected, the obfuscation was disabled:

<a href="mailto:amdmi3@amdmi3.ru">amdmi3@amdmi3.ru</a>

With that I was ready to submit a pull request (simplified diff) which got accepted the next day. These changes should be available in then next 1.9.3 release of Doxygen.

Note to self: before submitting a PR, check a code style of a project you’re contributing to and make sure your changes conform to it.