Understanding The AutoIndex Workflow

Load the script file (usually index.idx) and process it one line at a time, producing one or more index term per (non-comment) line.

Reading all lines builds a list of terms to index. Some of those may be terms defined (by you) directly in the script file, others may be terms found by scanning C++ header and source files that were specified by the !scan-path directive.

Once the complete list of terms to index is complete, it loads the Docbook XML file. (If this comes from Quickbook/Doxygen/Boostbook/Docbook then this is the complete documentation after conversion to Docbook format).

AutoIndex builds an internal Document Object Model (DOM) of the Docbook XML. This internal representation then gets scanned for occurrences of the terms to index. This scanning works at the XML paragraph level (or equivalent sibling such as a table or code block) - so all the XML encoding within a paragraph gets flattened to plain text.
This flattening means the regular expressions used to search for terms to index can find anything that is completely contained within a paragraph (or code block etc).

For each term found then an indexterm Docbook element is inserted into the Document Object Model (DOM) (provided internal index generation is off),

Also the AutoIndex's internal index representation gets updated.

Once the whole XML document has been indexed, then, if AutoIndex has been instructed to generate the index itself, it creates the necessary XML and inserts this into the Document Object Model (DOM).

Finally the whole Document Object Model (DOM) is written out as a new Docbook XML file, and normal processing of this continues via the XSL stylesheets (with xsltproc) to actually build the final human-readable docs.