Home | Libraries | People | FAQ | More |
The following elements can occur in a script:
Blank lines consisting of only whitespace are ignored, so are lines that start with a #.
Note | |
---|---|
You can't append # comments onto the end of a line! |
term [regular-expression1 [regular-expression2 [category]]]
Term to index.
The index term will form a primary entry in the Index with the section title(s) containing the term as secondary entries, and also will be used as a secondary entry beneath each of the section titles that the index term occurs in.
Index term Searcher.
An optional regular expression: each occurrence of the regular expression in the text of the document will result in one index term being emitted.
If the regular expression is omitted (default) or is "", then the index term itself will be used as the search text - and only occurrence of whole words matching index term will be indexed.
For example:
foobar
will index occurrences of "foobar" in any section, but
foobar \<\w*(foo|bar)\w*\>
will index any whole word containing either "foo" or "bar" within it. This is useful when you want to index a lot of similar or related words under one entry.
reflex
will only index occurrences of "reflex" as a whole word, but:
reflex \<reflex\w*\>
will index occurrences of "reflex", "reflexes", "reflexing" and "reflexed" ... all under the same entry reflex.
You will very often need to use this to deal with plurals and other variants.
Section(s) Selector.
A constraint that specifies which sections are indexed for term: only if the ID of the section matches regular-expression2 exactly will that section be indexed for occurrences of term.
For example, to limit indexing to just one specific section (but not sub-sections below):
myclass "" "mylib\.examples"
For example, to limit indexing to specific sections, and sub-sections below:
myclass "" "mylib\.examples.*"
will index occurrences of "myclass" as a whole word, but only in sections whose section ID begins "mylib.examples", while
myclass "\<myclass\w*\>" "mylib\.examples.*"
will also index plurals myclass, myclasses, myclasss ...
and:
myclass "" "(?!mylib\.introduction).*"
will index occurrences of "myclass" in any section, except those whose section IDs begin "mylib.introduction".
Finally, two (or more) sections can be excluded by OR'ing them together:
myclass "" "(?!mylib\.introduction|mylib\.reference).*"
which excludes searching for this term in sections whose ID's start with either "mylib.introduction" or "mylib.reference".
If this third section selection field is omitted (the default) or is "", then all sections are indexed for this term.
Index Category Constraint.
Optionally a category to place occurrences of index term in. If you have multiple indexes then this is the name assigned to the indexes "type" attribute.
For example:
myclass "" "" class_name
Will index occurances of myclass and place them in the class-index if there is one.
You can have an index term appear more than once in the script file:
Thus:
myterm search_expression1 constrait_expression2 foo myterm search_expression1 constrait_expression2 bar
Will be treated as different terms each with their own entries, while:
myterm search_expression1 constrait_expression2 mycategory myterm search_expression1 constrait_expression2 mycategory
Will be combined into a single term equivalent to:
myterm (?:search_expression1|search_expression1) (?:constrait_expression2|constrait_expression2) mycategory
!scan source-file-name
Scans the C/C++ source file source-file-name for definitions of functions, classs, macros or typedefs and makes each of these a term to be indexed. Terms found are assigned to the index category "function_name", "class_name", "macro_name" or "typedef_name" depending on how they were seen in the source file. These may then be included in a specialised index whose "type" attribute has the same category name.
Important | |
---|---|
When actually indexing a document, the scanner will not index just any old occurrence of the terms found in the source files. Instead it searches for class definitions or function or typedef declarations. This reduces the number of spurious matches placed in the index, but may also miss some legitimate terms: refer to the define-scanner command for information on how to change this. |
!scan-path directory-name file-name-regex [recurse]
The directory to scan: this should be a path relative to the script file (or to the path specified with the prefix=path option on the command line) and should use all forward slashes in its file name.
A regular expression: any file in the directory whose name matches the regular expression will be scanned for terms to index.
An optional boolean value - either "true" or "false" - that indicates whether to recurse into subdirectories. This defaults to "false".
!exclude term-list
Excludes all the terms in whitespace separated term-list from being indexed. This should be placed after any !scan or !scan-path rules which may result in the terms becoming included. In other words this removes terms from the scanners internal list of things to index.
!rewrite-id regular-expression new-name
A regular expression: all section ID's that match the expression exactly will have index entries new-name instead of their title(s).
The name that the section will appear under in the index.
!rewrite-name regular-expression format-text
A regular expression: all sections whose titles match the regular expression exactly, will have index entries composed of the regular expression match combined with the regex format string format-text.
The Perl-style format string used to reformat the title.
For example:
!rewrite-name "(?:A|An|The)\s+(.*)" "\1"
Will remove any leading "A", "An" or "The" from all index entries - thus preventing lots of entries under "The" etc!
!define-scanner type file-search-expression xml-regex-formatter term-formatter id-filter filename-filter
When a source file is scanned using the !scan
or !scan-path
rules, then the file is searched using a series of regular expressions to look
for classes, functions, macros or typedefs that should be indexed. A set of
default regular expressions are provided for this (see below), but sometimes
you may want to replace the defaults, or add new scanners. The arguments to
this rule are:
The type to which items found using this rule will assigned, index terms created from the source file and then found in the XML, will have the type attribute set to this value, and may then appear in a specialized index with the same type attribute
A regular expression that is used to scan the source file for index terms, the result of a match against this expression will be transformed by the next two arguments.
A regular expression format string that extracts the salient information from whatever matched the file-search-expression in the source file, and creates a new regular expression that will be used to search the document being indexed for occurrences of this index term.
A regular expression format string that extracts the salient information from whatever matched the file-search-expression in the source file, and creates the index term that will appear in the index.
Optional. A regular expression that restricts the section-id's that are searched in the document being indexed: only sections whose ID attribute matches this expression exactly will be considered for indexing terms found by this scanner.
Optional. A regular expression that restricts which files are scanned by this scanner: only files whose file name matches this expression exactly will be scanned for index terms to use. Note that the filename matched against this may well be an absolute path, and contain either forward or backward slash path separators.
If, when the first file is scanned, there are no scanners whose type is "class_name", "typedef_name", "macro_name" or "function_name", then the defaults are installed. These are equivalent to:
!define-scanner class_name "^[[:space:]]*(template[[:space:]]*<[^;:{]+>[[:space:]]*)?(class|struct)[[:space:]]*(\<\w+\>([[:blank:]]*\([^)]*\))?[[:space:]]*)*(\<\w*\>)[[:space:]]*(<[^;:{]+>)?[[:space:]]*(\{|:[^;\{()]*\{)" "(?:class|struct)[^;{]+\<\5\>[^;{]+\{" \5 !define-scanner typedef_name "typedef[^;{}#]+?(\w+)\s*;" "typedef[^;]+\<\1\>\s*;" "\1" !define-scanner "macro_name" "^\s*#\s*define\s+(\w+)" "\<\1\>" "\1" !define-scanner "function_name" "\w++(?:\s*+<[^>]++>)?[\s&*]+?(\w+)\s*(?:BOOST_[[:upper:]_]+\s*)?\([^;{}]*\)\s*[;{]" "\\<\\w+\\>(?:\\s+<[^>]*>)*[\\s&*]+\\<\1\\>\\s*\\([^;{]*\\)" "\1"
Note that these defaults are not installed if you have provided your own versions with these type names. In this case if you want the default scanners to be in effect as well as your own, you should include the above in your script file. It is also perfectly allowable to have multiple scanners with the same type, but with the other fields differing.
Finally you should note that the default scanners are quite strict in what they will find, for example the class scanner will only create index entries for classes that have class definitions of the form:
class my_class : public base_classes { // etc
In the documentation, so that simple mentions of the class name will not get indexed, only the class synopsis if there is one. If this isn't how you want things, then include the class_name scanner definition above in your script file, and change the xml-regex-formatter field to something more permissive, for example:
!define-scanner class_name "^[[:space:]]*(template[[:space:]]*<[^;:{]+>[[:space:]]*)?(class|struct)[[:space:]]*(\<\w+\>([[:blank:]]*\([^)]*\))?[[:space:]]*)*(\<\w*\>)[[:space:]]*(<[^;:{]+>)?[[:space:]]*(\{|:[^;\{()]*\{)" "\<\5\>" \5
Will look for any occurrence of whatever class names the scanner may find in the documentation.
If you see a term in the index, and you don't understand why it's there, add a debug directive:
!debug regular-expression
Now, whenever regular-expression matches either the found index term, or the section title it appears in, or the type field of a scanner, then some diagnostic information will be printed that will look something like:
Debug term found, in block with ID: spirit.qi.reference.parser_concepts.parser
Current section title is: Notation
The main index entry will be : Notation
The indexed term is: parser
The search regex is: [P|p]arser
The section constraint is: .qi.reference.parser_concepts.
The index type for this entry is: qi_index
This can produce a lot of output in your log file, but until you are satisfied with your file selection and scanning process, it is worth switching it on.