455 lines
17 KiB
Text
455 lines
17 KiB
Text
# $Id$
|
|
|
|
Introduction
|
|
============
|
|
|
|
Text_Highlighter is a class for syntax highlighting. The main idea is to
|
|
simplify creation of subclasses implementing syntax highlighting for
|
|
particular language. Subclasses do not implement any new functioanality, they
|
|
just provide syntax highlighting rules. The rules sources are in XML format.
|
|
To create a highlighter for a language, there is no need to code a new class
|
|
manually. Simply describe the rules in XML file and use Text_Highlighter_Generator
|
|
to create a new class.
|
|
|
|
|
|
This document does not contain a formal description of API - it is very
|
|
simple, and I believe providing some examples of code is sufficient.
|
|
|
|
|
|
Highlighter XML source
|
|
======================
|
|
|
|
Basics
|
|
------
|
|
|
|
Creating a new syntax highlighter begins with describing the highlighting
|
|
rules. There are two basic elements: block and region. A block is just a
|
|
portion of text matching a regular expression and highlighted with a single
|
|
color. Keyword is an example of a block. A region is defined by two regular
|
|
expressions: one for start of region, and another for the end. The main
|
|
difference from a block is that a region can contain blocks and regions
|
|
(including same-named regions). An example of a region is a group of
|
|
statements enclosed in curly brackets (this is used in many languages, for
|
|
example PHP and C). Also, characters matching start and end of a region may be
|
|
highlighted with their own color, and region contents with another.
|
|
|
|
Blocks and regions may be declared as contained. Contained blocks and regions
|
|
can only appear inside regions. If a region or a block is not declared as
|
|
contained, it can appear both on top level and inside regions. Block or region
|
|
declared as not-contained can only appear on top level.
|
|
|
|
For any region, a list of blocks and regions that can appear inside this
|
|
region can be specified.
|
|
|
|
In this document, the term "color group" is used. Chunks of text assigned to
|
|
same color group will be highlighted with same color. Note that in versions
|
|
prior 0.5.0 color goups were refered as CSS classes, but since 0.5.0 not only
|
|
HTML output is supported, so "color group" is more appropriate term.
|
|
|
|
Elements
|
|
--------
|
|
|
|
The toplevel element is <highlight>. Attribute lang is required and denotes
|
|
the name of the language. Its value is used as a part of generated class name,
|
|
and must only contain letters, digits and underscores. Optional attribute
|
|
case, when given value yes, makes the language case sensitive (default is case
|
|
insensitive). Allowed subelements are:
|
|
|
|
* <authors>: Information about the authors of the file.
|
|
<author>: Information about a single author of the file. (May be used
|
|
multiple times, one per author.)
|
|
- name="...": Author's name. Required.
|
|
- email="...": Author's email address. Optional.
|
|
|
|
* <default>: Default color group.
|
|
- innerGroup="...": color group name. Required.
|
|
|
|
* <region>: Region definition
|
|
- name="...": Region name. Required.
|
|
- innerGroup="...": Default color group of region contents. Required.
|
|
- delimGroup="...": color group of start and end of region. Optional,
|
|
defaults to value of innerGroup attribute.
|
|
- start="...", end="...": Regular expression matching start and end
|
|
of region. Required. Regular expression delimiters are optional, but
|
|
if you need to specify delimiter, use /. The only case when the
|
|
delimiters are needed, is specifying regular expression modifiers,
|
|
such as m or U. Examples: \/\* or /$/m.
|
|
- contained="yes": Marks region as contained.
|
|
- never-contained="yes": Marks region as not-contained.
|
|
- <contains>: Elements allowed inside this region.
|
|
- all="yes" Region can contain any other region or block
|
|
(except not-contained). May be used multiple times.
|
|
- <but> Do not allow certain regions or blocks.
|
|
- region="..." Name of region not allowed within
|
|
current region.
|
|
- block="..." Name of block not allowed within
|
|
current region.
|
|
- region="..." Name of region allowed within current region.
|
|
- block="..." Name of block allowed within current region.
|
|
- <onlyin> Only allow this region within certain regions. May be
|
|
used multiple times.
|
|
- block="..." Name of parent region
|
|
|
|
* <block>: Block definition
|
|
- name="...": Block name. Required.
|
|
- innerGroup="...": color group of block contents. Optional. If not
|
|
specified, color group of parent region or default color group will be
|
|
used. One would only want to omit this attribute if there are
|
|
keyword groups (see below) inherited from this block, and no special
|
|
highlighting should apply when the block does not match the keyword.
|
|
- match="..." Regular expression matching the block. Required.
|
|
Regular expression delimiters are optional, but if you need to
|
|
specify delimiter, use /. The only case when the delimiters are
|
|
needed, is specifying regular expression modifiers, such as m or U.
|
|
Examples: #|\/\/ or /$/m.
|
|
- contained="yes": Marks block as contained.
|
|
- never-contained="yes": Marks block as not-contained.
|
|
- <onlyin> Only allow this block within certain regions. May be used
|
|
multiple times.
|
|
- block="..." Name of parent region
|
|
- multiline="yes": Marks block as multi-line. By default, whole
|
|
blocks are assumed to reside in a single line. This make the things
|
|
faster. If you need to declare a multi-line block, use this
|
|
attribute.
|
|
- <partgroup>: Assigns another color group to a part of the block that
|
|
matched a subpattern.
|
|
- index="n": Subpattern index. Required.
|
|
- innerGroup="...": color group name. Required.
|
|
|
|
This is an example from CSS highlighter: the measure is matched as
|
|
a whole, but the measurement units are highlighted with different
|
|
color.
|
|
|
|
<block name="measure" match="\d*\.?\d+(\%|em|ex|pc|pt|px|in|mm|cm)"
|
|
innerGroup="number" contained="yes">
|
|
<onlyin region="property"/>
|
|
<partGroup index="1" innerGroup="string" />
|
|
</block>
|
|
|
|
* <keywords>: Keyword group definition. Keyword groups are useful when you
|
|
want to highlight some words that match a condition for a block with a
|
|
different color. Keywords are defined with literal match, not regular
|
|
expressions. For example, you have a block named identifier matching a
|
|
general identifier, and want to highlight reserved words (which match
|
|
this block as well) with different color. You inherit a keyword group
|
|
"reserved" from "identifier" block.
|
|
- name="...": Keyword group. Required.
|
|
- ifdef="...", ifndef="..." : Conditional declaration. See
|
|
"Conditions" below.
|
|
- inherits="...": Inherited block name. Required.
|
|
- innerGroup="...": color group of keyword group. Required.
|
|
- case="yes|no": Overrides case-sensitivity of the language.
|
|
Optional, defaults to global value.
|
|
- <keyword>: Single keyword definition.
|
|
- match="..." The keyword. Note: this is not a regular
|
|
expression, but literal match (possibly case insensitive).
|
|
|
|
Note that for BC reasons element partClass is alias for partGroup, and
|
|
attributes innerClass and delimClass are aliases of innerGroup and
|
|
delimGroup, respectively.
|
|
|
|
|
|
Conditions
|
|
----------
|
|
|
|
Conditional declarations allow enabling or disabling certain highlighting
|
|
rules at runtime. For example, Java highlighter has a very big list of
|
|
keywords matching Java standard classes. Finding a match in this list can take
|
|
much time. For that reason, corresponding keyword group is declared with
|
|
"ifdef" attribute :
|
|
|
|
<keywords name="builtin" inherits="identifier" innerClass="builtin"
|
|
case="yes" ifdef="java.builtins">
|
|
<keyword match="AbstractAction" />
|
|
<keyword match="AbstractBorder" />
|
|
<keyword match="AbstractButton" />
|
|
...
|
|
...
|
|
<keyword match="_Remote_Stub" />
|
|
<keyword match="_ServantActivatorStub" />
|
|
<keyword match="_ServantLocatorStub" />
|
|
</keywords>
|
|
|
|
This keyword group will be only enabled when "java.builtins" is passed as an
|
|
element of "defines" option:
|
|
|
|
$options = array(
|
|
'defines' => array(
|
|
'java.builtins',
|
|
),
|
|
'numbers' => HL_NUMBERS_TABLE,
|
|
);
|
|
$highlighter = Text_Highlighter::factory('java', $options);
|
|
|
|
"ifndef" attribute has reverse meaning.
|
|
|
|
Currently, "ifdef" and "ifndef" attributes are only supported for <keywords>
|
|
tag.
|
|
|
|
|
|
|
|
Class generation
|
|
================
|
|
|
|
Creating XML description of highlighting rules is the most complicated part of
|
|
the process. To generate the class, you need just few lines of code:
|
|
|
|
<?php
|
|
require_once 'Text/Highlighter/Generator.php';
|
|
$generator = new Text_Highlighter_Generator('php.xml');
|
|
$generator->generate();
|
|
$generator->saveCode('PHP.php');
|
|
?>
|
|
|
|
|
|
|
|
Command-line class generation tool
|
|
==================================
|
|
|
|
Example from previous section looks pretty simple, but it does not handle any
|
|
errors which may occur during parsing of XML source. The package provides a
|
|
command-line script to make generation of classes even more simple, and takes
|
|
care of possible errors. It is called generate (on Unix/Linux) or generate.bat
|
|
(on Windows). This script is able to process multiple files in one run, and
|
|
also to process XML from standard input and write generated code to standard
|
|
output.
|
|
|
|
Usage:
|
|
generate options
|
|
|
|
Options:
|
|
-x filename, --xml=filename
|
|
source XML file. Multiple input files can be specified, in which
|
|
case each -x option must be followed by -p unless -d is specified
|
|
Defaults to stdin
|
|
-p filename, --php=filename
|
|
destination PHP file. Defaults to stdout. If specied multiple times,
|
|
each -p must follow -x
|
|
-d dirname, --dir=dirname
|
|
Default destination directory. File names will be taken from XML input
|
|
("lang" attribute of <highlight> tag)
|
|
-h, --help
|
|
This help
|
|
|
|
Examples
|
|
|
|
Read from php.xml, write to PHP.php
|
|
|
|
generate -x php.xml -p PHP.php
|
|
|
|
Read from php.xml, write to standard output
|
|
|
|
generate -x php.xml
|
|
|
|
Read from php.xml, write to PHP.php, read from xml.xml, write to XML.php
|
|
|
|
generate -x php.xml -p PHP.php -x xml.xml -p XML.php
|
|
|
|
Read from php.xml, write to /some/dir/PHP.php, read from xml.xml, write to
|
|
/some/dir/XML.php (assuming that xml.xml contains <highlight lang="xml">, and
|
|
php.xml contains <highlight lang="php">)
|
|
|
|
generate -x php.xml -x xml.xml -d /some/dir/
|
|
|
|
|
|
|
|
Renderers
|
|
=========
|
|
|
|
Introduction
|
|
------------
|
|
|
|
Text_Highlighter supports renderes. Using renderers, you can get output in
|
|
different formats. Two renderers are included in the package:
|
|
|
|
- HTML renderer. Generates HTML output. A style sheet should be linked to
|
|
the document to display colored text
|
|
|
|
- Console renderer. Can be used to output highlighted text to
|
|
color-capable terminals, either directly or trough less -r
|
|
|
|
|
|
Renderers API
|
|
-------------
|
|
|
|
Renderers are subclasses of Text_Highlighter_Renderer. Renderer should
|
|
override at least two methods - acceptToken and getOutput. Overriding other
|
|
methods is optional, depending on the nature of renderer's output and details
|
|
of implementation.
|
|
|
|
string reset()
|
|
resets renderer state. This method is called every time before a new
|
|
source file is highlighted.
|
|
|
|
string preprocess(string $code)
|
|
preprocesses code. Can be used, for example, to normalize whitespace
|
|
before highlighting. Returns preprocessed string.
|
|
|
|
void acceptToken(string $group, string $content)
|
|
the core method of the renderer. Highlighter passes chunks of text to
|
|
this method in $content, and color group in $group
|
|
|
|
void finalize()
|
|
signals the renderer that no more tokens are available.
|
|
|
|
mixed getOutput()
|
|
returns generated output.
|
|
|
|
|
|
Setting renderer options
|
|
--------------------------------
|
|
|
|
Renderers accept an optional argument to their constructor - options array.
|
|
Elements of this array are renderer-specific.
|
|
|
|
HTML renderer
|
|
-------------
|
|
|
|
HTML renderer produces HTML output with optional line numbering. The renderer
|
|
itself does not provide information about actual colors of highlighted text.
|
|
Instead, <span class="hl-XXX"> is used, where XXX is replaced with color group
|
|
name (hl-var, hl-string, etc.). It is up to you to create a CSS stylesheet.
|
|
If 'use_language' option with value evaluating to true was passed, class names
|
|
will be formatted as "LANG-hl-XXX", where LANG is language name as defined in
|
|
highlighter XML source ("lang" attribute of <highlight> tag) in lower case.
|
|
|
|
There are 3 special CSS classes:
|
|
|
|
hl-main - this class applies to whole output or right table column,
|
|
depending on 'numbers' option
|
|
hl-gutter - applies to left column in table
|
|
hl-table - applies to whole table
|
|
|
|
HTML renderer accepts following options (each being optional):
|
|
|
|
* numbers - line numbering style.
|
|
0 - no numbering (default)
|
|
HL_NUMBERS_LI - use <ol></ol> for line numbering
|
|
HL_NUMBERS_TABLE - create a 2-column table, with line numbers in left
|
|
column and highlighted text in right column
|
|
|
|
* tabsize - tabulation size. Defaults to 4
|
|
|
|
Example:
|
|
|
|
require_once 'Text/Highlighter/Renderer/Html.php';
|
|
$options = array(
|
|
'numbers' => HL_NUMBERS_LI,
|
|
'tabsize' => 8,
|
|
);
|
|
$renderer = new Text_Highlighter_Renderer_HTML($options);
|
|
|
|
Console renderer
|
|
----------------
|
|
|
|
Console renderer produces output for displaying on a color-capable terminal,
|
|
either directly or through less -r, using ANSI escape sequences. By default,
|
|
this renderer only highlights most common color groups. Additional colors
|
|
can be specified using 'colors' option. This renderer also accepts 'numbers'
|
|
option - a boolean value, and 'tabsize' option.
|
|
|
|
Example :
|
|
|
|
require_once 'Text/Highlighter/Renderer/Console.php';
|
|
$colors = array(
|
|
'prepro' => "\033[35m",
|
|
'types' => "\033[32m",
|
|
);
|
|
$options = array(
|
|
'numbers' => true,
|
|
'tabsize' => 8,
|
|
'colors' => $colors,
|
|
);
|
|
$renderer = new Text_Highlighter_Renderer_Console($options);
|
|
|
|
|
|
ANSI color escape sequences have the following format:
|
|
|
|
ESC[#;#;....;#m
|
|
|
|
where ESC is character with ASCII code 27 (033 octal, 0x1B hexadecimal). # is
|
|
one of the following:
|
|
|
|
0 for normal display
|
|
1 for bold on
|
|
4 underline (mono only)
|
|
5 blink on
|
|
7 reverse video on
|
|
8 nondisplayed (invisible)
|
|
30 black foreground
|
|
31 red foreground
|
|
32 green foreground
|
|
33 yellow foreground
|
|
34 blue foreground
|
|
35 magenta foreground
|
|
36 cyan foreground
|
|
37 white foreground
|
|
40 black background
|
|
41 red background
|
|
42 green background
|
|
43 yellow background
|
|
44 blue background
|
|
45 magenta background
|
|
46 cyan background
|
|
47 white background
|
|
|
|
|
|
How to use Text_Highlighter class
|
|
=================================
|
|
|
|
Creating a highlighter object
|
|
-----------------------------
|
|
|
|
To create a highlighter for a certain language, use Text_Highlighter::factory()
|
|
static method:
|
|
|
|
require_once 'Text/Highlighter.php';
|
|
$hl = Text_Highlighter::factory('php');
|
|
|
|
|
|
Setting a renderer
|
|
------------------
|
|
|
|
Actual output is produced by a renderer.
|
|
|
|
require_once 'Text/Highlighter.php';
|
|
require_once 'Text/Highlighter/Renderer/Html.php';
|
|
$options = array(
|
|
'numbers' => HL_NUMBERS_LI,
|
|
'tabsize' => 8,
|
|
);
|
|
$renderer = new Text_Highlighter_Renderer_HTML($options);
|
|
$hl = Text_Highlighter::factory('php');
|
|
$hl->setRenderer($renderer);
|
|
|
|
Note that for BC reasons, it is possible to use highlighter without setting a
|
|
renderer. If no renderer is set, HTML renderer will be used by default. In
|
|
this case, you should pass options as second parameter to factory method. The
|
|
following example works exactly as previous one:
|
|
|
|
require_once 'Text/Highlighter.php';
|
|
$options = array(
|
|
'numbers' => HL_NUMBERS_LI,
|
|
'tabsize' => 8,
|
|
);
|
|
$hl = Text_Highlighter::factory('php', $options);
|
|
|
|
|
|
Getting output
|
|
--------------
|
|
|
|
And finally, do the highlighting and get the output:
|
|
|
|
require_once 'Text/Highlighter.php';
|
|
require_once 'Text/Highlighter/Renderer/Html.php';
|
|
$options = array(
|
|
'numbers' => HL_NUMBERS_LI,
|
|
'tabsize' => 8,
|
|
);
|
|
$renderer = new Text_Highlighter_Renderer_HTML($options);
|
|
$hl = Text_Highlighter::factory('php');
|
|
$hl->setRenderer($renderer);
|
|
$html = $hl->highlight(file_get_contents('example.php'));
|
|
|
|
# vim: set autoindent tabstop=4 shiftwidth=4 softtabstop=4 tw=78: */
|
|
|