CodeColor - JavaScript library for language syntax highlighting

This is just a preview of the library. I gave it version number 0.01. The code is not finalized and subject to change. At this point only five languages are implemented: C#, VB.Net, JavaScript, XML and HTML.
I plan to add support for other languages and ability to switch from one language to another.
You may find that colors below look strange. I did it just to show different types of constructs. All colors are customizable through CSS and future versions will provide more consistent colors.

How is it started?

I wanted to use FlexWiki for project documentation. One feature I was missing is the color highlighting for code snippets. Since I could not find solution "out-of-the-box" and considering my interest in parsing technologies I decided to create the library myself. I hope that one day it will be integrated into FlexWiki distribution.

Requirements

  • Implementation should be done in JavaScript and work on IE and FF.
  • New language definitions should be easy to add.
  • Color schema defintion should be close to Visual Studio 2005.
  • Support switch between languages.
  • To be small, fast and easy to read.

    How does it work?

    CodeColor language definition contains number of states. Each state may have number of rules with patterns written in RegExp. CodeColor creates one combined RegExp statement for each state:
    /(rule1_pattern)|(rule2_pattern)|...|(rulen_pattern)/m
    
    Parsing algorithm uses feature of RegExp to produce full match and all partial matches. Since in this case only one partial match will be non-empty, I use it to find matching rule and apply style. This approach requires usage of "(?:..)" in any pattern definition, otherwise the engine will not work.

    References

    The project was inspired by Jonathan de Halleux article. I played with Espresso and RegExp Tester to develop the algorithm for identification matching rules. I also looked at the implementations of color highlighting rules in #develop and SyntaxBox. I used a number of language specification documents: C# 1.2, VB.Net 8.0, JavaScript, XML, HTML. For hyperlink processing I used info from Regex for URLs.

    Samples

    C# Sample

    /* block comment */
    # define test
    #define test2
    class Test<T>
    {
    }
    /// <summary>This is doc comment http://notebar.com
    /// This is doc comment</summary>
    /// <param name="paramName">This a param</summary>
    if (x == "test" && y < 2)
    {
    	s = "string with \" escapes \\ more \n another \r tab \t";
    	s = "not complete should finish at the end of line
    	"this must not affect previous line";
    
    	s = @"special C# string
    	   can be multiline http://notebar.com
    	   use "" to include quot";
    
    	// this is linecomment
    	b = 3;
    	char c = 'c';
    }
    else
    {
    	/* this is block
    	comment */
    	/* block comment may have * or / */
    	b = 4;
    	@int = 1; //Identifier
    	int[] hex = new int[] {0x12F, 0X23ADUL, 0xFFUL};
    	double d = new double[] {.12, 12.33, 1e-2, .1e+22, 1F};
    }
    

    VB.Net Sample

    #Const TestMode = "Test"
    #If TestMode = "Test" Then
    'Some statements
    #End If
    ''' <summary>
    ''' Test Doc comments <seealso cref="test">sss</seealso>
    ''' </summary>
    ''' <remarks></remarks>
    Public Class Test
    	Sub TestSub()
    		Dim Text, Dim$, _test, [if]
    		Text = "aaa" & "aaa" & "string "" string "
    		Dim n As Integer, d As DateTime
    		n = 1
    		n = &HCCAAFF
    		n = &O1234
    		n = .234
    		n = 34E-12
    		n += 2
    		d = #12/12/1970#
    		d = # 12:30 #
    		d = # 12/12/1970 12:30 AM #
    
    		'comment
    		REM comment
    
    		If Test Then
    
    		End If
    	End Sub
    End Class
    

    JavaScript Sample

    /* Block comment */
    var str1 = "test's test";
    var str2 = 'test for "test"';
    function Text(s)
    {
    	return s.replace(/(\-|\+|\*|\?|\(|\)|\[|\]|\\|\$|\^|\!)/g, "\\$1");
    }
    
    function ProcessDocument()
    {
    	var preList = document.getElementsByTagName("pre");
    	for (var i = 0, len = preList.length; i < len; i++)
    	{
    		var pre = preList[i];
    		var lang = languages[(pre.lang) ? pre.lang.toLowerCase() : ""];
    		if (lang)
    		{
    			var coloredText = lang.ProcessText(pre.innerHTML + "");
    			if (pre.outerHTML) //HACK: IE does not preserve end of lines.
    			{
    				pre.outerHTML = "<pre lang=" + pre.lang + ">" + coloredText + "</pre>";
    			}
    			else
    			{
    				pre.innerHTML = coloredText;
    			}
    		}
    	}
    }
    

    XML Sample

    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <!DOCTYPE document [
    <!ELEMENT document (node*)>
        <!ATTLIST document WMSNameSpaceVersion CDATA "2.0">
    
    <!ELEMENT node (node*)>
        <!ATTLIST node name CDATA #REQUIRED>
        <!ATTLIST node opcode ( create | remove | setval | clearval | rename | movebefore ) #REQUIRED>
        <!ATTLIST node secure ( true | false ) #IMPLIED>
        <!ATTLIST node type ( string | boolean | int32 | binary | int64 ) #IMPLIED>
        <!ATTLIST node value CDATA #IMPLIED>
    ]>
    
    Text
    <![CDATA[ cdata text ]]>
    <!-- Comment
    Comment -->
    <?mso-application progid="Word.Document"?>
    <test attr="value" attr="ssss">
      <inner/>
    </test>
    Text &nbsp; text &#160; text &#x12FA; &#x12FGA;
    

    HTML Sample

    Text
    <!-- Comment
    Comment -->
    Text
    <table cellpadding=0 enabled cellspacing="0">
    <tr>
    	<td> Sample text  <br/><br>another line </td>
    </tr>
    </table>
    


    License for the CodeColor.
    (c) 2006 Vladimir Morozov

    Last time updated: 04/24/2006