Tag Archives: vbscript

Regular Expressions and Generating XML

OK, I’ve been away for a while. Time to return. Although the reason I am here is more to explore what is new around here.

Latest task that I am working on at JoVE is trying to parse out citations. So, I am forced to recall what I know about Regular Expressions and application thereof by way of VBScript to parsing.

So…  first, what resources are there?

WTF?  Why is it impossible to find good resources?


OK, this one covers some ground, but, for the life of me, can’t figure out how to use submatches.  You know, the ones with parantheses.

<figuring out>

Here is the basic code:

    Dim re
    Set re = new RegExp

    with re
    .Global = True
    .IgnoreCase = True
    .Pattern = my_pattern
    end with

    Set res = re.Execute( my_citation_string )

    Dim author_str, title, source, pages, year

    for each r in res
        Result.value = Result.value & r & ”  “
        for each s in r.SubMatches
            SubRes.value = SubRes.value & “Matched: ” & s & VbCrLf
        author_str = r.SubMatches(0)
        title = r.SubMatches(1)
        source = r.SubMatches(2)
        pages = r.SubMatches(3)
        year = r.SubMatches(4)

So this seems to work fairly well

Now it’s a question of parsing sub-sections, which is the same as above and depositing the results into XML.  Now, usually in such circumstances, I would just output strings, but, frankly, I think this is a bad idea.  So it is time to switch to XML DOM.  Again, looking around I had trouble getting documentation, but finally came across this:


So, to add a child, it looks like the code would be something like this:

Dim xmlDoc
Set xmlDoc = CreateObject( “Microsoft.XMLDOM” )
xmlDoc.loadXML( “<root></root>” )
Dim root
Set root = xmlDoc.documentElement
root.appendChild xmlDoc.createElement( “test” )
Target.Message root.xml


Filed under Tech