Tag Archives: vbscript

Regular Expressions and Generating XML

OK, I’ve been away for a while. Time to return. Although the reason I am here is more to explore what is new around here.

Latest task that I am working on at JoVE is trying to parse out citations. So, I am forced to recall what I know about Regular Expressions and application thereof by way of VBScript to parsing.

So…  first, what resources are there?

WTF?  Why is it impossible to find good resources?

http://msdn.microsoft.com/en-us/library/yab2dx62(VS.85).aspx

OK, this one covers some ground, but, for the life of me, can’t figure out how to use submatches.  You know, the ones with parantheses.

<figuring out>

Here is the basic code:

    Dim re
    Set re = new RegExp

    with re
    .Global = True
    .IgnoreCase = True
    .Pattern = my_pattern
    end with

    Set res = re.Execute( my_citation_string )

    Dim author_str, title, source, pages, year

    for each r in res
        Result.value = Result.value & r & ”  “
        for each s in r.SubMatches
            SubRes.value = SubRes.value & “Matched: ” & s & VbCrLf
            ‘r.SubMatches.Item(0)
        Next
        author_str = r.SubMatches(0)
        title = r.SubMatches(1)
        source = r.SubMatches(2)
        pages = r.SubMatches(3)
        year = r.SubMatches(4)
    Next

So this seems to work fairly well

Now it’s a question of parsing sub-sections, which is the same as above and depositing the results into XML.  Now, usually in such circumstances, I would just output strings, but, frankly, I think this is a bad idea.  So it is time to switch to XML DOM.  Again, looking around I had trouble getting documentation, but finally came across this:

http://msdn.microsoft.com/en-us/library/ms764730(VS.85).aspx

So, to add a child, it looks like the code would be something like this:

Dim xmlDoc
Set xmlDoc = CreateObject( “Microsoft.XMLDOM” )
xmlDoc.loadXML( “<root></root>” )
Dim root
Set root = xmlDoc.documentElement
root.appendChild xmlDoc.createElement( “test” )
Target.Message root.xml


9 Comments

Filed under Tech