OK, I’ve been away for a while. Time to return. Although the reason I am here is more to explore what is new around here.
Latest task that I am working on at JoVE is trying to parse out citations. So, I am forced to recall what I know about Regular Expressions and application thereof by way of VBScript to parsing.
So… first, what resources are there?
WTF? Why is it impossible to find good resources?
OK, this one covers some ground, but, for the life of me, can’t figure out how to use submatches. You know, the ones with parantheses.
Here is the basic code:
Set re = new RegExp
.Global = True
.IgnoreCase = True
.Pattern = my_pattern
Set res = re.Execute( my_citation_string )
Dim author_str, title, source, pages, year
for each r in res
Result.value = Result.value & r & ” “
for each s in r.SubMatches
SubRes.value = SubRes.value & “Matched: ” & s & VbCrLf
author_str = r.SubMatches(0)
title = r.SubMatches(1)
source = r.SubMatches(2)
pages = r.SubMatches(3)
year = r.SubMatches(4)
So this seems to work fairly well
Now it’s a question of parsing sub-sections, which is the same as above and depositing the results into XML. Now, usually in such circumstances, I would just output strings, but, frankly, I think this is a bad idea. So it is time to switch to XML DOM. Again, looking around I had trouble getting documentation, but finally came across this:
So, to add a child, it looks like the code would be something like this:
Set xmlDoc = CreateObject( “Microsoft.XMLDOM” )
xmlDoc.loadXML( “<root></root>” )
Set root = xmlDoc.documentElement
root.appendChild xmlDoc.createElement( “test” )