Today we had a problem with a couple of dozen thousand lines of generated XML with an incorrectly nested tag to find. To which I thought “this sounds like a job for XPath!” So I hacked together a quick Ruby script to look for instances of specified pairs of tags. As an exercise I removed all the hardcoded paths and added command line parameters instead incase such a need should ever arise again.

So behold: findtags.rb

You can can get it from my github at https://github.com/dmison/Assorted-Scripts/. It’s pretty simple and is easy to break by passing in dodgy parameters.

It’s a Ruby script that searches a specified set of files for one specified XML element that contains a second specified XML element. Note that if a second tag is not specified it uses the first tag as the second as well.

findtags.rb [options]
    -p, --pattern "PATTERN"          File pattern to match.  Defaults to *
    -f, --firsttag TAG               first tag to search for
    -s, --secondtag TAG              second tag to search for
    -h, --help                       Display help[/code]

Usage examples:

  1. find instances of <para> inside of other <para>s in XML files

     [localhost]$ findtags.rb -p "/home/dmison/testbook/en-US*.xml" -f para
    
  2. find instances of <command> in <entry> in any file in current working directory

     [localhost]$ findtags.rb -f "entry" -s "command"