Regular Expressions in Notepad++ or Scite
I like Notepad++ and Scite (for Windows and Linux respectively) and use them as my editor of choice for many projects. They are both formed off of the Scintilla backend. Today, I was going through some HTML forms, you know the ones with a multitude of option elements for state, country, etc. Imagine you had to get all of the state names and country names out of those option selections. There are plenty of ways, but doing a regex find and replace in Notepad++ makes it pretty easy. (Granted you could do this in about any editor worth its salt, but here I would like to show you how to do it in Notepad plus plus)
Ok, so you have a list of html options and you want to get the values without the HTML:
<option value="AFGHANISTAN">AFGHANISTAN</option><option value="ALBANIA">ALBANIA</option><option value="ALGERIA">ALGERIA</option>
You could go lift a list off of someone else's website, or you could easily get the values with Notepad++. Just paste the HTML into Notepad++ then hit Ctrl+H or go to Search -> Replace.
- Check the option box in the lower left for Regular expression.
- Type this regex in the Find box: <option value="[0-9a-zA-Z_&-. ]+">
- Make sure the Replace box is empty. (You want to replace with empty space)
- Select Replace All.
- Check the option box for 'Extended (\n, \r, \t, \0, \x...)
- Type </option> in the Find box.
- Type \r\n in the Replace Box.
- Select Replace All
Hooray, now you have a list of countries (or whatever you had options for) delimited by a new line for each one.
Regex with find and replace
Now to get a little more advanced, let's say say we want to capture the find value and place it in a capture group so we can use it on our replace query. Fortunately, this is quite easy to do. Let's take an example, say we have a list of e-mail address such as this:
and the list goes on. Now, let's say we want to surround each e-mail address with quotes and place a comma after them, basically putting them in a csv style format. Well, if you had hundreds of e-mails this would be a lot of typing, but with regex replace it is simple. Go to Find & Replace (Ctrl + H) and make sure the box for regular expressions is checked. For the find we you can use
You should recognize the midle part in the brackets as being a list of characters to search for, as we did above. What's new here is we now have a capture group using parentheses. Now, in Scite you have to use backslashes to escape the parentheses as noted above, I'm not sure if they are needed in Notepad++, so if the above doesn't work try remove the \ characters. Now, in the replace box you will put:
The \1 is the first capture (and our only capture in this example) from the find query. Our find query grabbed email addresses, so now each email address will be replaced with surrounding quotes and a comma, and our list now looks like: