I had a long list (57 pages!) of Latin species names, sorted into alphabetical order. I’d separated the words so that there was only one word on each line. My next task was to go through and remove all the duplicates (i.e. a word immediately followed by the same word) so I could add the final list to my custom dictionary for species in Microsoft Word. I started doing it manually—it’s easy enough to find duplicates when the words are familiar, but for Latin words, my brain just wasn’t coping well and I was missing subtle differences like a single or double ‘i’ at the end of a word. There had to be a better way…
And there is! Good old Dr Google came to the rescue, and with a bit of fiddling to suit my circumstances (one word on each line), I got a wildcard find and replace routine to find the duplicates.
NOTE: DO NOT do a ‘replace all’ with this, in case Word makes unwanted changes. In my case it didn’t treat the second word as a whole word for matching purposes (e.g. it thought banksi, banksia, and banksii were duplicates). Even though I had to skip some of these, it was still worth it to automate much of the process. Another caveat—if you have several lines of the same word, each pair will be found, but you’ll have to run the find several times to get them all. Much better to move your cursor into Word and delete the excess multiple duplicates when you find them. You may still have to do a couple of passes over the document, but the heavy lifting will have been done for you.
banksibanksiabanksiiHere’s what I did to get it work:
- Press Ctrl+H to open the Find and Replace window.
- Click More, then select the Use Wildcards checkbox.
- In the Find What field, type (<*>)^013\1 (there are no spaces in this string).
- In the Replace With field, type \1 (there are no spaces in this string either).
- Click Find Next.
- When a pair of matching whole words is found, click Replace. NOTE: If the second word is only a partial match for the first word, click Find Next.
- Repeat steps 5 and 6 until you’re satisfied you’ve found them all.
Press Ctrl+H to open the Find and Replace window.Ctrl+HClick More, then select the Use Wildcards checkbox.MoreUse WildcardsIn the Find What field, type (<*>)^013\1 (there are no spaces in this string).(<*>)^013\1In the Replace With field, type \1 (there are no spaces in this string either).\1Click Find Next. Find NextWhen a pair of matching whole words is found, click Replace. NOTE: If the second word is only a partial match for the first word, click Find Next.ReplaceFind NextRepeat steps 5 and 6 until you’re satisfied you’ve found them all.How this works:
- (<*>) is the first element (later represented by \1) of the find. The angle brackets specify the start and end of a word, and the ‘word’ is anything (represented by the *). In other words, you’re looking for a whole ‘word’ of any length and made up of any characters (including numbers).
- ^013 is the paragraph marker at the end of the line. In my situation, each word was on its own line with a paragraph mark at the end of the line. If you don’t have this situation, leave this out and replace it with a space (two repeated words in the same line are separated by a space). NOTE: Normally you can find a paragraph mark in a Find with ^p, but not with a wildcard Find—you have to use ^013.
- \1 is the first element. In the Find, it means the duplicate of whatever was found by (<*>); in the Replace, it means replace the duplicated word with the first word found.
(<*>) is the first element (later represented by \1) of the find. The angle brackets specify the start and end of a word, and the ‘word’ is anything (represented by the *). In other words, you’re looking for a whole ‘word’ of any length and made up of any characters (including numbers).(<*>)\1*^013 is the paragraph marker at the end of the line. In my situation, each word was on its own line with a paragraph mark at the end of the line. If you don’t have this situation, leave this out and replace it with a space (two repeated words in the same line are separated by a space). NOTE: Normally you can find a paragraph mark in a Find with ^p, but not with a wildcard Find—you have to use ^013.^013^p^013\1 is the first element. In the Find, it means the duplicate of whatever was found by (<*>); in the Replace, it means replace the duplicated word with the first word found.\1(<*>)