Volume 5, No. 1 
January 2001

  Tibor Környei





Translation Journal
The Translator & the Computer


Using MS Word’s Advanced Find and Replace Function

by Tibor Környei

robably few people are familiar with, and even fewer use, the advanced feature of Microsoft Word's Find and Replace function. However, this feature may often prove to be extremely helpful in the translator's work. It can be accessed from the Find and Replace dialog box and it is called, depending on the version of Word, Use pattern matching or Use wildcards. The advanced feature only works after you have checked this option. If it is not presented to you in the dialog box, click the More button.

Much time can be saved in translating legal, financial, and technical texts by using properly written find-and-replace formulas.
This feature allows you to set complex search conditions by using special character combinations. The symbols to be used are listed in detail under Word's Help menu, so we shall not describe them here. The use of this feature will be shown below using a few examples. You may want to test these examples in Word using a test file.

By highlighting portions of the text, the search can be limited to that portion of the document. Word will perform the search in this portion and then will ask you whether you wish to continue to search in the rest of the text. Click the No button. Replacement can also be performed in the interactive mode by first pressing the Find button and, upon reaching the desired string, deciding whether replacement is required. If so, you press the Replace button; if not, press Find Next.

1. Eliminating extra spaces

In our work we often accidentally type two or more spaces between words. You can easily replace multiple spaces with a single space using the advanced Find and Replace feature.

After selecting the Use wildcards option, type the following in the appropriate boxes:

Find what: •{2,}
Replace with:

The • symbol here stands for a regular space.

The {n,} notation indicates the occurrence of the character or string preceding it at least n times; in this case it indicates that we are looking for a string consisting of at least two spaces. The generic format of the expression is {n,m}, indicating the occurrence of the character preceding it between n and m times. In some non-English versions of the software, a semicolon is used instead of the comma. By clicking Special, you can check which separator character appears in the bracketed expression indicating the number of occurrences.

2. Changing the separator character in numerals

Numbers written in English in the format 12,345,678.12 are written in some languages in the format 12 345 678,12 (the thousands separator here is not a simple space, but a non-breaking space typed with the key combination Ctrl+Shift+space). In a document with lots of numbers, replacing the separator character manually may be a time-consuming exercise. Use the following:

Find what: ([0-9]),([0-9])
Replace with: \1^s\2

The expression in square brackets [0-9] stands for an arbitrary numerical character (digit). By placing an expression between parentheses, it can be referred to as a unit in the Replace with box. The units are numbered from left to right starting with 1. There are two such units in our example.

The expression in the Find what box means: look for any string of characters where there is one comma between any two digits.

The expression in the Replace with box means: retype the string found by inserting a non-breaking space between the digits while leaving the digits and their order unchanged. This is indicated by \1 and \2. The caret (^) with the letter s following it is the symbol of a non-breaking space. It can also be inserted by clicking Special and then Nonbreaking space. The same non-breaking space can also be inserted by typing a caret followed by the character's ANSI code, in this case 0160. This method allows any character to be inserted as long as its ANSI code is known. In this case the replace expression would look as follows:

Find what: ([0-9]),([0-9])
Replace with: \1^0160\2

In the next step, we shall replace the decimal point with a decimal comma.

Find what: ([0-9]).([0-9])
Replace with: \1,\2

The procedure is similar to the one described above; no explanation is needed.

The reverse procedure when translating into English is somewhat different:

First we change the decimal comma into a decimal point:

Find what: ([0-9]),([0-9])
Replace with: \1.\2

Then we change the non-breaking space into a comma:

Find what: ([0-9])^s([0-9])
Replace with: \1,\2

If we wish to process not only non-breaking spaces but also regular spaces functioning as thousands separators (after all, we cannot assume that the author of the original text follows proper word processing practices), we must use the [•^s] (open square bracket - space - caret - letter s - close square bracket) as follows:

Find what: ([0-9])[•^s]([0-9])
Replace with: \1,\2

3. Reversing ordinal plus noun

Hungarian expressions like "2. fejezet", "2. fejezetben" or "2. Fejezet" are all translated into English as "Chapter 2". The find-and-replace operation is the following:

Find what: ([0-9]{1,}).?([Ff]ejeze[a-z]{1,})
Replace with: Chapter^s\1

The first {1,} refers to any digit preceding it; the second one refers to an alpha character, indicating that we are looking for one or several such characters. In this way, we can search for numbers consisting of several digits and for flexed forms of words. (Leave out the last letter of the unflexed word stem.) The ? always stands for an arbitrary character; in this way it does not matter whether there is a regular space or a non-breaking space in a given position. Of course, the proper way of looking for both types of space is the use of the [•^s] expression (open square bracket - space - caret - letter s - close square bracket).

The Find what box means: look for any string of characters where an Arabic number of any length is followed by a period and then, after an arbitrary character (which could also be a non-breaking space), and by a flexed or unflexed form of the word "fejezet" or "Fejezet". The method allows for searching for both upper-case and lower-case forms.

In the Replace with expression, the number of the chapter will appear instead of \1.

The method can also be used, with a slight modification, for chapters identified by Roman numerals. For example, "II. fejezet", is to be translated as "Chapter II". The solution:

Find what: ([A-Z]{1,}).?([Ff]ejeze[a-z]{1,})
Replace with: Chapter^s\1

All we had to do in order to search for Roman numerals is replace the expression ([0-9]{1,}) with ([A-Z]{1,}).

In legal texts, an expression of the type "45. §" must often be replaced by an expression of the type "Section 45". This can be easily accomplished on the basis of the above explanations:

Find what: ([0-9]{1,}).[•^s]§
Replace with: Section^s\1

The expression [•^s] provides for the possibility of two types of space. The § character could also be written using its ANSI code:

Find what: ([0-9]{1,}).[•^s]^0167
Replace with: Section^s\1

Replacement in the reverse direction is also easy. In order to replace an expression of the type "Section 45" with one of the type "45. §", we can proceed as follows:

Find what: Section[•^s]([0-9]{1,})
Replace with: \1.^s§

4. Reversing currency symbols

Numbers occurring in the format $50,12 must be converted to the format 50,12 $ in the translation. Fortunately, this mechanical task can also be automated. The numbers following the dollar sign can be modified so that the $ sign will immediately follow the number after a non-breaking space.

Find what: $([0-9.,]{1,})
Replace with: \1^s$

The expression can be easily modified so that the word "dollar" (or its plural in the respective language) will appear after the replacement instead of the $ sign.

Find what: $([0-9.,]{1,})
Replace with: \1^sdollar

The Find what box means: look for any expression where a string consisting of numbers, periods and commas immediately follows the $ sign. The disadvantage of this method is that any phrase or sentence ending on "$," or "$." will also be converted. We must check before using the replace function whether there is such an expression in the text to be searched. If so, the "$," and "$." expressions to be left unchanged must first be replaced by any unique expression (e.g., $comma and $dot) using the regular find-and-replace function and they must be changed back after the replace operation. This artifice can be used in general whenever the harmful side effect of an otherwise useful replace operation is to be avoided.

The reverse case, when the currency symbol is to be moved from a position after the number to before the number is not as simple and can only be accomplished in several steps.

In a first step, the thousands separator spaces are replaced with non-breaking spaces (otherwise the $ sign would always appear before the last thousands group):

Find what: ([0-9]) ([0-9])
Replace with: \1^s\2

In a second step, the space after the number is replaced, for example, with $$ and "attached" to the number:

Find what: ([0-9])[•^s]$
Replace with: \1$$

Now the currency symbol can be moved to precede the number:

Find what: ([0-9.,^s]{1,})$$
Replace with: $\1

5. Handling complex expressions

When replacing complex expressions, it is convenient to first break down the expression into its components, test the replacement of the components, and then work out a replace formula for the entire expression.

Let us take the example of the Hungarian expression "10. § (1) bekezdésének d)-f) pontja" into English, which may be "Paragraphs d)-f) of Subsection (1) of Section 10". In this case, the order of the individual expressions is also modified.

"10. §" would be no problem on the basis of the explanations under point 3. So let us see the replacement of "(1) bekezdésének". We must do the following:

Find what: (\([0-9]{1,}\)) bekezdé[a-z]{1,}
Replace with: Subsection \1

The novelty here is the use of the  "\("  and   "\)"  combinations. The parenthesis with a backslash before it distinguishes it as an ordinary character from the same character functioning as an operator. Since the second opening and first closing parentheses are parts of the expression we are searching for (ordinary characters, in contrast to the first opening and second closing parentheses, which function as operators), we must type them as  "\("  and   "\)"  in the Find what box.

Let us examine the part "d)-f) pontja". The solution is the following:

Find what: ([a-z]{1,}\)?[a-z]{1,}\))?pon[a-z]{1,}
Replace with: Paragraphs \1

We used the familiar operators here. By replacing the hyphen and the space with the ? sign, we can make the replace operation handle a hyphen, an n-dash, or a non-breaking space properly.

Finally, by assembling the components of the entire expression, we obtain:

Find what: ([0-9]{1,}).?§ (\([0-9]{1,}\)) bekezdé[a-z]{1,} ([a-z]{1,}\)?[a-z]{1,}\))?pon[a-z]{1,}
Replace with: Paragraphs \3 of Subsection \2 of Section \1

It is worth noting how easily the reversal of the order in which the individual segments will appear is handled using \3, \2, etc.

Additional options

The find-and-replace function can also be applied in many other cases, for example, in replacing date formats. It is worth learning how to record and write macros in Word, because even complex tasks can be performed by combining find-and-replace and macros.

Often the solution of a problem requires some ingenuity. The specific replace formula must always be tested on a test file before using it in an actual translation. The test file can be produced by copying and pasting a portion of the actual text into a new document via the clipboard.

Attention must be paid to typing the expressions accurately, since a single extra character may make the replace formula unusable. For this same reason, it is recommended that tested and proven replace formulas be saved for future use. They can also be recorded as macros, in which case they can be reused at any time, rather than reinvented over and over again.

Much time can be saved in translating legal, financial, and technical texts by using properly written find-and-replace formulas. The time and effort spent familiarizing yourself with the advanced find-and-replace feature of Word may yield rich dividends in increased productivity.