Here's a quote from the movie X-Men
quote = r "Dr . Grey, how do you feel about the Senator's Statement ? Is there a mutant plot to overthrow the government ? "
Given a dynamically generated list of words like the following
searchwords = ['the', 'Dr.', 'to']
highlight each word by wrapping it in a <span></span>
tag with class="highlight"
as shown below.
expected = r """<span class="highlight">Dr . </span> Grey, how do you feel about <span class="highlight">the</span> Senator's Statement ? Is there a mutant plot <span class="highlight">to</span> overthrow <span class="highlight">the</span> government ? """
Here, the
should not match in there
, to
should not match in Senator
, and note the non-word .
character after Dr
in one of the search words. Since the search words are not known at design-time, you need to build the pattern dynamically accounting for the fact these words must be matched as whole words.
Function
Description
Return Value
re.findall(pattern, string, flags=0)
Find all non-overlapping occurrences of pattern in string
list of strings, or list of tuples if > 1 capture group
re.finditer(pattern, string, flags=0)
Find all non-overlapping occurrences of pattern in string
iterator yielding match objects
re.search(pattern, string, flags=0)
Find first occurrence of pattern in string
match object or None
re.split(pattern, string, maxsplit=0, flags=0)
Split string by occurrences of pattern
list of strings
re.sub(pattern, repl, string, count=0, flags=0)
Replace pattern with repl
new string with the replacement(s)
Pattern
Description
[abc]
a or b or c
[^abc]
not (a or b or c )
[a-z]
a or b ... or y or z
[1-9]
1 or 2 ... or 8 or 9
\d
digits [0-9]
\D
non-digits [^0-9]
\s
whitespace [ \t\n\r\f\v]
\S
non-whitespace [^ \t\n\r\f\v]
\w
alphanumeric [a-zA-Z0-9_]
\W
non-alphanumeric [^a-zA-Z0-9_]
.
any character
x*
zero or more repetitions of x
x+
one or more repetitions of x
x?
zero or one repetitions of x
{m}
m repetitions
{m,n}
m to n repetitions
{m,n}
m to n repetitions
\\
, \.
, \*
backslash, period, asterisk
\b
word boundary
^hello
starts with hello
bye$
ends with bye
(...)
capture group
(po|go)
po or go
Solution 1¶
This content is gated
Subscribe to one of the products below to gain access
Solution 2¶
This content is gated
Subscribe to one of the products below to gain access