pythonregex
Ben Gorman

Ben Gorman

Life's a garden. Dig it.

Here's a quote from Groundhog Day.

quote = """Once a year, the eyes of the nation turn here, to this tiny
hamlet in Pennsylvania, to watch a master at work. The master?
Punxsutawney Phil, the world's most famous weatherman, the
groundhog, who, as legend has it, can predict the coming of an
early spring. And here's the big moment we've all been waiting for. Let's just
see what Mr. Groundhog has to say. Hey! Over here, you little
weasel! Well, that's it. Sorry you couldn't be here in person to
share the electric moment. This is one event where television
really fails to capture the excitement of thousands of people
gathered to watch a large squirrel predict the weather, and
I for one am deeply grateful to have been a part of it.
Reporting for Channel 9, this is Phil Connors.
"""

Find all substrings that, ignoring case sensitivity,

  • begin with one of these words: ['the', 'this', 'to', 'in']
  • end with one of these words: ['phil', 'weatherman', 'groundhog', 'pennsylvania', 'master']
  • have 30 or fewer characters in between the begin and end word (including spaces and newline characters).

Keep the earliest identified, non-overlapping, non-nested substrings when scanning from left to right.

starters = ['the', 'this', 'to', 'in']
enders = ['phil', 'weatherman', 'groundhog', 'pennsylvania', 'master']

Expected result

expected = [
    'to this tiny\nhamlet in Pennsylvania', 
    'to watch a master', 
    'The master', 
    "the world's most famous weatherman", 
    'the\ngroundhog', 
    'this is Phil'
]
Once a year, the eyes of the nation turn here, {==to this tiny
hamlet in Pennsylvania==}, {==to watch a master==} at work. {==The master==}?
Punxsutawney Phil, {==the world's most famous weatherman==}, {==the
groundhog==}, who, as legend has it, can predict the coming of an
early spring. And here's the big moment we've all been waiting for. Let's just
see what Mr. Groundhog has to say. Hey! Over here, you little
weasel! Well, that's it. Sorry you couldn't be here in person to
share the electric moment. This is one event where television
really fails to capture the excitement of thousands of people
gathered to watch a large squirrel predict the weather, and
I for one am deeply grateful to have been a part of it.
Reporting for Channel 9, {==this is Phil==} Connors.

Note that the result includes

to watch a master

and

The master

but not

to watch a master at work. The master

Regex Functions

Function Description Return Value
re.findall(pattern, string, flags=0) Find all non-overlapping occurrences of pattern in string list of strings, or list of tuples if > 1 capture group
re.finditer(pattern, string, flags=0) Find all non-overlapping occurrences of pattern in string iterator yielding match objects
re.search(pattern, string, flags=0) Find first occurrence of pattern in string match object or None
re.split(pattern, string, maxsplit=0, flags=0) Split string by occurrences of pattern list of strings
re.sub(pattern, repl, string, count=0, flags=0) Replace pattern with repl new string with the replacement(s)

Regex Patterns

Pattern Description
[abc] a or b or c
[^abc] not (a or b or c)
[a-z] a or b ... or y or z
[1-9] 1 or 2 ... or 8 or 9
\d digits [0-9]
\D non-digits [^0-9]
\s whitespace [ \t\n\r\f\v]
\S non-whitespace [^ \t\n\r\f\v]
\w alphanumeric [a-zA-Z0-9_]
\W non-alphanumeric [^a-zA-Z0-9_]
. any character
x* zero or more repetitions of x
x+ one or more repetitions of x
x? zero or one repetitions of x
{m} m repetitions
{m,n} m to n repetitions
{m,n} m to n repetitions
\\, \., \* backslash, period, asterisk
\b word boundary
^hello starts with hello
bye$ ends with bye
(...) capture group
(po|go) po or go

Solution

This content is gated

Subscribe to one of the products below to gain access