You own an ecommerce store that sells costumes for dogs 🐶. You'd like to analyze your web traffic by measuring the count of unique visitors amongst your web sessions data. Additionally, you want the ability to filter by date, so you can answer questions like
How many visitors visited my site?
and
How many visitors visited my site on February 20th, 2022?
You come up with the following count_visitors()
function that you place inside measures.py
.
def count_visitors(sessions, date=None):
"""
Count the number of unique visitors
:param sessions: list of sessions were each session has a visitor_id and date
:param date: optional filter by date
:return: number of unique visitors
"""
if date is None:
visitors = {x.visitor_id for x in sessions}
else:
visitors = {x.visitor_id for x in sessions if x.date == date}
return len(visitors)
To test it, you create test_measures.py
as follows.
from datetime import date
from types import SimpleNamespace
import time
from measures import count_visitors
def load_sessions():
"""Super expensive function that loads the sessions data"""
print("loading sessions data.. hold on a sec")
time.sleep(5)
sessions = [
SimpleNamespace(visitor_id=372, date=date.fromisoformat('2022-02-02')),
SimpleNamespace(visitor_id=925, date=date.fromisoformat('2022-02-02')),
SimpleNamespace(visitor_id=123, date=date.fromisoformat('2022-02-04')),
SimpleNamespace(visitor_id=925, date=date.fromisoformat('2022-02-10')),
SimpleNamespace(visitor_id=372, date=date.fromisoformat('2022-02-10')),
SimpleNamespace(visitor_id=123, date=date.fromisoformat('2022-02-10')),
SimpleNamespace(visitor_id=925, date=date.fromisoformat('2022-02-11')),
SimpleNamespace(visitor_id=128, date=date.fromisoformat('2022-02-15')),
SimpleNamespace(visitor_id=925, date=date.fromisoformat('2022-02-17')),
SimpleNamespace(visitor_id=372, date=date.fromisoformat('2022-02-17'))
]
return sessions
def test_count_visitors():
"""Confirm that count_visitors works as expected"""
sessions = load_sessions()
assert count_visitors(sessions) == 4
def test_count_visitors_on_date():
"""Confirm that count_visitors works as expected when a date filter is used"""
sessions = load_sessions()
assert count_visitors(sessions, date=date.fromisoformat('2022-02-17')) == 2
You have a problem.. Both of your test functions call the load_sessions()
function in order to load the sessions data, but this data is really expensive (slow) to fetch. (In theory, load_sessions()
might connect to a database and run an expensive query.) See if you can cut your tests runtime in half 😉.
Run with messages
Solution¶
from datetime import date
from types import SimpleNamespace
import time
import pytest
from measures import count_visitors
@pytest.fixture(scope="module")
def load_sessions():
"""Super expensive function that loads the sessions data"""
print("loading sessions data.. hold on a sec")
time.sleep(5)
sessions = [
SimpleNamespace(visitor_id=372, date=date.fromisoformat('2022-02-02')),
SimpleNamespace(visitor_id=925, date=date.fromisoformat('2022-02-02')),
SimpleNamespace(visitor_id=123, date=date.fromisoformat('2022-02-04')),
SimpleNamespace(visitor_id=925, date=date.fromisoformat('2022-02-10')),
SimpleNamespace(visitor_id=372, date=date.fromisoformat('2022-02-10')),
SimpleNamespace(visitor_id=123, date=date.fromisoformat('2022-02-10')),
SimpleNamespace(visitor_id=925, date=date.fromisoformat('2022-02-11')),
SimpleNamespace(visitor_id=128, date=date.fromisoformat('2022-02-15')),
SimpleNamespace(visitor_id=925, date=date.fromisoformat('2022-02-17')),
SimpleNamespace(visitor_id=372, date=date.fromisoformat('2022-02-17'))
]
return sessions
def test_count_visitors(load_sessions):
"""Confirm that count_visitors works as expected"""
assert count_visitors(load_sessions) == 4
def test_count_visitors_on_date(load_sessions):
"""Confirm that count_visitors works as expected when a date filter is used"""
assert count_visitors(load_sessions, date=date.fromisoformat('2022-02-17')) == 2
Now when we run pytest -s
, we get the following output:
Notice the message "loading sessions data.. hold on a sec" appears once - not twice as it did originally. This indicates that the expensive load_sessions()
function only ran once.
Explanation¶
The trick to this solution is to make use of pytest's fixture decorator which allows us to cache the output value of a function. In this case, we cache the output of load_sessions()
so we can use it in test_count_visitors()
and then reuse it in test_count_visitors_on_date()
. To make this work, note a few things:
-
We have to
import pytest
-
We have to decorate
load_sessions()
with@pytest.fixture(scope="module")
.@pytest.fixture(scope="module") def load_sessions(): """Super expensive function that loads the sessions data""" ...
scope="module"
tells pytest it can reuse the data within the same module (python file). However, if we were to reference theload_sessions()
function from a different module, the function would be re-executed. -
We have to pass
load_sessions
into its dependent test functions as a parameter, and tweak the internals accordingly.def test_count_visitors(load_sessions): """Confirm that count_visitors works as expected""" assert count_visitors(load_sessions) == 4 def test_count_visitors_on_date(load_sessions): """Confirm that count_visitors works as expected when a date filter is used""" assert count_visitors(load_sessions, date=date.fromisoformat('2022-02-17')) == 2