You run an ecommerce site called shoesfordogs.com :material-paw:. You want to analyze your visitors, so you compile a DataFrame called hits that represents each time a visitor hit some page on your site.
You suspect that the undocumented third-party tracking system on your website is buggy and sometimes splits one session into two or more session_ids. You want to correct this behavior by creating a field called session_group_id that stitches broken session_ids together.
Two session, A & B, should belong to the same session group if
They have the same visitor_idand
2. Their hits overlap in time or
3. The latest hit from A is within five minutes of the earliest hit from B, or vice-versa
Associativity applies. So, if A is grouped with B, and B is grouped with C, then A should be grouped with C as well.
Create a column in hits called session_group_id that identifies which hits belong to the same session group.