You operate an online business called tindercoach.com where you give people advice on their Tinder profiles ❤️🔥. You have a dictionary of visits
indicating how many times each visitor_id
visited each page
on your site.
Convert visits
into a Compressed Sparse Column (CSC) matrix where element (i,j) stores the number of times visitor i visited page j.
Then print the sub-matrix showing how many times visitors 1443, 6584, and 7040 visited pages tindercoach.com/chl, tindercoach.com/nky, and tindercoach.com/zmr.
Solution¶
Explanation
Our strategy is to build three lists:
data
: contains the non-zero elements of the matrixrow_idxs
: contains the row index of each non-zero elementcol_idxs
: contains the column index of each non-zero element
Then we can instantiate a CSC matrix with sparse.csc_matrix((data, (row_idxs, col_idxs)))
.
-
Build mappings for
visitor_id:row_index
andpage:col_index
.These mappings allow us to get the row / column index for any visitor_id / page.
-
Determine the row and column index for each data point.
-
Build the
csc_matrix
. -
Print the sub matrix showing visitors 1443, 6584, and 7040 and pages tindercoach.com/chl, tindercoach.com/nky, and tindercoach.com/zmr.
We can fetch the appropriate row and column indices as follows:
To fetch the sub matrix indexed by these row and column indices, we can do
If we simply did
mat[print_row_idxs, print_col_idxs]
, scipy would fetch three elements from the matrix; the elements at positions:(2, 38)
,(8, 17)
, and(0, 86)
. This is consistent with NumPy array indexing behavior, but it's not the behavior we desire.Rather, we want to fetch all combinations of (row, col) indices from our lists
print_row_idxs
andprint_col_idxs
. (9 combinations in total, forming a 3x3 sub matrix.)We use
np.ix_()
to accomplish this.np.ix_(print_row_idxs, print_col_idxs)
constructs an open mesh from the input lists / arrays.When these arrays are used to index
mat
, they are essentially broadcasted into the 3x3 i and j index arrays.