pythonnumpypandas
Ben Gorman

Ben Gorman

Life's a garden. Dig it.

You own a gym :muscle_tone4: for pregnant women :pregnant_woman_tone4: called “OB-GYM” and you recently opened a second location. You'd like to analyze its performance, but your reporting software has given you the sales data in an awkward format.

import numpy as np
import pandas as pd
 
generator = np.random.default_rng(314)
 
sales = pd.DataFrame({
    'date':pd.date_range(start = '2020-01-01', periods=5).repeat(2),
    'store_id':np.tile([1,2], 5),
    'sales1':np.round(generator.normal(loc=750, scale=20, size=10), 2),
    'sales2':np.round(generator.normal(loc=650, scale=40, size=10), 2),
    'members':generator.integers(low=20, high=25, size=10)
})
sales.loc[sales.store_id == 2, 'sales1'] = np.nan
sales.loc[sales.store_id == 1, 'sales2'] = np.nan
 
print(sales)
#         date  store_id  sales1  sales2  members
# 0 2020-01-01         1  737.54     NaN       22
# 1 2020-01-01         2     NaN  629.00       20
# 2 2020-01-02         1  750.75     NaN       23
# 3 2020-01-02         2     NaN  699.01       22
# 4 2020-01-03         1  750.60     NaN       20
# 5 2020-01-03         2     NaN  640.20       24
# 6 2020-01-04         1  752.65     NaN       21
# 7 2020-01-04         2     NaN  695.64       22
# 8 2020-01-05         1  747.02     NaN       20
# 9 2020-01-05         2     NaN  632.40       22

Reshape it into a DataFrame like this

#               sales_1  sales_2  members_1  members_2
# date
# 2020-01-01    737.54    629.00         22         20
# 2020-01-02    750.75    699.01         23         22
# 2020-01-03    750.60    640.20         20         24
# 2020-01-04    752.65    695.64         21         22
# 2020-01-05    747.02    632.40         20         22

Solution 1

This content is gated

Subscribe to one of the products below to gain access

Solution 2

This content is gated

Subscribe to one of the products below to gain access