Rossmann drug retailer 6 weeks sales prediction

Rossmann operates over 3,000 drug stores in 7 European countries. Currently, Rossmann store managers are tasked with predicting their daily sales for up to six weeks in advance. Store sales are influenced by many factors, including promotions, competition, school and state holidays, season- ality, and locality. With thousands of individual managers predicting sales based on their unique circumstances, the accuracy of results can be quite varied.

Steps in solving the problem

Feature Description and Filling-up the missing values

First step is to describe features and deal with missing values. Percentage of missing values in the dataset:

Feature filtering and engineering

Exploratory Data Analysis

Sales by type of store

Sales by type of holiday

Sales negatively correlates with distance

Sales are lower with extended promotions than with promotions only

Encoding categorical features and re-scaling features according to EDA.

Apply feature selection algorithm Boruta to select most relevant features.

Train Test Split and apply the metrics for baseline models (mean and linear regression)

Apply Cross Validation to the Random Forest and Gradient Boost Regressors.

Evaluate the value of the profit for the best case scenario and worst case scenario by store.

Implement ETL in Heroku platform to be accessible by any means such a bot