Financial inclusion remains one of the main obstacles to economic and human development in Africa. For example, in Kenya, Rwanda, Tanzania and Uganda, only 9.1 million adults (14% of adults) have access to or use a business bank account.
In 2008, the level of financial inclusion in Sub-Saharan Africa was just over 23%. In 2018, that number almost doubled. In Togo, Mazamesso Assih, finance minister, coordinates a financial inclusion strategy with partner banks. He states that ensuring access to basic financial services for the population is critical to boosting the African economy.
In 2022, as highlighted in the latest report by the Central Bank of West Africa, one of the highest rates of financial inclusion in the region was reached, close to 82%. A significant portion of this increase came from the rise of digital financial services.
Assih also states that there are three main reasons why African nations should focus on financial inclusion:
We assume that for each user there is a unique id, so users cannot be repeated in the dataset.
Furthermore the output must be in the following format:
Our strategy to solve this challenge was:
Data Description
Feature Engineering
Feature Selection
EDA
Data preparation
Machine Learning modelling
Model Avaliation (cross-validation and metrics)
Hyperparameters Fine Tuning
CSV Final
Step 01. Data Description:
Step 02. Feature Engineering:
H1. Developed countries has 10 % more digital bank accounts
H2. Urban zone has 50% more digital bank accounts than country-side zone.
H3. 80% of the people with cellphone and internet has digital bank accounts
H4. Quantity of people with undergraduated degree that has digital bank account are greater than the other levels of education.
H5. People between 18 and 40 years old constitute 85% of the basis of digital bank account.
H6. Women are marjority of the digital bank accounts
H7. People older than 60 with cellphone doesnt have digital bank account in general.
H8. 95% of unemployed people doesnt have digital bank account.
H9. 100% of the head-of-the-house has digital bank account.
Step 03. Data Filtering:
Step 04. Exploratory Data Analysis:
We proceed with the bivariated EDA for hypothesis testing. We list the main notable insights in the session below.
For the multivariated analysis we divided between numerical and categorical data and plot the correlation matrix (Pearson and V-Kramer correlations).
Numeric attributes (year,household_size and age_of_respondent):
No significant correlation is noted for these 3 features.
Categorical attributes (country,location_type,cellphone_access,gender_of_respondent,relationship_with_head,marital_status,education_level,job_type,bank_account):
Step 05. Data Preparation:
Step 06. Feature Selection:
Step 07. Machine Learning Modelling:
Four models were tested for classification: Random Forest, Logistic Regression, K-Nearest Neighbours and Support Vector Machine.
The metric used to quantify the models was F1-Score.
The Support Vector Machine was the most time consuming, since it underperformed in relation to the other models we decided to not go further with it.
Next we performed cross-validation on the models, their F1-Score.
Step 08. Hyperparameter Fine Tunning:
Step 09. Convert Model Performance to Business Values:
Step 10. Deploy Modelo to Production:
Hypothesis 01: Urban Areas has 50% more digital bank accounts than country-side area
False.
Hypothesis 02: People with higher degree of education has the marjority of bank accounts
True
Hypothesis 03: People between 18 and 40 years old constitute 85% of the digital bank account basis.
False.