Joachim Kuleafenu
11 min readJun 8, 2021


How to improve your chances of getting a Mortgage Loan?
Discovering the main factors affecting applicants mortgage loan approval rate using python

Photo by Taylor Grote on Unsplash

You’ve set an eye on some stunning property? Oh yes, we all do at times.

But, sometimes we would like to go a little step further — purchase the property. Suddenly, here comes the word ‘Mortgage’, there and then we begin to wonder if our mortgage application is going to be approved.

But, one can increase his/her chances by following a more strategized criteria most lenders set before giving out loans, and that is exactly what we will explore by digging into real-world data published following HDMA.

Project Goal: In this project, we will perform Data Exploration to discover the main factors affecting applicant’s mortgage loan approval rate using python by asking the right questions.

1.0 Table of contents

  1. Understanding the concept of Mortgage.
  2. Setting up Environment
  3. Knowing your Data, features and labels
  4. Exploratory Data Analysis

4. Conclusion

Let’s begin …

2. 0 … with the basics of Mortgage

  • A mortgage is a type of loan used to purchase or refinance a property.
  • Getting a mortgage means your lender gives you a set amount of money to buy a home or any other property. You agree to pay back your loan — with interest — over several years. You don’t fully own the home until the mortgage is paid off.
  • The mortgage becomes a necessity if you can’t pay the full cash upfront for a home.
  • Note: A mortgage is a type of loan, but not all loans are mortgages. You can read more about it from here.

Having this background knowledge, let us set up our environment.

3. 0) Setting up our Environment


  • Python: Used for all kinds of application programming
  • Pandas: Used for all forms of data analysis and wrangling.
  • Numpy: For scientific computation.
  • Matplotlib and Seaborn: For visualization

3.1 Import all required libraries

NB: The project code is available on my GitHub repository. All the codes have been explained as comments within the codes.

4.0) Reading the Data

4.1 Overview of Data

This is an HMDA publicly available dataset that contains information about Mortgage loans. Here is the source of the data.
The Home Mortgage Disclosure Act (HMDA) requires many financial institutions to maintain, report, and publicly disclose information about mortgages.

This public data is important because:

  • They help show whether lenders are serving the housing needs of their communities.
  • Help authorities determine and fish out all predatory act of lending.
  • They give public officials information that helps them make decisions and policies.
  • They shed light on lending patterns that could be discriminatory.
    Eg. A reported increase in mortgage borrowing by blacks and Hispanics as of 1993.
  • We have 466,566 unique data entries and 47 attribute/features/columns.
  • Among them, 34 features are object datatype, 9 floating points and 4 integer datatype.

5.0) Feature Description

It is always advisable to have some level of knowledge about the dataset you are dealing with. Knowing about your dataset helps you both in the EDA process and Feature Engineering. Now let's check out the description of some features.

Figure 1. Data Description

Loan amount: This is the amount of money the applicant applied for.

Tract or Census Tract: Is an area equivalent to a neighborhood consisting of a population between 2,800 and 8,000 people.

Metropolitan statistical areas (MSA): Are delineated by the U.S. Office of Management and Budget (OMB) as having at least one urbanized area with a minimum population of 50,000.

Rate spread is what the company charges on a loan compared to its cost of money.

hud_median_family_income is the median family income in dollars for the MSA/MD in which the tract is located.

as_of_year is the year the HDMA data was given to the federal agency.

loan_type_name provides information about the institutions or individuals that guarantee the loan:

  • Government programs are offered by Federal Housing Administration (FHA).
  • Department of Veterans Affairs (VA).
  • Department of Agriculture’s Rural Housing Service (RHS) or Farm Service Agency (FSA).

5.1) Let us understand our labels (target variable)

In our case, we will remove all loans that were sold to secondary institutions since we want to deal directly with primary lenders.

We will also take out loans that were withdrawn by applicants; the Application is withdrawn by the applicant.

  • Along the way, we will convert this multi-class label to a binary label having a loan approved representing 1 and a loan denied representing 0.

loan originated: This means the loan has been approved for disbursement in our case.

Secondary Market: This is basically an institution willing to buy loans from the primary lenders.

6.0 Exploratory Data Analysis

This is the most crucial section of our project where all insights would be derived from and all our questions will be answered. Simplistically, we would do both univariate, bivariate and multivariate analysis altogether.

6.1 What are the main reason why loan applications are denied?

  • The topmost reasons your loan would be denied are the Debt-to-income ratio and bad Credit History representing 23% and 22% respectively, followed by the rest.
  • The debt-to-income ratio is the percentage of your gross monthly income that goes to paying your monthly debt payments.
  • Credit history is the records of a consumer’s ability to repay debts and demonstrated responsibility in repaying debts.

6.2 Does gross income play a role in applicant loan attainment?

  • We will create a new column called loan_approved which will consist of binary. 1 means loan was approved and 0 means denied.
  • Then we will divide the entire dataset into two groups of dataframes loan_approved and loan_denied
  • We then display the mean and median income of both groups with a graph.

This turned out to be true!.

  • The mean and median gross income of those with their loans approved is slightly higher than those with their loans denied.

6.3 Can the neighborhood income of where the property is located affect the applicant’s chances of getting a loan?

  • You must be expecting that for a loan to be approved the applicant’s income must be equal to or above the neighborhood median family income.
  • Let’s see how applicants with their average income GREATER or LESSER than the mean neighborhood median income of where the property is located, are affected.

Reading plot

  • On the x-axis, 0 and 1 means loan_denied and loan_approved respectively
  • Each row is a unique neighborhood
  • On each row, read from the left towards right
  • Lesser status means applicant median income is lower compared to the average median income of the neighborhood where the property is located.
  • Greater status means applicant median income is Above the average median income of the neighborhood where the property is located.


  • As you view the graphs on each row, you could see either a slight decrease in the number of loan_denied or an increased in the number of approved loans on the left as compared to the right for almost all the plots.
  • This implies, applicants with a higher median income than the average median income of the neighborhood where the property is located, have got more of their loan approved compared to applicants with lower median income.

6.4 What kind of property has the highest loan approval rate?

Manufactured homes: Housing that is essentially ready for occupancy upon leaving the factory and being transported to a building site.

Multifamily dwelling: Any housing unit where two (2) or more dwellings are separated by a common wall, floor, or ceiling, including but not limited to apartments, condominiums, and townhouses.


  • Applicants for multifamily dwelling housing property have the highest percentage of qualifying for loans.

Such properties can produce a cash flow of decent rental income for payment at the earlier stage and that is what most lenders want.

Most applicants of multifamily dwelling are investors and they obviously have good credit-score history and they also provide a decent down payment.

It’s tougher to get a loan for manufactured housing. This is because manufactured housing tends to depreciate, while traditional home values tend to increase over time.

6.5 What type of loan has a better chance of being approved?

Some loans are insured or guaranteed by government programs offered by:

1) Federal Housing Administration (FHA)

2). Department of Veterans Affairs (VA)

3) Department of Agriculture’s Rural Housing Service (RHS) or Farm Service Agency (FSA).

All other loans are classified as conventional.

  • It turned out that, FSA/RHS loans have the highest rate of loan approval whiles FHA stands the lowest.
  • The FSA/RHS are Agencies set to help low-income rural residence and farmers get loans. Applicants guaranteed under this agency need not worry about credit history or present income since they are pardoned from such criteria but they should be able to make payment for their loan, taxes, and insurance.
  • FSA takes up to 95% of the loss.

6.6) Can the purpose of the loan help you?

While Home Purchase and Home Improvement are self-explanatory, Refinancing means getting a new mortgage to replace the original.

Refinancing is done to allow a borrower obtain a better interest term and rate.

  • Applying for a loan to purchase a home has a significant rate of approval over the other two.

6.7 How does the owner-occupancy name and hoepa status name affect loan approval?

HOEPA: The Home Ownership and Equity Protection Act (HOEPA) was enacted in 1994 as an amendment to the Truth in Lending Act (TILA) to address abusive practices in refinances and closed-end home equity loans with high-interest rates or high fees.

So HOEPA_STATUS_NAME shows whether or not a loan was subjected to heopa regulations.

Owner_occupancy_name: This shows the owner-occupancy status of the property. Second homes, vacation homes, and rental properties are classified as “not owner-occupied as a principal dwelling”.

For multifamily dwellings (housing five or more families), and any dwellings located outside MSA/MDs, or in MSA/MDs where an institution does not have home or branch offices, an institution may either enter not applicable.

Most of these properties are for investment purposes.

  • All the loans subjected to hoepa status stands a 100% rate of approval
  • Multifamily dwellings (housing five or more families) that are once indicated as not_applicable also have a comparatively higher rate of approval.
  • The Owner-occupied houses have a little chance over the not_owner occupied homes

6.7 Can an applicant loan be denied because of his/her race?

Not applicable is the name given to institutions representing applicants and co_applicants.

The heatmap shows the rate of loan approval from the various race combination.

A more summarized version is shown on the tables.

We take a particular race of main or co-applicants and combined it with the approval rate of the rest of the races and find the average.

Table 1: Each column value is the mean value of race(column name) vs all other races.


Assume the main applicant race is WHITE therefore as shown on the heatmap, we compare it with all the co-applicants race values then find the average.

After doing this calculation, Black or African American happened to have the lowest average rate of loan approval 65% compared to the white race(majority) 75%.

Table 2:

Each column value is the mean value of co-applicant race(column name) vs all other races.


Assume the co_applicant race is WHITE therefore as shown on the heatmap, we compare it with all the main applicant’s race values then find the average.

Again the results show that Black or African American happens to have the lowest average rate of loan approval 66% compared to the white race(majority) 78%.

  • Asian Americans a minority group happens to be having the highest rate of loan approval 76% and 79% on both tables 1 and 2 respectively.

A research article published by Pew Research Center shows that Blacks and Hispanics face extra challenges in getting home loans than the rest of the race in America.

The reasons lenders cite for turning down mortgage applications show different patterns depending on racial or ethnic group. Among whites, Hispanics and Asians rejected conventional home loans, for instance, the most frequently cited reason was that their debt-to-income ratio was too high (25%, 26%, and 29%, respectively). Among blacks, the most often cited reason was a poor credit history (31%).

According to the research, Blacks and Hispanics generally put less money down on houses relative to total value than other groups. This makes lenders denied them of loan since it reflects the income status of the applicant and their ability to make repayments.

Read more on this.

7.0 Conclusion

After our extensive Exploration we can then conclude everything by giving the following recommendations:

  • Credit history and Debt-to-income ratio emerged as the principal factor in securing a loan, hence applicants should maintain a good debt-to-income ratio and also build good credit history.
  • Unfortunately, it turned out that high-income earners have an edge over the lower-income group. So you may consider taking a high-income job though it’s not a major requirement.
  • In choosing the property, make sure your median income is higher than the median neighborhood income of where the property is located.
  • Choose to buy a multifamily dwelling. Don’t apply for a loan to buy manufactured housing since that has a very low rate of mortgage approval.
  • Apply for FSA/RHS loans. It has a relatively high chance of loan approval over the rest types.
  • Consider getting a Home loan in King County.
  • Apply for a loan to purchase a house; a Home Purchase loan has a significant approval rate.
  • If you want a 100% rate of approval for your loan, then, apply for HOEPA loans.
  • Secure your loan through the first lien.
  • Black or African Amerian has a good chance of applying with a white American, Asian or Native Hawaiian.

You can download the ful-code notebook here.





Joachim Kuleafenu

Software Engineer. I build smart features with Machine Learning techniques.