Linear Regression Health Costs Calculator

Note

Damn, I hate this problem! I feel so stupid to keep tweaking the numbers and model layers until I get the expected result. Even so, it doesn't always work. This is why I don't go with Data Science although in love Math. In Math, we're able to prove the correctness. In Data Science, we (or is it just me) keep repeating the same method and hope for a better result. When it works, God know why (I don't believe in God, btw).

Total time for this: 2h.

Problem description

Copied and modfied from this Google Colab link

In this challenge, you will predict healthcare costs using a regression algorithm.

You are given a dataset that contains information about different people including their healthcare costs. Use the data to predict healthcare costs based on new data.

The first two cells of this notebook import libraries and the data.

Make sure to convert categorical data to numbers. Use 80% of the data as the train_dataset and 20% of the data as the test_dataset.

pop off the "expenses" column from these datasets to create new datasets called train_labels and test_labels. Use these labels when training your model.

Create a model and train it with the train_dataset. Run the final cell in this notebook to check your model. The final cell will use the unseen test_dataset to check how well the model generalizes.

To pass the challenge, model.evaluate must return a Mean Absolute Error of under 3500. This means it predicts health care costs correctly within $3500.

The final cell will also predict expenses using the test_dataset and graph the results.

Solution

Download data

Install tensorflow docs

Import libraries

Prepare datasets

Since the data has some text columns, we need to convert the text values to numeric.

Let's pick randomly 20% record to make our test_dataset first

Now, we select the remaining 80% to make train_dataset

Prepare the labels

Prepare the model

Compile it

Feed it.

Test