Google Ad sense



All source codes in this blog are also available in Gregory Choi's Github Source code Repository


  

Thursday, April 7, 2016

Introduction to Machine Learning / Data Mining

[Machine Learning? Data Mining?]

Well, there is a little bit difference between machine learning and data mining although I don't see any difference between them.

See the debate on the difference between machine learning and data mining.

At the end, it is about training the machine to recognize the data, and the predict the future (or unknown variables) with the training. I'll use both terms interchangeably. Please, feel free to challenge me if I am wrong.

[How it works?]

Well, seeing is believing. I have been in search for the better explanation. But, professor Keating in University Notre Dame has a really great explanation for that. You'll see just two pictures with the painters' name. Next, I am going to give you just pictures, and give you the question "who is the painter?" I swear you can answer the question 100% correctly.

<Claude Monet>













<Van Gogh>














Now, who painted these pictures?
<1>


<2>


<3>


<4>


<5>


<Answer>
1 - Gogh
2 - Monet
3 - Monet
4 - Gogh
5 - Gogh

<How your brain worked?>
As soon as you saw those pictures, in your mind, you already have a formula, which allows you to make a difference between two painters. (By the way, both painters are well known to have stark contrast in their painting styles to each other.)

Features Gogh Monet
Color Use 4-5 colors Use more than 10 colors
Style Masculine Feminine
Stroke Rough Smooth
Viewer's Perception Powerful Detailed

Although there are some pictures which exactly doesn't fall into those two categories, we can get a broad sense of which picture is painted by whom.

Machine learning does the same thing. It learns the data given by the user. We call it as a "training set" Then, it applies the formula that was built when the machine analyzed the training set to the data set that we want to forecast. We call it as a "test set." The prediction can be wrong, but generally as we provide the machine with the more qualified test data, we can get the better prediction.

[Where can we apply it?]

<Sales>
You are the sales person of the insurance company. Just you've got the list of potential customers. It has the information of their income, age, place, and jobs. If you are a good sales person, you would have a gut feeling to single out which customer is willing to sign up the new insurance plan. However, with the machine learning, you don't need any gut feeling. If you have the past transaction records, it tells you which customer is the most likely to sign up the new insurance plan.

<Card company>
Suppose that you are in charge of issuing cards. You don't want to issue cards to those who are highly likely not to pay the card bill on time. In this case, you can figure out who is likely to default based upon age, income, job, and savings. Actually, credit card companies adopt this techniques long time ago. If you get "you are rejected to your request on issuing card" message, you would probably not pass this test.

I want to lead this conversation into real application of data mining.









No comments:

Post a Comment