AI/Anderw Ng's Machine Learning

Machine Learning Lecture Note 1

LiDARian 2021. 10. 12. 01:47
반응형


Definition of Machine Learning

a computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.

Kinds of Machine Learning

Supervised Learning

Unsupervised Learning

Reinforcement learning

recommender systems

 


Supervised Learning

Supervised Learning = right answer is given

In supervised learning, we are given a data set and already know what our correct output should look like, having the idea that there is a relationship between the input and the output.

 

Two Kinds of Problem in Supervised Learning

regression = predict continuous valued output

Classification : Discrete valued output problem : 0 or 1 / 0 or 1 or 2 or 3 etc..

 

1 variable인 경우 왼쪽 같이 표기한다.

오른쪽의 2 variable의 경우 이와 같이 구분선을 그어주는 것이 machine learning task T이다.

 

infinite variable인 경우 → Support Vector Machine을 이용

 


Unsupervised Learning

Unsupervised learning allows us to approach problems with little or no idea what our results should look like. We can derive structure from data where we don't necessarily know the effect of the variables.

We can derive this structure by clustering the data based on relationships among the variables in the data.

With unsupervised learning there is no feedback based on the prediction results.

 



Model representation

To establish notation for future use, we’ll use $x^{(i)}$ to denote the “input” variables (living area in this example), also called input features,

and $y^{(i)}$ to denote the “output” or target variable that we are trying to predict (price).

A pair $(x^{(i)} , y^{(i)} )$ is called a training example, and the dataset that we’ll be using to learn—a list of m training examples ${(x^{(i)} , y^{(i)} ); i = 1, . . . , m}(x(i),y(i));i=1,...,m$—is called a training set.

We will also use X to denote the space of input values, and Y to denote the space of output values. In this example, $X = Y = ℝ$


Objective...

learn a function $h : X → Y$ so that $h(x)$ is a “good” predictor for the corresponding value of $y$. For historical reasons, this function $h$ is called a hypothesis.

When the target variable that we’re trying to predict is continuous, such as in our housing example, we call the learning problem a regression problem. When y can take on only a small number of discrete values (such as if, given the living area, we wanted to predict if a dwelling is a house or an apartment, say), we call it a classification problem. 


Cost Function




We can measure the accuracy of our hypothesis function by using a cost function. This takes an average difference (actually a fancier version of an average) of all the results of the hypothesis with inputs from x's and the actual output y's.

This function is otherwise called the "Squared error function", or "Mean squared error"(평균제곱오차). The mean is halved $\left(\frac{1}{2}\right)$ as a convenience for the computation of the gradient descent, as the derivative term of the square function will cancel out the $\left(\frac{1}{2}\right)$ term. The following image summarizes what the cost function does: (미분하면 2가 곱해져서 1/2가 없어지기에 1/2를 미리 곱해줬다는 뜻이다.)

반응형