Data Science always involves modeling your data. And what is a model?

Suppose, we have some data about an object with a certain structure . For example, it might be data about an apartment: vector $$[2, 84.5, 6]$$, where $$2$$ is the number of rooms, $$84.5$$ is the total area of the apartment and $$6$$ is the floor number. Actually, we might have the data about $$N$$ different apartments. Then we will represent them as a set of vectors $$X = [x_1, ..., x_N]$$, where each $$x_i$$ is a set of $$m$$ features of an object : $$x_i = [x_{i1}, ..., x_{im}]$$. $$X$$ can also be a considered as a matrix :

$$X = \begin{pmatrix} x_{11} & ... & x_{1m} \\ ... & ... & ... \\ x_{N1} & ... & x_{Nm} \\ \end{pmatrix}$$

where $$x_{ij}$$ is the $$j_{th}$$ feature of the $$i_{th}$$ object. $$\mathbf{x_i}$$ is a vector of numeric features of the $$\mathbf{i_{th}}$$ object.

Besides, we have some numeric characteristic for each of $$N$$ objects $$y = [y_1, ..., y_N]$$ that we want to be able to calculate (or to model ) having that object's data. $$\mathbf{y_i}$$ is the result, the target or "ground truth" for the $$\mathbf{i_{th}}$$ object.

That is the data from an actual example. That could be the actual price of the apartment, which we consider the true price. Given such data, we make an assumption:

We call this assumption a model . In other words, a model is a function that given object's features outputs target prediction for that object :

$$a(x): [x_1, ..., x_m] \rightarrow \hat{y}$$

The type of function $$a(x)$$ can be different. The type of the model in $$(1)$$ is called a linear model. It has $$m+1$$ parameters which we call weights. In the final project, you will use the same type of model. We apply this model to all the objects with the same weights $$w = [w_0, w_1, ..., w_m]$$:

$$\hat{y}_1=a(x_1)=w_0 +w_1x_{11} + ... + w_mx_{1m}$$

...

$$\hat{y}_N=a(x_N)=w_0 +w_1x_{N1} + ... + w_mx_{Nm}$$

We want to make our model better by finding the right set of weights. To do that, we use a loss function. Loss is a deviation, error of our assumption with respect to the actual target $$\mathbf{y}$$ :

$$ Loss(\vec{w})=\frac{1}{N}\sum_{i=1}^N (y_i-\hat{y}_i)^2 $$

In order to be able to predict the target which is close to the actual one with our model, we need to minimize the loss by finding a better set of weights. Loss is a multivariate function from weights. As you know from the lectures, to minimize it, we may use gradient descent . This includes finding the gradient of this multivariate function.

Good luck with your final project!