Welcome to the Optimization 101. We need to understand what we're looking for and how we can characterize it. The first start is understanding of what is called extrema point or extremum. First of all, you need to understand that it's a [inaudible] in front, so it's plural is extrema, and we're going to learn with it. Point for example A, is called a local extremum of function F, if it is greatest over the lowest actually. A value in some neighborhoods of is a good point. You should carefully understand that, this definition actually looks if there is any neighborhoods that satisfies our demand for the greatest or the lowest value. So you don't need to push for every or somehow not be going less than some value. Just any neighborhood will suffice. Let us just have a look at the most common example. I've taken just a cubic polynom here to represent some basic cases. So what we do have here is three dots A, B, C, and let us understand. Firstly, point A is a local extrema because it is the greatest value for example, in this neighborhood. Point B is a local extremum because it is the lowest value in this neighborhood. Point A is called local maximum, and point B is called local minimum, and point C is neither a minimum or maximum extrema. Because whatever we are calling the closest neighborhood here, whatever we are going to do as this square frame on the left side from the point C, we are going to get greater values than the point C, and at the right side, we are going to get lower values than the point C. So it's not an extremum at all. I'm going to spend quite a bit on this graph, because you need to see one key difference here. I'm not speaking about global extrema, the greatest values and lowest values of all. The thing here is that there are times which you can mention that local maximum and local minimum, does not necessarily mean that the point that is called maximum is necessarily greater than point that's called minimum. Since all the points are called local, their relation between each other it's not actually defined. Just imagine some kind of this graph, just like extremely long stairs like curve. Well, let us just mark some maximas and minimas here. So this point is actually, let me just take a red color. That's a maximum, that's minimum, and that's a maximum, that's a minimum. That's a trouble. We have maximum and we have minimum, and minimum is actually greater than maximum. That's the case. That's what we are talking about. All these things are local. We cannot derive any conclusions just by its maximum and minimum here. That's actually a sad one, but it's just the case. So what we're going to do here is, we are going to understand how it is connected with derivatives. To understand how it is connected to the derivatives, we are going to just relate to our mean value theorem. As you remember, mean value theorem told us that the change of a function is proportional to the derivative and the change of the argument. So in case for example, our derivative is positive for all the segment or negative for all the segment, function changes monotonically on this segment. If derivative is positive then function monotonically grows, and if derivative is negative then function's monotonically falls. So as a result, we need to understand the connection between extremum and derivative. To doing so we need to remember our mean value theorem, because it stands for the change of function is proportional to the change of argument and its derivative, as you assume that derivative is positive or negative on all the segment. Then you necessarily get positive or negative change of function. In other words, the sign of the derivative is positive or negative, stands for the monotonicity of the function. It seems monotonicity of the function is crucial for us. Because let us assume for example, to the point A to the left side we are going to the right side we are falling, in which we add a function we are considering. So derivative has a point A cannot be either positive or negative, since it's necessarily should equal to zero. This kind of necessity case for extremums. The rule here looks as follows, if function have an extremum at a given point and is differentiable as a given point, then the derivative here equals to zero. Okay, that's understandable. But how come it is connected with convexity as a second derivative here? In order to understand this, we need to move further and understand how this derivative is connected with convexity.