"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Testing multiple optimization method\n",
"\n",
"We start by adding some libraries."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# uncomment and run once to install\n",
" using Pkg\n",
"#Pkg.add(\"Plots\")\n",
"#Pkg.add(\"LinearAlgebra\")\n",
"#Pkg.add(\"Optim\")\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"Pkg.add(\"PlotlyJS\")\n",
"Pkg.add(\"ORCA\")# if you want to try plotlyjs()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# run to make the library available\n",
"using Plots, LinearAlgebra, Optim \n",
"plotlyjs() #Plots has multiple plotting backend. plotlyjs() has more interactivity in notebook but sometimes buggy\n",
"#pyplot()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Two dimensional non-convex problem\n",
"\n",
"We define the Rosenbrock (a.k.a banana) function $f(x,y) = (1-x)^2 + 5(y-x^2)^2$. \n",
"\n",
"It is a function well known to be difficult to optimize."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Run to define the objective function, its gradient hessian, and plot its level set\n",
"# might take some time to run\n",
"f = x -> (1-x[1])^2 + 5*(x[2] - x[1]^2)^2\n",
"df = x -> [2*(10*x[1]^3-10*x[1]*x[2]+x[1]-1), 10*(x[2]-x[1]^2)]\n",
"hf = x -> [60x[1]^2-20x[2]+2 -20x[1]; -20x[1] 10]\n",
"\n",
"#\n",
"x0 = Float64[-0.5, 2] #Initial point\n",
"ε = 1e-6 #tolerance for the stopping test, here the gradient\n",
"IT_MAX = 5000 #max number of iterations, always a good idea to have this as fail-safe\n",
"\n",
"xdomain = -2:0.01:2\n",
"ydomain = -2:0.01:2\n",
"\n",
"lev_line = contour(xdomain,ydomain,(x,y)->f([x,y]), levels = 200);\n",
"lev_line = plot!(lev_line,[1],[1],seriestype = :scatter, color = :red,label=\"solution\");\n",
"lev_line"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following function plot the trajectory of an algorithm on this example."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# The last argument is used to plot on top of an already computed plot. \n",
"# By default it is the contour level set\n",
"function plot_trajectory(pts;color=\"blue\",label=`AUTO`,shape=:cross,plt=lev_line)\n",
" plt = deepcopy(plt)\n",
" plot!(plt,[p[1] for p in pts],[p[2] for p in pts],color=color,label=label,shape=shape)\n",
" return plt\n",
"end\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following function plots graph showing the convergence of the algorithm"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"function plot_convergence(vals,grads,steps)\n",
" p1 = plot(vals, title = \"Objective value\")\n",
" p2 = plot(log10.(vals .- vals[end]),title = \"log gap value\")\n",
" p3 = plot(log10.(norm.(grads)), title = \"log gradient norm\") \n",
" p4 = plot(steps, title = \"Step-size\")\n",
" plot(p1, p2, p3,p4, layout = (2, 2), legend = false)\n",
"end"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Gradient algorithm\n",
"\n",
"We start with the most classical descent direction algorithm.\n",
"\n",
"#### Fixed step size gradient algorithm"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"
Question 1) \n",
" \n",
"Numerically check that the algorithm does not converge for large step-size. Plot the corresponding trajectory.\n",
"\n",
"Find the best fixed step-size. What do you observe ?\n",
"
Question 2) \n",
" \n",
"Complete the receeding_step code.\n",
"\n",
"Play with the parameters of the gradient with receeding step to get the smallest number of oracle calls.\n",
"
Question 3) \n",
"\n",
"Implement a conjugate gradient method with receeding line search. You can test Fletcher-Reeves and Polak-Ribière version.\n",
"What is the smallest oracle calls you can get ?\n",
"
Question 4) \n",
"\n",
"Implement the Newton method with reducing step. \n",
"Test-it and plot the trajectory. What do you observe ? \n",
"\n",
"Identify the problem and suggest an easy fix.\n",
"
\n",
"\n",
"\n",
"NB : I get 20 oracles calls.\n",
"\n",
"\n",
"NBB : Funnily enough, on this particular example, doing things poorly yield the solution quicker, in 7 calls. Do not generalize this result..."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"n = 500 # dimension\n",
"κ = 100 # condition number\n",
"\n",
"# generating a random symetric matrix of condition κ\n",
"X = rand(n,n) \n",
"Q = X'X\n",
"U,S,V = svd(Q) \n",
"γ = (κ-1)/(S[1]-S[end] ) # scaling factor\n",
"S = [(S[i]-S[end])*γ + 1 for i in 1:n]\n",
"Q = U*diagm(S)*V'\n",
"\n",
"# defining the oracle\n",
"f_Q(x) = 0.5*x'*Q*x\n",
"df_Q(x) = Q*x\n",
"hf_Q(x) = Q\n",
"\n",
"# Initial point\n",
"x0 = ones(n); # the optimal solution is obviously 0, so the initial point should be something else\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"t0 = 0.5\n",
"gradient_fixed_step(x0,f_Q,df_Q;t0=t0);\n",
"gradient_receeding_step(x0,f_Q,df_Q;t0=t0);\n",
"conjugate_gradient_receeding_step(x0,f_Q,df_Q;t0=t0);\n",
"Newton(x0,f_Q,df_Q,hf_Q);\n",
"BFGS(x0,f_Q,df_Q);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
Question 7) [Optional] \n",
"In class we said that the conjugate gradient algorithm should converge in at most $n$ iterations. Do we observe this here ? Why ? \n",
"\n",
"Write a conjugate gradient method that would satisfy this convergence.\n",
"
Question 8) [Optional] \n",
"\n",
"Write a conjugate gradient method for solving a random linear system $Ax=b$. \n",
"\n",
"Test it with random matrices.\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Non-convex function in higher dimension\n",
"\n",
"We will now test our algorithms on the Rosenbrock function in higher dimension.\n",
"\n",
"$$ f_\\kappa(x) = \\sum_{i=1}^{n-1} \\kappa (x_{i+1} - x_i^2)^2 + (x_i -1)^2 $$\n",
"\n",
"for $\\kappa \\geq 0$."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
Question 9) [Optional] \n",
"\n",
"Find, for any $\\kappa \\geq 0$ the global minimum of $f_\\kappa$. Is it unique ?\n",
"\n",
"Compute the gradient and the hessian of $f_{\\kappa}$.\n",
"
Question 10) [Optional] \n",
"\n",
"Test the algorithms for various dimensions, starting points and $\\kappa$.\n",
"Comment your numerical observations.\n",
"\n",
"NB : The standard Rosenbrock function use $\\kappa = 100$ and is known to be difficult to solve. \n",
"