{ "cells": [ { "cell_type": "markdown", "metadata": { "toc": true }, "source": [ "

Table of Contents

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Testing multiple optimization method\n", "\n", "We start by adding some libraries." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "# uncomment and run once to install\n", " using Pkg\n", "#Pkg.add(\"Plots\")\n", "#Pkg.add(\"LinearAlgebra\")\n", "#Pkg.add(\"Optim\")\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Pkg.add(\"PlotlyJS\")# if you want to try plotlyjs() -- nicer but more unstable" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# run to make the library available\n", "using Plots, LinearAlgebra, Optim \n", "plotlyjs() #Plots has multiple plotting backend. plotlyjs() has more interactivity in notebook but sometimes buggy\n", "#pyplot() # In case of trouble with plotting, comment previous line and uncomment this one" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Two dimensional non-convex problem\n", "\n", "We define the Rosenbrock (a.k.a banana) function $f(x,y) = (1-x)^2 + 5(y-x^2)^2$. \n", "\n", "It is a function well known to be difficult to optimize." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Run to define the objective function, its gradiant and hessian function\n", "# plot its level set -- might take some time to run\n", "f = x -> (1-x[1])^2 + 5*(x[2] - x[1]^2)^2\n", "df = x -> [2*(10*x[1]^3-10*x[1]*x[2]+x[1]-1), 10*(x[2]-x[1]^2)]\n", "hf = x -> [60x[1]^2-20x[2]+2 -20x[1]; -20x[1] 10]\n", "\n", "#\n", "x0 = Float64[-0.5, 2] #Initial point\n", "ε = 1e-6 #tolerance for the stopping test, here the gradient norm\n", "IT_MAX = 5000 #max number of iterations, always a good idea to have this as fail-safe\n", "\n", "xdomain = -2:0.01:2\n", "ydomain = -2:0.01:2\n", "\n", "lev_line = contour(xdomain,ydomain,(x,y)->f([x,y]), levels = 200);\n", "lev_line = plot!(lev_line,[1],[1],seriestype = :scatter, color = :red,label=\"solution\");\n", "lev_line" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following function plot the trajectory of an algorithm on this example." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# The last argument is used to plot on top of an already computed plot. \n", "# By default it is the contour level set\n", "function plot_trajectory(pts;color=\"blue\",label=`AUTO`,shape=:cross,plt=lev_line)\n", " plt = deepcopy(plt)\n", " plot!(plt,[p[1] for p in pts],[p[2] for p in pts],color=color,label=label,shape=shape)\n", " return plt\n", "end\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following function plots graph showing the convergence of the algorithm" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "function plot_convergence(vals,grads,steps)\n", " p1 = plot(vals, title = \"Objective value\")\n", " p2 = plot(log10.(vals .- vals[end]),title = \"log gap value\")\n", " p3 = plot(log10.(norm.(grads)), title = \"log gradient norm\") \n", " p4 = plot(steps, title = \"Step-size\")\n", " plot(p1, p2, p3,p4, layout = (2, 2), legend = false)\n", "end" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Gradient algorithm\n", "\n", "We start with the most classical descent direction algorithm.\n", "\n", "#### Fixed step size gradient algorithm" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "
Question 1) \n", " \n", "Numerically check that the algorithm does not converge for large step-size. Plot the corresponding trajectory.\n", "\n", "Find the best fixed step-size. What do you observe ?\n", "
\n", "\n", "\n", "NB: I got 845 iterations." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Question 1)\n", "Answer\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "t0 = 0.02 #default step size\n", "\n", "\n", "\n", "function gradient_fixed_step(x0,f,df;t0=t0)\n", " x = x0\n", " \n", " pts = [x]\n", " vals = [f(x)]\n", " grads = [df(x)]\n", " steps = [t0]\n", "\n", " stop_test = false\n", " it = 0 #iteration number\n", "\n", " while !stop_test\n", " # update\n", " g = df(x)\n", " d = -g # descent direction\n", " t = t0 # step size\n", " x += t*d # update\n", "\n", " # keeping track of results\n", " push!(pts,x)\n", " push!(vals,f(x))\n", " push!(grads,g)\n", " push!(steps,t0)\n", " \n", " # stopping test\n", " it += 1\n", " stop_test = (it > IT_MAX - 1) || (norm(g)<ε)\n", " end\n", " \n", " println(\"gradient algorithm with fixed step t=\",t0, \" ended in \", it, \" iterations and \", it,\" oracle call\")\n", " return pts, vals, grads, steps\n", " \n", "end\n", "\n", "pts_gf, vals, grads, steps = gradient_fixed_step(x0,f,df,t0=0.02);\n", "plt = plot_trajectory(pts_gf,label=\"gradient\") # plotting the trajectory" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ " plot_convergence(vals,grads,steps)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Receeding step size\n", "\n", "We are now going to use a receeding step size.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Question 2) \n", " \n", "Complete the receeding_step code.\n", "\n", "Play with the parameters of the gradient with receeding step to get the smallest number of oracle calls.\n", "
\n", "NB : I got 267 calls." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Question 2)\n", "Answer\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "function receeding_step(f,g,x,d;reducing_factor=.5,slope_coefficient=0.5,it_max = 10, verbose = true, t0=1)\n", " f0 = f(x)\n", " t = t0\n", " it = 1\n", " t_traj = [t]\n", " \n", " ### TO COMPLETE \n", " while ( ... ) && (it <= it_max)\n", " t = ### TO COMPLETE\n", " push!(t_traj,t)\n", " it += 1\n", " end\n", " \n", " if verbose\n", " f(x+t*d) >= f0 && println(\"non reducing step \", \"it = \", it)\n", " it==it_max + 1 && println(\"maximum number of receeding iterations attained\") \n", " end\n", " return t_traj\n", "end" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "function gradient_receeding_step(x0,f,df;\n", " t0=t0, reducing_factor=.9,slope_coefficient=0.5,it_max = 10, verbose = false)\n", " x = x0\n", " \n", " pts = [x]\n", " pts_full = [x]\n", " vals = [f(x)]\n", " grads = [df(x)]\n", " steps = []\n", "\n", " stop_test = false\n", " it = 0 #iteration number\n", " calls = 0\n", "\n", " while !stop_test\n", " \n", " g = df(x)\n", " d = - g # descent direction\n", " t_traj = receeding_step(f,g,x,d;\n", " reducing_factor=reducing_factor,slope_coefficient=slope_coefficient,\n", " it_max = it_max, verbose = verbose, t0=t0)# step size\n", " calls += length(t_traj)\n", " t = t_traj[end]\n", " for t in t_traj\n", " push!(pts_full, x+t*d)\n", " end\n", " x += t*d # step\n", "\n", " # keeping track of results\n", " push!(pts,x)\n", " push!(vals,f(x))\n", " push!(grads,g)\n", " push!(steps,t)\n", " \n", " \n", " \n", " # stopping test\n", " it += 1\n", " stop_test = (it > IT_MAX - 1) || (norm(g)<ε)\n", " end\n", " \n", " println(\"gradient algorithm with receeding step size ended in \", it, \" iterations and \", calls,\" oracle calls\")\n", "\n", " return pts, pts_full, vals, grads, steps\n", "\n", "end\n", "\n", "pts_gr, pts_gr_full, vals, grads, steps = gradient_receeding_step(x0,f,df;verbose = false);" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt = plot_trajectory(pts_gr_full,label=\"gradient receeding\",shape = :diamond,color = :yellow)\n", "plt = plot_trajectory(pts_gr,label=\"gradient receeding\",plt=plt)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot_convergence(vals,grads,steps)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Conjugate gradient\n", "\n", "We implement a conjugate gradient method." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Question 3) \n", "\n", "Implement a conjugate gradient method with receeding line search. You can test Fletcher-Reeves and Polak-Ribière version.\n", "What is the smallest oracle calls you can get ?\n", "
\n", "NB : I got 178\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Question 3)\n", "Answer\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "function conjugate_gradient_receeding_step(x0,f,df;t0=0.02, reducing_factor=.9,slope_coefficient=0.5,it_max = 10, verbose = false)\n", " x = x0\n", " \n", " pts = [x]\n", " pts_full=[x]\n", " vals = [f(x)]\n", " grads = [df(x)]\n", " steps = []\n", "\n", " stop_test = false\n", " it = 0 #iteration number\n", " calls = 0\n", " \n", " #first iteration\n", " g = df(x)\n", " d = - g # descent direction\n", " t_traj = receeding_step(f,g,x,d;\n", " reducing_factor=reducing_factor,slope_coefficient=slope_coefficient,\n", " it_max = it_max, verbose = verbose, t0=t0)# step size\n", " calls += length(t_traj)\n", " t = t_traj[end]\n", " for t in t_traj\n", " push!(pts_full, x+t*d)\n", " end\n", " x += t*d # step\n", "\n", " # keeping track of results\n", " push!(pts,x)\n", " push!(vals,f(x))\n", " push!(grads,g)\n", " push!(steps,t)\n", " \n", " \n", " while !stop_test\n", " ## \n", " ## TO COMPLETE\n", " ## d = # descent direction\n", " t_traj = receeding_step(f,g,x,d;\n", " reducing_factor=reducing_factor,slope_coefficient=slope_coefficient,\n", " it_max = it_max, verbose = verbose, t0=t0)# step size\n", " calls += length(t_traj)\n", " t = t_traj[end]\n", " for t in t_traj\n", " push!(pts_full, x+t*d)\n", " end\n", " x += t*d # step\n", "\n", " # keeping track of results\n", " push!(pts,x)\n", " push!(vals,f(x))\n", " push!(grads,g)\n", " push!(steps,t)\n", " \n", " # stopping test\n", " it += 1\n", " stop_test = (it > IT_MAX - 1) || (norm(g)<ε)\n", " end\n", " \n", " println(\"conjugate gradient algorithm with receeding step size ended in \", it, \" iterations and \", calls,\" oracle calls\")\n", " return pts,pts_full, vals, grads, steps\n", " \n", "end\n", "\n", "pts_cg,pts_full_cg, vals, grads,steps = conjugate_gradient_receeding_step(x0,f,df;t0=0.02, reducing_factor=.9,slope_coefficient=0.5,it_max = 10, verbose = false));\n", "plt = plot_trajectory(pts_full_cg,label=\"conjugate gradient\",shape = :diamond,color = :yellow)\n", "plt = plot_trajectory(pts_cg,label=\"conjugate gradient\",plt=plt)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot_convergence(vals,grads,steps)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Newton's method" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Question 4) \n", "\n", "Implement the Newton method with reducing step. \n", "Test-it and plot the trajectory. What do you observe ? \n", "\n", "Identify the problem and suggest an easy fix.\n", "
\n", "\n", "\n", "NB : I get 20 oracles calls.\n", "\n", "\n", "NBB : Funnily enough, on this particular example, doing things poorly yield the solution quicker, in 7 calls. Do not generalize this result..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Question 4)\n", "Answer\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x0 = Float64[-0.5, 2]\n", "\n", "function Newton(x0,f,df, hf;\n", " reducing_factor=.9,slope_coefficient=0.5,it_max = 10, verbose = false)\n", " x = x0\n", " t0 = 1.0\n", " \n", " pts = [x]\n", " pts_full = [x]\n", " vals = [f(x)]\n", " grads = [df(x)]\n", " steps=[]\n", " \n", " \n", " stop_test = false\n", " it = 0 #iteration number\n", " calls = 0\n", " \n", " while !stop_test\n", " ### TO COMPLETE\n", " g = df(x) # gradient\n", " H = hf(x) # hessian\n", " d = # descent direction \n", " \n", " \n", " ### END COMPLETION\n", " t_traj = receeding_step(f,g,x,d;\n", " reducing_factor=reducing_factor,slope_coefficient=slope_coefficient,\n", " it_max = it_max, verbose = verbose, t0=t0)# step size\n", " calls += length(t_traj)\n", " t = t_traj[end]\n", " for t in t_traj\n", " push!(pts_full, x+t*d)\n", " end\n", " x += t*d # step\n", "\n", " # keeping track of results\n", " push!(pts,x)\n", " push!(vals,f(x))\n", " push!(grads,g)\n", " push!(steps,t)\n", " \n", " # stopping test\n", " it += 1\n", " stop_test = (it > IT_MAX - 1) || (norm(g)<ε)\n", " end\n", " \n", " println(\"Newton algorithm ended in \", it, \" iterations and \", calls, \" calls\")\n", " return pts, pts_full, vals, grads, steps\n", " \n", "end\n", "\n", "pts_n,pts_n_full, vals,grads,steps = Newton(x0,f,df, hf; reducing_factor=.9,slope_coefficient=0.5,it_max = 10, verbose = false)\n", "\n", "\n", "plt = plot_trajectory(pts_n_full,label=\"Newton\",shape = :diamond,color = :yellow)\n", "plt = plot_trajectory(pts_n,label=\"Newton\",plt=plt)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot_convergence(vals,grads,steps)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Quasi-Newton Method \n", "\n", "
Question 5) \n", "Implement now the BFGS method, and adjust to get smallest oracle calls.\n", "
\n", "\n", "NB : I got 24 oracle calls" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Question 5)\n", "Answer\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "function BFGS(x0,f,df;\n", " reducing_factor=.7,slope_coefficient=0.5,it_max = 10, verbose = false)\n", " x = x0\n", " \n", " pts = [x]\n", " pts_full = [x]\n", " vals = [f(x)]\n", " grads = [df(x)]\n", " steps = []\n", " \n", " stop_test = false\n", " it = 0 #iteration number\n", " calls = 0\n", " \n", " g_old = df(x)\n", " x_old = x0\n", " W = I\n", " \n", " while !stop_test\n", " \n", " g = df(x) # gradient\n", " \n", " ## Computing W\n", " \n", " \n", " d = - W*g # descent direction \n", " \n", " \n", " if g'*d >= 0 \n", " println(\"non-decreasing direction at it \", it,\", using gradient step instead\")\n", " d = - g\n", " end\n", " t_traj = receeding_step(f,g,x,d;\n", " reducing_factor=reducing_factor,slope_coefficient=slope_coefficient,\n", " it_max = it_max, verbose = verbose, t0=1.0)# step size\n", " calls += length(t_traj)\n", " t = t_traj[end]\n", " for t in t_traj\n", " push!(pts_full, x+t*d)\n", " end\n", " x += t*d # step\n", "\n", " # keeping track of results\n", " push!(pts,x)\n", " push!(vals,f(x))\n", " push!(grads,g)\n", " push!(steps,t)\n", " \n", " # stopping test\n", " it += 1\n", " stop_test = (it > IT_MAX - 1) || (norm(g)<ε)\n", " end\n", " \n", " println(\"BFGS algorithm ended in \", it, \" iterations and \", calls, \" calls\")\n", " return pts,pts_full, vals, grads, steps\n", " \n", "end\n", "\n", "pts_bfgs,pts_bfgs_full, vals,grads, steps = BFGS(x0,f,df; \n", " reducing_factor=.7,slope_coefficient=0.5,it_max = 10, verbose = false)\n", " \n", "plt = plot_trajectory(pts_bfgs_full,label=\"BFGS\",shape = :diamond,color = :yellow)\n", "plt = plot_trajectory(pts_bfgs,label=\"BFGS\",plt=plt)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot_convergence(vals,grads,steps)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plotting multiple trajectories\n", "\n", "To compare the algorithm we can plot multiple trajectories on the same graph." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt=plot_trajectory(pts_gf,label=\"gradient fixed\",color=\"yellow\")\n", "plt=plot_trajectory(pts_gr,label=\"gradient adaptative\",color=\"orange\",plt=plt)\n", "plt=plot_trajectory(pts_cg,label=\"conjugate gradient\",color=\"blue\",plt=plt)\n", "plt=plot_trajectory(pts_n,label=\"Newton\",color=\"red\",plt=plt)\n", "plt=plot_trajectory(pts_bfgs,label=\"BFGS\",color=\"green\",plt=plt)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Quadratic function and conditionning\n", "\n", "We are now going to test the effect of the condition number on the algorithm\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Question 6) \n", "\n", "Test the algorithms on quadratic function. \n", "Comment on the dependence in the dimension and conditionning.\n", "
\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Question 6)\n", "Answer\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "n = 500 # dimension\n", "κ = 100 # condition number\n", "\n", "# generating a random symetric matrix of condition κ\n", "X = rand(n,n) \n", "Q = X'X\n", "U,S,V = svd(Q) \n", "γ = (κ-1)/(S[1]-S[end] ) # scaling factor\n", "S = [(S[i]-S[end])*γ + 1 for i in 1:n]\n", "Q = U*diagm(S)*V'\n", "\n", "# defining the oracle\n", "f_Q(x) = 0.5*x'*Q*x\n", "df_Q(x) = Q*x\n", "hf_Q(x) = Q\n", "\n", "# Initial point\n", "x0 = ones(n); # the optimal solution is obviously 0, so the initial point should be something else\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "t0 = 0.5\n", "gradient_fixed_step(x0,f_Q,df_Q;t0=t0);\n", "gradient_receeding_step(x0,f_Q,df_Q;t0=t0);\n", "conjugate_gradient_receeding_step(x0,f_Q,df_Q;t0=t0);\n", "Newton(x0,f_Q,df_Q,hf_Q);\n", "BFGS(x0,f_Q,df_Q);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Question 7) [Optional] \n", "In class we said that the conjugate gradient algorithm should converge in at most $n$ iterations. Do we observe this here ? Why ? \n", "\n", "Write a conjugate gradient method that would satisfy this convergence.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Question 7)\n", "Answer\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Question 8) [Optional] \n", "\n", "Write a conjugate gradient method for solving a random linear system $Ax=b$. \n", "\n", "Test it with random matrices.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Question 8)\n", "Answer\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Non-convex function in higher dimension\n", "\n", "We will now test our algorithms on the Rosenbrock function in higher dimension.\n", "\n", "$$ f_\\kappa(x) = \\sum_{i=1}^{n-1} \\kappa (x_{i+1} - x_i^2)^2 + (x_i -1)^2 $$\n", "\n", "for $\\kappa \\geq 0$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Question 9) [Optional] \n", "\n", "Find, for any $\\kappa \\geq 0$ the global minimum of $f_\\kappa$. Is it unique ?\n", "\n", "Compute the gradient and the hessian of $f_{\\kappa}$.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Question 9)\n", "Answer\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Question 10) [Optional] \n", "\n", "Test the algorithms for various dimensions, starting points and $\\kappa$.\n", "Comment your numerical observations.\n", "\n", "NB : The standard Rosenbrock function use $\\kappa = 100$ and is known to be difficult to solve. \n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Question 10)\n", "Answer\n", "
" ] } ], "metadata": { "@webio": { "lastCommId": null, "lastKernelId": null }, "kernelspec": { "display_name": "Julia 1.10.3", "language": "julia", "name": "julia-1.10" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "1.10.3" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": true, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": true, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }