{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "toc": true
   },
   "source": [
    "<h1>Table of Contents<span class=\"tocSkip\"></span></h1>\n",
    "<div class=\"toc\"><ul class=\"toc-item\"><li><span><a href=\"#Two-dimensional-non-convex-problem\" data-toc-modified-id=\"Two-dimensional-non-convex-problem-1\"><span class=\"toc-item-num\">1&nbsp;&nbsp;</span>Two dimensional non-convex problem</a></span><ul class=\"toc-item\"><li><span><a href=\"#Gradient-algorithm\" data-toc-modified-id=\"Gradient-algorithm-1.1\"><span class=\"toc-item-num\">1.1&nbsp;&nbsp;</span>Gradient algorithm</a></span><ul class=\"toc-item\"><li><span><a href=\"#Fixed-step-size-gradient-algorithm\" data-toc-modified-id=\"Fixed-step-size-gradient-algorithm-1.1.1\"><span class=\"toc-item-num\">1.1.1&nbsp;&nbsp;</span>Fixed step size gradient algorithm</a></span></li><li><span><a href=\"#Receeding-step-size\" data-toc-modified-id=\"Receeding-step-size-1.1.2\"><span class=\"toc-item-num\">1.1.2&nbsp;&nbsp;</span>Receeding step size</a></span></li></ul></li><li><span><a href=\"#Conjugate-gradient\" data-toc-modified-id=\"Conjugate-gradient-1.2\"><span class=\"toc-item-num\">1.2&nbsp;&nbsp;</span>Conjugate gradient</a></span></li><li><span><a href=\"#Newton's-method\" data-toc-modified-id=\"Newton's-method-1.3\"><span class=\"toc-item-num\">1.3&nbsp;&nbsp;</span>Newton's method</a></span></li><li><span><a href=\"#Quasi-Newton-Method\" data-toc-modified-id=\"Quasi-Newton-Method-1.4\"><span class=\"toc-item-num\">1.4&nbsp;&nbsp;</span>Quasi-Newton Method</a></span></li><li><span><a href=\"#Plotting-multiple-trajectories\" data-toc-modified-id=\"Plotting-multiple-trajectories-1.5\"><span class=\"toc-item-num\">1.5&nbsp;&nbsp;</span>Plotting multiple trajectories</a></span></li></ul></li><li><span><a href=\"#Quadratic-function-and-conditionning\" data-toc-modified-id=\"Quadratic-function-and-conditionning-2\"><span class=\"toc-item-num\">2&nbsp;&nbsp;</span>Quadratic function and conditionning</a></span></li><li><span><a href=\"#Non-convex-function-in-higher-dimension\" data-toc-modified-id=\"Non-convex-function-in-higher-dimension-3\"><span class=\"toc-item-num\">3&nbsp;&nbsp;</span>Non-convex function in higher dimension</a></span></li></ul></div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Testing multiple optimization method\n",
    "\n",
    "We start by adding some libraries."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# uncomment and run once to install\n",
    " using Pkg\n",
    "#Pkg.add(\"Plots\")\n",
    "#Pkg.add(\"LinearAlgebra\")\n",
    "#Pkg.add(\"Optim\")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "Pkg.add(\"PlotlyJS\")# if you want to try plotlyjs() -- nicer but more unstable"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# run to make the library available\n",
    "using Plots, LinearAlgebra, Optim \n",
    "plotlyjs() #Plots has multiple plotting backend. plotlyjs() has more interactivity in notebook but sometimes buggy\n",
    "#pyplot() # In case of trouble with plotting, comment previous line and uncomment this one"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Two dimensional non-convex problem\n",
    "\n",
    "We define the Rosenbrock (a.k.a banana) function $f(x,y) = (1-x)^2 + 5(y-x^2)^2$. \n",
    "\n",
    "It is a function well known to be difficult to optimize."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Run to define the objective function, its gradiant and hessian function\n",
    "# plot its level set -- might take some time to run\n",
    "f = x -> (1-x[1])^2 + 5*(x[2] - x[1]^2)^2\n",
    "df = x -> [2*(10*x[1]^3-10*x[1]*x[2]+x[1]-1), 10*(x[2]-x[1]^2)]\n",
    "hf = x -> [60x[1]^2-20x[2]+2 -20x[1]; -20x[1] 10]\n",
    "\n",
    "#\n",
    "x0 = Float64[-0.5, 2] #Initial point\n",
    "ε = 1e-6              #tolerance for the stopping test, here the gradient norm\n",
    "IT_MAX = 5000         #max number of iterations, always a good idea to have this as fail-safe\n",
    "\n",
    "xdomain = -2:0.01:2\n",
    "ydomain = -2:0.01:2\n",
    "\n",
    "lev_line = contour(xdomain,ydomain,(x,y)->f([x,y]), levels = 200);\n",
    "lev_line = plot!(lev_line,[1],[1],seriestype = :scatter, color = :red,label=\"solution\");\n",
    "lev_line"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following function plot the trajectory of an algorithm on this example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# The last argument is used to plot on top of an already computed plot. \n",
    "# By default it is the contour level set\n",
    "function plot_trajectory(pts;color=\"blue\",label=`AUTO`,shape=:cross,plt=lev_line)\n",
    "    plt = deepcopy(plt)\n",
    "    plot!(plt,[p[1] for p in pts],[p[2] for p in pts],color=color,label=label,shape=shape)\n",
    "    return plt\n",
    "end\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following function plots graph showing the convergence of the algorithm"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "function plot_convergence(vals,grads,steps)\n",
    "    p1 = plot(vals, title = \"Objective value\")\n",
    "    p2 = plot(log10.(vals .- vals[end]),title = \"log gap value\")\n",
    "    p3 = plot(log10.(norm.(grads)), title = \"log gradient norm\") \n",
    "    p4 = plot(steps, title = \"Step-size\")\n",
    "    plot(p1, p2, p3,p4, layout = (2, 2), legend = false)\n",
    "end"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Gradient algorithm\n",
    "\n",
    "We start with the most classical descent direction algorithm.\n",
    "\n",
    "#### Fixed step size gradient algorithm"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "<div class=\"alert alert-success\"> <b>Question 1)</b> \n",
    "    \n",
    "Numerically check that the algorithm does not converge for large step-size. Plot the corresponding trajectory.\n",
    "\n",
    "Find the best fixed step-size. What do you observe ?\n",
    "</div>\n",
    "\n",
    "\n",
    "NB: I got 845 iterations."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\" role=\"alert\"><b>Question 1)</b>\n",
    "Answer\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "t0 = 0.02             #default step size\n",
    "\n",
    "\n",
    "\n",
    "function gradient_fixed_step(x0,f,df;t0=t0)\n",
    "    x = x0\n",
    "    \n",
    "    pts = [x]\n",
    "    vals = [f(x)]\n",
    "    grads = [df(x)]\n",
    "    steps = [t0]\n",
    "\n",
    "    stop_test = false\n",
    "    it = 0 #iteration number\n",
    "\n",
    "    while !stop_test\n",
    "        # update\n",
    "        g = df(x)\n",
    "        d = -g             # descent direction\n",
    "        t = t0             # step size\n",
    "        x += t*d           # update\n",
    "\n",
    "        # keeping track of results\n",
    "        push!(pts,x)\n",
    "        push!(vals,f(x))\n",
    "        push!(grads,g)\n",
    "        push!(steps,t0)\n",
    "        \n",
    "        # stopping test\n",
    "        it += 1\n",
    "        stop_test = (it > IT_MAX - 1) || (norm(g)<ε)\n",
    "    end\n",
    "    \n",
    "    println(\"gradient algorithm with fixed step t=\",t0, \" ended in \", it, \" iterations and \", it,\" oracle call\")\n",
    "    return pts, vals, grads, steps\n",
    "   \n",
    "end\n",
    "\n",
    "pts_gf, vals, grads, steps = gradient_fixed_step(x0,f,df,t0=0.02);\n",
    "plt = plot_trajectory(pts_gf,label=\"gradient\") # plotting the trajectory"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "    plot_convergence(vals,grads,steps)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Receeding step size\n",
    "\n",
    "We are now going to use a receeding step size.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\"> <b>Question 2)</b> \n",
    "    \n",
    "Complete the receeding_step code.\n",
    "\n",
    "Play with the parameters of the gradient with receeding step to get the smallest number of oracle calls.\n",
    "</div>\n",
    "NB : I got 267 calls."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\" role=\"alert\"><b>Question 2)</b>\n",
    "Answer\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "function receeding_step(f,g,x,d;reducing_factor=.5,slope_coefficient=0.5,it_max = 10, verbose = true, t0=1)\n",
    "    f0 = f(x)\n",
    "    t = t0\n",
    "    it = 1\n",
    "    t_traj = [t]\n",
    "    \n",
    "    ### TO COMPLETE \n",
    "    while ( ... ) && (it <= it_max)\n",
    "        t = ### TO COMPLETE\n",
    "        push!(t_traj,t)\n",
    "        it += 1\n",
    "    end\n",
    "    \n",
    "    if verbose\n",
    "        f(x+t*d) >= f0 && println(\"non reducing step \", \"it = \", it)\n",
    "        it==it_max + 1 && println(\"maximum number of receeding iterations attained\") \n",
    "    end\n",
    "    return t_traj\n",
    "end"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "function gradient_receeding_step(x0,f,df;\n",
    "        t0=t0, reducing_factor=.9,slope_coefficient=0.5,it_max = 10, verbose = false)\n",
    "    x = x0\n",
    "    \n",
    "    pts = [x]\n",
    "    pts_full = [x]\n",
    "    vals = [f(x)]\n",
    "    grads = [df(x)]\n",
    "    steps = []\n",
    "\n",
    "    stop_test = false\n",
    "    it = 0 #iteration number\n",
    "    calls = 0\n",
    "\n",
    "    while !stop_test\n",
    "         \n",
    "        g = df(x)\n",
    "        d = - g       # descent direction\n",
    "        t_traj = receeding_step(f,g,x,d;\n",
    "            reducing_factor=reducing_factor,slope_coefficient=slope_coefficient,\n",
    "            it_max = it_max, verbose = verbose, t0=t0)# step size\n",
    "        calls += length(t_traj)\n",
    "        t = t_traj[end]\n",
    "        for t in t_traj\n",
    "            push!(pts_full, x+t*d)\n",
    "        end\n",
    "        x += t*d          # step\n",
    "\n",
    "        # keeping track of results\n",
    "        push!(pts,x)\n",
    "        push!(vals,f(x))\n",
    "        push!(grads,g)\n",
    "        push!(steps,t)\n",
    "        \n",
    "        \n",
    "        \n",
    "        # stopping test\n",
    "        it += 1\n",
    "        stop_test = (it > IT_MAX - 1) || (norm(g)<ε)\n",
    "    end\n",
    "    \n",
    "    println(\"gradient algorithm with receeding step size ended in \", it, \" iterations and \", calls,\" oracle calls\")\n",
    "\n",
    "    return pts, pts_full, vals, grads, steps\n",
    "\n",
    "end\n",
    "\n",
    "pts_gr, pts_gr_full, vals, grads, steps = gradient_receeding_step(x0,f,df;verbose = false);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt = plot_trajectory(pts_gr_full,label=\"gradient receeding\",shape = :diamond,color = :yellow)\n",
    "plt = plot_trajectory(pts_gr,label=\"gradient receeding\",plt=plt)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plot_convergence(vals,grads,steps)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Conjugate gradient\n",
    "\n",
    "We implement a conjugate gradient method."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\"> <b>Question 3)</b> \n",
    "\n",
    "Implement a conjugate gradient method with receeding line search. You can test Fletcher-Reeves and Polak-Ribière version.\n",
    "What is the smallest oracle calls you can get ?\n",
    "</div>\n",
    "NB : I got 178\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\" role=\"alert\"><b>Question 3)</b>\n",
    "Answer\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "function conjugate_gradient_receeding_step(x0,f,df;t0=0.02, reducing_factor=.9,slope_coefficient=0.5,it_max = 10, verbose = false)\n",
    "    x = x0\n",
    "    \n",
    "    pts = [x]\n",
    "    pts_full=[x]\n",
    "    vals = [f(x)]\n",
    "    grads = [df(x)]\n",
    "    steps = []\n",
    "\n",
    "    stop_test = false\n",
    "    it = 0 #iteration number\n",
    "    calls = 0\n",
    "    \n",
    "    #first iteration\n",
    "    g = df(x)\n",
    "    d = - g       # descent direction\n",
    "    t_traj = receeding_step(f,g,x,d;\n",
    "            reducing_factor=reducing_factor,slope_coefficient=slope_coefficient,\n",
    "            it_max = it_max, verbose = verbose, t0=t0)# step size\n",
    "    calls += length(t_traj)\n",
    "    t = t_traj[end]\n",
    "        for t in t_traj\n",
    "            push!(pts_full, x+t*d)\n",
    "        end\n",
    "    x += t*d          # step\n",
    "\n",
    "    # keeping track of results\n",
    "    push!(pts,x)\n",
    "    push!(vals,f(x))\n",
    "    push!(grads,g)\n",
    "    push!(steps,t)\n",
    "    \n",
    "    \n",
    "    while !stop_test\n",
    "        ## \n",
    "        ## TO COMPLETE\n",
    "        ## d =   # descent direction\n",
    "        t_traj = receeding_step(f,g,x,d;\n",
    "            reducing_factor=reducing_factor,slope_coefficient=slope_coefficient,\n",
    "            it_max = it_max, verbose = verbose, t0=t0)# step size\n",
    "        calls += length(t_traj)\n",
    "        t = t_traj[end]\n",
    "        for t in t_traj\n",
    "            push!(pts_full, x+t*d)\n",
    "        end\n",
    "        x += t*d          # step\n",
    "\n",
    "    # keeping track of results\n",
    "    push!(pts,x)\n",
    "    push!(vals,f(x))\n",
    "    push!(grads,g)\n",
    "    push!(steps,t)\n",
    "        \n",
    "        # stopping test\n",
    "        it += 1\n",
    "        stop_test = (it > IT_MAX - 1) || (norm(g)<ε)\n",
    "    end\n",
    "    \n",
    "    println(\"conjugate gradient algorithm with receeding step size ended in \", it, \" iterations and \", calls,\" oracle calls\")\n",
    "    return pts,pts_full, vals, grads, steps\n",
    "   \n",
    "end\n",
    "\n",
    "pts_cg,pts_full_cg, vals, grads,steps = conjugate_gradient_receeding_step(x0,f,df;t0=0.02, reducing_factor=.9,slope_coefficient=0.5,it_max = 10, verbose = false));\n",
    "plt = plot_trajectory(pts_full_cg,label=\"conjugate gradient\",shape = :diamond,color = :yellow)\n",
    "plt = plot_trajectory(pts_cg,label=\"conjugate gradient\",plt=plt)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plot_convergence(vals,grads,steps)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Newton's method"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\"> <b>Question 4)</b> \n",
    "\n",
    "Implement the Newton method with reducing step. \n",
    "Test-it and plot the trajectory. What do you observe ? \n",
    "\n",
    "Identify the problem and suggest an easy fix.\n",
    "</div>\n",
    "\n",
    "\n",
    "NB : I get 20 oracles calls.\n",
    "\n",
    "\n",
    "NBB : Funnily enough, on this particular example, doing things poorly yield the solution quicker, in 7 calls. Do not generalize this result..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\" role=\"alert\"><b>Question 4)</b>\n",
    "Answer\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x0 = Float64[-0.5, 2]\n",
    "\n",
    "function Newton(x0,f,df, hf;\n",
    "        reducing_factor=.9,slope_coefficient=0.5,it_max = 10, verbose = false)\n",
    "    x = x0\n",
    "    t0 = 1.0\n",
    "    \n",
    "    pts = [x]\n",
    "    pts_full = [x]\n",
    "    vals = [f(x)]\n",
    "    grads = [df(x)]\n",
    "    steps=[]\n",
    "    \n",
    "    \n",
    "    stop_test = false\n",
    "    it = 0 #iteration number\n",
    "    calls = 0\n",
    "    \n",
    "    while !stop_test\n",
    "        ### TO COMPLETE\n",
    "        g = df(x)         # gradient\n",
    "        H = hf(x)         # hessian\n",
    "        d =       # descent direction \n",
    "        \n",
    "       \n",
    "        ### END COMPLETION\n",
    "        t_traj = receeding_step(f,g,x,d;\n",
    "            reducing_factor=reducing_factor,slope_coefficient=slope_coefficient,\n",
    "            it_max = it_max, verbose = verbose, t0=t0)# step size\n",
    "        calls += length(t_traj)\n",
    "        t = t_traj[end]\n",
    "        for t in t_traj\n",
    "            push!(pts_full, x+t*d)\n",
    "        end\n",
    "        x += t*d          # step\n",
    "\n",
    "        # keeping track of results\n",
    "        push!(pts,x)\n",
    "        push!(vals,f(x))\n",
    "        push!(grads,g)\n",
    "        push!(steps,t)\n",
    "        \n",
    "        # stopping test\n",
    "        it += 1\n",
    "        stop_test = (it > IT_MAX - 1) || (norm(g)<ε)\n",
    "    end\n",
    "    \n",
    "    println(\"Newton algorithm ended in \", it, \" iterations and \", calls, \" calls\")\n",
    "    return pts, pts_full, vals, grads, steps\n",
    "   \n",
    "end\n",
    "\n",
    "pts_n,pts_n_full, vals,grads,steps = Newton(x0,f,df, hf; reducing_factor=.9,slope_coefficient=0.5,it_max = 10, verbose = false)\n",
    "\n",
    "\n",
    "plt = plot_trajectory(pts_n_full,label=\"Newton\",shape = :diamond,color = :yellow)\n",
    "plt = plot_trajectory(pts_n,label=\"Newton\",plt=plt)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plot_convergence(vals,grads,steps)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Quasi-Newton Method \n",
    "\n",
    "<div class=\"alert alert-success\"> <b>Question 5)</b> \n",
    "Implement now the BFGS method, and adjust to get smallest oracle calls.\n",
    "</div>\n",
    "\n",
    "NB : I got 24 oracle calls"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\" role=\"alert\"><b>Question 5)</b>\n",
    "Answer\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "function BFGS(x0,f,df;\n",
    "        reducing_factor=.7,slope_coefficient=0.5,it_max = 10, verbose = false)\n",
    "    x = x0\n",
    "    \n",
    "    pts = [x]\n",
    "    pts_full = [x]\n",
    "    vals = [f(x)]\n",
    "    grads = [df(x)]\n",
    "    steps = []\n",
    "    \n",
    "    stop_test = false\n",
    "    it = 0 #iteration number\n",
    "    calls = 0\n",
    "    \n",
    "    g_old = df(x)\n",
    "    x_old = x0\n",
    "    W = I\n",
    "    \n",
    "    while !stop_test\n",
    "        \n",
    "        g = df(x)         # gradient\n",
    "        \n",
    "        ## Computing W\n",
    "        \n",
    "        \n",
    "        d = - W*g       # descent direction \n",
    "        \n",
    "        \n",
    "        if g'*d >= 0 \n",
    "            println(\"non-decreasing direction at it \", it,\", using gradient step instead\")\n",
    "            d = - g\n",
    "        end\n",
    "        t_traj = receeding_step(f,g,x,d;\n",
    "            reducing_factor=reducing_factor,slope_coefficient=slope_coefficient,\n",
    "            it_max = it_max, verbose = verbose, t0=1.0)# step size\n",
    "        calls += length(t_traj)\n",
    "        t = t_traj[end]\n",
    "        for t in t_traj\n",
    "            push!(pts_full, x+t*d)\n",
    "        end\n",
    "        x += t*d          # step\n",
    "\n",
    "        # keeping track of results\n",
    "        push!(pts,x)\n",
    "        push!(vals,f(x))\n",
    "        push!(grads,g)\n",
    "        push!(steps,t)\n",
    "        \n",
    "        # stopping test\n",
    "        it += 1\n",
    "        stop_test = (it > IT_MAX - 1) || (norm(g)<ε)\n",
    "    end\n",
    "    \n",
    "    println(\"BFGS algorithm ended in \", it, \" iterations and \", calls, \" calls\")\n",
    "    return pts,pts_full, vals, grads, steps\n",
    "   \n",
    "end\n",
    "\n",
    "pts_bfgs,pts_bfgs_full, vals,grads, steps = BFGS(x0,f,df; \n",
    "    reducing_factor=.7,slope_coefficient=0.5,it_max = 10, verbose = false)\n",
    "    \n",
    "plt = plot_trajectory(pts_bfgs_full,label=\"BFGS\",shape = :diamond,color = :yellow)\n",
    "plt = plot_trajectory(pts_bfgs,label=\"BFGS\",plt=plt)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plot_convergence(vals,grads,steps)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Plotting multiple trajectories\n",
    "\n",
    "To compare the algorithm we can plot multiple trajectories on the same graph."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt=plot_trajectory(pts_gf,label=\"gradient fixed\",color=\"yellow\")\n",
    "plt=plot_trajectory(pts_gr,label=\"gradient adaptative\",color=\"orange\",plt=plt)\n",
    "plt=plot_trajectory(pts_cg,label=\"conjugate gradient\",color=\"blue\",plt=plt)\n",
    "plt=plot_trajectory(pts_n,label=\"Newton\",color=\"red\",plt=plt)\n",
    "plt=plot_trajectory(pts_bfgs,label=\"BFGS\",color=\"green\",plt=plt)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Quadratic function and conditionning\n",
    "\n",
    "We are now going to test the effect of the condition number on the algorithm\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\"> <b>Question 6)</b> \n",
    "\n",
    "Test the algorithms on quadratic function. \n",
    "Comment on the dependence in the dimension and conditionning.\n",
    "</div>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\" role=\"alert\"><b>Question 6)</b>\n",
    "Answer\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "n = 500        # dimension\n",
    "κ = 100        # condition number\n",
    "\n",
    "# generating a random symetric matrix of condition κ\n",
    "X = rand(n,n) \n",
    "Q = X'X\n",
    "U,S,V = svd(Q) \n",
    "γ = (κ-1)/(S[1]-S[end] ) # scaling factor\n",
    "S = [(S[i]-S[end])*γ + 1 for i in 1:n]\n",
    "Q = U*diagm(S)*V'\n",
    "\n",
    "# defining the oracle\n",
    "f_Q(x) = 0.5*x'*Q*x\n",
    "df_Q(x) = Q*x\n",
    "hf_Q(x) = Q\n",
    "\n",
    "# Initial point\n",
    "x0 = ones(n); # the optimal solution is obviously 0, so the initial point should be something else\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "t0 = 0.5\n",
    "gradient_fixed_step(x0,f_Q,df_Q;t0=t0);\n",
    "gradient_receeding_step(x0,f_Q,df_Q;t0=t0);\n",
    "conjugate_gradient_receeding_step(x0,f_Q,df_Q;t0=t0);\n",
    "Newton(x0,f_Q,df_Q,hf_Q);\n",
    "BFGS(x0,f_Q,df_Q);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\"> <b>Question 7) [Optional]</b> \n",
    "In class we said that the conjugate gradient algorithm should converge in at most $n$ iterations. Do we observe this here ? Why ? \n",
    "\n",
    "Write a conjugate gradient method that would satisfy this convergence.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\" role=\"alert\"><b>Question 7)</b>\n",
    "Answer\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\"> <b>Question 8) [Optional]</b> \n",
    "\n",
    "Write a conjugate gradient method for solving a random linear system $Ax=b$. \n",
    "\n",
    "Test it with random matrices.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\" role=\"alert\"><b>Question 8)</b>\n",
    "Answer\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Non-convex function in higher dimension\n",
    "\n",
    "We will now test our algorithms on the Rosenbrock function in higher dimension.\n",
    "\n",
    "$$ f_\\kappa(x) = \\sum_{i=1}^{n-1} \\kappa (x_{i+1} - x_i^2)^2 + (x_i -1)^2 $$\n",
    "\n",
    "for $\\kappa \\geq 0$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\"> <b>Question 9) [Optional]</b> \n",
    "\n",
    "Find, for any $\\kappa \\geq 0$ the global minimum of $f_\\kappa$. Is it unique ?\n",
    "\n",
    "Compute the gradient and the hessian of $f_{\\kappa}$.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\" role=\"alert\"><b>Question 9)</b>\n",
    "Answer\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\"> <b>Question 10) [Optional]</b> \n",
    "\n",
    "Test the algorithms for various dimensions, starting points and $\\kappa$.\n",
    "Comment your numerical observations.\n",
    "\n",
    "NB : The standard Rosenbrock function use $\\kappa = 100$ and is known to be difficult to solve. \n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\" role=\"alert\"><b>Question 10)</b>\n",
    "Answer\n",
    "</div>"
   ]
  }
 ],
 "metadata": {
  "@webio": {
   "lastCommId": null,
   "lastKernelId": null
  },
  "kernelspec": {
   "display_name": "Julia 1.10.3",
   "language": "julia",
   "name": "julia-1.10"
  },
  "language_info": {
   "file_extension": ".jl",
   "mimetype": "application/julia",
   "name": "julia",
   "version": "1.10.3"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": true,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": true,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}