Trajectory optimization is cool. The idea is to take a dynamical problem as a big ole optimization problem, finding the best actions to take to achieve your goals or maximize a reward.

There are a couple of flavors of trajectory optimization (shooting methods, collocation methods) http://www.matthewpeterkelly.com/tutorials/trajectoryOptimization/

PyTorch gives a pretty low overhead extension to Numpy that also gives autodifferentiation. It is mainly intended as a neural network library, for which it has a number of facilities.

Gradient Descent is not the preferred method for these problems (According to Boyd’s Convex optimization course). Gradient Descent has shit convergence compared to newton iteration, but is very flexible and easy to implement.

In addition, using a normal ODE solver from Scipy would be much more stable, but it would require cleverness to have the same code work for both scipy and the torch parts. So screw it.

One nicety of this approach is that we don’t even have to have our derivatives solved for. They could be all tied up in a

I thought that maybe I could just weight the dynamics cost enough to have it require the dynamics be satisfied, but that did not seem to work. Maybe with more fiddling? On further review my code had massive bugs in it. I’m not sure that the dynamics cost version wouldn’t work, but the Lagrange multiplier method seems to work well and makes sense too.

In this formulation, we can also train some kind of parametrized controller function by sampling some random initial starting conditions (or even dynamical parameters like mass and length etc, or noise forces). This is quite nice.

Additional bits that may be nice: Backtracking line search, logarithmic potential for inequalities, I wonder if a symplectic style interleaving of position and momentum might be nice even for this global case. Should definitely just tie up all the vars into a single x. Can we use a lagrangian or hamiltonian and then have pytorch differentiate that? It may in fact be nice to use some combinator to be able to hand the same function to ODEInt for a couple reasons (getting good initilizations of the path for example).

For a simple system, I’m using , , where you get to control f at every time point and x is starting at 0 and wants to get to 1. I’m using a simple scheme of finite difference in time for the time derivative. x and v are defined at and are defined at the half time steps . You need at least two time steps to get a derivative. I’m adding a square cost to the force, otherwise it would just get a huge force. lx and lv are Lagrange multipliers enforcing the equations of motion at each time step

Here was an initial pass (just here for for historical reasons, look at the updated one below. This one does not work as is)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
import torch batch = 1 N = 10 T = 5 dt = T/N x = torch.zeros(batch,N, requires_grad=True) #print(x) v = torch.zeros(batch, N, requires_grad=True) f = torch.zeros(batch, N-1, requires_grad=True) def calc_loss(x,v,f): delx = (x[:,1:] - x[:, :-1]) / dt delv = (v[:,1:] - v[:, :-1]) / dt xbar = (x[:,1:] + x[:, :-1]) / 2 vbar = (v[:,1:] + v[:, :-1]) / 2 dxdt = vbar dvdt = f xres = xbar - dxdt vres = vbar - dvdt dyn_err = torch.sum(torch.abs(xres) + torch.abs(vres), dim=1) #torch.sum(xres**2 + vres**2, dim=1) # + Abs of same thing? reward = torch.sum((xbar-1)**2 + f**2, dim=1) total_cost = 100 * dyn_err + reward return total_cost #print(x.grad) #print(v.grad) #print(f.grad) import torch.optim as optim ''' # create your optimizer optimizer = optim.SGD(net.parameters(), lr=0.01) # in your training loop: optimizer.zero_grad() # zero the gradient buffers output = net(input) loss = criterion(output, target) loss.backward() optimizer.step() # Does the update ''' #Could interleave an ODE solve step - stinks that I have to write dyanmics twice #Or interleave a sepearate dynamic solving # Or could use an adaptive line search. Backtracking # Goal is to get dyn_err quite small learning_rate = 0.001 for i in range(40): total_cost=calc_loss(x,v,f) #total_cost.zero_grad() total_cost.backward() while dyn_loss > 0.01: dyn_loss.backward() with torch.no_grad(): learning_rate = dyn_loss / (torch.norm(x.grad[:,1:]) + (torch.norm(v.grad[:,1:]) x[:,1:] -= learning_rate * x.grad[:,1:] # Do not change Starting conditions v[:,1:] -= learning_rate * v.grad[:,1:] reward.backward() with torch.no_grad(): f -= learning_rate * f.grad print(x) print(v) print(f) |

goofed up a couple things (inlcuding my xres making no sense. You need to explicility zero gradients. Pretty annoying). Lagrange multiplier method makes total sense.

Could we use a Hamiltonian and use autograd to derive equations of motion? Seems plausible and convenient.

Can I make a custom pytorch layer for sparse Hessians? The data oriented viewpoint would have you pump the gradient and hessian backward. Or could you automatically build an H-matrix structure for the hessian of convnets?

Put a neural controller in there. Each batch could have randomized parameters, noise, and initial conditions.

Is rebuilding total_cost every time bad?

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 |
import torch batch = 1 N = 20 T = 5 dt = T/N x = torch.zeros(batch,N, requires_grad=True) #print(x) v = torch.zeros(batch, N, requires_grad=True) f = torch.zeros(batch, N-1, requires_grad=True) lx = torch.zeros(batch, N-1, requires_grad=True) lv = torch.zeros(batch, N-1, requires_grad=True) ''' class Vars(): def __init__(self, N=10): self.data = torch.zeros(batch, N, 2) self.data1 = torch.zeros(batch, N-1, 3) self.lx = self.data1[:,:,0] self.lv = self.data1[:,:,1] self.f = self.data1[:,:,2] self.x = self.data[:,:,0] self.v = self.data[:,:,1] ''' def calc_loss(x,v,f, lx, lv): #finite difference derivative delx = (x[:,1:] - x[:, :-1]) / dt delv = (v[:,1:] - v[:, :-1]) / dt #average at half time step xbar = (x[:,1:] + x[:, :-1]) / 2 vbar = (v[:,1:] + v[:, :-1]) / 2 dxdt = vbar dvdt = f - xbar #residual of dynamics (want these numbers to be zeor) xres = delx - dxdt vres = delv - dvdt #dyn_err = torch.sum(torch.abs(xres) + torch.abs(vres), dim=1) #torch.sum(xres**2 + vres**2, dim=1) # + Abs of same thing? lagrange_mult = torch.sum(lx * xres + lv * vres, dim=1) cost = torch.sum( torch.abs(x-1), dim=1) + torch.sum( f**2, dim=1) #+ #cost = torch.sum((x-1)**2, dim=1) total_cost = cost + lagrange_mult #100 * dyn_err + reward return total_cost #print(x.grad) #print(v.grad) #print(f.grad) import torch.optim as optim ''' # create your optimizer optimizer = optim.SGD(net.parameters(), lr=0.01) # in your training loop: optimizer.zero_grad() # zero the gradient buffers output = net(input) loss = criterion(output, target) loss.backward() optimizer.step() # Does the update ''' #Could interleave an ODE solve step - stinks that I have to write dyanmics twice #Or interleave a sepearate dynamic solving # Or could use an adaptive line search. Backtracking # Goal is to get dyn_err quite small ''' learning_rate = 0.001 for i in range(40): total_cost=calc_loss(x,v,f) #total_cost.zero_grad() total_cost.backward() while dyn_loss > 0.01: dyn_loss.backward() with torch.no_grad(): learning_rate = dyn_loss / (torch.norm(x.grad[:,1:]) + (torch.norm(v.grad[:,1:]) x[:,1:] -= learning_rate * x.grad[:,1:] # Do not change Starting conditions v[:,1:] -= learning_rate * v.grad[:,1:] reward.backward() with torch.no_grad(): f -= learning_rate * f.grad ''' learning_rate = 0.001 for i in range(10000): total_cost = calc_loss(x,v,f, lx, lv) print(total_cost) #print(x) #total_cost.zero_grad() total_cost.backward() with torch.no_grad(): #print(f.grad) #print(lx.grad) #print(x.grad) #print(v.grad) f -= learning_rate * f.grad lx += learning_rate * lx.grad lv += learning_rate * lv.grad #print(x.grad[:,1:]) x[:,1:] -= learning_rate * x.grad[:,1:] # Do not change Starting conditions v[:,1:] -= learning_rate * v.grad[:,1:] x.grad.data.zero_() v.grad.data.zero_() f.grad.data.zero_() lx.grad.data.zero_() lv.grad.data.zero_() print(x) print(v) print(f) |