## Noise and The Fluctuation Dissipation Theorem

I was looking at some slides the other day and they quoted noise power in units of $\frac{W}{\sqrt{Hz}}$. Being the ignoramus I am, I was wondering why it was scaled

First off, when a Watt is quoted in an electrical measurement, usually you’re measuring Voltage with an instrument with a known input impedance Z. That’s how you convert your fluctutating voltage measurement to Watts.

Second, the sqrt frequency thing? Nowadays, your measurement apparatus is probably a digital sampler and it performs an FFT giving you a spectrum. The width of your FFT is the sampling frequency roughly. Does that make sense that when you increase the width of your taken spectrum the height of the noise signal changes too? It does, but only because implicitly, most sampling circuits take an average of the signal over the same period as the sampling time. These two times are not necessarily intrinsically linked. One could have a system that takes a very fast snapshot and but can only save data or send it over a link at a much slower speed. The noise power is this snapshot time, not the data saving time. The data saving time would be the bandwidth in the FFT.

These two are engineered to be the same to avoid distortion of the frequency signal via aliasing.

But there is an even simpler way to see this. Suppose you have two measurements V1 and V2 that are the averages of time T with variance $\sigma$. Then the average of these two, V3, is over a time 2T. However, by the standard kind of manipulations (for Gaussian variables the squared variance of a sum = the sum of the squared variances, $\sigma^2_{\sum x_i}=\sum \sigma_{x_i}$ ), the variance of the new signal is $\sigma/\sqrt{2}$ which means it scales with the time window. Hence multiplying you actual measured variances by the square root of your time window gives you a time window invariant quantity.

While I was thinking about that in the car I realized that the fluctuation dissipation theorem is a mean field theory kind of thing. The fluctuation dissipation theorem feels weird and spooky, but I guess it is ultimately simple (or not).

Mean field theory tries to summarize all the complicated interactions with neighbors with a simple summary. For interacting spins, it tries to summarize as an effective B field from the surrounding spins. Then you have a 1-particle model which you can solve and try to find a self-consistent value of B. Here is a sketch in equations.

$H= \sum S\cdot S - B_{ext}\cdot S \rightarrow \sum - B_{eff}\cdot S$

$Z=\sum_s e^{-\beta H}$

$M = = \partial_{\beta B} \ln(Z)$

$B = \alpha M$

You can do something similar to find an effective permeability due to your surrounding neighbors. $\partial_B M = \chi$

The fluctuating force due to your neighbors is like B, a constant forcing term.

The damping is like the permeability. One may want to consider a system that starts with an intrinsic damping, that is one difference between the magnetic case and the fluctuation case, in that free space has a natural permeability but not a natural damping (I suppose there is always some damping, due to radiation and what not, but we have a tendency to totally neglect such things). One could imagine ball bearings being shaken in a cup of molasses or something. You might want to fluctuation due to being hit by other ball bearings, but consider the damping from the molasses to be the dominating damping term (the the thermal fluctuations from the molasses to be ignorable).

Another difference is that I think you really are going to need to work explicitly with time. Just the thermal average isn’t going to cut it I think (at least not conceptually. There might be some dirty tricks you can play, but a typical Hamiltonian can’t have damping terms. As I write this I am doubting it’s truth).

$\ddot{x} = -\nu \dot{x}+ f$

calculate some averages … Then use the self-consistency

$B = \alpha M \rightarrow f = f(\hat{x})$

The dissipation will be related to your correlation with your neighbors. When you are moving faster, they have to tend to move in such a way to make you slow down on average.

To Be Continued

## Reverse Mode Auto Differentiation is Kind of Like a Lens

Edit: More cogent version here http://www.philipzucker.com/reverse-mode-differentiation-is-kind-of-like-a-lens-ii/

Warning: I’m using sketchy uncompiled Haskell pseudocode.

Auto-differentiation is writing a function that also computes the derivative alongside calculating its value. Function composition is done alongside applying the chain rule to the derivative part.

One way to do this is to use a “dual number”. Functions now take a tuple of values and derivatives.

The Jacobean of a function from $R^n \rightarrow R^m$ is a m by n matrix. The chain rule basically says that you need to compose the matrices via multiplication when you compose the value functions.  This is the composition of the linear maps.

Conceptually, you initialize the process with a NxN identity matrix corresponding to the fact that \$latex \partial x_i/\partial x_j=\delta_{ij}

Vectorized versions of scalar functions (maps) will often use diag

A couple points:

1.  Since the Jacobean j is always going to be multiplied in composition, it makes sense to factor this out into a Monad structure (Applicative maybe? Not sure we need full Monad power).
2. There is an alternative to using explicit Matrix data types for linear maps. We could instead represent the jacobeans using (Vector Double) -> Vector Double. The downside of this is that you can’t inspect elements. You need explicit matrices as far as I know to do Gaussian elimination and QR decomposition. You can sample the function to reconstitute the matrix if need be, but this is somewhat roundabout. On the other hand, if your only objective is to multiply matrices, one can use very efficient versions. Instead of an explicit dense NxN identity matrix, you can use the function id :: a -> a, which only does some minimal pointer manipulation or is optimized away. I think that since we are largely multiplying Jacobeans, this is fine.

What we’ve shown so far is Forward Mode.

When you multiply matrices you are free to associate them in any direction you like. (D(C(BA))) is the association we’re using right now. But you are free to left associate them. ((DC)B)A). You can write this is right associated form using the transpose $((DC)B)A)^T = (A^T(B^T(C^TD^T)))$

This form is reverse mode auto differentiation. Its advantage is the number of computations you have to do and the intermediate values you have to hold. If one is going from many variables to a small result, this is preferable.

It is actually exactly the same in implementation except you reverse the order of composition of the derivatives. We forward compose value functions and reverse compose derivative functions (matrices).

We have CPSed our derivative matrices.

Really a better typed version would not unify all the objects into a. While we’ve chosen to use Vector Double as our type, if we could tell the difference between R^n and R^m at the type level the following would make more sense.

However, this will no longer be a monad. Instead you’ll have to specify a Category instance. The way I got down to this stuff is via reading Conal Elliott’s new Automatic Differentiation paper which heavily uses the category interface.  I was trying to remove the need to use constrained categories (it is possible, but I was bogged down in type errors) and make it mesh nice with hmatrix. Let me also mention that using the Arrow style operators *** and dup and &&& and fst, and clever currying that he mentions also seems quite nice here. The Tuple structure is nice for expressing direct sum spaces in matrices. (Vector a, Vector b) is the direct sum of those vector spaces.

Anyway, the arrows for RD are

This is a form I’ve seen before though. It is a lens. Lens’ have a getter (a -> b) that extracts b from a and a setter (a -> b -> a) that given an a and a new b returns the replaced a.

Is an automatic derivative function in some sense extracting an implicit calculable value from the original vector and returning in a sense how to change the original function? It is unclear whether one should take the lens analogy far or not.

The type of Lens’  (forall f. Functor f => (b -> f b) -> a -> f a) means that it is isomorphic to a type like DFun’. The type itself does imply the lens laws of setters and getters, so these functions are definitely not proper lawful lenses. It is just curious that conceptually they are not that far off.

The lens trick of replacing this function with a quantified rank 1 type (forall f. ) or quantified rank-2 (forall p.) profunctor trick seems applicable here. We can then compose reverse mode functions using the ordinary (.) operator and abuse convenience functions from the lens library.

Neat if true.

## CartPole WORKIN’ BOYEEE

We have been fighting a problem for weeks. The Serial port was just not reliable, it had sporadic. The problem ended up being a surprising thing, we were using threading to receive the messages nd checking for limit switches. It is not entirely clear why but this was totally screwing up the serial port update in an unpredictable manner. Yikes. What a disaster.

After that though smoooooooth sailing.

With a slight adaptation of the previous Openai gym LQR cartpole code and a little fiddling with parameters we have a VERY stable balancer. We removed the back reaction of the pole dynamics on the cart itself for simplicity. This should be accurate when the pole vastly.

We did find that the motor is exactly velocity control in steady state with a linear response. There is a zero point offset (you need to ask for 100 out of 2046 before you get any movement at all).

We’ll see where we can get with the Lyapunov control next time.