For my birthday, we made a new movie and built some robots and went to see Star Wars. Quite a day!

I am pleased to present: The Wassenroid

Skip to content
# Category: Uncategorized

## Our Latest Film Ouvre: The Wassenroid

## Dump of Nonlinear Algebra / Algebraic geometry Notes. Good Links Though

## Polynomial Factorization

## Fool’s Rules Regatta 2019

## Giving the Mostly Printed CNC a try (MPCNC)

## Chile: Nice place

## Thoughts on Faking Some of GADTs in Rust

## Cvxpy and NetworkX Flow Problems

## Nand2Tetris in Verilog and FPGA and Coq

## More Reinforcement Learning with cvxpy

## Q-Learning with Linear Programming (cvxpy, OpenAI Gym Pendulum)

For my birthday, we made a new movie and built some robots and went to see Star Wars. Quite a day!

I am pleased to present: The Wassenroid

Old notes on nonlinear algebra. I dunno about the content but some very good links and book suggestions in here, so I’m gonna dump this one out there. Maybe it’ll help someone.

—

Systems of multivariable polynomial equations are more solvable than people realize. There are algebraic and numeric methods. Look at Macaulay, Singular, Sympy for algebraic methods. phcpack and bertini and homotopycontinuation.jl for numerical .

Algebraic methods are fixated on Groebner bases, which are a special equvialent form your set of equations can be manipulated to. You can disentangle the variables using repeated polynomial division (buchberger’s algorithm) turning your set of equations into an equivalent set that has one more variable per equation. This is like gaussian elimination which is actually the extremely simple version of buchberger for linear equations.

The numerical methods use perturbation theory to take a system of equations you know how to solve and smoothly perturb them to a new system. Each small perturbation only moves the roots a little bit, which you can track with a differential equation solver. Then you can fix it up with some Newton steps. People who really care about this stuff make sure that there are no pathological cases and worry about roots merging or going off to infinity and other things.

You need to know how many roots to build and track in your solvable system. For that two theorems are important

bezout thereom – for dense systems, number of solutions is bound by product of total degree of equations.

bernstein bound – newton polytope gives bound of number of solutions of polynomial system. useful for sparse

One could make an argument for the homotopy continuation methods being the analog of iterative solutions for linear equations if grobner basis are gaussian elimibnation. Take equation we know how to solve (~preconditioner) and perform some iterative thing on it.

add enough random linear equations to make system full (points).

Then you have a membership algorithm due to sweeping of planes. Once you have points on actual varieites, pairwise compare them.

Cox OShea book is often reccomended. It’s really good.

https://www.springer.com/us/book/9781441922571

More advanced Cox et al book

https://www.springer.com/us/book/9780387207063

Bernd Sturmfels, Mateusz Michalek (including video lectures)

https://personal-homepages.mis.mpg.de/michalek/ringvorlesung.html

https://personal-homepages.mis.mpg.de/michalek/NonLinearAlgebra.pdf

(Bernd is da man!)

https://math.berkeley.edu/~bernd/math275.html

Maculay 2 book

https://faculty.math.illinois.edu/Macaulay2/Book/

Singular books

https://www.singular.uni-kl.de/index.php/publications/singular-related-publications.html

https://www.springer.com/us/book/9783662049631

https://www.ima.umn.edu/2006-2007

Planning Algorithms, in particular chapter 6

Gröbner bases in Haskell: Part I

Summer school on tensor methods

https://www.mis.mpg.de/calendar/conferences/2018/nc2018.html

Extensions of

https://ieeexplore.ieee.org/document/4399968

Numerical Polynomial Algebra by Hans Stetter

https://epubs.siam.org/doi/book/10.1137/1.9780898717976?mobileUi=0&

Introduction to Non-Linear Algebra V. Dolotin and A. Morozov. A high energy physics perspective

https://arxiv.org/pdf/hep-th/0609022.pdf

Nonlinear algebra can also be approach via linear algebra surprisingly. Resultants. As soon as you see any nonlinearity, the linear part of your brain shuts down, but a good question is linear in WHAT? Consider least squares fitting, which works via linear algebra. Even though you’re fitting nonlinear functions, the expressions are linear in the parameters/coefficients so you’re all good.

Similarly you can encode root finding into a linear algebra problem. A matrix has the same eigenvalues as it’s characterstic polynomial has roots, so that already shows that it is plausible to go from linear algebra to a polynomial root finding problem. But also you can encode multiplying a polynomial by x has a linear operation on the coefficients. In this way we can .

[1 x x^2 x^3 …] dot [a0 a1 a2 a3 …] = p(x)

Multiplying by x is the shift matrix. However, we also are assuming p(x)=0 which gives use the ability to truncate the matrix. x * [1 x x^2 x^3 …] = Shift @ xbar. This is somewhat similar to how it feels to do finite differnce equations. The finite difference matrix is rectangular, but then boundary conditions give you an extra row. Multiplication by x returns the same polynomial back only when p(x)=0 or x = 0. The eigenvalues of this x matrix will be the value of x at these poisitions (the roots). This is the companion matrix https://en.wikipedia.org/wiki/Companion_matrix

We can truncate the space by using the zero equation.

It’s a pretty funky construction, I’ll admit.

To take it up to multivariable, we bring in a larger space [1 x y x^2 xy y^2 …] = xbar kron ybar

We now need two equations to reduce it to points. The X matrix is lifted to X kron I. and we can pad it with ?

Multiplying by an entire polynomial. Sylvester matrix for shared roots. Double root testing.

Sylvester matrix is based on something similar to bezout’s identity. To find out if some things p q has common factors you can find 2 things r s such that r*p + q*s = 0

https://en.wikipedia.org/wiki/Polynomial_greatest_common_divisor#B%C3%A9zout’s_identity_and_extended_GCD_algorithm

Sum of Squares is somewhat related material on systems of polynomial inequalities which can be translated to semidefinite matrix constraints. If you want to include equalities, you can use groebner bases to presolve them out.

Parrilo course material on Sum of Squares.

https://learning-modules.mit.edu/materials/index.html?uuid=/course/6/sp16/6.256#materials

Paper on using greobner and CAD (cylindrical algebraic decomposition) for opitmization and control

Using groebner basis for constraint satisfaction problems: x^n=1 gives a root of unity. There are n solutions. This gives a finite set to work with. Then you can add more equations. This is related to the max-cut thing. I saw this on Cox webpage.

You can require neighbors to have different vertices by 0=(xi^k – xj^k)/(xi – xj). You can encode many constraints using clever algebra.

an example using the same technique to solve sudoku

Sympy tutorial solving geoemtric theorems and map coloring

explicitly mentions toric groebner as integer programming.

other interesting exmaples

http://www.scholarpedia.org/article/Groebner_basis

Noncommutative grobner basis have application to solving differential equations? The differential operators are noncommutative. Not just silly quantum stuff. I mean the simple exmaple of non commutativty are the shcordinger momentum operators.

Automatic loop invariant finding

Geometric theorem rpvong

robotic kinematics

Optics? Envelopes, exchange of coordinates. Legendre transformations. Thermodynamics?

Global optimization? Find all local minima.

Nonlinear finite step.

Dynamic Prgramming. Add implicit V variab le for the vlaue function. Constrain via equations of modtion. Perform extermization keeping x0 v0 fixed. dx0=0 dv0=0 and dV=0. Grobner with ordering that removes x1 v1 V1. Iterate. Can keep dt as variable. Power series in t? Other integration schemes.

Probably need some method to simplify that left over relations so that they don’t get too complex. Smoothing? Dropping terms? Minimization may require factoring to find global minimum.

Differentiation. Add to every variable a dx. Collect up first order as a seperate set of constraints. Add conditions df=0 and dy=0 for fixed variables to perform partial differentiation and extremization. A very similar feel to automatic differentiation. Functions tend to not be functions, just other wriables related by constraints

Variable ordering

lex – good for elimination

deglex – total degree then a lex to tie break

grevlex – total degree + reverse lexicographic. The cheapest variable is so cheap that it goes last

block ordering, seperate variables into blocks and pick orderings inside blocks

general matrix ordering. Apply a matrix to the exponent vectors and lex comparse the results. Others are a subset.

Can’t I have a don’t care/ partial order? would be nice for blockwise elimination I feel like.

Non-commutative

http://sheaves.github.io/Noncommutative-Sage/

Physicsy

https://arxiv.org/pdf/hep-th/0609022

CAD book

https://link.springer.com/book/10.1007%2F978-3-7091-9459-1

Rings have addition and multiplication but not division necessarily. Polynomials, integers, aren’t guarenteed to have inverses that remain polynomials or integers.

ideal = a subset of a ring that absorbs multiplication. Also closed under addition

All polynomial conseqeunces of a system of equations

HIlbert Basis theorem – all ideals are egenrated by a finite set

ideal generated from set – any element of ring that can be generated via addition and multiplication by arbitary element. Is ideal because if you multiplied it by another object, it is still and sum of multiples.

Ideals are sometimes kind of a way of talking about factors without touching factors. Once something is a multiple of 5, no matter what you multiply it with, it is still a multiple of 5. If (x – 7) is a factor of a polynomial, then no matter what you multiply it with, (x-7) is still a factor. Zeros are preserved.

Principal ideal domain – every ideal is generated by a single object

Prime ideal. if a*b is in ideal then either a or b is in ideal. Comes from prime numbers ideal (all number divislable by prime number). If ab has a factor of p then either a or b had a factor of p. whereas consider all mutiples of 4. if a = b =2 then ab is a mutiple of 4, but neither a nor b are a multiple of 4.

1d polynomials. Everything is easy.

Polynomial division is doable. You go power by power. Then you may have a remained left over. It’s pretty weird.

You can perform the gcd of two polynomials using euclidean algorithm.

The ideal generated by a couple of them is generated by the multipolynomial gcd?

a = cx + dy + r

multivariate division: we can do the analog of polynomial division in the multivariate case. But we need an ordering of terms. reaminder is not unique.

But for certain sets of polynomials, remainder is unique.

Why the fixation on leading monomials?

The S-polynomial is the analog of one step of the euclidean algorithm. It also has the flavor of a wronskian or an anticommutator.

The bag euclidean algorithm. Grab the two things (biggest?). Take remainder between them, add remainder into bag.

This is the shape of the buchberger algorithm.

Finding homology or cohomology of solutions. Good question. One can see how this could lead to categorical nonsense since Category theory was invented for topological questions.

The variety is where a set of polynomials is 0. Roots and zero surfaces

List gives set of polynomials.

[forall a. Field a => (a,a,a) -> a ]

Or explicit

union and intersection can be achieved via multiplication and combining the sets

Krull dimension – Definition of dimension of algebraic variety. Maximal length of inclusion chain of prime ideals.

Ideals and Varieites have a relation that isn’t quite trivial.

The ideal of a variety

Envelopes – parametrized set of varieties f(x,t)=0 and partial_t f(x,t)=0. Eliminate t basically to draw the thing. Or trace out t?

Wu’s method for geometric theorem proving. You don’t need the full power of a grobner basis.

Polynomial maps. Talk about in similar language to differential geometry.

Boxes are a simple way to talk about subsets. Or lines, planes. Or polytopes.

Also any function that gives a true false value. But this is very limited in what you can actually do.

Varieties give us a concrete way to talk about subsets. Grothendieck schemes give unified languages supposedly using categorical concepts. Sounds like a good fit for Haskell.

class Variety

use powser. Functor composition makes multivariable polynomials. Tuples or V3 with elementwise multiplication

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
-- give variables names newtype X a = X [a] newtype Y a = Y [a] -- from k^n -> k^m type family PolyFun n m k where PolyFun n (S m) k = (PolyFun n 1, PolyFun n m) PolyFun (S n) 1 k = [PolyFun n 1 k] PolyFun 1 1 k = k -- Gonna need to Make a typeclass to actually do this. Yikes -- it's just not as simple as Cat a b type. You really need to do computation -- and input a is not same as class PolyF f where pcompose :: PolyFun b c k -> PolyFun a b k -> PolyFun a b k pid :: Num k => PolyFun b c k -- related to ideal of n generators on a space k^m -- this functions will compose some incoming -- or is this better thought of as a variety? type Idealish :: (PolyFun n 1) -> PolyFun m 1 makeidealish :: PolyFun m n -> Ideal makeidealish f = flip pcompose f -- apply turns polynomial into haskell function apply :: (PolyFun n m) -> V n -> V m -- somehow should be able to replace points with varieties. It's like a whole thing type VarietyFun = (PolyFun n 1 k) -> (PolyFun m 1 k) (PolyFun n 1 k -> PolyFun m 1 k) -> (PolyFun m 1 k -> PolyFun l) |

The polynomial as a type parameter for agda. Regular Functions are functions from one variety to another. They are the same as the polynomial ring quotiented out by the ideal of the variety.

Ring Space and Geometric Space (affine space)

Maximal ideals can be thought of as points. (ideal of x-a, y-b, …).

Free Polynomials ~ Free Num. Sparse representation. Uses Ordering of a. We should not assume that the are a power like in http://hackage.haskell.org/package/polynomial-0.7.3/docs/Math-Polynomial.html

Ord is monomial ordering. Think of a as [X,Y,X,X,X]

divmod :: (Integral a, Ord a) => Poly r a -> Poly r a -> Poly r a

newtype Monomial a = Monomial [a]

— different monomial newtype orderings for lex, etc.

Monomial (Either X Y)

divmod as bs = remove bs from as. if can’t remainder = as, div = 0

Intuition pumps : algebraic geometry, differential geoemtry, ctaegory theory, haskell agda.

In differential geometry, embedding sucks. We get around it by defining an atlas and differential maps.

There is a currying notion for polynomials. We can consider a polynomial as having coefficinets which themselves are polynomials in other variables or all at once.

What can be solved linearly? The Nullstullensatz certificate can be solved using linear equations

Resultants. What are they? Linear sums of monomials powers * the orginal polynomials. Det = 0 implies that we can find a polynomial combination

What is the deal with resultants

Toric Varieties. C with hole in it is C*. This is the torus because it is kind of like a circle. (Homologically?). There is some kind of integer lattice lurking and polytopes. Gives discrete combinatorial flavor to questions somehow. Apparently one of the more concrete/constructive arenas to work in.

binomaial ideals. the variety will be given by binomials

maps from one space to another which are monomial. can be implicitized into a variety. map is described by integer matrix. Integer programming?

Similar “cones” have been discussed in teh tropical setting is this related?

Algebraic statistics. Factor graph models. Probablisitc graphical models. Maybe tihs is why a PGM lady co taught that couse with Parillo

Modules

Tropical geometry

http://www.cmap.polytechnique.fr/~gaubert/papers.html

Loits of really intriguing sounding applications. Real time verification

gfan

How does the polynomial based optimization of the EDA course relate to this stuff? https://en.wikipedia.org/wiki/Logic_optimization

Mixed volume methods? Polytopes.

cdd and other polytopic stuff. Integration of polynomials over polytopes

Software of interest

Sage

Sympy

Singular – Plural non-commutative?

FGb – Faugiere’s implmentation of Grobner basis algorithms

Macaulay

CoCoa

tensorlab – https://en.wikipedia.org/wiki/Tensor_software

sostools

PolyBori – polynomials over boolean rings http://polybori.sourceforge.net/doc/tutorial/tutorial.html#tutorialli1.html

LattE

4ti2

normaliz

polymake – https://polymake.org/doku.php/tutorial/start slick

http://hep.itp.tuwien.ac.at/~kreuzer/CY/CYpalp.html Calabi Yau Palp????

TOPCOM

frobby – can get euler charactersitics of monomial ideals? http://www.broune.com/frobby/index.html

gfan

https://www.swmath.org/browse/msc

Homotopy continuation:

Bertini

http://homepages.math.uic.edu/~jan/phcpy_doc_html/index.html

phcpy and phcpack

hom4ps

https://www.juliahomotopycontinuation.org/

certification:

http://www.math.tamu.edu/~sottile/research/stories/alphaCertified/

cadenza

Jan

http://homepages.math.uic.edu/~jan/mcs563s14/index.html

www.math.uic.edu/~jan/tutorial.pdf

bezout thereom – for dense systems, number of solutions is bound by product of total degree of equations.

bernstein bound – newton polytope gives bound of number of solutions of polynomial system. useful for sparse

One could make an argument for the homotopy continuation methods being the analog of iterative solutions for linear equations if grobner basis are gaussian elimibnation. Take equation we know how to solve (~preconditioner) and perform some iterative thing on it.

add enough random linear equations to make system full (points).

Then you have a membership algorithm due to sweeping of planes. Once you have points on actual varieites, pairwise compare them.

Suggestion that “linear program” form helps auto differentiation?

local rings. thickening? Infinite power series modded out by local relation. One maximal ideal.

differential geometry on algebaric surfaces.

modules are like vector spaces.

Ring linear

Canonical example, a vector of polynomials.

1-d space of polynomials.

Module morphism – respects linearity with sresepct to scalar multiplacation and addition Can be specified compoent wise. But has to be specified in such a way that resepct.

Basis – Linearly Independent set that spans the whole module. May not exist.

So were are kind of stuck always working in overcomplete basis to make thje vector space analogy. The generators have non trivial relations that equal zero. These coefficients form their own vector space. The space whole image is zero because of the relations is called the first syzygy module.

But then do we have a complete basis of all the relations? Or is it over complete?

If you ignore that the entries of a vectors are polynomials it becomes vector space. But but because they are they have secret relations.

even 1 dimensional vector space has some funky structure because of the polynomial nautre of the ring.

Somehow fields save us?

Paramaetrized vector curves, surfaces.

Parametrzied matrices.

Noncommutative polynomials. We could perhaps consider the process of normal ordering something related to a grobner basis calcaultion. Perhaps a multi polynomial division process? Consider the ordering where dagger is greaer than no dagger. Canonical basis also has i<j (more important for fermion).

SOS gives you the exact minimum of 1-d polynomial. You could also imagine encoding this as a semidefintier program. H-lam>=0. Min lam. Where H is the characterstic matrix.

We can diagonalize to the sos form, and then take each individual term = 0 to solve for x*.

While integer programming does that funky toric variety stuff with the objective vector deswcribing the grobner basis, binary programming is simple. x^2=x + linear eequations and constraints

haskell grobener

1. Monomials. Exponent vectors. Logarithmic representation. Mutiplication is addition. Composition is elementwise multiplication. Type level tag for ordering.

newtype Mon3 ord = V3 Int

data Lex

data DegLex

Ordering of monomials is important. Map is perfect

Map (Mon3 ord) ring

Groebner bases can be used to describe many familiar operations. Linear algerba, gaussian elminiation. Using commutators. Building power series assuming terms are relatively irrelevant.

Can I get a power series solution for x^2 + ax + 1=0 by using a negative ordering for a? I need another equation. x = \sum c_n * a^n. (x+dx)? How do I get both solutions?

Dual numbers for differential equations. dx is in a ring such that dx^n = 0.

Subset sum. Find some of subset of numebrs that add up to 0.

s um variables s_i

Solutions obey

s_0 = 0

(s_i – s_{i-1})(s_i – s_{i-1}-a_{i-1})=0

s_N = 0

Factors give OR clauses. Sepearte oplynomials give AND clauses. pseudo CNF form. Can’t always write polys as factors though? This pattern also matches the graph coloring.

More interesting books:

Some fun with algebraic numbers

https://mattpap.github.io/masters-thesis/html/src/algorithms.html

https://en.wikipedia.org/wiki/Factorization_of_polynomials

Numerical vs Symbolic

Numeric

https://en.wikipedia.org/wiki/Root-finding_algorithm

Pick a random point. Then apply Newton’s method. Do this over and over. If you find N unique factors, you’ve done it. A little unsatisfying, right? No guarantee you’re going to find the roots.

2. Perturbation theory / Holonomy continuation. Start with a polynomial with the same number of total roots that you know how to factor. x^N – 1 = 0 seems like an easy choice. Given , . . You can use this ODE to track the roots. At every step use Newton’s method to cleanup the result. Problems can still arise. Do roots collapse? Do they smack into each other? Do they run off to infinity?

3. The Companion matrix. You can convert finding the roots into an eigenvalue problem. The determinant of a (A – \lambda) is a polynomial with roots at the eigenvalues. So we need tyo construct a matrix whose deteerminant equals the one we want. The companion matrix simulates multiplication by x. That is what the 1 above the diagonal do. Then the final row replaces x^(N+1) with the polynomial. In wikipedia, this matrix is written as the transpose. https://en.wikipedia.org/wiki/Companion_matrix

4. Stetter Numerical Polynomial Algebra. We can form representations basically of the Quotient Rings of an Ideal. We can make matrices A(j) that implement multiplication by monomials x^j in F[x]/I. Then we can take joint eigensolutions to diagonalize these multiplications. Something something lagrange polynomials. Then if the solutions respect some kind of symmettry, it makes sense that we can use Representation theory proper to possibly solve everything. This might be the technique of Galois theory metnoined in that Lie Algebra book. This is not unconnected with the companion matrix technique above. These matrices are going to grow very higher dimensional.

Thought. Could you use holonomy continuation to get roots, then interpolate those roots into a numerical grobner basis? Are the Lagrange polynomials of the zero set a grobner basis?

Symbolic

Part of what makes it seem so intimidating is that it isn’t obvious how to brute force the answer. But if we constrain ourselves to certain kinds of factors, they are brute forceable.

Given a suggested factor, we can determine whether it actually is a factor by polynomial division. If the remainder left over from polynomial division is 0, then it is a factor.

If we have an enumerable set of possibilities, even if large, then it doesn’t feel crazy to find them.

Any root of a polynomial with rational coefficients can be converted to integer coefficients by multiplying out all the denominators.

Let’s assume the polynomial has factors of integer coefficients.

Rational Root Test

Kronecker’s method

Finite Fields. It is rather remarkable that there exists finite thingies that have the algerbaic properties of the rationals, reals, and complex numbers. Typically when discretizing continuum stuff, you end up breaking some of the nice properties, like putting a PDE on a grid screws over rotational symmetry. Questions that may be hard to even see how to go about them become easy in finite fields in principle, because finite fields are amenable to brute force search. In addition, solutions in finite fields may simply extend to larger fields, giving you good methods for calculations over integers or rationals or what have you.

SubResultant. A curious property that if two polynomials share roots/common factors, it is pretty easy to seperate that out. The GCD of the polynomials.

Kind of the gold standard of root finding is getting a formula in terms of square roots. This is an old question. Galois Theory is supposedly the answer.

For four years running now we’ve done the fool’s rules regatta in Jamestown as Team Skydog in the unlimited category. http://www.jyc.org/FoolsRules/ThisYearsFoolsRules.htm

It’s a short boat race where you make a boat in 2 hours out of crap and then race ’em.

The first year we did trash bags filled with air with a platform. We were told we were very creative. We got incredibly dehydrated and sunburnt that year.

Then next two years we did a catamaran with triangular plywood pontoons held together with zip ties. Here’s some link from last year

http://www.jyc.org/FoolsRules/images/2018/foolsrules20183.jpg

http://www.jyc.org/FoolsRules/images/2018/foolsrules20187.jpg

This year we did a 2×4 and 2×3 frame with plastic sheeting stapled to the inside. It worked surprisingly well! Nearly unsinkable even when we intentionally punctured huge holes . The wind was blowing out to sea and almost everyone was drifting out to Newport. We got first place this year! We celebrated with bbq and our ice cream bucket prize.

Thoughts for next year: Make a keel / centerboard so we can tack? Maybe that is crazy talk.

Declan’s post on the same event: https://www.declanoller.com/2019/08/11/skydog-2019-winners-again/

Declan had the good idea to make a CNC machine. There is a popular plan available here

https://www.v1engineering.com/specifications/

The really cute part of it is using electrical conduit as rails, which are shockingly inexpensive. Like a couple bucks for 4 feet! Holy shnikes!

We’ve been printing up a storm for the last couple weeks. A ton of parts!

We already had a lot of motors and stuff lying around. Declan bought a lot of stuff too just for this. Assorted bearings and bolts. The plans have a bill of materials.

Repetier host seemed to work pretty well for controlling the board

Used the RAMPS branch of the mpcnc marlin repo

Edited the header files as described in this post so that we could use both extruders as extra x and y motor drivers. It did not seem like driving two motors from the same driver board was acceptable. Our bearings are gripping the rails a little too tight. It is tough to move.

Some useful links on the thingiverse version of the mpccnc https://www.thingiverse.com/thing:724999

He suggests using this program http://www.estlcam.com/ Seem like windows only? ugh.

The mpcnc plans don’t contain actual tool mounts but here are some examples

A pen holder: https://www.thingiverse.com/thing:1612207/comments

A dewalt mount: https://www.thingiverse.com/thing:944952

This is an interesting web based g-code maker. Ultimately a little to janky though. It works good enough to get started http://jscut.org/jscut.html . Not entirely clear what pocket vs interior vs whatever is. engrave sort of seemed like what I wanted. Went into inkscape with a reasonable png and traced bitmapped it, then object to path. It’s also nice to just find an svg on the internet

The following code was needed to zero repetier and the RAMPS at the touch off point. We added it as a macro. It is doing some confusing behavior though.

1 2 |
G92 X0 Y0 Z0 @isathome |

pycam is the best I can find for 3d machining. Haven’t actually tried it yet

http://pycam.sourceforge.net/getting-started/

We should probably upgrade the thing to have limit switches. It pains me every time we slam it into the edge.

All in all, a very satisfying project. Hope we build something cool with it.

Just got back from Chile from a vacation visiting Will. Nice place.

We took a leisurely amount of time getting to Torres del Paine, which was the main feature of our trip. We travelled through Santiago, Punta Arenas and Puerto Natales. We spent a very tired day in the children’s science museum and rode the funicular. There wasn’t that much to do in the latter two cities, maybe we could have shaved some time from them. Our hostel in Punta Arenas was notably ramshackle. We spent 5 days backpacking in the park. Absolutely gorgeous. The wind on the second day was like nothing I’ve ever experienced. I was a little concerned about staying on my feet. Hiking poles for the win. Day 4 was cold and wet and miserable, but it ended up ok in the end. We were able to get a spot in a refugio when it just was too overwhelming to try to set up our tents in a flooded muddy campsite on that day. I think Beth in particular was at the end of her rope. I basically didn’t poop the entire first week I was there, but one glorious day on the mountain the heavens parted for me, and I was fine from then on. I didn’t quite pack food right. There ended up being camp stores at most of the places we stayed, but if I hadn’t been able to re up on cookies it would have been a lean couple days food wise. Ramen Bombs for the win. We drank right from the streams, which is unusual for us. Usually we filter.

All told we did ok on food. I really like the al pobre thing. What isn’t to like a about steak, onions, and eggs on fries? Chileans seem a little eager on the mayo. Nobody does breakfast right except the US. The street food was good. I love the fried tortilla thing that you can just slather salsa on. It was like 30 cents. The empanadas were also pretty great cheap food. Ceviche was also very tasty. They toss out avocado like it’s nuthin down there. Sandwiches were kind of shitty. Don’t know if that is entirely fair, but that is how it felt. Highlight meal of the trip was at Cafe Artimana in Puerto Natales. Yeah, I got some al pobre. But also basil lemonade stuff

After the hiking, we scheduled an early return to Santiago rather than busting our asses to a glacier viewpoint. In the airport at Punta Arenas, we got the southernmost dominos in the entire world. Ben Will and Declan went on a taxi quest to go get it. Wandered around Santiago, saw some churches and cathedrals, a fort, ate churros, etc. Declan was on a quest for a neck pillow. We did a prison themed Escape Room. People felt like we got a little cheated because some of the puzzles felt like bullshit? I think they really expect to break room records. I suck at escape rooms. We were able to spend a day in Valparaiso, which had a super awesome street art scene.

I spent the last day puking my guts out. So it goes. Not sure how exactly. The street sausage may have put me over the top. I guess I’m a sensitive fellow? I get pretty consistently unwell on trips.

Chile has tons of fluffy street dogs. They’re pretty friendly, although they do chase cars and motorcycles. Idiots.

Chile has a way lower english quotient than other trips I made. I’ve been surprised how common at least some english has been in Europe and Asia, and was now equally surprised how little there was in Chile. It makes sense. A lot of the continent is spanish speaking. It was really useful to have Will around, who has gotten shockingly good at Spanish from an outsiders perspective.

SUCH MEMORIES

Wait, where are all my BOBBY PICS!?!

o there u r u cutie

overwhelmed

I’m a guy who is somewhat familiar with Haskell who is trying to learn Rust. So I thought I’d try to replicate some cool Haskell functionality in Rust. I would love to hear comments, because I’m trying to learn. I have no sense of Rust aesthetics yet and in particular I have no idea how this interacts with the borrow system. What follows is a pretty rough brain dump

GADTs (Generalized algebraic data types) are an extension in Haskell that allows you to write constrained type signatures for your data constructors. They also change how the type checking of pattern matching is processed.

GADTs are sometimes described/faked by being built by making data types that hold equality/unification constraints. Equality constraints in Haskell like `a ~ Int`

are fairly magical and the Rust compiler does not support them in an obvious way. Maybe this is the next project. Figure out how to fake ’em if one can. I don’t think this is promising though, because faking them will be a little wonky, and then GADTs are a little wonky on top of that. See https://docs.rs/refl/0.1.2/refl/ So we’ll go another (related) road.

This is roughly what GADTs look like in Haskell.

1 2 3 |
data MyGadt a where TBool :: Bool -> MyGadt Bool TInt :: Int -> MyGadt Int |

And here is one style of encoding using smart constructors and a typeclass for elimination (pattern matching is replicated as a function that takes callbacks for the data held in the different cases). Regular functions can have a restricted type signature than the most general one their implementation implies. The reason to use a typeclass is so that we can write the eliminator as returning the same type that the GADT supplies. There isn’t an explicit equality constraint. A kind of Leibnitz equality

1 |
(forall f. f a -> f b) |

is hiding in the eliminator. The Leibnitz equality can be used in place of (~) constraints at some manual cost. http://code.slipthrough.net/2016/08/10/approximating-gadts-in-purescript/

https://jesper.sikanda.be/files/leibniz-equality.pdf

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
data MyGadt2 a = TBool2 Bool | TInt2 Int -- smart constructors. Hide the original constructors tInt :: Int -> MyGadt2 Int tInt = TInt tBool :: Bool -> MyGadt2 Bool tBool = TBool -- gadt eliminator class MyGadtElim a where elimMyGadt :: forall f. MyGadt2 a -> (Bool -> f Bool) -> (Int -> f Int) -> f a instance MyGadtElim Int where elimMyGadt (TInt2 x) fb fi = fi x elimMyGadt _ _ _ = error "How did TBool2 get type MyGadt2 Int?" instance MyGadtElim Bool where elimMyGadt (TBool2 x) fb fi = fb x |

The

1 |
forall f :: * -> * |

is a problem for Rust. Rust does not have higher kinded types, although they can be faked to some degree. https://gist.github.com/CMCDragonkai/a5638f50c87d49f815b8 There are murmurs of Associated Type Constructors / GATs , whatever those are , that help ease the pain, but I’m pretty sure they are not implemented anywhere yet.

I’m going to do something related, a defunctionalization of the higher kinded types. We make an application trait, that will apply the given type function tag to the argument. What I’m doing is very similar to what happens in the singletons library, so we may be getting some things for free.

https://typesandkinds.wordpress.com/2013/04/01/defunctionalization-for-the-win/

1 |
trait App1<A> { type T;} |

Then in order to define a new typelevel function rip out a quick tag type and an App impl.

1 2 3 4 |
struct F1 {} impl<A> App1<A> for F1{ type T = Vec<Vec<A>>; } |

It might be possible to sugar this up with a macro. It may also be possible to write typelevel functions in a point free style without defining new function tag names. The combinators Id, Comp, Par, Fst, Snd, Dup, Const are all reasonably definable and fairly clear for small functions. Also the combinator S if you want to talk SKI combinatory calculus, which is unfit for humans. https://en.wikipedia.org/wiki/SKI_combinator_calculus For currying, I used a number for how many arguments are left to be applied (I’m not sure I’ve been entirely consistent with these numbers). You need to do currying quite manually. It may be better to work with tuplized arguments

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
struct Vec1 {} // number is number of aspplications left. impl<A> App1<A> for Vec1{ type T = Vec<A>; } struct Id(); impl<A> App1<A> for Id{ type T = A; } // partially applied const. // struct Const() // call this next one const1 struct Const1(); struct Const<B>(PhantomData<B>); // should be Const1 impl<A> App1<A> for Const1 { // partial application type T = Const<A>; } impl<A,B> App1<A> for Const<B>{ type T = B; } struct Fst {} impl <A,B> App1<(A,B)> for Fst { type T = A; } struct Snd {} impl <A,B> App1<(A,B)> for Snd { type T = B; } struct Dup{} impl <A> App1<A> for Dup { type T = (A,A); } struct Par2 {} struct Par1<A> (PhantomData<A>); struct Par<A,B> (PhantomData<A> , PhantomData<B> ); impl<F> App1<F> for Par2 { type T = Par1<F>; } impl<F,G> App1<G> for Par1<F> { type T = Par<F,G>; } impl<X,Y,F,G> App1<(X,Y)> for Par<F,G> where F : App1<X>, G : App1<Y> { type T = (F::T, G::T); } // In order to curry, i think I'd have to define a name for every curried form. // combinator calculus Const is K, Id is I, and here is S combinator. Yikes. type I = Id; type K<A> = Const<A>; struct S3{} struct S2<A>(PhantomData<A>); struct S1<A,B>(PhantomData<A>, PhantomData<B>); // struct S<A,B,C>(PhantomData<A>, PhantomData<B>, PhantomData<C>); impl <A,B,C> App1<C> for S1<A,B> where A : App1<C>, B : App1<C>, <A as App1<C>>::T : App1< <B as App1<C>>::T > { type T = < <A as App1<C>>::T as App1< <B as App1<C>>::T > >::T; } struct Comp2(); struct Comp1<F> (PhantomData<F>); struct Comp<F,G> (PhantomData<F>, PhantomData<G>); impl<F,G, X> App1<X> for Comp<F,G> where G : App1<X>, F : App1<<G as App1<X>>::T> { type T = <F as App1<<G as App1<X>>::T>>::T; } |

Anyway, the following is a translation of the above Haskell (well, I didn’t wrap an actual i64 or bool in there but I could have *I think*). You need to hide the actual constructors labeled INTERNAL in a user inaccessible module.

1 2 3 4 5 6 7 8 9 10 11 12 13 |
enum Gadt<A> { TIntINTERNAL(PhantomData<A>), TBoolINTERNAL(PhantomData<A>) } // then build the specialzied constructors that fn TBool() -> Gadt<bool>{ Gadt::TBoolINTERNAL(PhantomData) } fn TInt() -> Gadt<i64>{ Gadt::TIntINTERNAL(PhantomData) } |

The smart constructors put the right type in the parameter spot

Then pattern matching is a custom trait per gadtified type. Is it possible to unify the different elimination traits that will come up into a single Elim trait? I’m 50-50 about whether this is possible. What we’re doing is a kind of fancy `map_or_else`

if that helps you.

https://doc.rust-lang.org/std/option/enum.Option.html#method.map_or_else

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
trait GadtElim { type Inner; fn gadtElim<F>(&self, case1 : <F as App1<bool>>::T , case2 : <F as App1<i64>>::T ) -> <F as App1<Self::Inner>>::T where F : App1<bool>, F : App1<i64>, F : App1<Self::Inner>; } impl GadtElim for Gadt<i64> { type Inner = i64; fn gadtElim<F>(&self, case1 : <F as App1<bool>>::T , case2 : <F as App1<i64>>::T ) -> <F as App1<Self::Inner>>::T where F : App1<bool>, F : App1<i64>, F : App1<Self::Inner>{ match self{ Gadt::TIntINTERNAL(PhantomData) => case2, Gadt::TBoolINTERNAL(PhantomData) => panic!("Somehow TBool has type Gadt<i64>")// Will never be reached though. god willing } } } impl GadtElim for Gadt<bool> { type Inner = bool; fn gadtElim<F>(&self, case1 : <F as App1<bool>>::T , case2 : <F as App1<i64>>::T ) -> <F as App1<Self::Inner>>::T where F : App1<bool>, F : App1<i64>, F : App1<Self::Inner>{ match self{ Gadt::TIntINTERNAL(PhantomData) => panic!("Somehow TInt has type Gadt<bool>"), Gadt::TBoolINTERNAL(PhantomData) => case1 // Will never be reached though. god willing } } } |

Usage. You have to explicitly pass the return type function to the eliminator. No inference is done for you. It’s like Coq’s match but worse. BTW the `dbg!`

macro is the greatest thing on earth. Well done, Rust.

1 2 3 4 5 6 7 8 |
let z = TInt().gadtElim::<Id>(true , 34); let z2 = TBool().gadtElim::<Id>(true , 34); dbg!(z); dbg!(z2); struct F7 {} // You need to do this. Kind of sucks. macroify? App!(F<A> = Vec<A>) impl<A> App1<A> for F7 { type T = Vec<A>; } dbg!(TInt().gadtElim::<F7>(vec!(true,false) , vec!(34,45,4,3,46))); |

You can make helpers that don’t require explicit types to be given

1 2 3 |
fn gadtRec<A>(x : impl GadtElim, case1 : A, case2 : A) -> A { x.gadtElim::<Const<A>>(case1 , case2) } |

One could also make an `Eq a b`

type with Refl similarly. Then we need typelevel function tags that take two type parameter. Which, with currying or tupling, we may already have.

Questions:

Is this even good? Or is it a road of nightmares? Is this even emulating GADTs or am I just playing phantom type games?

We aren’t at full gadt. We don’t have existential types. Rust has some kind of existential story evolving (already there?), but the state of it is confusing to me. Something to play with. Higher rank functions would help?

Are overlapping typeclasses a problem in Rust?

Again, I have given nearly zero thought to borrowing and how it interacts with this. I’m a Rust n00b. I should think about it. Different eliminators based on whether you own or are borrowing?.

How much of singleton style dependent types do we get from this? It feels like we have already paid the cost of defunctionalizing. http://hackage.haskell.org/package/singletons

My current playground for this is at https://github.com/philzook58/typo

Edit: Rust is an ML with affine types and a syntax facelift roughly. So Ocaml is a good place to pump. Oleg Kiselyov had a fascinating approach that sort of smuggles through an Option type in an equality type using mutation. I wonder how well that would mesh with Rust. It seems obviously not thread safe unless you can make the mutation and matching atomic.

http://okmij.org/ftp/ML/GADT.txt

okmij.org/ftp/ML/first-class-modules/

https://blog.janestreet.com/more-expressive-gadt-encodings-via-first-class-modules/

Networkx outputs scipy sparse incidence matrices

https://docs.scipy.org/doc/scipy/reference/sparse.html

Networkx also has it’s own flow solvers, but cvxpy gives you some interesting flexibility, like turning the problem mixed integer, quadratic terms, and other goodies. Plus it is very easy to get going as you’ll see.

So here’s a basic example of putting these two together. Very straightforward and cool.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
import networkx as nx import cvxpy as cvx import matplotlib.pyplot as plt import numpy as np from scipy.sparse import lil_matrix #graph is an networkx graph from somewhere #print(edgedict) nEdges = len(graph.edges) nNodes = len(graph.nodes) posflow = cvx.Variable(nEdges) negflow = cvx.Variable(nEdges) # split flow into positive and negative parts so we can talk about absolute value. # Perhaps I should let cvxpy do it for me constraints = [ 0 <= posflow, 0 <= negflow ] absflow = posflow + negflow flow = posflow - negflow L = nx.incidence_matrix(graph, oriented=True ) source = np.zeros(nNodes) #lil_matrix(n_nodes) # just some random source placement. source[7] = 1 source[25] = -1 # cvxpy needs sparse matrices wrapped. Lcvx = cvx.Constant(L) #sourcecvx = cvx.Constant(source) # flow conservation constraints.append(Lcvx*flow == source) # can put other funky inequality constraints on things. objective = cvx.Minimize(cvx.sum(absflow)) print("building problem") prob = cvx.Problem(objective, constraints) print("starting solve") prob.solve(solver=cvx.OSQP, verbose = True) #or try cvx.CBC, cvx.CVXOPT, cvx.GLPK, others np.set_printoptions(threshold=np.inf) print(absflow.value) |

Here was a cool visual from a multi commodity flow problem (nx.draw_networkx_edges)

Nice, huh.

Publishing these draft notes because it has some useful info in it and trying to reboot the project. It’s very ambitious. We’ll see where we get with it.

https://github.com/philzook58/nand2coq

Old Stuff (Last Edited 6/23/18):

So my friends and I are starting the nand2tetris course. I feel like I have some amount of familiarity with the topics involved, so I’d like to put it into challenge mode for me.

Week 1 is about basic combinatorial logic gate constructions and sort of the ideas of an HDL.

I was trying to keep up in Verilog and failing. The verilog syntax is a little bit more verbose.

Still not so bad.

The easiest thing to use was assign statements. The difference between = and <= in verilog is still a little opaque to me

I compiled them and ran them using Icarus verilog (iverilog and the vpp the output file).

I started using MyHDL but I’m not sure I saw why it was going to be easier? But the MyHdl docs did help me understand a bit why verilog is the way it is.

Here is a big list of interesting projects:

MyHDL – A python hardware description language. Can output VHDL and Verilog. based around python generators and some decorators.

Icarus Verilog – http://iverilog.wikia.com/wiki/Main_Page. iverilog Compiles verilog into a assembly format which can be run with vvp command.

example

1 2 3 |
iverilog myor.v not.v vpp a.out |

Verilator – Compiles Verilog to C for simulation

GTKWave – A Waveform viewer

IceStick – A cheap 20$ ish fpga usb board that can be programmed easily

IceStorm http://www.clifford.at/icestorm/ – An OpenSource toolchain for compiling to and programming ice40 fpga chips

IceStudio – a graphical block editor. Last I checked it was still a little clunky

EdaPlayground – online web app for writing code and giving to simulators

Formal tools:

yosys-smtbmc

symbiyosys

http://www.clifford.at/papers/2016/yosys-smtbmc/

http://zipcpu.com/blog/2017/10/19/formal-intro.html

icestick floorplan – https://knielsen.github.io/ice40_viewer/ice40_viewer.html

ZipCPU

open source fpga twitter https://twitter.com/ico_tc?lang=en

https://opencores.org/

Learning Verilog for FPGAs: The Tools and Building an Adder

Upduino – interesting set of boards. Cheap.

http://gnarlygrey.atspace.cc/development-platform.html#upduino

Questionable: Clash?

installing icestick on the mac

https://github.com/ddm/icetools

https://github.com/Homebrew/homebrew-core/issues/9229

Had to pip uninstall enum34. Weird.

Verilog

Start with module statement

end lines with semicolons.

You need to name instantiated elements

yosys -p “synth_ice40 -top not1 -blif not.blif” not.v

https://mcmayer.net/first-steps-with-the-icestorm-toolchain/

../icetools/arachne-pnr/bin/arachne-pnr -d 1k -P tq144 -o not.asc -p not.pcf not.blif

../icetools/icestorm/icepack/icepack not.asc not.bin

iceprog not.bin

1 |
sudo kextunload -b com.FTDI.driver.FTDIUSBSerialDriver |

The ftdi isn’t working

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
module alu( input [15:0] x , input [15:0] y , output [15:0] out , input zx // zero x , input zy // zero y , input nx // negate result on x , input ny // """ , input f // Plus is 1 or and if 0 , input no // negate result? , output zr // is it exactly zero , output ng // is out < 0 ); wire [15:0] zerox; wire [15:0] zeroy; wire [15:0] notx; wire [15:0] noty; wire [15:0] andplus; assign zerox = zx ? 0 : x; assign notx = nx ? ~zerox : zerox; assign zeroy = zy ? 0 : y; assign noty = ny ? ~zeroy : zeroy; assign andplus = f ? x + y : x & y; assign out = no ? ~andplus : andplus; assign zr = out == 0; assign ng = out[15] == 1; // check sign bit endmodule |

So I spent thanksgiving doing this and playing Zelda. Even though that sounds like a hell of a day, seems a little sad for thanksgiving :(. I should probably make more of an effort to go home next year.

I tried implementing a more traditional q-learning pipeline using cvxpy (rather than the inequality trick of the last time). Couldn’t get it to work as well. And it’s still kind of slow despite a lot of rearrangement to vectorize operations (through batching basically).

I guess I’m still entranced with the idea of avoiding neural networks. In a sense, that is the old boring way of doing things. The Deep RL is the new stuff. Using ordinary function approximators is way older I think. But I feel like it takes a problem out of the equation (dealing with training neural nets). Also I like modeling languages/libraries.

I kept finding show stopping bugs throughout the day (incorrectly written maxaction functions, typos, cross episode data points, etc.), so I wouldn’t be surprised if there is one still in here. It’s very surprising how one can convince oneself that it is kind of working when it is actually impossible it’s working. All these environments are so simple, that I suspect I could randomly sample controllers out of a sack for the time I’ve been fiddling with this stuff and find a good one.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 |
import gym import numpy as np import cvxpy as cvx from numpy.polynomial.chebyshev import chebval env = gym.make('Pendulum-v0') print(env.action_space) print(env.observation_space) print(env.observation_space.high) print(env.observation_space.low) # [1,1,8] print(env.action_space.high) # -2 print(env.action_space.low) chebdeg = 2 alpha = cvx.Variable(3*chebdeg**3) alpha.value = np.zeros(3*chebdeg**3) gamma = 1. - 1./50 def basis(s,a): n = np.arange(4) f1 = chebval(s[0]/1.,np.eye(chebdeg)).reshape(-1,1,1,1) f2 = chebval(s[1]/1.,np.eye(chebdeg)).reshape(1,-1,1,1) f3 = chebval(s[2]/8.,np.eye(chebdeg)).reshape(1,1,-1,1) f4 = ((a/2)**np.arange(3)).reshape(1,1,1,-1) return f1*f2*f3*f4 def vbasis(s,a): #n = np.arange(4) N = s.shape[0] #print(chebval(s[:,0]/1.,np.eye(chebdeg)).shape) f1 = chebval(s[:,0]/1.,np.eye(chebdeg)).T.reshape(N,-1,1,1,1) f2 = chebval(s[:,1]/1.,np.eye(chebdeg)).T.reshape(N,1,-1,1,1) f3 = chebval(s[:,2]/8.,np.eye(chebdeg)).T.reshape(N,1,1,-1,1) if type(a) == np.ndarray: f4 = (a.reshape(-1,1,1,1,1)/2)**(np.arange(3).reshape(1,1,1,1,-1)) else: f4 = (a/2)**(np.arange(3).reshape(1,1,1,1,-1)) return (f1*f2*f3*f4).reshape(N,-1) def qmaxaction(alpha,s): #n = np.arange(4) N = s.shape[0] #print(chebval(s[:,0]/1.,np.eye(chebdeg)).shape) f1 = chebval(s[:,0]/1.,np.eye(chebdeg)).T.reshape(N,-1,1,1,1) f2 = chebval(s[:,1]/1.,np.eye(chebdeg)).T.reshape(N,1,-1,1,1) f3 = chebval(s[:,2]/8.,np.eye(chebdeg)).T.reshape(N,1,1,-1,1) z = f1*f2*f3 a = (np.array([0,0,0.25]).reshape(1,1,1,1,-1)*z).reshape(N,-1)@alpha.value b = (np.array([0,0.5,0]).reshape(1,1,1,1,-1)*z).reshape(N,-1)@alpha.value acts = np.clip(-b/2/(a+1e-7), -2,2) q2 = vbasis(s,2)@alpha.value print(acts) qa = vbasis(s,acts)@alpha.value qn2 = vbasis(s,-2)@alpha.value return np.maximum(qa,np.maximum(qn2,q2)) def evalb(alpha, s,a): f = basis(s,a) return alpha*f.flatten() def maxaction(alpha, obs): f1 = np.sum(evalb(alpha.value, observation, 2)) f2 = np.sum(evalb(alpha.value, observation, 0)) f3 = np.sum(evalb(alpha.value, observation, -2)) coeff = np.polyfit([2,0,-2], [f1,f2,f3], deg=2) if coeff[0] < 0: action = -coeff[1]/2/coeff[0] action = min(max(action,-2),2) elif f1 > f3: #print("huh") action = 2 else: #print("how bout dat") action = -2 return np.array([action]) constraints = [] objective = 0 for x in range(4): constraints = [] epobs = [] eprewards = [] epactions = [] objective = 0 epsilon = 1.2/(x+1) print("epsilon: ", epsilon) epNum = 50 for i_episode in range(epNum): observations = [] rewards = [] actions = [] observation = env.reset() reward = -100 for t in range(100): prev_obs = observation prev_reward = reward if np.random.rand() < epsilon: action = env.action_space.sample() else: action = maxaction(alpha, observation) observation, reward, done, info = env.step(action) observations.append(observation) rewards.append(reward) actions.append(action) #constraints.append(evalb(alpha, prev_obs, action) >= prev_reward + gamma*evalb(alpha, observation, action + np.random.randn()*0.2))#maxaction(alpha, observation))) #objective += evalb(alpha, observation, env.action_space.sample()) if done: print("Episode finished after {} timesteps".format(t+1)) break epobs.append(observations) epactions.append(actions) eprewards.append(rewards) for qiter in range(20): print("Q iteration: ", qiter) objective = 0 for (observations,actions, rewards) in zip(epobs,epactions,eprewards): obs = np.array(observations) act = np.array(actions) bprev = vbasis(obs[:-1],act[1:]) # np.array([basis(s,a).flatten() for (s,a) in zip(observations[:-1],actions[1:])]) #bnext = np.array([basis(s,a+np.random.randn()*0.2).flatten() for (s,a) in zip(observations[1:],actions[1:])]) #obs = np.array(observations)[:-1] #q2 = np.array([basis(s,2).flatten() for s in observations[:-1]])@alpha.value #q2 = vbasis(obs[1:],2)@alpha.value #q0 = vbasis(obs[1:],0)@alpha.value #qn2 = vbasis(obs[1:],-2)@alpha.value #q1 = vbasis(obs[1:],1)@alpha.value #qn1 = vbasis(obs[1:],-1)@alpha.value #qs = [] #for a in np.linspace(-2,2,5): # qs.append(vbasis(obs[1:],a+np.random.randn()*0.5)@alpha.value) #for a in np.linspace(-0.1,0.1,3): # qs.append(vbasis(obs[1:],a)@alpha.value) #qmax = np.max(np.stack([q2,q0,qn2,q1,qn1],axis=1),axis=1) #qmax = np.max(np.stack(qs,axis=1),axis=1) qmax = qmaxaction(alpha, obs[1:]) #q0 = np.array([basis(s,0).flatten() for s in observations[:-1]])@alpha.value #qn2 = np.array([basis(s,-2).flatten() for s in observations[:-1]])@alpha.value #qmax = np.maximum(np.maximum(q0,q2),qn2)#np.array([np.dot(basis(s, maxaction(alpha,s)).flatten(),alpha.value) for s in observations[1:]]) rewards = np.array(rewards)[:-1] objective += cvx.sum(cvx.huber(bprev@alpha - (rewards + gamma*qmax))) #cvx.sum_squares(bprev@alpha - (rewards + gamma*qmax)) #bcost = np.array([basis(s,env.action_space.sample()).flatten() for s in observations]) #objective = cvx.sum(bcost@alpha) # np.sum(bnext,axis=0) * alpha loss = 0.001*cvx.sum_squares(alpha-alpha.value) + objective prob = cvx.Problem(cvx.Minimize(loss), constraints) # The optimal objective value is returned by `prob.solve()`. print("solving problem") result = prob.solve(verbose=True) print(result) # The optimal value for x is stored in `x.value`. print(alpha.value) #inspection loop for i_episode in range(4): observation = env.reset() #env.env.state[0] = np.random.randn()*sigma for t in range(200): env.render() #print(observation) prev_obs = observation action = maxaction(alpha, observation) #print(action) observation, reward, done, info = env.step(action) if done: print("Episode finished after {} timesteps".format(t+1)) break |

I also did the easy cartpole environment using the inequality trick. Seems to work pretty well.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
import gym import numpy as np import cvxpy as cvx from numpy.polynomial.chebyshev import chebval env = gym.make('CartPole-v1') print(env.action_space) print(env.observation_space) print(env.observation_space.high) print(env.observation_space.low) # [1,1,8] #print(env.action_space.high) # -2 #print(env.action_space.low) chebdeg = 3 alpha = cvx.Variable(2*chebdeg**4) alpha.value = np.zeros(2*chebdeg**4) gamma = 1. - 1./25 def basis(s,a): n = np.arange(4) f1 = chebval(s[0]/4.,np.eye(chebdeg)).reshape(-1,1,1,1,1) f2 = chebval(s[1]/10.,np.eye(chebdeg)).reshape(1,-1,1,1,1) f3 = chebval(s[2]/0.4,np.eye(chebdeg)).reshape(1,1,-1,1,1) f4 = chebval(s[3]/10.,np.eye(chebdeg)).reshape(1,1,1,-1,1) f5 = ((a/2)**np.arange(2)).reshape(1,1,1,1,-1) return f1*f2*f3*f4*f5 def evalb(alpha, s,a): f = basis(s,a) return alpha*f.flatten() def maxaction(alpha, obs): f1 = np.sum(evalb(alpha.value, observation, 0)) f2 = np.sum(evalb(alpha.value, observation, 1)) if f1 > f2: action = 0 else: action = 1 return action constraints = [] objective = 0 for x in range(4): constraints = [] objective = 0 epsilon = 1.2/(x+1) print("epsilon: ", epsilon) for i_episode in range(200): observation = env.reset() observations = [] rewards = [] actions = [] reward = -100 for t in range(50): prev_obs = observation prev_reward = reward if np.random.rand() < epsilon: action = env.action_space.sample() else: action = maxaction(alpha, observation) observation, reward, done, info = env.step(action) observations.append(observation) rewards.append(reward) actions.append(action) #constraints.append(evalb(alpha, prev_obs, action) >= prev_reward + gamma*evalb(alpha, observation, action + np.random.randn()*0.2))#maxaction(alpha, observation))) #objective += evalb(alpha, observation, env.action_space.sample()) if done: print("Episode finished after {} timesteps".format(t+1)) break #print(rewards) bprev = np.array([basis(s,a).flatten() for (s,a) in zip(observations[:-1],actions[1:])]) bnext = np.array([basis(s,env.action_space.sample()).flatten() for (s,a) in zip(observations[1:],actions[1:])]) rewards = np.array(rewards)[:-1] constraints += [(bprev@alpha) >= rewards + gamma*(bnext)@alpha] bcost = np.array([basis(s,env.action_space.sample()).flatten() for s in observations]) objective += cvx.sum(bcost@alpha) # np.sum(bnext,axis=0) * alpha loss = 1*cvx.sum(cvx.abs(alpha-alpha.value)) + objective prob = cvx.Problem(cvx.Minimize(loss), constraints) print("solving problem") result = prob.solve(verbose=True) print(result) print(alpha.value) #inspection loop for i_episode in range(4): observation = env.reset() #env.env.state[0] = np.random.randn()*sigma for t in range(200): env.render() #print(observation) prev_obs = observation action = maxaction(alpha, observation) #print(action) observation, reward, done, info = env.step(action) if done: print("Episode finished after {} timesteps".format(t+1)) break |

I also have some Work in Progress on getting full swingup cartpole. Currently is not really working. Seems to kind of be pumping about right? The continuous force control easy cartpole does work though.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 |
import gym import numpy as np import cvxpy as cvx from numpy.polynomial.chebyshev import chebval env = gym.make('CartPole-v1') print(env.action_space) print(env.observation_space) print(env.observation_space.high) print(env.observation_space.low) # [1,1,8] #print(env.action_space.high) # -2 #print(env.action_space.low) chebdeg = 2 varNum = 2*2*4*2*2*3 alpha = cvx.Variable(varNum) alpha.value = np.zeros(varNum) gamma = 1. - 1./150 def basis(s,a): n = np.arange(4) f1 = chebval(s[0]/4.,np.eye(2) ).reshape(-1,1,1,1,1,1) f2 = chebval(s[1]/10.,np.eye(2)).reshape(1,-1,1,1,1,1) f3 = chebval(s[2]/1,np.eye(4) ).reshape(1,1,-1,1,1,1) f4 = chebval(s[3]/1,np.eye(2) ).reshape(1,1,1,-1,1,1) f5 = chebval(s[4]/10.,np.eye(2)).reshape(1,1,1,1,-1,1) f6 = ((a/10)**np.arange(3)).reshape(1,1,1,1,1,-1) return f1*f2*f3*f4*f5*f6 def evalb(alpha, s,a): f = basis(s,a) return alpha*f.flatten() ''' def maxaction(alpha, obs): f1 = np.sum(evalb(alpha.value, observation, 0)) f2 = np.sum(evalb(alpha.value, observation, 1)) if f1 > f2: action = 0 else: action = 1 return action ''' def maxaction(alpha, obs): f1 = np.sum(evalb(alpha.value, observation, 10)) f2 = np.sum(evalb(alpha.value, observation, 0)) f3 = np.sum(evalb(alpha.value, observation, -10)) coeff = np.polyfit([10,0,-10], [f1,f2,f3], deg=2) if coeff[0] < 0: action = -coeff[1]/2/coeff[0] action = min(max(action,-10),10) elif f1 > f3: action = 10 else: action = -10 return np.array([action]) constraints = [] objective = 0 for x in range(4): constraints = [] objective = 0 epsilon = 1.2/(x+1) print("epsilon: ", epsilon) for i_episode in range(100): observation = env.reset() observations = [] rewards = [] actions = [] reward = -100 env.env.state[2] = np.random.rand()*2*np.pi for t in range(100): prev_obs = observation prev_reward = reward if np.random.rand() < epsilon or len(observation) < 5: action = np.random.rand()*20-10 else: action = maxaction(alpha, observation) a = action env.env.force_mag = abs(a) #min(100, abs(a)) #print(a) if a < 0: daction = 0 else: daction = 1 observation, reward, done, info = env.step(daction) o = observation observation = np.array([o[0],o[1],np.cos(o[2]),np.sin(o[2]),o[3]]) observations.append(observation) bonus = 0 if observation[2] > 0.9: bonus = 1 rewards.append( (bonus + observation[2] + 1)/2)#- observation[0]**2) #reward #print(rewards[-1]) actions.append(action) #constraints.append(evalb(alpha, prev_obs, action) >= prev_reward + gamma*evalb(alpha, observation, action + np.random.randn()*0.2))#maxaction(alpha, observation))) #objective += evalb(alpha, observation, env.action_space.sample()) x = observation[0] if x < -4 or x > 4: break #if done: # print("Episode finished after {} timesteps".format(t+1)) # break #print(rewards) bprev = np.array([basis(s,a).flatten() for (s,a) in zip(observations[:-1],actions[1:])]) bnext = np.array([basis(s,a+np.random.randn()*0.3).flatten() for (s,a) in zip(observations[1:],actions[1:])]) rewards = np.array(rewards)[:-1] #print(bprev.shape) #print(bnext.shape) constraints += [(bprev@alpha) >= rewards + gamma*(bnext)@alpha] bcost = np.array([basis(s,a+np.random.randn()*0.3).flatten() for s in observations]) objective += cvx.sum(bcost@alpha) # np.sum(bnext,axis=0) * alpha loss = 1*cvx.sum(cvx.abs(alpha-alpha.value)) + objective prob = cvx.Problem(cvx.Minimize(loss), constraints) print("solving problem") result = prob.solve(verbose=True) print(result) print(alpha.value) #inspection loop for i_episode in range(4): observation = env.reset() env.env.state[2] = np.random.rand()*2*np.pi #env.env.state[0] = np.random.randn()*sigma o = observation observation = np.array([o[0],o[1],np.cos(o[2]),np.sin(o[2]),o[3]]) for t in range(200): env.render() #print(observation) prev_obs = observation o = observation observation = np.array([o[0],o[1],np.cos(o[2]),np.sin(o[2]),o[3]]) action = maxaction(alpha, observation) a = action env.env.force_mag = abs(a) #min(100, abs(a)) #print(a) if a < 0: daction = 0 else: daction = 1 print(action) observation, reward, done, info = env.step(daction) if x < -4 or x > 4: break |

Now I feel that a thing that matters quite a bit is what is your choice of action for the next time step. Hypothetically you want a ton of samples here. I now think that using an action that is just slightly perturbed from the actual action works well because the actual action is tending to become roughly the optimal one. Subsequent time steps have roughly the same data in them.

One advantage of discrete action space is that you can really search it all.

Does that mean I should seriously investigate the sum of squares form? A semidefinite variable per data point sounds bad. I feel like I’d have to seriously limit the amount of data I’m using. Maybe I’ll be pleasantly surprised.

I haven’t even gotten to playing with different polynomials yet. The current implementation is exponentially sized in the number of variables. But in kind of a silly way. I think it would be better to use all terms of a bounded total degree.

http://web.mit.edu/~pucci/www/discountedLP.pdf

http://underactuated.mit.edu/underactuated.html?chapter=dp

There is a fun idea of using Linear Programming to do dynamic programming I originally saw in the underactuated robotics textbook.

In my experience reinforcement learning is finicky and depressing. It usually doesn’t work and is very hard to troubleshoot. Do you just need to run it for 10 minutes? 10 years? Is there a bug? God knows. I end up wriggling hyperparameters and praying a lot.

One part of this is the relative finickiness of neural network optimization compared to the technology of convex optimization. Convex optimization solvers are quite reliable and fast.

There is a way of phrasing Q learning as a linear programming problem

The linear programming approach relaxes the Bellman equations.

to

We can approach this forall in a couple ways, one of which is just sampling actions somehow. To make the constraint tight in places you minimize a weighting of Q

If Q is written as a linear combination of basis functions

The all of this put together is a linear program in the variables .

For ease, I used cvxpy. I don’t even store my state action pairs, which is quite lazy of me. Even here, compiling the linear program via cvxpy is kind of slow. This preprocessing step takes longer than the actual solve does. You could avoid cvxpy and directly interface a linear programming solver much faster, if that is your thing.

The whole process is still model free. I didn’t plug in pendulum dynamics anywhere. I run openAI gym and use the resulting state-action-state tuples to add inequalities to my cvxpy model. I weight where I want the inequalities to be tightest by using the actual states experienced.

Unfortunately, it still took a couple hours of hyper parameter tuning and fiddling to get the thing to work. So not a grand success on that point.

I made a lot of guesswork for what seemed reasonable

I parametrized the dependence of Q on `a`

by a quadratic so that it is easy to maximize analytically. That is what the polyfit stuff is about. Maximum of is at . I really should be checking the sign of the a coefficient. I am just assuming it is positive. Naughty boy.

m assuming that it

Chebyshev polynomials are probably good.

It seemed to help to use a slight perturbation of the actual action used on the right hand side of the Bellman inequality. My reasoning here is that the pendulum is actually a continuous system, so we should be using the differential Bellman equation really.

Should I allow for some kind of slack in the equations? Getting a bad reward or data point or one weird unusual state could ruin things for everyone. Inequalities are unforgiving.

Gamma seemed to matter a decent amount

The regularization of alpha seemed largely irrelevant.

Epsilon greediness seems to not matter much either.

Future ideas:

Might be good to replace the sampling of a with a Sum of Squares condition over the variable a.

Should I damp the update in some way? Add a cost the changing alpha from it’s previous value. A kind of damped update / using a prior.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 |
import gym import numpy as np import cvxpy as cvx from numpy.polynomial.chebyshev import chebval env = gym.make('Pendulum-v0') print(env.action_space) print(env.observation_space) print(env.observation_space.high) print(env.observation_space.low) # [1,1,8] print(env.action_space.high) # -2 print(env.action_space.low) chebdeg = 4 alpha = cvx.Variable(3*chebdeg**3) gamma = 1. - 1./100 def basis(s,a): n = np.arange(4) f1 = chebval(s[0]/1.,np.eye(chebdeg)).reshape(-1,1,1,1) f2 = chebval(s[1]/1.,np.eye(chebdeg)).reshape(1,-1,1,1) f3 = chebval(s[2]/8.,np.eye(chebdeg)).reshape(1,1,-1,1) f4 = ((a/2)**np.arange(3)).reshape(1,1,1,-1) return f1*f2*f3*f4 def evalb(alpha, s,a): f = basis(s,a) return alpha*f.flatten() def maxaction(alpha, obs): f1 = np.sum(evalb(alpha.value, observation, 2)) f2 = np.sum(evalb(alpha.value, observation, 0)) f3 = np.sum(evalb(alpha.value, observation, -2)) coeff = np.polyfit([2,0,-2], [f1,f2,f3], deg=2) action = -coeff[1]/2/coeff[0] action = min(max(action,-2),2) return np.array([action]) constraints = [] objective = 0 for x in range(4): constraints = [] objective = 0 epsilon = 1.2/(x+1) print("epsilon: ", epsilon) for i_episode in range(50): observation = env.reset() reward = -100 for t in range(100): prev_obs = observation prev_reward = reward if np.random.rand() < epsilon: action = env.action_space.sample() else: action = maxaction(alpha, observation) observation, reward, done, info = env.step(action) constraints.append(evalb(alpha, prev_obs, action) >= prev_reward + gamma*evalb(alpha, observation, action + np.random.randn()*0.2))#maxaction(alpha, observation))) objective += evalb(alpha, observation, env.action_space.sample()) if done: print("Episode finished after {} timesteps".format(t+1)) break loss = 0.1*cvx.sum(cvx.abs(alpha)) + objective prob = cvx.Problem(cvx.Minimize(loss), constraints) # The optimal objective value is returned by `prob.solve()`. print("solving problem") result = prob.solve() print(result) # The optimal value for x is stored in `x.value`. print(alpha.value) #inspection loop for i_episode in range(4): observation = env.reset() #env.env.state[0] = np.random.randn()*sigma for t in range(200): env.render() #print(observation) prev_obs = observation action = maxaction(alpha, observation) #print(action) observation, reward, done, info = env.step(action) if done: print("Episode finished after {} timesteps".format(t+1)) break |

Edit:

A improved version. Fixed the bug in my maxaction function. I shouldn’t have been assuming that it was always concave down.

Also vectorized slightly. Fairly significantly improves the solve time. Not much time is spent in cvxpy, now the solve is dominated by about 3 legitimate seconds in OSQP.

You can flip stuff in and out of loops to try different versions. This method is off-policy, so I could keep data around forever. However, it mostly just slowed the solve time.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 |
import gym import numpy as np import cvxpy as cvx from numpy.polynomial.chebyshev import chebval env = gym.make('Pendulum-v0') print(env.action_space) print(env.observation_space) print(env.observation_space.high) print(env.observation_space.low) # [1,1,8] print(env.action_space.high) # -2 print(env.action_space.low) chebdeg = 3 alpha = cvx.Variable(3*chebdeg**3) alpha.value = np.zeros(3*chebdeg**3) gamma = 1. - 1./100 def basis(s,a): n = np.arange(4) f1 = chebval(s[0]/1.,np.eye(chebdeg)).reshape(-1,1,1,1) f2 = chebval(s[1]/1.,np.eye(chebdeg)).reshape(1,-1,1,1) f3 = chebval(s[2]/8.,np.eye(chebdeg)).reshape(1,1,-1,1) f4 = ((a/2)**np.arange(3)).reshape(1,1,1,-1) return f1*f2*f3*f4 def evalb(alpha, s,a): f = basis(s,a) return alpha*f.flatten() def maxaction(alpha, obs): f1 = np.sum(evalb(alpha.value, observation, 2)) f2 = np.sum(evalb(alpha.value, observation, 0)) f3 = np.sum(evalb(alpha.value, observation, -2)) coeff = np.polyfit([2,0,-2], [f1,f2,f3], deg=2) if coeff[0] < 0: action = -coeff[1]/2/coeff[0] action = min(max(action,-2),2) elif f1 > f3: action = 2 else: action = -2 return np.array([action]) constraints = [] objective = 0 for x in range(4): constraints = [] objective = 0 epsilon = 1.2/(x+1) print("epsilon: ", epsilon) for i_episode in range(50): observation = env.reset() observations = [] rewards = [] actions = [] reward = -100 for t in range(100): prev_obs = observation prev_reward = reward if np.random.rand() < epsilon: action = env.action_space.sample() else: action = maxaction(alpha, observation) observation, reward, done, info = env.step(action) observations.append(observation) rewards.append(reward) actions.append(action) #objective += evalb(alpha, observation, env.action_space.sample()) if done: print("Episode finished after {} timesteps".format(t+1)) break bprev = np.array([basis(s,a).flatten() for (s,a) in zip(observations[:-1],actions[1:])]) bnext = np.array([basis(s,a+np.random.randn()*0.2).flatten() for (s,a) in zip(observations[1:],actions[1:])]) rewards = np.array(rewards)[:-1] constraints += [(bprev@alpha) >= rewards + gamma*(bnext)@alpha] bcost = np.array([basis(s,env.action_space.sample()).flatten() for s in observations]) objective += cvx.sum(bcost@alpha) # np.sum(bnext,axis=0) * alpha loss = 0.1*cvx.sum(cvx.abs(alpha-alpha.value)) + objective prob = cvx.Problem(cvx.Minimize(loss), constraints) print("solving problem") result = prob.solve(verbose=True) print(result) print(alpha.value) #inspection loop for i_episode in range(4): observation = env.reset() #env.env.state[0] = np.random.randn()*sigma for t in range(200): env.render() #print(observation) prev_obs = observation action = maxaction(alpha, observation) #print(action) observation, reward, done, info = env.step(action) if done: print("Episode finished after {} timesteps".format(t+1)) break |