Haskell Gloss is awesome

Gloss is a super simple binding to drawing 2d stuff in a window for haskell


Given how relatively bewildering Hello World feels in Haskell, it’s surprising how easy it is to get an animated window going. I think that may be because of expectations. You expect Hello world to be really easy and have no confusion and yet there is an inexplicable IO monad-what? , whereas you expect drawing to involve a bunch of inexplicable boiler plate bullshit.

This simulates a pendulum. Drawing it as a line.

simulate takes a pile of arguments

first thing describes a window, with title text, size and screen position I think?

then background color

frames per second

initial state (0 angle 0 angular velocity)

a drawing function from state to a picture, which is a gloss type of lines and crap

and a state update function taking a time step dt and current state.


I feel like there is room for a completely introductory tutorial to Haskell using gloss. It’s so rewarding to see stuff splash up on the screen.

MAYBE I’LL DO IT. or maybe not.

An ignoramus thinking about Compiling to Categories


I haven’t thoroughly read this (in fact barely at all) and yet I have some thoughts.

Conal Elliott is up to some good shit. Maybe ignore me and just checkout the links.

Simply Typed Lambda Calculus (STLC)

Hask is a category of haskell types with functions as arrows between them, but there is a lot going on in Haskell. Polymorphism for example, which is often modeled as some kind of Natural Transformation of EndoFunctors from Hask to Hask (but I don’t think this covers every possible use of polymorphism. Maybe it does).

It is commonly mentioned that STLC is the more direct mapping. STLC is kind of a subset of haskell with no polymorphism or with the polymorphism stripped out (Every time you use a polymorphic function just monomorphize it. This blows up the number of functions floating around).

STLC is a Cartesian Closed Category (CCC), which means it is always possible to create pairs between any two types and functions between any two types.

data STLCType a = Prim a | Pair (STLCType a) (STLCType a) | Arr (STLCType a) (STLCType a)

data STLCTerm a = Lam Var STLCTerm | App STLCTerm | Var Var | Prim a

data Var = Var String STLCType


which maybe be extendible with a bunch of primitive operations and types (primitives might include addition, integers, bits, bit operations, etc.). It isn’t clear to me where it is most elegant to put type annotations. Maybe it is best to keep them separate and compare them.

Apparently it is possible to compile this in a point free form

data CatTerm a = Comp STLCTerm STLCTerm | App STLCTerm STLCTerm | Prim a

or maybe

data CatTerm a = App STLCTerm STLCTerm | Prim a| Comp

Dealing with the labeling of variables correctly is a real pain in the ass, so this is a win from the compiler standpoint. It is a pain to manually write functions using this style.

The compiling to categories project I think is using Haskell as the language and GHC to do the parsing and some other stuff, but then grabbing Core (the GHC intermediate language) and converting it into the Category form. I don’t see why you couldn’t use an STLC DSL and do the same. It would be less ergonomic to the user but also much less complicated. I wonder. I’ve written interpreters for STLC and they are very simple.

Circuits form a CCC. Basic Prim type is a wire with a Boolean value on it. Pairs of these make a bus. Composition is just attaching wires between subunits. PrimOps might include Ands and Ors and Nands and multiplexers and such. Or maybe you want to work at the level where 32-bit integers are primitives and addition and subtraction and other arithmetic functions are primops.

The Arrow type is more questionable. Can you really do higher order functions in fixed circuitry? In principle, yes. Since every prim type is finite and enumerable, arrows are also finitely enumerable. You could use a varying lookup table for example as the incoming data. This is an exponential blowup though. Maybe you ought to be working in the category where arrows are types of functions that are compactly encodable and runnable. You don’t want really stupidly massive circuits anyhow. Some kind of complexity theory restriction. For example, maybe you want all functions encodable in a tight BDD. You really need to shape of the BDD to be fixed. Maybe BDD whose width is bounded by some polynomial of the depth? If you don’t need the full width, maybe you could leave sections dead. Just spitballin’

Or in many cases I assume the higher order functions will come applied at compile time, in which case you can just substitute the given circuit in to the locations it is needed or share it somehow (probably needs a clock to time multiplex its use) at the multiple locations it is needed to save space.

Also of extreme interest:


He’s using this compiling to categories perspective with the intent to layout parallel circuits.

He uses Generic DataTypes with are very functory, which implies type parameters which tempts one into polymorphism. But I think again he is using polymorphism as a scheme which upon compilation gets turned into a bunch of different legit types. Maybe you’d want

data STLCType f a = Prim (f a) | Pair (STLCType f a) (STLCType f a) | Arr (STLCType f a) (STLCType f a)


You could do well to make way more lightweight operator synonyms for this lambda calculus

Lam String LC | App LC LC | Var String

(–>) = Lam


(\\) = Lam

and some synonyms for common variables

x = “x”

y = “y”



($$) = App

to dereference

(**) = Var  – bind this very close. Trying to look kind of like pointer derferencing?

maybe also add pattern matching into the language

(Pat) =

And some kind of autocurrying

Curry [Var] |

Maybe use Vec instead of list for compile time size.

I guess this is gonna be funky and having the actual lambda syntax compile is super cool and nice. But it is also nice to have a userland haskell concept of this stuff without weird plugins. A OverloadLambda extension would be cool. I don’t know how it would work. Type directed inference of which kind of lambda to use? Is that possible?


Also, could one piggyback on the haskell type system to make the well typing lighter weight a la the well-typed interpeter. Maybe using GADTs. http://docs.idris-lang.org/en/latest/tutorial/interp.html

It does seem unlikely that one could use raw lambdas in a good way


Checkin out the OpenAI Baselines

These are my notes on trying to edit the opeai baselines codebase to balance a cartpole from the down position. They are pretty scattered.

First I just run the built in examples to get a feel and try out deepq networks.

The PPO algorithm at the bottom is the reccommended one still I think. I got the pole kind of upright and would balance for a couple seconds maybe. Work in progress. The ppo example has some funky batching going on that you need to reshape your observations for.




Some of these are ready to roll on anything you throw at them

They use python3.

pip3 install tensorflow-gpu

pip3 install cloudpickle

running a deepq cartpole example

checkout source


what is all this shit.

Has quickly suppressed  exploration

has a clear upward trend. But the increase dies off at episode 300 or so. Learning rate decay?

took around 300 episodes to reach reward 200

Pretty good looking

The pong trainer is now called run_atari.py

sudo apt install python3-opencv

Hmm. Defaults to playing breakout actually. Highly exploratory at first. Still basically totally random after a couple mins

The buffer size is actually smaller than I’d expected. 10000?

simple.py is where the learn function is defined.

Hmmm explloration fraction is fraction of time for random play to be turned off over.

Is it getting better? Maybe. Still at 90% random. reward ~0.2

After about 2 hours 13,000 episodes still at 32% exploration. reward ~ 2

How much is from reduction of randomness, vs improvement of q-network? I suppose I could make a graph of reward vs exploration % using current network


trying ppo2 now (this is apparently openai goto method at the moment for first attempt)


I don’t know what to attribute this to necessarily but it is very quickly outperforming the q learning

like in 30seconds

clipfrac is probably how often the clipping bound gets hit

approxkl is approximate kullback-leibler divergence

looking inside the networks,

pi is the move probability distribution layer

vf is the value function

The MlpPolicy class will be useful for lower dimensional tasks. That is the multilayer perceptron, using  a bunch of fully connected layers

on a non simulated system, lower processor count to 1. The -np option on mpirun

Or just follow the mujoco example which is using only 1 cpu

Ah. This whole time it was saving in the /tmp folder in a time stamped folder



without threshold stopping, the thing went absolutely insane. You need to box in the pole


trying to make the square of height, so  that it more emphasizes a total height achieved? Maybe only give reward for new record height? + any height nearly all the way up

beefing up the force magntidue. I think it is a little too wimpy to get it over vertical

Maybe lower episode end so it spends more time trying to get higher than trying to just stay in bounds

Wait, should I not add those vector wrapper guys?


Hey! Sort of kind of success… It gets it up there sometimes, but then it can’t really keep it up there… That’s odd.

Wow. This is not a good integrator. Under no force the pendulum is obviously gaining energy. Should that matter much? I switched around the ordering to get a leapfrog. Much better.

custom cartpole instance to work from down position and editted to work with ppo2. To more closely match mujoco example, i switched from discrete actions to a continuous choice. I was trying to shape the reward to keep it in bounds. We’re getting there maybe. Picking the starting position happens in reset. Most everything else is a straight copy from the gym env


A script to view resulting network






How I Ruined Today Reading Haskell Posts

Lot of gold here (comments included). Graphs are important and I haven’t yet grokked the FGL thing or other methods of handling them

Functional programming with graphs from haskell



A new methodology for Arduino Haskell bindings


Sharing in Haskell using Data.Reify.


Resources on String Diagrams, and Adjunctions, and Kan Extensions

I’ve been trying to figure out Kan Extensions

Ralf Hinze on Kan Extensions



But while doing that I went down a rabbit hole on String Diagrams

This post is the first one on String Diagrams that made sense to me.


I had seen this stuff before, but I hadn’t appreciated it until I saw what Haskell expressions it showed equivalence between. They are not obvious equivalences

This seems like a very useful video on this topic.


In Summary, it is an excellent notation for talking about transformations of a long sequence of composed Functors  F G H … into some other long sequence of Functors. The conversion of functors runs from up to down. The composition of functors procedes left to right.  F eta is the fmap of eta, and eta F is eta with the forall’ed type unified to be of the form F a.

Adjunctions L -| R are asymmetric between cups and caps. L is on the left in cups and on the right in caps. That’s what makes squiggles pull straightable

I think I have an interesting idea for a linear algebra library based on this stuff


John Baez and Mike Stay’s Rosetta Stone (A touch stone I keep returning to)


Dan Piponi gave a talk which is another touch stone of mine that I come back to again and again. There is a set of corresponding blog posts.

Other resources:

NCatLab article


John Baez hosted seminars





Dan Marsden’s Article


Marsden and Hinze have been collaborating



Stephen Diehl on Adjunctions



A Section From an old Oregon Programming Language Summer School (a rich set of resources)


Marsden and Hinze have been collaborating




Mike Stay doing a very interesting series of Category Theory in Javascript. He uses contracts in place of types. Defeats one of the big points of types (static analysis), but still pretty cool



I think that about covers everything I know about.

Oh yeah, there is the whole Coecke and Abramsky categorical quantum mechanics stuff too.

Using the Purescript Servant Bridge

Alright, here is some garbage code. Hope it helps you, sorry if it confuses you.

Checkout the Counter example


that is where I pulled everything from. I ripped out just about everything fancy just so you and I can see the truly bare bones example.

It’s mostly boiler plate on the generator side.

He does a couple fancier things that you may want, like rearranging names and changing folders. Here is a more basic example

This goes in app/PSGenerator.ps if you’re using a stack template

Mostly things can just be the defaults. Look at the Counter Example for some more config stuff you can do.

You do need to separate out the user json api by itself. If you hand writeAPIModuleWithSettings an api that has the RAW serving the index.html, it freaked out. Maybe there is a way to handle that, but it’s not like that is probably what you want anyhow.

The myTypes Sum Type you want to add to for every type that you want to export over to purescript. frontEndRoot is where the generated files will go.

The Proxy business is a bunch of typelevel programming boilerplate. So is the empty MyBridge type.

There is basically no content to this code.

You also need to add this app/PSGenerator.hs file to your cabal file.


Every time you want to run the generator, you need to run

stack exec psGenerator


This then will put an API file and a Data type file into your purescript source in frontend/src

Using the API is a touch annoying but correct. If you look at the generated signature

There are a lot of constraints you need to satisfy in the monad m in order to call this thing. You need a monad that is Reader-like for getting the SPSettings_, needs to handle a Possible AjaxError, and needs to be an Aff-like monad. Woof.

It makes sense that you’d want to do all of this, but it is a burdensome mental overhead to get started.

Here’s some very basic source that shows how to at least get to the stage where you can log it the resulting request. I’m just dumping the error like a bad boy.

Note that you have to install purescript-servant-support as it tells you when you run psGenerator. I’ve been using psc-package. It is often helpful to go in to the psc-package.json file and update to the latest package-set. Just a little tip.

You see that the ExceptT handles the AjaxError and the ReaderT supplies the settings, which uses the defaults + a baseURL to point the request to

The whole thing is run inside an Aff monad.


Here’s the basic servant code


Again, I started using the stack servant template, whose directory structure I’m complying with.


Edit: Some more comments: Purescript bridge is a seperate project from servant-purescript. Purescript bridge will translate your Haskell types. Servant purescript writes your api calls. The two lines in the main of PSGenerator.hs do these sepearte tasks. the writeAPI writes the API calls and  writePSTypes writes the types.

If you want to transport a parametrized data type like (Maybe a) in the myTypes things, hand the type a special type from here


works like a charm



Downloading and Collecting Coursera videos

I like to watch and listen to my coursera videos on my commute. The app has download functionality but the quizzes and crap require your intervention. I need just a block of stuff so I can be hands free.

coursera-dl is a command line tool to download coursera content


basic usage is like so

coursera-dl -h is a help menu

you can pass in your username and password with -u and -p or setup a ~/.netrc file as described in the README

coursera-dl –list-courses -n

I think it should list courses by default honestly.

This downloads all mp4 videos

coursera-dl cloud-computing -n -f “mp4”


I then made a dirty script that will go through each week and concatenate the videos of that week hopefully in order into a single mp4 file. It is not a clean script. It will throw some errors and build some weird extra files, but it gets the job done. Run it in the course directory





Deep Learning Coursera Notes


cross-entropy – expectation value of log(p).

initialization – randn for weights. use 2/sqrt(input size) if using relu. See He. Avoids blow up

epoch – one run through all data

mini-batch – break up data into 1 gpus worth chunks. Worth trying different values to see

momentum – smooths gradients that are oscillating and lets build up

Adam – combined momentum and RMS prop. Works better often? 0.9 for beta1 and 0.999 for beta2 are common parameters.


Hyperparameter search – random points use log scales for some things.

reevalute your hyperparametrs occasionally

batch normalization – adds a normalization and mean subtraction at every hidden layer. makes later neurons less susceptible to earlier changes

tensorflow – variables – placeholder, make sessions, run a trainer

strategy – fit training set, dev set, test set, real world

use better optimizer bigger network if not fitting training

use more dat, rgularize if not

satisficing metric

add weight to realyy important examples

bias  – perforance on triainig set – human level is good benchmark

error analysis – ceiling on performance. Find out how many of some kind of problem are happening to figure out what is worthwhile. Do it manually

reasonably robust to random errors in training set

build first fast, iterate fast

if you need to use a different distro from training set, use the real stuff mostly in your dev and test

Break up into train dev and train-dev. so that you can know if the problem is due to mismatch or due to overfitting

manually try to make training set more like dev set on problem cases. Maybe add noise or find more examples of the error prone thing

Transfer learning

Multi-task learning

end to end – use subproblems if you have data for subproblems


And… I can’t access the two last ones yet. Poo.