See also:

  • Datalog
  • concurrency

Key Value Store

log structured storage a log is a append only store LSM - log structured merge trees. In memory table for writes. Flushed to disk. Multiple read only written to disk, coalesced in background. sstable Tombstone records for deletes.

wide-column store key/value store



OLTP online transaction processing OLAP online analytical processing hyperloglog bloom filters cuckoo filter


sql injection everything is foreign keys? Interning

Recursive tables let you do datalog like stuff.

INSERT INTO edge(a,b)
SELECT a,b FROM edge;

--SELECT * FROM edge;

-- path(x,z) :- edge(x,y), path(y,z).
  path0(x,y) AS
    -- SELECT 1,2
    (SELECT a,b FROM edge UNION SELECT edge.a, path0.y FROM edge, path0 WHERE path0.x = edge.b )
  INSERT INTO path SELECT x,y FROM path0;
SELECT a,b FROM path;


  parent(x,y) AS
  SELECT a, min(b) (SELECT (a,b) FROM eq UNION eq, parent)

python sqlite3 in stdlib

import sqlite3
con = sqlite3.connect(':memory:')
cur = con.cursor()
# Create table
cur.execute('''CREATE TABLE stocks
               (date text, trans text, symbol text, qty real, price real)''')

# Insert a row of data
cur.execute("INSERT INTO stocks VALUES ('2006-01-05','BUY','RHAT',100,35.14)")

#cur.executemany("insert into characters(c) values (?)", theIter)
for row in cur.execute('SELECT * FROM stocks ORDER BY price'):

adapters to python types

sqlite loadable extensions



Saved queries that act as virtual tables


This is interesting

Aggregate functions

Window Functions


Functional Dependencies

Armstrong axioms

Normal Formals

Tuple Generating dependencies

Query Optimization

Cascades framework

The Chase

Equality Generating Dependencies

Yisu: query optimization data integration querying incomplete databases benchmarking the chase chasebench

Chasefun, DEMOo, Graal, llunatic, pdg, pegasus, dlv, e, rdfox

Stratgeies - (restricted, unrestricted, parallel, skolem, fresh-null

Chase Strategies vs SIPS

The power of the terminating chase

Is the chase meant to be applied to actual databases, symbolic databases / schema, or other dependencies? Is it fair the say that the restricted chase for full dependencies is datalog?

Alice book chapter 8-11

Graal - defeasible programming Something about extra negation power? Defeatable rules if something contradicts them Pure is part of graal

llunatic -

RDfox -

dlgp - datalog plus format. Allows variables in head = existentials. Variables in facts. Notion of constraint ! :- and notion of query. Hmm.

Ontology Formats

graph database OWL RDF sparql sparql slides shacl -

semantic web

Knowdlege representation handbook Course very similar to bap knoweldge base

Optimal Joins

worst case optimal join algorithm leapfrog triejoin Dovetail join - relational ai unpublished. Julia specific ish? use sparsity of all relations to narrow down search Worst case optiomal join Ngo pods 2012 leapfrog triejoin simpel worst case icdt 2015 worst case optimal join for sparql worst case optimal graph joins in almost no space Correlated subqueries: unnesting arbitrary queries How materializr and other databases optimize sql subqueries

Relational AI

snowflake databricks bigquery dbt fivetran

data apps - dapps

lookml sigma legend

Resposnive compilter - matsakis salsa.jl umbra/leanstore

incremental COnvergence of datalog over presmeirings differential dataflor cidr2013 reconciling idfferences 2011 Green F-IVM incrmenetal view mantinance with triple lock fotrization benefits

systemml vecame apache systemds

Semantic optimization FAW question asked frequence : Ngo Rudra PODS 2016 What do shannon type ineuqlaities submodular width and disjunctive datalog have to do with one another pods 2017 precise complexity analysis for efficient datalog queries ppdp 2010 functional aggregate queries with additive inequalities convergence of dtalog over pr-esemirign

Relational machine learning Layered aggregate engine for analystics worloads schelich olteanu khamis leanring models over relational data using sparse tenosrs The relational data borg is learning olteanu vldb keynote sturcture aware machine learning over multi relational database relational know graphs as the ofundation for artifical intelligence km-means: fast clustering for relational data Learning Models over Relational Data: A Brief Tutorial

duckdb for sql support calcite postgresql parser

Fortress library traits. OPtimization and parallelism triangle view mantenance


streaming 101 unbounded data

lambda architecture - low latency inaccurate, then batch provides accurate

event time vs processing time


Flink Apache Beam millwheel spark streaming


Conflict Free replicated datatypes martin Kleppmann

CRDT of string - consider fractional positions. Tie breaking. Bad interleaving problem unique identifiers

  • LSeq
  • RGA
  • TreeSeq crdt rich text

automerge: library of data structures for collab applications in javascript local first. use local persistent storage. git for your app’s data. rust implementation?

isabelle crdt I was wrong. CRDTs are the future

Conflict-free Replicated Data Types” “A comprehensive study of Convergent and Commutative Replicated Data Types

Operational Transformation - sequences of insert and delete. Moves possibly.

delta-based vs state-based


json crdt for vibes patches?

Tree move op. Create delete subtrees.

Big Data

Spark Hadoop MapReduce Dask Flink Storm

Mahout Vowpal Wabbit



Spark Databricks - company bigdatalog MLlib spark streaming graphx

Message brokrs

RabbitMQ Kafka


BigQuery Snowflake Azure AWS

Graph systems

It isn’t that relational systems can’t express graph problems. But maybe graph systems are more optimized for the problem neo4j Giraph Powergraph graphrex graphx myria graphchi xsteam gridgraph graphlab


  • create table
  • create index
  • explain query plan I saw explain analyze elsewhere
  • select
  • vacuum - defrag and gabrage collect the db
  • begin transaction


    sqlite commands that are interesting

  • .help
  • .dump
  • .tables
  • .schema
  • .indexes
  • .expert suggests indices?



duckdb embedded like sqlite?

Conjunctive-query containment and constraint satisfaction

Designing Data intensive systems martin kleppmann

scalability but at what cost? big systems vs laptops.

Data integration the relational logic approach

postgres indexes for newbies postgres tutorial raytracer in sql [advent of code sql(] sqllancer detecting lgoic bugs in dbms

  • Differential Datalog
  • CRDTs
  • Differential Dataflow
  • Nyberg Accumulators
  • Verkle Trees
  • Cryptrees
  • Byzantine Eventual Consistency
  • Self-renewable hash chains
  • Binary pebbling