Databases
- Key Value Store
- Algorithms
- SQL
- Schema
- Ontology Formats
- Optimal Joins
- Relational AI
- Streaming
- CRDTs
- Big Data
- Graph systems
- Resources
See also:
- Datalog
- concurrency
Key Value Store
log structured storage a log is a append only store LSM - log structured merge trees. In memory table for writes. Flushed to disk. Multiple read only written to disk, coalesced in background. sstable Tombstone records for deletes. https://www.igvita.com/2012/02/06/sstable-and-log-structured-storage-leveldb/
wide-column store key/value store
Algorithms
B-trees
OLTP online transaction processing OLAP online analytical processing hyperloglog bloom filters cuckoo filter
SQL
sql injection https://ctf101.org/web-exploitation/sql-injection/what-is-sql-injection/ everything is foreign keys? Interning
Recursive tables let you do datalog like stuff.
CREATE TABLE edge(a INTEGER, b INTEGER);
INSERT INTO edge(a,b)
VALUES
(1,2),
(2,3),
(3,4);
SELECT a,b FROM edge;
CREATE TABLE path(a INTEGER, b INTEGER);
--INSERT INTO path
--SELECT * FROM edge;
-- path(x,z) :- edge(x,y), path(y,z).
WITH RECURSIVE
path0(x,y) AS
-- SELECT 1,2
(SELECT a,b FROM edge UNION SELECT edge.a, path0.y FROM edge, path0 WHERE path0.x = edge.b )
INSERT INTO path SELECT x,y FROM path0;
SELECT a,b FROM path;
.quit
UF
WITH RECURSIVE
parent(x,y) AS
SELECT a, min(b) (SELECT (a,b) FROM eq UNION eq, parent)
import sqlite3
con = sqlite3.connect(':memory:')
cur = con.cursor()
# Create table
cur.execute('''CREATE TABLE stocks
(date text, trans text, symbol text, qty real, price real)''')
# Insert a row of data
cur.execute("INSERT INTO stocks VALUES ('2006-01-05','BUY','RHAT',100,35.14)")
#cur.executemany("insert into characters(c) values (?)", theIter)
for row in cur.execute('SELECT * FROM stocks ORDER BY price'):
print(row)
adapters to python types https://en.wikipedia.org/wiki/Materialized_view
indices
views
Saved queries that act as virtual tables
triggers
This is interesting
Aggregate functions
Window Functions
Schema
https://en.wikipedia.org/wiki/Database_normalization
Functional Dependencies
Armstrong axioms
Normal Formals
Tuple Generating dependencies
Query Optimization
The Chase
Equality Generating Dependencies
Yisu: query optimization data integration querying incomplete databases benchmarking the chase chasebench
Chasefun, DEMOo, Graal, llunatic, pdg, pegasus, dlv, e, rdfox
Stratgeies - (restricted, unrestricted, parallel, skolem, fresh-null
Chase Strategies vs SIPS
The power of the terminating chase
Is the chase meant to be applied to actual databases, symbolic databases / schema, or other dependencies? Is it fair the say that the restricted chase for full dependencies is datalog?
Alice book chapter 8-11
Graal - https://github.com/hamhec/DEFT https://hamhec.github.io/DEFT/ defeasible programming http://lidia.cs.uns.edu.ar/delp_client/ Something about extra negation power? Defeatable rules if something contradicts them Pure is part of graal
llunatic - https://github.com/donatellosantoro/Llunatic
RDfox - https://docs.oxfordsemantic.tech/
dlgp - datalog plus format. Allows variables in head = existentials. Variables in facts.
Notion of constraint ! :-
and notion of query. Hmm.
Ontology Formats
graph database OWL RDF sparql sparql slides shacl -
semantic web
Knowdlege representation handbook Course https://web.stanford.edu/class/cs227/Lectures/lec02.pdf very similar to bap knoweldge base
Optimal Joins
worst case optimal join algorithm leapfrog triejoin https://github.com/frankmcsherry/blog/blob/master/posts/2018-05-19.md Dovetail join - relational ai unpublished. Julia specific ish? https://relational.ai/blog/dovetail-join use sparsity of all relations to narrow down search Worst case optiomal join Ngo pods 2012 leapfrog triejoin simpel worst case icdt 2015 worst case optimal join for sparql worst case optimal graph joins in almost no space Correlated subqueries: unnesting arbitrary queries How materializr and other databases optimize sql subqueries
Relational AI
https://www.youtube.com/watch?v=WRHy7M30mM4&ab_channel=CMUDatabaseGroup
snowflake databricks bigquery dbt fivetran
data apps - dapps
lookml sigma legend
Resposnive compilter - matsakis salsa.jl umbra/leanstore
incremental COnvergence of datalog over presmeirings differential dataflor cidr2013 reconciling idfferences 2011 Green F-IVM incrmenetal view mantinance with triple lock fotrization benefits
systemml vecame apache systemds https://systemds.apache.org/
Semantic optimization FAW question asked frequence : Ngo Rudra PODS 2016 What do shannon type ineuqlaities submodular width and disjunctive datalog have to do with one another pods 2017 precise complexity analysis for efficient datalog queries ppdp 2010 functional aggregate queries with additive inequalities convergence of dtalog over pr-esemirign
Relational machine learning Layered aggregate engine for analystics worloads schelich olteanu khamis leanring models over relational data using sparse tenosrs The relational data borg is learning olteanu vldb keynote sturcture aware machine learning over multi relational database relational know graphs as the ofundation for artifical intelligence km-means: fast clustering for relational data https://arxiv.org/abs/1911.06577 Learning Models over Relational Data: A Brief Tutorial
duckdb for sql support calcite postgresql parser
Fortress library traits. OPtimization and parallelism https://relational.ai/blog/categories/research
https://arxiv.org/abs/2004.03716 triangle view mantenance
Streaming
streaming 101 unbounded data
https://en.wikipedia.org/wiki/Stream_processing
lambda architecture - low latency inaccurate, then batch provides accurate
event time vs processing time
Watermark
Flink Apache Beam millwheel spark streaming
https://materialize.com/blog
CRDTs
Conflict Free replicated datatypes https://crdt.tech/ martin Kleppmann
CRDT of string - consider fractional positions. Tie breaking. Bad interleaving problem unique identifiers
- LSeq
- RGA
- TreeSeq
https://www.inkandswitch.com/peritext/ crdt rich text
https://github.com/josephg/diamond-types https://josephg.com/blog/crdts-go-brrr/
https://github.com/yjs/yjs
automerge: library of data structures for collab applications in javascript https://mobiuk.org/abstract/S4-P5-Kleppmann-Automerge.pdf local first. use local persistent storage. git for your app’s data. rust implementation?
isabelle crdt I was wrong. CRDTs are the future
Conflict-free Replicated Data Types” “A comprehensive study of Convergent and Commutative Replicated Data Types
Operational Transformation - sequences of insert and delete. Moves possibly.
delta-based vs state-based https://bartoszsypytkowski.com/the-state-of-a-state-based-crdts/
counters
json crdt for vibes patches?
Tree move op. Create delete subtrees.
Big Data
Spark Hadoop MapReduce Dask Flink Storm
Mahout Vowpal Wabbit
hadboop
Giraph
Spark
https://en.wikipedia.org/wiki/Apache_Spark Databricks - company bigdatalog https://www.cis.upenn.edu/~susan/cis700/Papers/BigDataAnalyticsSPARK.pdf https://github.com/ashkapsky/BigDatalog MLlib spark streaming graphx
Message brokrs
RabbitMQ Kafka
Services
BigQuery Snowflake Azure AWS
Graph systems
It isn’t that relational systems can’t express graph problems. But maybe graph systems are more optimized for the problem neo4j Giraph Powergraph graphrex graphx myria graphchi xsteam gridgraph graphlab
SQL
create table
create index
explain query plan
I sawexplain analyze
elsewhereselect
vacuum
- defrag and gabrage collect the dbbegin transaction
sqlite
sqlite commands that are interesting
.help
.dump
.tables
.schema
.indexes
.expert
suggests indices?
Resources
Conferences
- SIGMOD PODS https://sigmod.org/pods-home/ pods uutorials https://sigmod.org/pods-home/pods-tutorials/ Testy of time awards Cool stuff in here.
- VLDB
- HYTRADBOI https://www.hytradboi.com/ also very cool stuff.
Misc
Datavases, types, and the relational model The third manifesto
duckdb embedded like sqlite?
Conjunctive-query containment and constraint satisfaction
Designing Data intensive systems martin kleppmann
scalability but at what cost? big systems vs laptops.
Data integration the relational logic approach
postgres indexes for newbies postgres tutorial raytracer in sql [advent of code sql(https://news.ycombinator.com/item?id=29467671)] sqllancer detecting lgoic bugs in dbms
- Differential Datalog
- CRDTs
- Differential Dataflow
- Nyberg Accumulators
- Verkle Trees
- Cryptrees
- Byzantine Eventual Consistency
- Self-renewable hash chains
- Binary pebbling