Table Structure
- longer, narrower tables are generally better than wide, short tables
- tradeoff between query speed and index size
- NULLs
- 3NF
- Idea: don’t duplicate data, use keys and more tables; keep like with like
Keys & Indices
- PRIMARY: every row should have a unique identifier
- a table may have only one PRIMARY key
- generally bad practice not to have a PRIMARY KEY
- INDEX (see below)
- UNIQUE: unlike PRIMARY, can have NULLs
- search may be \(O(\log{n})\) instead of \(O(n)\)
- Optimizing queries with EXPLAIN
Example
DROP INDEX IF EXISTS individuals_state_idx;
DROP INDEX IF EXISTS committees_cmte_st_idx;
-- wicked slow, running time forever...
-- note sequential scans
EXPLAIN SELECT sum(1) as N FROM committees c
JOIN individuals i ON c.cmte_st = i.state
WHERE i.state = 'MA';
-- Build index on big table
CREATE INDEX ON individuals (state);
-- now it uses the index, running time
EXPLAIN SELECT sum(1) as N FROM committees c
JOIN individuals i ON c.cmte_st = i.state
WHERE i.state = 'MA';
-- Build index on small table
CREATE INDEX ON committees (cmte_st);
-- now it uses BOTH indexes, running time
EXPLAIN SELECT sum(1) as N FROM committees c
JOIN individuals i ON c.cmte_st = i.state
WHERE i.state = 'MA';