Optimization

Table Structure

  • longer, narrower tables are generally better than wide, short tables
  • tradeoff between query speed and index size
  • NULLs
  • 3NF
    • Idea: don’t duplicate data, use keys and more tables; keep like with like

Keys & Indices

  • PRIMARY: every row should have a unique identifier
    • a table may have only one PRIMARY key
    • generally bad practice not to have a PRIMARY KEY
  • INDEX (see below)
  • UNIQUE: unlike PRIMARY, can have NULLs
  • search may be \(O(\log{n})\) instead of \(O(n)\)
  • Optimizing queries with EXPLAIN

Foreign Keys

Partitioning

Housekeeping

Example

DROP INDEX IF EXISTS individuals_state_idx;
DROP INDEX IF EXISTS committees_cmte_st_idx;

-- wicked slow, running time forever...
-- note sequential scans
EXPLAIN SELECT sum(1) as N FROM committees c
JOIN individuals i ON c.cmte_st = i.state
WHERE i.state = 'MA';

-- Build index on big table
CREATE INDEX ON individuals (state);

-- now it uses the index, running time 
EXPLAIN SELECT sum(1) as N FROM committees c
JOIN individuals i ON c.cmte_st = i.state
WHERE i.state = 'MA';

-- Build index on small table
CREATE INDEX ON committees (cmte_st);

-- now it uses BOTH indexes, running time 
EXPLAIN SELECT sum(1) as N FROM committees c
JOIN individuals i ON c.cmte_st = i.state
WHERE i.state = 'MA';

Activity

Questions

  1. An entity relationship schema shows how _____are related to each other in a database.
  2. A primary key ____-ly identifies an observation in a ____.
  3. A primary key in one table can be a foreign key in another table. Can the foreign key alone uniquely describe a row of data in a table?
  4. What’s the purpose of primary keys and foreign keys?
  5. Can primary keys have repeat values in a table?
  6. Can foreign keys have repeat values in a table?
  7. Are primary and foreign keys related only through one-to-one relationships?