How to implement really small and fast ORM with PHP Part 4 specification error handling

How to implement really small and fast ORM with PHP (Part 4: specification error handling)

Queries are gaining more and more complexity, data is getting bigger and bigger. Most optimizations in database technology are done in the database server. This is an approach to optimize queries on the client side.

We need a good API, so …

  • it should be easy to learn
  • method names must be short and intuitive
  • the goal is to map datasets and relations to objects
  • the API should offer method chaining
  • special features like auto-increments should be included
  • the code should be small, no getters and setters
  • the database schema is created before writing PHP code
  • relationships should be defined in the database, not in the code
  • we get low latencies combined with low memory usage


To make things easier, we make some restrictions:

  • only UTF-8
  • only MySQL (mysqli)
  • only PHP 5.4.0+
  • only buffered queries

Our ORM should have the same efficiency as handwritten SQL. So the following statements should produce only 1 query:

  • DBo::Guestbook(42)->Comments()->delete(); // “Guestbook” and “Comments” are tables
  • DBo::Guestbook(42)->Comments()->save(‘hidden’, 1);
  • DBo::Guestbook(42)->Comments()->count();
  • DBo::Student()->Attend()->Lecture()->Uses()->Book(); // “Uses”, “Book”, etc. are tables

The following statements should only update 1 column:

  • DBo::Guestbook(42)->Comments()->save(‘hidden’, 1);
  • DBo::Guestbook(42)->save(‘title’, ‘hello’);

The following statements should only select 1 column (that’s the hard one):

  • foreach (DBo::Guestbook() as $gb) echo $gb->title;

This gets implemented by a two-pass method: We store the columns used in the first run in DBo::usage_col[code position] and reuse it in the second run. The “code position” is the line where the constructor was called (returned from debug_backtrace()).

Meta data like columns or indexes should be fetched statically into the code (not during runtime). To create a join out of “Guestbook()->Comments()”, the relevant columns are chosen during runtime. The syntax is:

  • table.primary_key_field = other_table.table_primary_key_field
  • e.g. sale.id = salepos.sale_id

To export the schema, we use:


// export schema to schema.php
DBo::conn(new mysqli('127.0.0.1', 'root', 'some_pw'));
DBo::exportSchema();

To keep control over the queries, we allow normal queries and debugging:


$entries = DBo::query('SELECT * FROM guestbook WHERE id=?', [42]); // Iterator
foreach ($entries as $obj) {...}

$id = DBo::query('INSERT INTO guestbook VALUES (...)'); // LastInsert ID
// 42

$affected = DBo::query('UPDATE guestbook SET active=0'); // Affected rows
// 10

$subject = DBo::value('SELECT subject FROM guestbook WHERE id=42'); // String
// 'Hello World'

$categories = DBo::values('SELECT DISTINCT categories FROM guestbook'); // Array
// ['Sports', 'Movies', 'Music']

$row = DBo::one('SELECT * FROM guestbook WHERE id=42'); // Array
// [id=>42, title=>'hello']

$row = DBo::keyValue('SELECT id,title FROM guestbook'); // Array
// [42=>'hello', 43=>'world']

$row = DBo::keyValues('SELECT id,title,subject FROM guestbook'); // Array
// [42=>['title'=>'hello', 'subject'=>'world'], 43=>['title'=>...]

echo DBo::Guestbook(42)->Comments(); // SQL String
// SELECT a.* FROM Comments a, Guestbook b WHERE ...

echo DBo::Guestbook(42)->Comments()->explain(); // explain SQL string
// EXPLAIN SELECT ... id | select_type | table | type ...

DBo::Guestbook(42)->Comments()->print_r(); // print_r related comments
// Array( id=... )

We also allow transactions:


DBo::begin();
$dbo = DBo::Guestbook(42);
$dbo->Comments()->delete(); // DELETE FROM Comments ...
$dbo->save('comments_count', 0); // UPDATE Guestbook SET ...
DBo::commit();

Here are some examples how the ORM should work:


// create a new entry in table "Guestbook"
// set attribute values to "hello" and "world"
// finally print out the primary key (auto-generated by auto-increment)
// - gives INSERT INTO Guestbook SET subject='hello', details='world'
$obj = DBo::Guestbook();
$obj->subject = 'hello';
$obj->details = 'world';
$obj->save();
echo $obj->id; // 43

// map entry in table "Guestbook" with primary key "42" to "$obj"
// - gives SELECT * FROM Guestbook WHERE id=42
$obj = DBo::Guestbook(42);
if (!$obj->exists()) {...}
echo $obj->subject;
// or
echo DBo::Guestbook(42)->subject;

// update antry in table "Guestbook" with primary key "42", set "hidden" to "1"
// - gives UPDATE Guestbook SET hidden=1 WHERE id=42
$obj = DBo::Guestbook(42);
$obj->hidden = 1;
$obj->save();
// or
DBo::Guestbook(42)->save('hidden', 1);

// delete entry in table "Guestbook" with primary key "42"
// - gives DELETE FROM Guestbook WHERE id=42
DBo::Guestbook(42)->delete();

// increment a field in table "Guestbook" with primary key "42"
// - gives UPDATE Guestbook SET likes=likes+1 WHERE id=42
DBo::Guestbook(42)->save('likes=likes+1');

Doing 1:n and n:m relations should be also very easy:


// get all Comments for Guestbook entry with primary key 42
// join a 1:n relationship (table.id = table2.table_id)
// - gives SELECT * FROM Comments WHERE guestbook_id=42
$comments = DBo::Guestbook(42)->Comments();
foreach ($comments as $comment) {...}

// update comments
// - gives UPDATE Comments SET hidden=1 WHERE guestbook_id=24
// - note that traditional ORMs do one update statement for each dataset
DBo::Guestbook(42)->Comments()->save('hidden', 1);

// update comments with where predicate
// - gives UPDATE Comments SET active=0 WHERE active=1 AND guestbook_id=24
DBo::Guestbook(42)->Comments('active=1')->save('active', 0);

// deleting comments works in the same way
// - gives DELETE FROM Comments WHERE guestbook_id=24
DBo::Guestbook(42)->Comments()->delete();

// n:m Students attend Lectures
// - gives SELECT a.* FROM Lecture a, Attend b WHERE b.student_id=21 AND
// b.lecture_id = a.id
$lectures = DBo::Student(21)->Attend()->Lecture();
foreach ($lectures as $lecture) {...}

// n:m Students attend Lectures, Lecture uses Books
// - gives SELECT a.* FROM Book a, Uses b, Lecture c, Attend d
// WHERE d.student_id = 21 AND d.lecture_id = c.id
// AND c.id = b.lecture_id AND b.book_id = a.id
$books = DBo::Student(21)->Attend()->Lecture()->Uses()->Book();
foreach ($books as $book) {...}

Sometimes it is better to avoid normalization and store multiple values inside a string. This reduces the number of tables, relations and costly joins. This technique is useful if there is a limited number of related values, e.g. using a string value like “100,101,102” instead of a join. The data can be also encoded as a JSON string with ‘[100,101,102]’ or ‘[“100″,”101″,”102”]’. To do the encoding and decoding automatically, the names of the members can be prefixed with “arr_” and “json_”. Here is an example:


// automatic encoding and decoding of values
// - gives UPDATE Guestbook SET tags='sport,music,tv' WHERE id=42
// UPDATE Guestbook SET tags2='{"a":"b","c":"d"}' WHERE id=42
$obj = DBo::Guestbook(42);
$obj->save('arr_tags', ['sport','music','tv']); // field tags (Varchar)
// or
$obj->save('json_tags2', ['a'=>'b', 'c'=>'d']); // field tags2 (Varchar)

$obj = DBo::Guestbook(42);
print_r($obj->arr_tags); // Array([0] => sport\n [1] => music\n [2] => tv)
// or
print_r($obj->json_tags2); // Array([a] => b\n [c] => d)

Predicates can be defined in several ways:


// select one dataset
// - gives SELECT * FROM Guestbook WHERE id=10
DBo::Guestbook('id=10');
DBo::Guestbook('id=?', 10);
DBo::Guestbook(['id'=>10]);
DBo::Guestbook(10); // id is a numeric primary key

// select multiple datasets
// - gives SELECT * FROM Guestbook WHERE id IN (10,11,12)
DBo::Guestbook('id in (10,11,12)');
DBo::Guestbook('id in ?', [10,11,12]);
DBo::Guestbook(['id'=>[10,11,12]]);
DBo::Guestbook([10,11,12]); // id is a primary key

// select multiple primary keys
// - gives SELECT * FROM Guestbook WHERE (id,id2) IN ((10,11))
DBo::Guestbook('id=10 and id2=11');
DBo::Guestbook('(id,id2) in ?', [10,11]);
DBo::Guestbook('id=? and id2=?', 10, 11);
DBo::Guestbook(['id'=>10, 'id2'=>11]);
DBo::Guestbook([[10,11]]); // id and id2 are a primary key

// select multiple datasets with multiple primary keys
// - gives SELECT * FROM Guestbook WHERE (id,id2) IN ((10,1),(11,2))
DBo::Guestbook('(id,id2) in ((10,1), (11,2))');
DBo::Guestbook('(id,id2) in ?', [[10,1], [11,2]]);
DBo::Guestbook([[10,1], [11,2]]); // id and id2 are a primary key

// additional predicates
DBo::Guestbook(10, "active=1");
DBo::Guestbook("active=1 and foo=?", "bar");

// limit
foreach (DBo::Order("status=open")->limit(100) as $obj) {...

Custom SQL can be used:


$comments = DBo::object("SELECT * FROM Comments WHERE guestbook_id=?", [42]);
foreach ($comments as $comment) {...}

Custom classes can also be used:


class DBo_Sales extends DBo {
public function save() {
// execute some pre trigger, e.g. validation
if (empty($this->some_val)) throw new Exception(...);
parent::save();
// execute some post trigger
}

public function delete() {
if ($this->status != 'draft') throw new Exception(...);
parent::delete();
}

public function completed() {
return DBo::query('SELECT * FROM sales WHERE completed=1');
}

public function get_age() {
return date_diff(new DateTime($this->birthdate), new DateTime())->y;
}
}

// DBo_{table} is automatically used as class
print_r(DBo::Sales());
=> DBo_Sales Object (...

print_r(DBo::Sales()->completed());
=> returns completed() from Dbo_Sales

print_r(DBo::object("SELECT * FROM Sales")->completed());
=> returns completed() from Dbo_Sales

// get_{field}() is automatically used when the member not exists
print_r(DBo::Sales()->age);
=> returns get_age()

The database connection should be opened when the first query is being executed. We use:


class mysqli_lazy extends mysqli {
public function query($query) {
if (!@$this->host_info) parent::connect('127.0.0.1', 'root', '', 'db');
// or persistent connection: 'p:127.0.0.1'
return parent::query($query);
}
}
DBo::conn(new mysqli_lazy, 'db');

// instead of
DBo::conn(new mysqli('127.0.0.1', 'root', '', 'db'), 'db');
// or persistent connection: 'p:127.0.0.1'

To get all queries on stdout, we use:


class mysqli_log extends mysqli {
public function query($query) {
echo $query."\n";
return parent::query($query);
}
}
DBo::conn(new mysqli_log('127.0.0.1', 'root', '', 'db'), 'db');

The fastest way to get data is reading it from a hash table in the main memory. With the APC extension, we can persist data between many requests. The syntax for caching looks like this:


// cache categories for 60 seconds
$payments = DBo::Categories()->cache(60);
// [{id=>0, name=>Sports}, {id=>1, name=>Movies}, ...]

$payments = DBo::Categories()->cArray(60);
// [[id=>0, name=>Sports], [id=>1, name=>Movies], ...]

$payments = DBo::Categories()->ovalues('col_name', 60);
// [Sports, Movies, ...]

$payments = DBo::Categories()->okeyValue('col_id', 'col_name', 60);
// [0=>Sports, 1=>Movies, ...]

$payments = DBo::Categories()->count(60);
// 42

Broken queries or connection errors can be handled with try-catch:


try {
DBo::conn(new mysqli('127.0.0.1', 'root', '', 'db'), 'db');
DBo::query('select * from invalid');
}
catch (mysqli_sql_exception $e) {
echo $e->getMessage();
exit(1);
}

If column names are ambiguous, we need to prefix them with “@”:


// SELECT a.* FROM app a, os b WHERE a.os_id=b.id AND id=42 AND id=13
DBo::os('id=42')->app('id=13');

// SELECT a.* FROM app a, os b WHERE a.os_id=b.id AND b.id=42 AND a.id=13
DBo::os('@id=42')->app('@id=13');

The implementation in detail:

Running DBo::Student()->Attend()->Lecture()->array() as a single query can be implemented by using a stack. The query itself presents a chain of tables being joined by some predicates. Attend() and others are handled by PHP’s magic __call(). Every table in the chain pushes a new element on the stack containing the name of the table, the primary key and the required predicate(s).

Even with the best ORM, you can still write bad code:


foreach (DBo::os(10)->app() as $app) {
if (!$app->active) continue; // bad
...
}
foreach (DBo::os(10)->app('active=1') as $app) {...} // good, less data

foreach (DBo::os(10)->app() as $app) {
foreach ($app->compontent() as $compontent) {...} // bad, many queries
}
foreach (DBo::os(10)->app()->compontent() as $compontent) {...} // good

foreach (DBo::os(10)->app() as $app) $app->save('active', 1); // bad
DBo::os(10)->app()->save('active', 1); // good, 1 query

foreach (DBo::sale(10)->salepos() as $salepos) {
if (DBo::logistics($salepos->id, 'shipped=1')->exists()) {
$salepos->save('complete', 1); // bad
}
}
DBo::sale(10)->salepos()->logistics('shipped=1')
->salepos()->save('complete', 1); // good

In general, it is better to select as little data as possible from the database. Joins are often expensive, but if the database handles them correctly, it is much more efficient than performing joins directly in PHP.

More coming soon …