Rise of the Monster Query

Now we are handed a new task: find the n most critically acclaimed of the local movies. We assume the following declarations/definitions:

struct critic {
    std::string name;
    serial favorite_movie;     // foreign key to movies table
};

extern table<critic> critics;

extern const int n;

First we define a C++ function to build the scoring part of our query:

// Build a server-side expression that counts the critics who like a given movie:
//
exprn_mapper<int64_t>
score(const exprn_mapper<serial> &movie_id) {
    return scalar(
        critics
        .where(critics->favorite_movie == movie_id)
        .select(count_all)
    );
}

Then we can define our query:

const query<movie> local_hits =
    local_movies
    .distinct()
    .order(- score(local_movies->id))
    .limit(n);

Problem solved, but now the boss wants something else: generate the titles of these local hits, ordered alphabetically, case-insensitive. Okay:

const query<std::string> alphabetized_local_hit_titles =
    local_hits
    .order(upper(local_hits->title))
    .select(local_hits->title);

The SQL that quince generates for local_hits (assuming that we execute it) is more complex than the SQL it generates for local_movies; and the SQL for alphabetized_local_hit_titles is more complex again. But the C++ code that we are writing, reading and (if this were a real application) maintaining is not getting more complex. It's just one little step after another. When we use a query A in the definition of a larger query B, we don't look back at how A was defined; we treat it as though it were a simple table. Then we do the same with B to define C, and then D, and so on. We kind of know that somewhere in a dungeon there is an SQL statement with a thick fur of nested parentheses, and column aliases seeping from its every pore, but that is not something we humans have to confront ^[1] . As the SQL complexity grows, our cognitive load stays constant. We could keep this up all day.

Is it wise to build monster queries? Often no, but sometimes yes. If a single monster query replaces a C++ algorithm that issues many small queries, then the monster query might perform better, for several reasons. It might shift load from an application host to a DBMS host, which can be a win, depending on the circumstances; it probably reduces traffic between the application and the DBMS, and it may be advantaged by query optimization strategies within the DBMS.

How far you go in the direction of monster queries is up to you, the application designer. Quince has no opinion, but whatever you decide, quince allows you to express your choice in manageable C++.

^[1] We can confront it though. Queries have a to_string() method, which returns the SQL text, including placeholders for any bound data, in a std::string. E.g., assuming the most recent definition of local_movies, and assuming that our tables reside on a PostgreSQL database, alphabetized_local_hit_titles.to_string() reveals the whole ugly truth:

SELECT r$10 FROM (SELECT r$10 FROM (SELECT DISTINCT r$9, r$10 FROM (SELECT "movies"."id" AS r$9, "movies"."title" AS r$10 FROM (SELECT "screens"."id" AS r$14, "screens"."cinema_id" AS r$15, "screens"."current_movie_id" AS r$16 FROM (SELECT "cinemas"."id" AS r$11, "cinemas"."location.x" AS r$12, "cinemas"."location.y" AS r$13 FROM "test"."cinemas" WHERE ((("cinemas"."location.x")-($1))*(("cinemas"."location.x")-($2)))+((("cinemas"."location.y")-($3))*(("cinemas"."location.y")-($4))) <= ($5)*($6)) AS q$1 INNER JOIN "test"."screens" ON "screens"."cinema_id" = r$11) AS q$2 INNER JOIN "test"."movies" ON "movies"."id" = r$16) AS q$3) AS q$4 ORDER BY (SELECT "count"(*) AS r$20 FROM "test"."critics" WHERE "critics"."favorite_movie" = r$9 LIMIT 1) DESC LIMIT 2) AS q$5 ORDER BY "upper"(r$10)