One solution is to run the query twice, once with just a count(*) and then the actual query, but that'd be very inefficient.
A few experiments later I think I have a solution; a bit inefficient than running the plain query but much better than running the query twice.
with REAL_QUERY as (select relname, oid from pg_class limit 5) -- User should ignore the rest of the query select *, show_rowcounter() as row_count from (select REAL_QUERY.*, increment_rowcounter() as row_number from REAL_QUERY, (select reset_rowcounter()) as _c order by row_number offset 0) as _subq;
Don't get scared by the length of the query above, all one needs to do is put the real query between the first pair of parenthesis after the REAL_QUERY identifier.
The result would be the same as the original query, but with two additional columns: row_number and row_count. row_number numbers each row starting from 1, and row_count shows the total number of rows in the result.
. relname | oid | row_number | row_count ---------------+------+------------+----------- pg_statistic | 2619 | 1 | 5 pg_type | 1247 | 2 | 5 pg_attribute | 1249 | 3 | 5 pg_toast_1262 | 2844 | 4 | 5 pg_toast_2604 | 2830 | 5 | 5 (5 rows)The 'ORDER BY row_number' clause adds the overhead, but it is necessary so that the row_count is calculated before the first row is produced at the top level. I wish I could introduce a MATERIALIZE node in the query plan at will.
If your REAL_QUERY has an ORDER BY or GROUP BY clause then you can remove the ORDER BY row_number clause from the outer query, since Postgres will make sure that show_rowcounter() will not be called before the last call of increment_rowcounter().
The above trick uses the Common-Table-Expression feature (introduced in Postgres version 8.4), because I wanted to make it look like a template where user's real query is visible at the top.
If you are running on an older version you can easily modify it to be a simple query because CTE used above is not recursive.
select *, show_rowcounter() as row_count from (select REAL_QUERY.*, increment_rowcounter() as row_number from (select relname, oid from pg_class limit 5) as REAL_QUERY, (select reset_rowcounter()) as _c order by row_number offset 0) as _subq;Now the guts of the trickery: PL/Perl functions:
-- PL/Perl create or replace function reset_rowcounter() returns int as $p$ $_SHARED{rowcounter} = 0; $p$ language plperl stable; create or replace function increment_rowcounter() returns int as $p$ $_SHARED{rowcounter} = $_SHARED{rowcounter} + 1; return $_SHARED{rowcounter}; $p$ language plperl; create or replace function show_rowcounter() returns int as $p$ return $_SHARED{rowcounter}; $p$ language plperl;BTW, this also shows how one can get Oracle's ROWNUM like feature in Postgres.
I use a simpler solution that takes advantage of using COUNT() as a window function:
ReplyDeleteSELECT ...
,COUNT(*) OVER() AS fullRowCount
FROM ...
WHERE ...
ORDER BY ...
LIMIT 25 OFFSET 50;
That's interesting... I didn't think about that one. Maybe this will help me eliminate that expensive ORDER BY; all I need is a blocking plan node that waits until all rows have been consumed, and the emits the rows in a FIFO order.
ReplyDeleteThis wouldn't work in pre-8.4 versions though.
Gurjeet,
ReplyDeleteWhat is the PL/PgSQL equivalent of $_SHARED{}? Thanks!
Thanks for the suggestion, Michael. I tested the tip and it worked both correctly and faster.
ReplyDeleteIn the old design, my "count(*)" query took 300 ms to run, while the query for a single page of results took 58 ms. (So, 358 ms total). Using the window function method, the single query took 314 ms. That's about 10% faster, not counting any overhead associated with running two queries instead of one.
A great tip!
:-) Glad it worked for you, Mark. I adapted it from a strategy I used when we were running Oracle.
ReplyDelete@gwchamb, PL/PgSQL does not have any equivalent of $_SHARED. I'd have loved to use it if there were anything similar, to avoid external dependency on perl.
ReplyDelete@Mark, if you adapted from the above query, can you please share your version of the windowing query, thanks.
Here's an example of doing it w/ window functions. The following will even work on databases that support window functions but don't have a LIMIT OFFSET feature.
ReplyDeleteWITH base AS (
-- Real Query
SELECT r.*,
100.0 AS rpp, -- results per page
row_number() OVER (ORDER BY rental_date) row_num,
COUNT(1) OVER () row_count
FROM rental r
WHERE staff_id = 1
)
SELECT ceil(row_num / rpp) AS page,
ceil(row_count / rpp) AS pages,
base.*
FROM base
WHERE ceil(row_num / rpp) = 2
ORDER BY row_num
Why not use CURSORs instead?
ReplyDeleteMOVE by required offset, then FETCH page of query results and MOVE FORWARD ALL to get number of total query results.
@Tomas, it appears that cursors are not well supported in DBD::Pg:
ReplyDeletehttp://search.cpan.org/dist/DBD-Pg/Pg.pm#Cursors
Besides, the windowing method only adds a small amount of syntax to existing query... what could be simpler than that?
Cursors use PL/PGSQL, which ads more complexity.