Example Queries¶
We show several example queries here. To run the following examples, first import verdict and create its instance as follows:
import verdict
v = verdict.presto(presto_host='presto')
Traditional Mode¶
Use the sql(query_string) method to obtain a single accuracy-guaranteed answer. The error level
can be specified at query time (1% relative error, by default) .
Note
To see details of the class of SQL queries supported by Verdict, see Query Syntax.
Count with arbitrary filters¶
v.sql("""
select l_shipmode, count(*)
from hive.tpch_sf1.lineitem_premerged
where l_shipdate >= date '1994-01-01' or l_shipdate <= date '1995-01-01'
group by l_shipmode
order by l_shipmode
""")
Joins of two large tables¶
v.sql("""
select o_orderstatus, count(*)
from hive.tpch_sf100.lineitem_premerged l inner join
hive.tpch_sf100.orders_premerged o on l_orderkey = o_orderkey
group by o_orderstatus
order by o_orderstatus
""")
Exists predicate expressed using a join (TPC-H Q4)¶
v.sql("""
select
o_orderpriority,
count(*) as order_count
from
hive.tpch_sf100.orders_premerged o left join
(
select l_orderkey, count(*) exist_count
from hive.tpch_sf100.lineitem_premerged
where l_commitdate < l_receiptdate
group by l_orderkey
) t on o_orderkey = l_orderkey
where
o_orderdate >= date '1996-05-01'
and o_orderdate < date '1996-08-01'
and exist_count > 0
group by
o_orderpriority
order by
o_orderpriority
""")
Stream Mode¶
Simply change sql to sql_stream. Then, verdict returns an iterator from which you can obtain
a series of answers that converge to the exact one. Often, this is called progressive analytics.
For example,
Count with arbitrary filters¶
results_itr = v.sql_stream("""
select count(*)
from hive.tpch_sf100.lineitem_premerged
where l_returnflag = 'R' and
(l_shipdate >= date '1994-01-01' or l_shipdate <= date '1995-01-01')
""")
for result in results_itr:
print(result)
These successive results can be used by upfront applications (e.g., visualization libraries) to deliver the results in a more intuitive way.