Using presto to query SQL Server, but it has a significant low performance

Question

I have configured the presto to connect my Sql Server DB (only configure one connector) to test distributed query, and below is the content of my sqlserver.properties file:

cat catalog/sqlserver.properties 
connector.name=sqlserver
connection- 
url=jdbc:sqlserver://172.3.207.68:1433;databaseName=Rich_FinanceCheckSystem;
connection-user=sunbiqi
connection-password=****

Because presto can't use TOP, so I use limit instead, but I found that presto fetched the whole table data, and then take the top several records, even when I use paging sql. The two sql sentences are:

select * from sqlserver.dbo.fc _jcxx limit 10;

select * from(
    select row_number() over (order by id asc) rownum,* 
        from sqlserver.dbo.fc _jcxx
) as T 
where T.rownum between 10000 and 10100;

so I hope if there would be someone can help me out this confusion.I will appreiate it sincerely.

After I monitored my sql server's sql commands, I found it scans the whole table's rows while I only want the top 10 rows back or fetch the paging data in memory, so they are extremely slow, as both of them took over 2 minutes (only about 300,000 records in this table).

I want my application can use presto to improve performance as the data becomes larger and larger. But this result made me disappointed.

I think in this situation the indexes will not work either...I will confirm this later.

Is there anybody can help me out? Thank you very much!

If your engine is too limited to emit `TOP`, then it is too limited. Every database system has [its own way](https://stackoverflow.com/a/595155/4137916) of expressing this idiom, so any serious approach needs to take that into account. Using `ROW_NUMBER()` for paging is, as you've discovered, usually not the solution beyond the first few pages, as this cannot be effectively optimized. You could consider writing a stored procedure to fetch data, if the engine supports calling those. Otherwise, rewrite things in terms of `indexed_key >= X and indexed_key < Y`. — Jeroen Mostert, Aug 07 '18 at 14:59
Presto will never execute a simple query from single SQL Server table faster than SQL Server -- you're using a tool that is not designed for your use case (if I understand your use-case correctly). Also, please see https://stackoverflow.com/a/45114359/65458 explanation about approaches alternative to LIMIT/OFFSET queries. — Piotr Findeisen, Aug 07 '18 at 20:45
You mean if I want to use paging or top clauses in my application, I couldn't use presto to perform distributed querys? I don't need it to query faster than a single sql server instance, but I shouldn't fetch the whole table data in each query, which might make my application crash... — Nicolas Chain, Aug 08 '18 at 02:31

Using presto to query SQL Server, but it has a significant low performance

0 Answers0