0

I have configured the presto to connect my Sql Server DB (only configure one connector) to test distributed query, and below is the content of my sqlserver.properties file:

cat catalog/sqlserver.properties 
connector.name=sqlserver
connection- 
url=jdbc:sqlserver://172.3.207.68:1433;databaseName=Rich_FinanceCheckSystem;
connection-user=sunbiqi
connection-password=****

Because presto can't use TOP, so I use limit instead, but I found that presto fetched the whole table data, and then take the top several records, even when I use paging sql. The two sql sentences are:

select * from sqlserver.dbo.fc _jcxx limit 10;

select * from(
    select row_number() over (order by id asc) rownum,* 
        from sqlserver.dbo.fc _jcxx
) as T 
where T.rownum between 10000 and 10100;

so I hope if there would be someone can help me out this confusion.I will appreiate it sincerely.

After I monitored my sql server's sql commands, I found it scans the whole table's rows while I only want the top 10 rows back or fetch the paging data in memory, so they are extremely slow, as both of them took over 2 minutes (only about 300,000 records in this table).

I want my application can use presto to improve performance as the data becomes larger and larger. But this result made me disappointed.

I think in this situation the indexes will not work either...I will confirm this later.

Is there anybody can help me out? Thank you very much!

  • If your engine is too limited to emit `TOP`, then it is too limited. Every database system has [its own way](https://stackoverflow.com/a/595155/4137916) of expressing this idiom, so any serious approach needs to take that into account. Using `ROW_NUMBER()` for paging is, as you've discovered, usually not the solution beyond the first few pages, as this cannot be effectively optimized. You could consider writing a stored procedure to fetch data, if the engine supports calling those. Otherwise, rewrite things in terms of `indexed_key >= X and indexed_key < Y`. – Jeroen Mostert Aug 07 '18 at 14:59
  • Presto will never execute a simple query from single SQL Server table faster than SQL Server -- you're using a tool that is not designed for your use case (if I understand your use-case correctly). Also, please see https://stackoverflow.com/a/45114359/65458 explanation about approaches alternative to LIMIT/OFFSET queries. – Piotr Findeisen Aug 07 '18 at 20:45
  • You mean if I want to use paging or top clauses in my application, I couldn't use presto to perform distributed querys? I don't need it to query faster than a single sql server instance, but I shouldn't fetch the whole table data in each query, which might make my application crash... – Nicolas Chain Aug 08 '18 at 02:31

0 Answers0