Parse SQL scripts to find table dependencies and outputs

Question

I currently have a large set of sql scripts transforming data from one table to another, often in steps, like for example

select input3.id as cid
    , input4.name as cname
into #temp2
from input3
inner join input4
    on input3.match = input4.match
where input3.regdate > '2019-01-01';

truncate table output1;

insert into output1 (customerid, customername)
select cid, cname from #temp2;

I would like to "parse" these scripts into their basic inputs and outputs

in: input3, input4
out: output1

(not necessarily this format, just this info)

To have the temporary tables falsely flagged would not be a problem:

in: input3, input4, #temp2
out: #temp2, output1

It is OK to take a little bit of time, but the more automatic, the better.

How would one do this?

Things I tried include

regexes (straight forward but will miss edge cases, mainly falsely flagging tables in comments)
Use an online parser to list the DB objects, postprocessing by hand
Look into solving it programmatically, but for example writing a C# program for this will cost too much time

This is a very hard task to do it without errors. I'd definitely discard doing a parser on your own. What happens when you encounter different SQL versions that support different syntax? What about dynamic SQL? If the code is stored as procedures, you can try querying `sys.sql_dependencies`. — EzLo, Aug 14 '19 at 08:30
For sure don't want to write my own parser, now looking into https://sqlparse.readthedocs.io/en/latest/analyzing/ Sadly the code is not stored as procedure. — TacoV, Aug 14 '19 at 09:44
There's a fully fledged T-SQL parser in .NET written by MS itself, but using it can have a bit of a learning curve. Still, the actual amount of code required wouldn't be too bad. See [this answer](https://stackoverflow.com/a/37868244/4137916) for what that sort of thing looks like (not a duplicate, as it solves a different problem). — Jeroen Mostert, Aug 14 '19 at 10:44

score 1 · Answer 1 · answered Aug 14 '19 at 10:00

I usually wrap the scripts' content into stored procedures and deploy them into the same database where the tables are located. If you are sufficiently acquainted with (power)shell scripting and regexps, you can even write the code which will do it for you.

From this point on, you have some alternatives:

If you need a complete usage / reference report, or it's a one-off task, you can utilise the sys.sql_expression_dependencies or other similar system views;
Create a SSDT database project from that database. Among many other things that make database development easier and more consistent, SSDT has the "Find all references" functionality (Shift+F12 hotkey) which displays all references of a particular object (or column) across the code.

AFAIK neither of them sees through dynamic SQL, so if you have lots of it, you'll have to look elsewhere.

Parse SQL scripts to find table dependencies and outputs

1 Answers1