3

Problem: My task is to create a database to hold information about various products and to create RESTful api to serve and manage this information. But the client doesn't know exactly what all information they will need on these products, so the database is likely to have new columns and tables added later on to accommodate new product properties. My question is about producing a database that will readily accept these changes and constructing queries that can fetch products securely based upon product properties that don't yet exist with little to no modification.

Proposed Solution: I have a test database setup with the following structure.

+------------------+
|      item        |
+----+------+------+
| id | name | cost |
+----+------+------+
|  0 | test |   50 |
+----+------+------+

+--------------+
|    color     |
+----+---------+
| id |   val   |
+----+---------+
|  0 |  blue   |
|  1 | purple  |
+----+---------+

+--------------------+
|      item_color    |
+---------+----------+
| item_id | color_id |
+---------+----------+
|       0 |        0 |
|       0 |        1 |
+---------+----------+

The 'item' table will likely have columns added later and more junction tables will likely be added too.

A request to retrieve products would like http://www.example.com/api/products?color=purple&cost=50 With php I'm dynamically constructing prepared statements to retrieve the relevant products, hopefully without opening the door for sql injections. First I determine which properties are included in the 'item' table and which are in separate tables using the following functions:

function column_exists($column, $pdo) {
    $statement = $pdo -> prepare("DESCRIBE item");
    $statement -> execute();
    $columns = $statement -> fetchAll(PDO::FETCH_COLUMN);
    $column_exists = in_array($column, $columns);
    return $column_exists;
}

function table_exists($table, $pdo) {
    $statement = $pdo -> prepare("SHOW TABLES");
    $statement -> execute();
    $tables = $statement -> fetchAll();
    $table_exists = in_array($table, $tables);
    return $table_exists;
}

If the property is not found as a column in the 'item' table or as a table name, then an exception is thrown.

The prepared statement my code constructs would look like:

$sql = "SELECT * FROM item WHERE cost = :cost
AND id IN (SELECT item_id FROM item_color 
WHERE color_id IN (SELECT id FROM color WHERE val = :color));";

And would be executed like so,

$statement = $pdo -> prepare($sql);
$statement -> execute(Array(":cost" => $cost, ":color" => $color));

What I want to know: Will I encounter major bottlenecks as the database grows and is accessed more frequently? Is my method of retrieval safe against 1st order sql injection attacks?

What I have done: I have read about rudimentary database design principals and basic sql injection attack/defense methods. I've tried to read about dynamically creating prepared statements, but the material I've found isn't what I'm looking for. I have tested my design against a basic Bobby Tables attack.

Why this question is relevant: The design principles I've read warn against trying to make database too flexible Those new to the field have no way to gauge how flexible is too flexible and could benefit from an analysis of this example. Also, it seems that prepared statements are only intended to be used as static templates. If I find this dynamic way to constructing them to be enticing then other newbies probably will too, so we need to know if we're creating a great security vulnerability. Lastly, I've asked these question together because the database design and the structure of the queries to be run against it are directly related.

Details: php v5.6.17 mysql v5.6.35

Community
  • 1
  • 1
Hatch
  • 33
  • 1
  • 6
  • I think a problem could be that you could access the whole database by guessing tables and columns. – LKKP4ThX Oct 18 '17 at 21:31
  • What kind of problem? Do you think this will create a bottleneck down the road? I really don't know how much overhead the checks are adding. – Hatch Oct 18 '17 at 21:36
  • No, just from the security side. What if anyone tries database user, column password? – LKKP4ThX Oct 18 '17 at 21:41
  • There is no basic design problem with adding (some) properties. "Too flexible"-problems can arise if you e.g. overdue it or e.g. use an [EAV-design](https://stackoverflow.com/q/2224234/6248528). You are trying to have a flexible layer between the database and the app, and the main design issue seems to be that you are mixing app and that layer. Luckily for you, people already made "frameworks" to abstract database access (which also allow flexible columns), see e.g. [here](https://stackoverflow.com/q/9007110/6248528). You will still have to do some work, but it should tackle all your concerns. – Solarflare Oct 19 '17 at 11:03
  • @LKKP4ThX I misunderstood your comment. You'll see in the example query above that "SELECT * FROM item" is hard coded, so I think I'm safe unless I leave an opening for an injection attack. – Hatch Oct 19 '17 at 15:33
  • @Solarflare Thank you for the suggestion. I'm really bad about coding first and checking for preexisting solutions later. I'm not sure that an ORM is right for me, but I'll keep them in mind. – Hatch Oct 19 '17 at 15:42

1 Answers1

0

My first reaction to your SQL is that you really need to learn how to use JOIN in SQL. The join operation is really fundamental to SQL and relational data. Using only subqueries in lieu of JOIN is like using another programming language, but refusing to use a while() loop. Sure, you can do it, but why?

$sql = "SELECT * FROM item WHERE cost = :cost
AND id IN (SELECT item_id FROM item_color 
WHERE color_id IN (SELECT id FROM color WHERE val = :color));";

Should be

$sql = "
 SELECT i.id, i.name, i.cost, c.color 
 FROM item AS i
 INNER JOIN item_color AS ic ON i.id = ic.item_id
 INNER JOIN color AS c ON c.id = ic.color_id
 WHERE i.cost = :cost AND c.val = :color";

Any reference or tutorial on SQL covers joins.

As for your question about safety, yes—using query parameters is safe with respect to SQL Injection. By making your base query hard-coded and separating the dynamic parts into parameters, you eliminate any chance for unsafe data to change the parsing of your SQL query.

You might like my presentation SQL Injection Myths and Fallacies (video: https://www.youtube.com/watch?v=VldxqTejybk).

Your requirements make me think you'd be better off using a document database like MongoDB, where you can add attributes to any document. It doesn't mean you don't still have to be careful about database design, but it gives you the opportunity to add attributes after design more easily.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • Thank you for the great example. That is a much more readable query. I have to use mysql. Do you know of any resources that may aid me in constructing an access layer for this database design? – Hatch Oct 19 '17 at 18:22
  • You might try reading about the JSON data type in MySQL 5.7: https://dev.mysql.com/doc/refman/5.7/en/json.html – Bill Karwin Oct 19 '17 at 18:40
  • You can still store any JSON in a TEXT blob in MySQL 5.6, but it won't store as compactly as the 5.7 data type, you don't get all the functions to play with it, and you can't index it with virtual columns. – Bill Karwin Oct 19 '17 at 19:04
  • You might also like to read my presentation [Extensible Data Modeling with MySQL](https://www.slideshare.net/billkarwin/extensible-data-modeling). – Bill Karwin Oct 19 '17 at 19:04
  • I see. Adding the serialized lob approach into the mix will make retrieving a specific item MUCH easier. – Hatch Oct 19 '17 at 19:44