6

In the last couple of years I've worked on projects using NoSQL databases in the publishing industry. As a programmer, and as a person who cut his teeth designing SQL databases, I strive to be DRY.

In document-centric databases, DRY is something that seems to be eshewed, it may even be detrimental to performance and scalability. Certainly, this is the belief of my colleagues, who have worked with and even for some of the NoSQL vendors. They should know.

Still, I struggle to make the mental leap, because I find it hard to swallow that DRY and NoSQL are immiscible. So many things in life begin with a push too far one way and then settle with a compromise that works best.

Data is so often repeated and I see integrity problems all the time. The attitude with my programmers and BAs is to embrace it, its life. Consuming services must deal with it, or its the problem with the upstream team.

I wonder why documents aren't composed of many small sub-documents and referenced, stitched together, like a view, I guess.

Data that's 'printed' into each document can only be updated by writing custom 'find replace' like operations that must work across terabytes. Thus, these features add cost and don't have an explicit need for an Agile story and don't get built. Data gets more inconsistent.

The problem with breaking a document into sub-documents is that the native database search stops working, since it knows nothing about the document-sub-document relationship. So search would have to be a custom process that searches sub-documents and then collate the foreign keys of the main docs to which they're linked.

Is there a middle ground or is it really a case of accepting the trade-off? It's actually not easy to find much discussion on this issue.

Luke

Community
  • 1
  • 1
Luke Puplett
  • 42,091
  • 47
  • 181
  • 266
  • 1
    I found this question when googling for the exact same thing ("nosql dry"), as I'm just starting to read about document databases now (coming from a relational database background), and this is on main thing that I'm weary against. It seems a huge waste of storage space in so many applications. I'm surprised you had no answers / comments on this question! Did you find out anything more? – Dan Aug 22 '14 at 20:12
  • None so far, Dan. I find this topic immediately striking upon working with no-sql so I'm as baffled as you as to why we're appear to be alone! [echoes] – Luke Puplett Aug 26 '14 at 10:21
  • possible duplicate of [Schema-less design guidelines for Google App Engine Datastore and other NoSQL DBs](http://stackoverflow.com/questions/2653074/schema-less-design-guidelines-for-google-app-engine-datastore-and-other-nosql-db) – Paul Sweatte Jun 12 '15 at 23:58
  • Found this question while googling the same topic. Still no takers – AJ. Mar 25 '16 at 02:00
  • 1
    Great question, almost a year on from @AJ.'s comment and no answer. Emperors new clothes. – Vanquished Wombat Jan 05 '17 at 15:04

1 Answers1

0

Not sure which platform you are on, but have you looked into putting a layer between the DB and the client?

By example, I like the architecture of feathersjs, which lets you run hooks before / after DB queries. In the context of documents and sub-documents, look into the included hooks populate and dePopulate.

Here is the documentation example of a 1:1 relationship:

// users like { _id: '111', name: 'John', roleId: '555' }
// roles like { _id: '555', permissions: ['foo', bar'] }
import { populate } from 'feathers-hooks-common';

const userRoleSchema = {
  include: {
    service: 'roles',
    nameAs: 'role',
    parentField: 'roleId',
    childField: '_id'
  }
};

app.service('users').hooks({
  after: {
    all: populate({ schema: userRoleSchema })
  }
});

// result like
// { _id: '111', name: 'John', roleId: '555',
//   role: { _id: '555', permissions: ['foo', bar'] } }

PS: feathers support many databases, also SQL databases.

arve0
  • 3,424
  • 26
  • 33