In the last couple of years I've worked on projects using NoSQL databases in the publishing industry. As a programmer, and as a person who cut his teeth designing SQL databases, I strive to be DRY.
In document-centric databases, DRY is something that seems to be eshewed, it may even be detrimental to performance and scalability. Certainly, this is the belief of my colleagues, who have worked with and even for some of the NoSQL vendors. They should know.
Still, I struggle to make the mental leap, because I find it hard to swallow that DRY and NoSQL are immiscible. So many things in life begin with a push too far one way and then settle with a compromise that works best.
Data is so often repeated and I see integrity problems all the time. The attitude with my programmers and BAs is to embrace it, its life. Consuming services must deal with it, or its the problem with the upstream team.
I wonder why documents aren't composed of many small sub-documents and referenced, stitched together, like a view, I guess.
Data that's 'printed' into each document can only be updated by writing custom 'find replace' like operations that must work across terabytes. Thus, these features add cost and don't have an explicit need for an Agile story and don't get built. Data gets more inconsistent.
The problem with breaking a document into sub-documents is that the native database search stops working, since it knows nothing about the document-sub-document relationship. So search would have to be a custom process that searches sub-documents and then collate the foreign keys of the main docs to which they're linked.
Is there a middle ground or is it really a case of accepting the trade-off? It's actually not easy to find much discussion on this issue.
Luke