When optimizing my "events" table, should I be more concerned with number of fields or number of interrelated tables?

Question

This question is a folo to a previous question I asked about how to best model different kind of time quantities and timeframes: In a database, how to store event occurrence dates and timeframes for fast/elegant querying?

Given a table of events, I'd like the simplest way to model and query events that have these kinds of occurrences:

One-time: XY Rock band has a show on Dec. 12, 2014 at the Rockhouse
Annually: Volunteer at the soup kitchen on Thanksgiving morning
Monthly: Free night at the MoMA every first Saturday
Weekly: Regular business hours

I've been kicking around doing a schema in this form:

Name
Description
start_datetime
end_datetime
frequency_type (string, e.g. 'Weekly', 'Monthly')
mon (boolean)
tues
wed
thu
fri
sat
sun (all booleans)
schedule (text)
frequency_description (text)

A common usecase I foresee is that on a given Tuesday...say, 4/5/2016, I want to find everything that is happening on that Tuesday..including all businesses that are open on regular Tuesdays, anything that happens monthly on a Tuesday, and anything happening on that specific date.

So the pseudocode query would be something like:

SELECT * from events WHERE `tues`=TRUE || DATE(start_datetime) = '2016-04-05'

At the application/controller level I could apply the necessary logic to exclude all "monthly" Tuesday events that don't happen on the first Tuesday, using a key/store in frequency_description (I'm going to ignore for discussion's sake, the "annual" edge case in which something happens every fourth thursday of November or some such thing). It'd be nice to do that exclusion in the query but I'm not sure how to design the table to allow that and still keep a simple SELECT.

I'm also predicting that it's not necessary to do a query in which I find all businesses open on Tuesday at 9AM...So the individual day fields can just be space-efficient booleans, with the schedule field being a date-store of my non-normalized specific information. The application will have logic to parse and format it for display.

Is this overkill? Let's say 70% of my events will be one-time, which eliminates the need for the mon,tue,wed, etc. and the schedule and frequency_description text-key-stores...

Should I instead have two tables? One for events, and one for some kind of event_relation in which the day_fields and key-store-textfields are joined?

That seems like a more efficient use of space...on the other hand, my query would have to be a SELECT and JOIN...which may be slower.

When dealing with a magnitude of records numbering from 10k to 100k, and doing simple EC2 hosting...should I care more about efficient space usage in my database (not just pure data storage space, but all the associated overhead with text fields and numerous columns)...or should I care more about simple SELECT statements?

score 1 · Accepted Answer · answered Jan 17 '12 at 11:00

You could just make your recurring events insert into the 'once of' event table with a key referencing back to the master recurring event record (in a separate table).

While it's not very good for space usage.. you can make some shortcuts that say that events that occur "every Tuesday from now to the end of all time", the end time might actually default to say 200 years in the future from now, that means you're only populating 10k records (52 * 200) in this extreme case.

This would simplify your reading greatly as you would then just be looking for any 'event' that occurs on that date, and then you would do all your excludes based on the master recurring event table record.

So you have something like this:

Events table = Your current schema
Event occurrence table = {event_id, start_datetime, end_datetime}

Suppose you have 1000 weekly recurring events, (and we assume you go with 200 years if no endDate) that's going to be say 10M records, you then index the start_datetime field of the Event occurrence table and your query will be very quick even with many more records than this. Compare the costs of this (reduced performance on writes and more space used) versus having to find every event that today is between startdate and enddate and then calculate if the event is actually occurring on today.

In the end it all comes down to:

'how much does space cost you?'
'how often are you going to update records (and do you want to update all records including historical records for an event)?'
and 'how often are you going to want to be running a select on a specific date? (likely very often)

That's an interesting perspective, though it seems to have additional risk in unsynced data if the main event record changes and all the linked occurrence data is not successfully updated to reflect that. However, it does make me realize I have to think more about the use-case...the application won't be relying heavily on user-input, i.e. frequent UPDATE/INSERT statements. But the admin will routinely be pushing in updates and inserts. — Zando, Jan 17 '12 at 20:34
@Zando To make your updates flow through, the easiest way to do it is via a trigger on the Events table, that way it will cascade through every time the events table is updated at all. You even get to test which fields have changed, just remember that more than one event might be updated in the same update command (eg: `UPDATE Events SET Active = 0` with no condition). — Seph, Jan 18 '12 at 05:55
A simple check would be to test if anything has changed that affects when the event occurs (event frequency, start/end times etc), if not then don't do anything, if it has then delete all events in the `event occurrence` table and recalculate all the occurrences and write them out. Since as you say only an Admin will be changing them you can expect them to be more patient if the updates take a couple seconds to process. — Seph, Jan 18 '12 at 06:09

When optimizing my "events" table, should I be more concerned with number of fields or number of interrelated tables?

1 Answers1