0

I have a dtd file which describes what columns my columns should have.

The problem is, it gives no info on what data type I should use for the columns, i.e whether INT, Varchar, or Text, and no info on the max length of the columns. In most places it says #PCDATA which I believe simply means mixed data.

Is there a way for me to find out what data type and max lenghts I should use, or should I simply make a table full of Varchar (255)s?

Ali
  • 261,656
  • 265
  • 575
  • 769

1 Answers1

1

SGML is (in)famously lacking a type system, so there is no mechanized way to infer the correct type for any sort of element. Note that #PCDATA doesn't mean "mixed data", but "parsed character data" -- an element with content #PCDATA mustn't contain any other elements, but it can contain entity references (and in SGML it is subject to inclusion/exclusion exceptions, but those are not present in XML). "Mixed content" is something like (element1 | #PCDATA), which would be a lot harder to translate into a database schema.

Your best bet is to either deduce the content type from the element type names or from helpful comments in the DTD, and/or to inspect a series of documents in observe their usage pattern.

Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084
  • I can figure out the content type but not the max lengths. Do you suppose it'll have to be varchar 255s for all non-numeric fields? – Ali Jul 31 '11 at 13:10
  • The DTD doesn't put any restrictions on the length, so a valid document may contain any amount of content. If it's really a problem and you cannot preprocess your document with an SGML parser, you could use `TEXT` column type, which is slower but can deal with arbitrary text data. – Kerrek SB Jul 31 '11 at 13:12
  • Thanks for the help, can you tell me whats a SGML parser and what will it do? Will it help me in defining the schema for my mysql table that has to be populated by data from xml files that follow this DTD file? – Ali Jul 31 '11 at 14:30
  • An SGML parser, or also an XML parser, deserializes a your file into memory, so you'll need that at some point to insert the data into the right place. I'm not aware of any parser that creates a database schema, though it's entire possible that that exists (search the internet for "xml-based database" or something like that). My suggestion would be to do a preliminary parse of your data and track the maximum element lengths to see what you're up against. – Kerrek SB Jul 31 '11 at 14:34