0

I have a 25GB's text file with that structure(headers):

Sample Name Allele1 Allele2 Code metaInfo...

So it just one table with a few millions of records. I need to put it to database coz sometimes I need to search that file looking, for example, specific sample. Then I need to get all row and equals to file. This would be a basic application. What is important? File is constant. It no needed put function coz all samples are finished.

My question is:

Which DB will be better in this case and why? Should I put a file in SQL base or maybe MongoDB would be a better idea. I need to learn one of them and I want to pick the best way. Could someone give advice, coz I didn't find in the internet anything particular.

martin
  • 1,145
  • 1
  • 7
  • 24

2 Answers2

2

Your question is a bit broad, but assuming your 25GB text file in fact has a regular structure, with each line having the same number (and data type) of columns, then you might want to host this data in a SQL relational database. The reason for choosing SQL over a NoSQL solution is that the former tool is well suited for working with data having a well defined structure. In addition, if you ever need to relate your 25GB table to other tables, SQL has a bunch of tools at its disposal to make that fast, such as indices.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • 1
    There are caveats to what I said in my answer. If you're looking at 1TB files, then SQL starts to get really expensive. I am also assuming that you would want to query your data, e.g., using any of the columns, and to be able to do that quickly. – Tim Biegeleisen Jul 08 '19 at 05:15
  • Okay, I get it. A valuable remark. So in this particular case I should use SQL, but I should do it with my head. And NoSQL may also be useful in the future – martin Jul 08 '19 at 05:17
  • A general use case where you'd want to use NoSQL would be very large data sets (SQL gets too expensive) and also unstructured data (not must point to a relational database). Gear your choice using these benchmarks. – Tim Biegeleisen Jul 08 '19 at 05:19
1

Both MySQL and MongoDB are equally good for your use-case, as you only want read-only operations on a single collection/table. For comparison refer to MySQL vs MongoDB 1000 reads

But I will suggest going for MongoDB because of its aggeration pipeline. Though your current use case is very much straight forward, in future you may need to go for complex operations. In that case, MongoDB's aggregation pipeline will come very handy.

Rajat Goel
  • 2,207
  • 17
  • 35