TokuMX, as well as compared to MongoDB. So thought I would do a quick comparison and post the results.
"> TokuMX, as well as compared to MongoDB. So thought I would do a quick comparison and post the results. "> TokuMX, as well as compared to MongoDB. So thought I would do a quick comparison and post the results. " />The love of Data, Database Engineering, Architecture, Entrepreneurship, and other assorted bits
16 September 2014
It’s a pretty common question these days for folks to ask the difference in real world storage footprint between various compression schemes in TokuMX, as well as compared to MongoDB. So thought I would do a quick comparison and post the results.
It should be noted that these tests are just pure storage footprint tests. This is not a comparison of the run time performance of each option. Each compression setting comes with a set of tradeoffs I will try to enumerate in a follow up post.
If you aren’t already familiar with TokuMX, it’s a fork of MongoDB with a completely retooled storage subsystem to take advantage of Tokutek’s Fractal Tree index technology. In a TokuMX instance, the collection itself is actually a fractal index on _id.
Yes, that means it’s similar to a index organized table or clustered index. One notable component of the TokuMX storage layer is compression. Each collection may have it’s own compression scheme, including the oplog.
Let’s just jump right to the results, they speak for themselves.
du -k
// TokuMX
db.ocean_data.stats['storageSize']+
db.ocean_data.stats['totalIndexStorageSize']
// MongoDB
db.ocean_data.stats['storageSize']
The dataset for this test is some of the NOAA sample data I posted about previously. It’s about 500,000 documents and just 1 secondary index. Each document has the following structure:
db.ocean_data.findOne()
{
"_id" : ObjectId("53e4fc2a2239c2398fd45521"),
"station_id" : 9440910,
"loc" : {
"type" : "Point",
"coordinates" : [
-123.9669,
46.7075
]
},
"name" : "Toke Point",
"lon" : "-123.9669",
"products" : [
{
"v" : 61,
"t" : ISODate("2014-08-08T16:24:00Z"),
"name" : "air_temperature",
"f" : "0,0,0"
},
{
"d" : "295.00",
"g" : "13.80",
"f" : "0,0",
"s" : "9.91",
"t" : ISODate("2014-08-08T16:24:00Z"),
"dr" : "WNW",
"name" : "wind"
},
{
"v" : 1020.8,
"t" : ISODate("2014-08-08T16:24:00Z"),
"name" : "air_pressure",
"f" : "0,0,0"
}
],
"lat" : "46.7075",
"fetch_date" : ISODate("2014-08-08T16:34:47.714Z"),
"id" : 9440910
}
db.ocean_data.getIndexes()
[
{
"key" : {
"_id" : 1
},
"unique" : true,
"ns" : "kg_spacetest.ocean_data",
"name" : "_id_",
"clustering" : true
},
{
"key" : {
"station_id" : -1
},
"ns" : "kg_spacetest.ocean_data",
"name" : "station_id_-1"
}
]
Here are the settings and raw output from the test.