Mutable vs Immutable datastructures - Serialization vs Performance

In my last post, I was playing around with methods to serialize Clojure data structures, especially a complex record that contains a number of other records and refs. Chas Emerick and others mentioned in the comments there, that putting a ref inside a record is probably a bad idea – and I agree in principle. But this brings me to a dilemma.

Lets assume I have a complex record that contains a number of "sub" records that need to be modified during a program's execution time. One scenario this could happen in is a record called "Table", that contains a "Row" which is updated (Think database tables and rows). Now this can be implemented in two ways,

  • Mutable data structures – In this case, I would put each row inside a table as a ref, and when the need to update happens, just fine the row ID and use a dosync – alter to do any modifications needed.
    • The advantage is that all data is being written to in place, and would be rather efficient.
    • The disadvantage however, is that when serializing such a record full of refs, I would have to build a function that would traverse the entire data structure and then serialize each ref by dereferencing it and then writing to a file. Similarly, I'd have to reconstruct the data structure when de-serializing from a file.
{:filename "tab1name", :tuples #<ref :field="" :tupdesc="" nil="">}, :tup #<ref :name="">} {:recordid nil, :tupdesc {:x #<ref :field="">}, :tup #<ref :name="">}}>, :tupledesc {:x #<ref :field="">}} </ref></ref></ref></ref></ref>
  • Immutable data structures – This case involves putting a ref around the entire table data structure, implying that all data within the table would remain immutable. In order to update any row within the table, any function would return a new copy of the table data structure with the only change being the modification. This could then overwrite the existing in-memory data structure, and then be propagated to the disk as and when changes are committed.
    • The advantage here is that having just one ref makes it very simple to serialize – simply de-ref the table, and then write the entire thing to a binary file.
    • The disadvantage here is that each row change would make it necessary to return a new "table", and writing just the "diff" of the data to disk would be hard to do.
#<ref :field="" :name="" :tup="" :tupdesc="" :tupledesc="" :tuples="" nil=""> <p> So at this point, which method would you recommend?</p> <p></p> </ref>

--

If you have any questions or thoughts, don't hesitate to reach out. You can find me as @viksit on Twitter.