RE: Soft Deletion Probably Isn’t Worth It

a reply to Brandur’s post about soft deletion in databases

Tue, 19 Jul 2022

I’ve worked on entirely too many CRUD applications at my day job, so the question of how to get rid of data comes up all too frequently. Unfortunately, the answer is often to include an IsDeleted flag and a DeletedOn field. Sometimes we even periodically purge data flagged for deletion with a batch process.

Brandur has a different approach. He goes into detail about why soft-delete is a bad idea, and gets into the headaches that come with soft-deleting normalized data and dealing with foreign keys, but I’m not sure his alternative approach is much better:

Although I’ve never seen an undelete work in practice, soft deletion wasn’t completely useless because we would occasionally use it to refer to deleted data – usually a manual process where someone wanted to see to a deleted object for purposes of assisting with a support ticket or trying to squash a bug.

And while I’d argue against the traditional soft deletion pattern due to the downsides listed above, luckily there’s a compromise.

Instead of keeping deleted data in the same tables from which it was deleted from, there can be a new relation specifically for storing all deleted data, and with a flexible jsonb column so that it can capture the properties of any other table.

This “deleted records” table still stores old data inside the application’s primary database, which generally conforms to the OLTP model, and still allows the retention of data that has been flagged for deletion. This still requires a batch process for periodic purging.

But, as Brandur admits, it’s a compromise. It might not be feasible to archive this data in a separate database, data warehouse, or data lake for economic or regulatory reasons. And it would make it a bit harder for Facebook to re-surface embarrassing photos that you were sure you had deleted once you had slept it off.

For my part, I’d rather outright delete records unless there is an explicit legal/regulatory requirement for data retention. As long as there’s a reasonable schedule for full and incremental backups in place along with procedures for restoring backups to a staging database and restoring records from the staging database to production, it shouldn’t be that hard to restore a mistakenly deleted record. It can’t be much worse than pulling a record out of the “deleted records” table and de-serializing JSON.

However, since I’m just an individual contributor it isn’t always my call. Unfortunately, “Don’t blame me, man; I just work here.” didn’t fly at Nuremberg and I doubt it would do me much good either if I get stuck holding the bag after implementing a bad design despite my objections.

You see, dark patterns don’t magically implement themselves. Sure, you can refuse, but more often than not your objection will be noted as a reason to fire you and make room in the payroll for somebody possessed of greater ethical flexibility. Thus the use of an archive table as Brandur suggests is not only a technical compromise, but an ethical one. After all the data isn’t immediately accessible, especially if you serialize to JSON and then encrypt it before inserting it into the database.