Compressing Data in Entity Framework Core

I recently had an issue where the size of the database for one of my projects was increasing, leading to increased storage requirements and associated costs with Azure SQL. This was a good problem to have because it was occurring because the project was getting used more. However, I investigated the data and how it was being used, and found that I had one table where I was storing a lot of data, but that data was not read very often. It also did not need to be searched. Within the application, it would occasionally be retrieved using an ID to obtain an entire row and then all the data from that row would be read at once.

I decided that the data within this table would be a good candidate for compression. I also decided it would be good to compress the data with code before being sent to SQL Server so that less data is being sent over the wire. Also, the App Service plan the application is running under has more processing power than the database.

I started off by writing two extension methods for a byte array which would compress (Zip) and uncompress (Unzip) the bytes. It would make use of the GZip algorithm already built into .NET.

public static class ByteExtensions
{
    /// <summary>
    /// Adapted from: https://stackoverflow.com/a/63049728
    /// </summary>
    public static byte[] Zip(this byte[] inBytes)
    {
        using (var memoryStreamIn = new MemoryStream(inBytes))
        {
            using (var memoryStreamOut = new MemoryStream())
            {
                using (var gZipStream = new GZipStream(memoryStreamOut, CompressionMode.Compress))
                {
                    memoryStreamIn.CopyTo(gZipStream);
                }

                return memoryStreamOut.ToArray();
            }
        }         
    }

    /// <summary>
    /// Adapted from: https://stackoverflow.com/a/63049728
    /// </summary>
    public static byte[] Unzip(this byte[] bytes)
    {
        using (var memoryStreamIn = new MemoryStream(bytes))
        {
            using (var memoryStreamOut = new MemoryStream())
            {
                using (var gs = new GZipStream(memoryStreamIn, CompressionMode.Decompress))
                {
                    gs.CopyTo(memoryStreamOut);
                }

                return memoryStreamOut.ToArray();
            }
        }          
    }
}

Within the ApplicationDbContext class, I created two private generic methods which make use of the extension methods to compress and uncompress entities before and after sending them to, or retrieving them from the database.

private string CompressEntity<T>(T entity)
{
    return Convert.ToBase64String(Encoding.UTF8.GetBytes(JsonSerializer.Serialize(entity)).Zip());
}

private T UncompressEntity<T>(string compressedEntity)
{
    return JsonSerializer.Deserialize<T>(Encoding.UTF8.GetString(Convert.FromBase64String(compressedEntity).Unzip()));
}

Then I modified the ModelBuilder for the entity to call these methods.

taskHistory
    .Property(th => th.Description)
    .HasConversion(
        d => CompressEntity(d),
        d => UncompressEntity<TaskDescription>(d));

All the data currently in the database was not yet compressed, so I had to write some code to go through the database and update all the entries. In the end it was worth it as my database shrunk by about 30%.

Leave a comment

Your email address will not be published.