LUC #56: How to Approach Scaling Your Database

Plus, how the most prominent Git branching strategies work, HTTPS workflow explained, and how do you decide which database type

This week’s issue brings you:


A big thank you to our partner Postman who keeps this newsletter free to the reader.

Did you know Postman just released a package library feature that enables reusable scripts (i.e., change once, use everywhere)? Significantly faster development. Check it out.

How to Approach Scaling Your Database

If your application is experiencing load problems, it's time to bring out the champagne! Your app's growing pains are actually a sign of its success.

As the user base expands, so do the demands on your system. You may notice database queries become sluggish, network requests time out, and user interfaces start to lag.

These signs indicate that it's time to scale your infrastructure to keep up with demand.

Today we’ll be honing in on how to approach scaling your database. Let’s jump in.

The Cost of Scaling

Before you start you start diving into implementing scaling solutions, an important principle should be kept in mind.

You shouldn’t implement premature optimizations or attempt to scale your app before it’s actually needed.

Implementing scaling solutions introduces complexities, including:

  • New features take longer to develop

  • The system becomes more complex with more pieces and variables involved

  • Code can be more difficult to test

  • Finding and resolving bugs becomes harder

You should only accept these trade-offs if your app is reaching capacity. Keep the system simple, don’t introduce scaling complexities unless it’s warranted.

Finding Bottlenecks Using Metrics

To scale effectively, first, identify your bottlenecks.

Time to check your resource monitoring system or create one if you haven’t already.

If you’re running on any of the leading IaaS providers, such as AWS and Azure, there are already application performance management tools available out of the box.

Analyze and observe resource usage. Pinpoint what’s holding your app back. Look for spikes or flat tops in your monitoring tools, indicating that a resource is overwhelmed.

If performance issues aren't immediately apparent, but your app seems to be running slow, try sprinkling logs in heavily used operations to check for slow-loading resources.

This analysis will guide your scaling strategy, ensuring you address the right problems.

Scaling an App From a Birdseye View

Now that we’ve got a good sense of what/where the problems/bottlenecks are, we can start implementing solutions to address these issues.

Remember, simplicity is key, we want to avoid introducing unnecessary complexities.

The high-level goal of scaling solutions is to have the stack do less work for the application’s most common requests or effectively distribute across multiple resources the workload that can’t be eliminated.

The way that scaling techniques do this usually translates into one or more of the following:

  • Reusing data the app has already looked up

  • Eliminating requests from the client for data the app already possesses

  • Storing results of common operations in order to reduce repeating computations

  • Avoiding complex operations in the request-response cycle

Many scaling techniques boil down to some form of caching.

Cache Database Queries

Caching database queries is one of the simplest improvements you can make to handle database load.

Usually, an application will have a handful of queries that make up the majority of the requests made.

Rather than making a round trip over the network each time for that data, it can simply be cached in memory on the webserver.

To take it one step further, it can be cached in a dedicated in-memory store such as Redis.

Data that’s cached can become ‘stale’ or out of date quite quickly. You will have to be mindful of which data you chose to cache and how long for.

Database Indexing

Proper indexing is another efficient lower hanging fruit to enhance performance.

Indexes help locate data quickly without scanning the entire database, significantly speeding up query times.

It’s another simple “early on” solution that provides outsized returns.

However, maintaining the right balance in indexing is important.

Overly indexed databases can slow down because each insert, update, or delete operation requires additional processing to update the indexes. This can lead to slower write performance and increased storage requirements.

Database Read Replication

If your database is still under heavy read load even after implementing caching, and efficient indexing, considering read replication can often be a next step.

Read replication involves writing to a single primary database, which is then cloned into multiple read replicas on different machines.

This distributes the read load across several servers, taking the pressure off the primary database and improving write performance.

Additionally, having replicas in different regions can dramatically increase read speed and reduce latency.

However, writes to the primary database need to propagate to the replicas, which can lead to temporary data inconsistencies. For immediate read-after-write consistency, such as updating and immediately viewing a profile, read from the primary database.

Read replication is a powerful scaling solution but comes with its fair share of complexities. It's advisable to implement it only after exhausting simpler solutions and ensuring optimal application performance.

Database Sharding

Most scaling solutions focus on reducing database read loads.

Database sharding, however, manages both reads and writes by partitioning the primary database into multiple smaller databases (shards). This horizontal scaling technique splits data across multiple nodes or servers.

There are two types of sharding: horizontal and vertical.

Horizontal sharding involves placing rows across different machines with identical columns.

Vertical sharding splits a table into distinct tables with separate rows and columns across multiple machines.

Sharding can speed up queries and improve resilience to failures, as an outage typically affects only a single shard rather than the entire database.

Database sharding is powerful and carries a lot of benefits. However, sharding is complex and costly to implement and maintain. It should be considered after exhausting other scaling solutions as the ramifications of ineffective implementation can be quite severe.

Leveraging DBaaS for Efficient Scaling

These days a lot of the manual implementation can be abstracted away.

Database as a service (DBaaS) solutions, including Amazon RDS, Google Cloud SQL, and Microsoft Azure SQL Database, offer scalable database management with minimal manual intervention.

These services facilitate the scaling process (both vertical and horizontal) with automation features, enabling you to concentrate more on optimizing your application and less on intensive infrastructure management.

Wrapping Up

Scaling your database effectively involves more than just throwing resources at the problem.

It requires a strategic approach tailored to your specific challenges.

There are general patterns, best practices, and solutions that are typically applied early on and later on.

As you plan your scaling strategy, remember that each step not only addresses immediate needs but also sets the groundwork for sustained growth.

How the Most Prominent Git Branching Strategies Work (Recap)

When formulating your branching strategy, take the most relevant features from the strategies below and apply your own set of tweaks. Every project and team has its own unique needs and boundaries, which should be reflected in their Git branching strategy.

Feature branching: A popular method where each feature gets its own branch to ensure that changes are isolated and code reviews are simplified.

Gitflow: has two permanent branches — a production and a pre-production branch, often referred to as the “prod” and “dev” branches. Features, releases, and urgent bug fixes get temporary branches. It’s a great approach for scheduled releases and handling multiple production versions.

GitLab flow: A blend of feature and environment-based branching. Changes merge into a main branch, then to branches aligned with the CI/CD environment stages.

GitHub flow: Similar to feature branching but with a twist. The main branch is always production-ready, and changes to this branch set off the CI/CD process.

Trunk-based development: Branches are short-lived. Changes merge into the main branch within a day or two, and feature flags are used for changes that require more time to complete. This is ideal for large teams with disciplined processes.

HTTPS Workflow (Recap)

HTTPS encrypts data in transit to ensure its security and privacy, helping to prevent unauthorized access. TLS protocol facilitates this encryption, establishing a secure connection between the user and the website.

The HTTPS process involves several key steps:

1) Client hello —The browser sends a message indicating which encryption methods (cipher suites) it supports.

2) Server hello — In response, the server selects an encryption method from the list provided by the client and sends its TLS certificate.

3) Verification — To confirm the validity of the server's certificate, the client checks it against the issuing Certificate Authority (CA). The client also checks the certificate's validity period and revocation status.

4) Key exchange — A secure exchange of keys takes place. Initially, asymmetric encryption is used to securely exchange the symmetric key that will be used for the session.

5) Secure communication — Once the secure channel is established, the session continues with encrypted data exchanges to guarantee the integrity and confidentiality of the information shared.

SQL, NoSQL, or Something Else—How Do You Decide Which Database? (Recap)

The performance of your application can suffer if you choose the incorrect database type, and going back on a bad choice can be time-consuming and expensive.

There are several types of databases, each designed and optimized for specific use cases; relational, document, graph, vector, columnar, time-based, key-value, and time-series, to name a few.

Considerations that should be made to choose the optimal database for your use case:

  • How structured is your data?

  • How often will the schema change?

  • What type of queries do you need to run?

  • How large is your dataset and do you expect it to grow?

  • How large is each record?

  • What is the nature of the operations you need to run? Is it read-heavy or write-heavy?

  • Which databases do your team have experience with?

That wraps up this week’s issue of Level Up Coding’s newsletter!

Join us again next week where we’ll explore API architectural styles, deployment patterns, serverless architecture, and more.