LUC #18: Designing an Efficient Database Schema

Plus, MVC vs MVP architecture, and Kafka explained.

We're thrilled to bring you another issue of Level Up Coding’s newsletter!

In today’s issue:

Read time: 6 minutes

Designing an Efficient Database Schema

How well a database schema is designed can have direct implications on the performance, scalability, and maintainability of an application. Taking the time to craft an efficient database schema pays off in the long run and leads to:

  • Better performance: quicker data retrieval and optimized storage.

  • Improved scalability: better ability to handle a growing dataset and increasing traffic.

  • Data integrity: using constraints where applicable ensures data accuracy and integrity.

  • Reduced costs: optimized storage requires less computing power and memory, leading to cost savings.

  • Improved user experience: faster data retrieval means faster load times and a more responsive experience.

There are many considerations when designing a database schema. Let's explore some of the most important aspects of database design.

Normalize (but not too much)

Normalization is a technique that organizes data into well-designed tables and relationships to streamline storage and enhance query performance by removing data anomalies. While it is a key technique to achieve optimal schema design, it is important to not over-normalize as it leads to complex queries that require multiple table joins which degrades performance.

Choose the Right Data Types

Using the appropriate data types not only reduces storage costs but also improves query performance as the database engine can process data faster. For example, using an INT when a SMALLINT would suffice can result in unnecessary storage overhead and slower data retrieval.

Data typing also ensures data integrity which reduces validation needs and can optimize query execution.

Use Indexing Carefully

Indexing is a technique that speeds up data retrieval by providing a way to locate data without scanning the entire table. Strategically applying indexing to columns that are frequently filtered can significantly boost query performance. However, it’s important to only reserve indexing to a very small set of frequently used columns as it consumes additional storage and can slow down write operations. To ensure that indexing is appropriately applied, identify the most frequently used queries across the application and index the columns that is most commonly used in the WHERE clause.

Apply Constraints

Applying constraints on data columns helps maintain data integrity and consistency. These include primary keys, foreign keys, unique constraints, ENUM, NOT NULL, and much more. Database engines use these constraints to make assumptions when running operations to optimize their execution. Just like most optimization strategies, the overuse of constraints has a negative impact on performance. Constraints add an extra validation step that checks each constraint before executing write operations which can slow down performance.

Consider Materialized Views

Materialized views are a great way to speed up the time it takes to fetch data from complex queries. With materialized views, complex queries are computed ahead of time, and their results are stored in a database table for retrieval, the results are then updated at a specified interval. Removing the need to execute the query every time it is needed significantly improves the performance of an application. However, materialized views are essentially a duplication of the original data which can take up significant storage space. Due to its periodical updates, it is not an ideal technique when data is frequently updated or results are needed in real-time.

Each of the techniques mentioned above comes with its own set of trade-offs. It’s important to strike a balance between performance, complexity, and cost. The ideal approach is one that is customized to the application’s use cases and requirements.

MVC vs MVP Architecture (recap)

  • MVC (Model View Controller) and MVP (Model View Presenter) are design patterns that separate an application into distinct components.

  • MVP was established after MVC to improve on the drawbacks of MVC and improve maintainability.

  • Both patterns aim to separate concerns, but they have some differences in their approach.

  • The most significant procedure differences relate to which component handles the business logic and how the UI gets updated.

  • MVC is typically less complex but it can have tightly coupled components. Whereas, MVP is generally more complex but has more decoupled components.

What is Kafka (recap)

  • Kafka is an open-source distributed streaming platform designed for building real-time data pipelines and streaming applications.

  • Kafka operates as a distributed pub-sub messaging system. Allowing applications to publish and subscribe to real-time or near-real-time data feeds.

  • The high throughput, scalability, fault-tolerance, durability, and ecosystem Kafka provides have made it a very popular choice for use cases where real-time data feeds are required.

  • The key components of Kafka include Producer, Consumer, Broker, Topic, and Partition.

  • Kafka has many use cases, from aggregating data from different sources to monitoring and real-time analytics.

That wraps up this week’s issue of Level Up Coding’s newsletter!

Join us again next week where we’ll explore service communication strategies, Restful APIs, and how local area networks (LAN) work.