• Level Up Coding
  • Posts
  • LUC #63: Understanding Database Types — Relational, Vector, Graph, and More

LUC #63: Understanding Database Types — Relational, Vector, Graph, and More

Plus, API architectural styles you should know, understanding the OSI model, and PACELC theorem explained

This week’s issue brings you:

READ TIME: 5 MINUTES

Thank you to our partners who keeps this newsletter free to the reader.

Postman have made it possible to develop and test your API without leaving your code editor thanks to Postman’s VS Code extension. Streamline your development workflow by testing your APIs in the same application you use to develop them. Try it out!

Understanding Database Types — Relational, Vector, Graph, and More

The performance of a software application often relies on choosing the correct database(s).

As software developers, we encounter a wide range of database choices.

Recognizing the distinctions among these choices and selecting those that most closely match the needs of our project is essential. Typically, a complex application employs multiple databases, each tailored to meet a particular need of the application.

Let’s delve into the world of database types and explore where each one fits best.

Relational Databases

Being at the heart of traditional data storage, relational databases organize data into structured tables, with rows representing records and columns storing corresponding data fields.

These setups, enhanced by the use of SQL, excel in managing and querying structured information, making them ideal for tasks requiring precise data organization, like customer records or inventory tracking.

Relational databases are particularly effective when ACID compliance is required, and where predefined schemas can be established.

However, their structured approach, while beneficial for specific tasks, presents limitations in handling unstructured data, posing challenges in environments with evolving data needs.

Columnar Databases

Columnar databases, in contrast to traditional row-based relational databases, store data in columns rather than rows.

This architectural design significantly boosts their performance for analytical processing, where complex queries across large datasets, particularly involving aggregate functions, are common.

These databases excel in environments requiring rapid and frequent access to specific data columns, such as in customer analytics or financial data analysis. The column-based structure enhances data retrieval and aggregation speeds, making columnar databases highly effective for handling and analyzing extensive datasets. Columnar databases can be an ideal choice for big data analytics, and business intelligence applications.

However, their focus on column-based storage and retrieval may not be as efficient for transactional systems, where data is typically written in small, regular transactions. This specialization can limit their suitability in scenarios that require a balanced approach to both data handling and transactional processing.

Document Databases

When it comes to handling unstructured data, document databases reign supreme with their ability to store data in semi-structured formats like JSON or XML.

This method offers exceptional flexibility in data management, making these databases a top choice for environments with complex or continually changing data structures, such as content management systems and e-commerce platforms.

Their schema-less approach facilitates rapid development and iteration, enabling them to adapt seamlessly to evolving data requirements. Yet, this same flexibility can sometimes complicate ensuring data consistency and integrity, particularly in large-scale or complex systems where maintaining structured data relationships is crucial.

Key-Value Databases

Key-value databases represent a straightforward form of database, where data is handled using a unique key for each value.

This simplicity makes them highly efficient for operations involving inserting, updating, and retrieving data. Often utilized for smaller datasets, key-value Databases are particularly popular for temporary purposes like caching or session management, where speed and simplicity are paramount.

Their uncomplicated structure is great for rapid access and modification of data, streamlining processes where quick data retrieval is crucial. But this simplicity also means key-value databases might not be the best fit for complex data handling or scenarios requiring detailed data relationships.

Graph Databases

Graph databases take a unique approach to data management, emphasizing the storage and querying of highly connected data. In these databases, records are represented as nodes and relationships as edges, utilizing graph theory to efficiently traverse connections between nodes.

This design makes them exceptionally well suited for applications involving complex relationships, such as social networks, recommendation engines and fraud detection systems, where navigating intricate data connections is key.

One of the key strengths of graph databases is their ability to reveal insights from the relationships and interconnections within data.

One of the main downfalls is that they can be over-engineered for simpler, less connected datasets. In situations where data relationships are straightforward or minimal, the advanced capabilities of graph databases might not be fully utilized, potentially leading to unnecessary complexity in the data management process.

Time-Series Databases

Time-series databases are the go-to choice for managing sequential, time-stamped data, vital in fields like IoT and monitoring systems.

With built-in time-based functions, they are adept at storing, querying, and analyzing large datasets over time, making them a great fit for applications requiring trend analysis, forecasting, and real-time insights.

This specialized design is exceptionally capable in capturing and analyzing changes over time, a crucial feature for environments where understanding temporal dynamics is important. These specializations make time-series databases highly effective for sequential, time-stamped data, but limited for most other purposes.

Time-series databases may struggle in scenarios that require handling diverse data types or general-purpose data storage, as they are optimized primarily for time-focused data management.

Vector Databases

Vector Databases are designed for complex searches and AI-driven applications, utilizing a vector space model to handle high-dimensional data, complex queries and pattern recognition.

Their main strength lies in supporting AI and machine learning, offering deep insights and relevant search results, ideal for recommendation systems and complex search functionalities.

If you’re working on machine learning projects, you might find vector databases to be a top consideration. Yet, the same architecture behind these databases that makes them great in AI and ML-based applications also makes them less suitable for basic data management tasks.

The complexity and specialized knowledge required for vector databases can be excessive for straightforward storage or retrieval needs, and the vector space model generally makes it a poor fit for data relationships that are not vector-based.

A Database for Every Need

Each database type has its specialty: Relational for structured data and ACID compliance, Columnar for analytics, Document for unstructured data flexibility, Graph for complex relationships, Time-Series for time-stamped data, Vector for AI and ML scenarios, and Key-value for simple, fast data access.

Using the right database type can be a game changer for performance, the wrong one can wreak havoc. The right choice depends on the project's specific needs. Understanding these differences enables teams to pick the right database(s) for their application or system, a key decision for ensuring efficient data management, scalability, and overall system reliability and performance.

Understanding The OSI Model (Recap)

The OSI Model is a conceptual framework that standardizes how different systems communicate over a network. It divides network interactions into seven layers, each responsible for specific functions, ensuring data flows from one device to another in a structured way. Key concepts include:

  • Encapsulation: as data moves from the top (Application layer) to the bottom (Physical layer), each layer adds its own header, wrapping the data for transmission.

  • Decapsulation: on the receiving end, the process reverses as each layer removes its corresponding header, eventually delivering the original data to the application.

The OSI model promotes interoperability across different systems, simplifies troubleshooting, and ensures reliable, consistent communication across global networks.

Layers: Application, Presentation, Session, Transport, Network, Data Link, Physical.

API Architecture Styles Work You Should Know (Recap)

REST — Utilizes HTTP methods for operations which provides a consistent API interface. Its stateless nature ensures scalability, while URI-based resource identification provides structure.

GraphQL — Unlike REST it uses a single endpoint. GraphQL uses a single endpoint, allowing users to specify exact data needs, and delivers the requested data in a single query.

SOAP — Once dominant, SOAP remains vital in enterprises for its security and transactional robustness. It’s XML-based, versatile across various transport protocols, and includes WS-Security for comprehensive message security.

gRPC — Offers bidirectional streaming and multiplexing using Protocol Buffers for efficient serialization. It supports various programming languages and diverse use cases across different domains.

WebSockets — Provides a full-duplex communication channel over a single, long-lived connection. It is ideal for applications requiring real-time communication.

MQTT — A lightweight messaging protocol optimized for high-latency or unreliable networks. It uses an efficient publish/subscribe model.

PACELC Theorem Explained (Recap)

PACELC expands on the CAP theorem by adding a new dimension: latency. While CAP focuses on trade-offs between availability (A) and consistency (C) during network partitions (P), PACELC considers both partitioned and normal operations.

In a partition (P), you choose between availability (A) and consistency (C), just like CAP. However, when there is no partition (Else), you face the trade-off between latency (L) and consistency (C).

That wraps up this week’s issue of Level Up Coding’s newsletter!

Join us again next fortnight where we’ll explore and visually distill more important engineering concepts.