Level Up Coding
Posts
LUC #58: Understanding Data Streams — The Solution to Handling Continuous Flows of Big Data

LUC #58: Understanding Data Streams — The Solution to Handling Continuous Flows of Big Data

Plus, concurrency is NOT parallelism, explaining SOLID principles, and API design best practices

Level Up Coding
July 04, 2024

This week’s issue brings you:

Demystifying Data Streams
Concurrency is NOT Parallelism (Recap)
Explaining SOLID Principles (Recap)
API Design Best Practices (Recap)

READ TIME: 4 MINUTES

A big thank you to our partner Postman who keeps this newsletter free to the reader.

Did you know there is a VS Code extension for Postman? It’s recently moved out of beta to general availability. Check it out.

Demystifying Data Streams

We live in a time when everything is online and interconnected. From social media posts and sensor readings to real-time transaction logs, modern systems are inundated with information at a scale and speed previously unimagined.

With this mountain of information, the challenge lies in managing it all.

Enter the world of data streams—a brilliant solution to tackle this very problem.

What Exactly Is a Data Stream?

A data stream is a sequence of data that is generated continuously and often at high velocity.

Rather than processing data as a static batch, streaming processes the data in real-time (or near-real-time), enabling applications to react swiftly to the incoming information.

Types of Data Streams

The landscape of data streams can be quite diverse. Some streams are never-ending (continuous data streams), always supplying apps with fresh data.

Others have a clear beginning and end (bounded data streams), often originating from specific datasets.

There are also differences in how organized this data is.

While some streams are structured and may follow a schema, similar to database tables (structured data streams).

Others are more free-form, stemming from sources like text files or media content (unstructured data streams).

Lastly, there’s the factor of plurality.

Single-source streams come from a single data source, whilst multi-source streams mix and merge data from multiple sources.

As you’ve probably guessed, multi-source streams are more complex to work with but provide much richer insights.

Implementing a Data Stream

To implement a data stream it’s key to understand the volume, velocity, and variability of the data. The three Vs determine the demands of the streaming system.

Volume refers to the amount of data generated over a specific timeframe, dictating the storage and processing capacities needed.

Velocity touches on the speed at which data is produced and ingested into the system, affecting real-time processing capabilities.

Variability, on the other hand, delves into the inconsistencies or fluctuations in the data rate, which can pose challenges in terms of predictability and resource allocation.

With these factors in mind, appropriate selection of stream processing frameworks and other tools like data storage solutions can be made.

There are several other factors to keep in mind when building a streaming system.

First and foremost, the integrity of the data is vital; it should be consistent, accurate, and reliable.

Given the continuous flow of streams, inconsistencies can easily emerge, so proactive measures to prevent them are essential.

Additionally, security is paramount — don’t forget to implement security measures to ensure that the data remains protected from unauthorized access.

Best Practices

Data streams, like many other areas of tech, come with a set of guidelines to keep in mind.

First off, as the landscape of data continues to expand, it's vital to ensure our setups can horizontally scale.

In order to ensure no data is lost, make sure you implement fault tolerance.

Lastly, given how dynamic data can be, staying flexible is key.

Data streams might evolve due to new sources, differing formats, or varying data amounts. Anticipating these shifts helps systems stay robust and responsive.

As modern applications attempt to deal with a growing amount of data, strategies like data streams have become standout solutions. With their ability to manage real-time data efficiently, adapt to changing conditions, and provide invaluable insights, utilizing data streams has become, and will continue to be, essential for crafting robust and streamlined digital systems.

Concurrency is NOT Parallelism (Recap)

As Rob Pike (one of the creators of GoLang) succinctly put it: “Concurrency is about dealing with lots of things at once. Parallelism is about doing lots of things at once."

Concurrency — Manages several tasks on a single processor, creating an illusion of simultaneous execution by switching between tasks. This optimizes processor time, especially when tasks wait for others to complete.

Parallelism — Performs multiple tasks simultaneously by efficiently utilizing multiple processors or cores in a computing system.

For more details between the two, check out the full post here.

Explaining SOLID Principles

SOLID represents five principles of object-oriented programming.

Single Responsibility Principle (SRP): Each unit of code should only have one job or responsibility. A unit can be a class, module, function, or component. This keeps code modular and removes the risk of tight coupling.

Open-Closed Principle (OCP): Units of code should be open for extension but closed for modification. You should be able to extend functionality with additional code rather than modifying existing ones. This principle can be applied to component-based systems such as a React frontend.

Liskov Substitution Principle (LSP): You should be able to substitute objects of a base class with objects of its subclass without altering the ‘correctness’ of the program.

Interface Segregation Principle: Provide multiple interfaces with specific responsibilities rather than a small set of general-purpose interfaces. Clients shouldn’t need to know about the methods & properties that don't relate to their use case. This decreases complexity and increases code flexibility.

Dependency Inversion Principle (DIP): You should depend on abstractions, not on concrete classes. Use abstractions to decouple dependencies between different parts of the systems. Direct calls between units of code shouldn’t be done, instead interfaces or abstractions should be used.

API Design Best Practices (Recap)

There are several aspects, techniques, and best practices in API design.

Idempotency, security, versioning, clear resource naming, use of plurals, cross-referencing resources, sorting, and filtering are all aspects that can be observed in the URL.

However, best practices go far beyond what can be observed in API URLs.

Thorough documentation, robust monitoring and logging, consistent error handling, and rate limiting some of the other primary best practices that should be implemented to design effective and safe APIs.

For an explanation of the best practices mentioned above, check out the full post here.

That wraps up this week’s issue of Level Up Coding’s newsletter!

Join us again next week where we’ll explore load balancing algorithms, SQL execution order, strategies to optimize CI/CD pipeline performance, and more.