DOP 97: Processing Event Streams With Apache Kafka

Transcript

Vik: [00:00:00]
I also would like to use the chess analogy to explain this. When you look at the position of the figures on the board, at the particular moment of time, you see a current state of the game. But every time when everyone is doing some moves, they're recording their steps. Basically, for every time if the state will be destroyed and they still have each move written into this paper, they will be able to reconstruct the state by replaying all these moves that happened in the game.

Darin:
This is DevOps Paradox episode number 97. Processing Event Streams With Apache Kafka

Darin:
Welcome to DevOps Paradox. This is a podcast about random stuff in which we, Darin and Viktor, pretend we know what we're talking about. Most of the time, we mask our ignorance by putting the word DevOps everywhere we can, and mix it with random buzzwords like Kubernetes, serverless, CI/CD, team productivity, islands of happiness, and other fancy expressions that make it sound like we know what we're doing. Occasionally, we invite guests who do know something, but we do not do that often, since they might make us look incompetent. The truth is out there, and there is no way we are going to find it. PS: it's Darin reading this text and feeling embarrassed that Viktor made me do it. Here are your hosts, Darin Pope and Viktor Farcic.

Darin: [00:01:33]
Today's episode is going to be a little strange. Not only do I have one Viktor. I have two Viktors.

Viktor: [00:01:43]
Both are not spelled correctly. Right?

Darin: [00:01:45]
Both are not spelled correctly, but they're both spelled the same.

Vik: [00:01:48]
I was just waiting for your sign when we need to start, because maybe there will be some interesting intro and things like that.

Darin: [00:01:55]
Well, that was the intro right there. This second Viktor, which we will now continue to call him Vik, just so I don't mess up. Vik, why don't you introduce yourself, tell us what you do, who you are, all the things.

Vik: [00:02:10]
Yeah. Vik Gamov, developer advocate here at Confluent, talking to people, doing podcasts and all this kind of stuff. Today's my third show for today, apart from doing one podcast and one conference and I'm really pleased to be here.

Darin: [00:02:29]
That's sort of like on that same word that we get every once in while. People say they're honored to talk with us and like, don't you know who we are? Do you realize you don't want to necessarily be honored or pleased to be with us?

Vik: [00:02:41]
Well, it's a very nice way to put this. Whenever your first time somewhere here, this is how my mother taught me. Be polite and after that you can go and switch to talking shit about stuff but when you introduce yourself, be polite. That's what I use.

Viktor: [00:02:55]
This needs to go on a record. This is the first episode where I wasn't the person saying shit.

Vik: [00:03:04]
and I'm looking at Darin right now and not sure if we like overstepping or like shit. Oh, no next and again, I need to put this explicit content, the sign. So it's a pain.

Darin: [00:03:15]
It's fine. At least I know what's going to happen, so it's fine. I'll leave it in and I'll mark it explicit and life goes on.

Viktor: [00:03:20]
We just increased his editing time for 30 minutes

Darin: [00:03:24]
Okay So Vik, for people who do not know who Confluent is, why don't you explain Confluent and then we will wrap in why you're here.

Vik: [00:03:34]
Yeah. So I work for the company named Confluent. We have nothing to do with wikis. We have nothing to do with the Jira and other kind of stuff. Confluent here in business on building event streaming platform based on very popular open source project called Apache Kafka. So, Apache Kafka is open source project. We're the company who actively contribute to this open source project and also we build all sorts of tools and other projects, some of them commercial, some of them open, based on Apache Kafka or surrounding Apache Kafka or help people to use Kafka more efficiently.

Darin: [00:04:15]
and if people do not know what Kafka is

Vik: [00:04:17]
That's very good question Darin. Thank you so much for asking this one. For people who don't know what Apache Kafka is, it's what we called event streaming platform. Apache Kafka essentially it's a system that allows you to move data around. Think about this as very very interesting and very weird database or very interesting or very weird messaging system. So it really depends on who you talking to. Usually I have this anecdote that comes from times where a few blind but wise people meet the elephant and because they were blind they were touching the elephant on the different parts. Depends where they touch this elephant, one of them said oh it looks like some weird animal like a snake because it has the nose. Another one touch the tail and say oh it looks like a possum. Someone touches the skin and it looks like very rough. So it's the same thing with Apache Kafka. It depends on where we coming from. Depends on what's your background in the past. That's how you can say that, oh, okay. So Kafka looks like a messaging system because you came from the messaging system and Kafka has topics or you came from database and distributed NoSQL databases and you're saying yeah it looks like a database because Kafka provides persistence. Kafka provides data distribution and consistent hashing and all this kind of stuff. Essentially it's a system that allows you to move things around from one place to another. Also it's a system that allows you to process this data as this data moves. So it's why it's called streaming or stream processing. When the data comes into Kafka, you can do something with this data using tools that surround this world of stream processing. After that, you can move on to another system.

Darin: [00:05:59]
And the reason why Vik is here is if you've been listening, you know we've been talking about events a lot recently and I wanted to have somebody on that lives in the event space. Viktor and I play in the event space. We play a doctor on TV but we're not one. We wanted to talk to somebody that actually has to deal with it on a day-to-day basis.

Vik: [00:06:25]
I'm not a real doctor as well, so...

Darin: [00:06:27]
Yeah, you're not a real doctor as well but you do work for a company that certifies doctors so not doctors. We had this weird idea a couple of episodes ago of in operations, people still tend to put pipelines together in a very monolithic early way. It's all highly defined and we were theorizing, well, what if everything that happens in a pipeline is actually just events and instead of it being completely orchestrated, it's more choreographed and we can't tell what the beginning was until the end is done. What do you think about that?

Vik: [00:07:07]
Let's do step back. With the changing the mindset from expressing the system in a sense that there are some things like we used to do. I have a background in programming and still program. Object oriented language teaches us that there are some objects of specific class. Those are translated into some sort of entities. That's how you can persist those and if you need to do some operation on those, there's different approaches how you can do this. You can pass this object. You mutate this object's state and pass this around. This object with the changed state. You have the bank account. Simplest possible thing. You withdraw some money from this bank account and you mutate the state then your balances changes or have a thermometer. The status is changing based on the temperature outside. There's a thing called a thermometer and we mutate the state based on certain conditions. Now, if we want to design the system around this concept, there's nothing wrong with this. It's totally fine. We were doing this for ages, but once you want to enable some of the different use cases. For example, you change the temperature or the current state. You know the current temperature right now but you don't know since you changed the state of this thermometer, you didn't capture what was the previous state of this. So in this case, how we can model the same use case, like measuring temperature, not from perspective of the things and changing state of this but rather the capturing the something that happened. I don't want to sound too scientific but like a shared narrative that every party of the system understand what are we talking about. So in this case instead of going there and changing temperature of the thermometer, we're saying the temperature changed and there's a event, temperature changed, and the new temperature is I don't know like 72 degrees. We need to have a system that allows to model around this concept. You can definitely do this with relational databases. Instead of doing an update or upsert, you do insert and after that you're capturing all the rows, but it goes directly against the concept of designing these databases, designing using third normal form where you do not include any duplications and things like that. This is where denormalization comes into play. Obviously, at least the way how I was approaching this problem when I started learning these things, for me it was obvious that it actually increases redundancy. The amount of data is becoming bigger compared to the things that you do with traditional databases when you trying to limit this. That's why you develop the third normal form so that you can reduce redundancy. Once you start developing, you start thinking about the stuff that happens in terms of events. You need to know where to capture this. I just tell you that the traditional database maybe not very suitable for capturing this. So, what can? That's why the system that can capture those events comes into play. It captures, stores them. The events will have certain characteristics and based on these characteristics, we can say, yeah, we will preserve a order which is important because the events happened in real life and there are certain order of events and this is important. When you said something to your significant other that you're not supposed to be saying, you essentially cannot change that. It's another interesting thing because with the initial example of the thermometer changing the state but in the real life, you only thing that you can do is just issue another event. So you can issue an apology and it's dependent on the receiver system how the state of the receiver system will change based on new events. So we have event like telling the bullshit to your wife and after that, you send the event that's contains the apology. Based on the latency and how the system processes all these events, it would be internal state of the system will change. So, it requires a little bit changing the way how you approaching this problem, but if you think about this, we are human people, much easier to operate on the event driven fashion rather than being object oriented with the example that I just gave you with the conversation that we have with people and plus, the events are happened in your life. You can't change those but you can remember those events. You can replay those events in your memory. Same thing a system that captured those with allow you to do this. So capturing events. Events are immutable. They are ordered and there's a way how we can replay those events in order to generate some different output. Another example from your life. You replaying those events in your mind and you can record the podcast. Why are you talking about those events or you can write the book but essentially the facts these events that captured in the system, they're not changing. It's just to change the way how you process those and what kind of the product of this process would be, if it makes sense. I'll just do a stop for a second if you need the time to process it.

Viktor: [00:11:58]
No it makes a lot of sense. So actually, I didn't think about it in those terms before that basically events may be the first examples of immutability actually now when I think about it right? But I'm interested in that other side of the story. You're generating some events and then somebody or something is listening. If I go back to the example of the wife. There is no guarantee that wife is really listening while you're saying something that you shouldn't right? Or reacting to it or maybe she's having headphones and actually you just said it and there is nothing coming out of that. That's how real life operates. What I'm constantly curious or puzzled is why it happens to be so hard for us to comprehend that way of thinking when we design our systems. Whenever I speak about similar subjects with people it's I just created the event somebody put this in a shopping cart. How do I guarantee? The questions that start like that. How do I guarantee that this? How do I ensure that will be proceeded by something else?

Vik: [00:13:06]
Answer is you shouldn't worry about this. In this case it is a separation of concerns between a producer which is like entity that writes these events into the streaming system versus the consumer because in real life, as you point out perfectly, there's some delayed processing based on certain conditions. When I say something, it may be ephemeral, so it more look like the message in traditional messaging system that you'll get the acknowledgement so those words are gone. But when I actually putting some post-it notes where something is written that is how the system can return and read those messages whenever it's ready. So in my opinion and I think this is the right way to think and you really shouldn't worry about much like all of these like end to end. This is not the concern of the producer if someone will consume this. I'll give you another example. Kafka. The name usually people asking why this name is Kafka? Is it something to do with the famous writer who known for his interesting views and interesting stories about the interesting things that happened with interesting people like some person turns into insect for example or some injustice that might happen with anyone. So there's not much common with Franz Kafka and Apache Kafka. However there is a very significant similarity. So essentially, Kafka is designed to be many consumers can read a lot of stuff and similar writer is designed to write a book and I think that there will be millions of people who will read this book. Franz Kafka really not responsible for anyone who was able to read his book and he don't need to track them down and say Hey did you read my book? Hey did you read my book? Same thing with Kafka. So there's two concepts where your producer is disconnected from your consumer. When the producer writes the data, it doesn't anticipate how this data will be used, how this data will be processed. Here's another example. Think about Kafka as a book where the writer writes the messages in real time and the reader is also reading this real time. With Kafka we rather reading pages and put the bookmark where we stop reading in order to return to this and read this from the point where we stopped. In Kafka, we call this offsets and this is where we read something. But if we would compare this to the messaging system, with the messaging system, read something, you turn page and after that, throw it away. In Kafka, your book is still there and many readers can read the same book at the same time. They might have different use cases, like our example if I read the book I will looking for the pictures and if Viktor will be reading the book, he will be counting words and with Darin would be reading the same book. We're reading the same book, same materials and he will be for example finding the duplicates of the words in same book. It will not change the facts. It will not change the words. It will not change the pages, but each of us as a consumer will get a result that we would be interested in. I would be interested because I don't need to store the state. I just need to find a picture and save it somewhere. For me it's just a simple filtering, so I'm not doing anything. My process would be quite fast, so I would just switching the pages and finding these pictures. Viktor is trying to calculate number of words, so in this case his processing is more stateful because he needs to store this aggregation somewhere, store intermediate step when I have a individual word, we need to increment this. For Darin where he's looking for duplicates, he needs to maintain a state or set of words that already were there so he can eliminate those duplicates. Our use cases are different and the when this book was written, no one ever thought about certain use cases that Viktor and Darin might have. At this point it's great because it allows the companies to enable future developments of the system. Once they extract this data from some legacy system like for example mainframe, they can put this in Kafka. They potentially enable them to creation of new applications that will use same data but they never intend to be used that way, if it makes sense.

Viktor: [00:17:11]
I guess you're meeting probably people who are having flows of things happening through events and people who like more procedural type or I don't know what would be the name. Do think that there is some kind of relation between whether to use events or something else and how we're organized? Is it some kind of reflection of organizational structures?

Vik: [00:17:32]
This is very good question. There's two things that I already said a few times but I didn't emphasize the importance of those things. I start my conversation with the example of a thermometer where thermometer has a state. State usually representing something that happens with objects of particular point of time. This state might change over the time and there's a difference where are we querying this. That's why in a traditional systems like databases you issue query and you execute this query on top of the existing data. So when you issue this query next time, you might have a different result because something might change those data before you're doing this. This is like a request response type of pattern. The state isn't something that can be derived from the stream. Essentially stream represents the progression of the facts over the time where state is something like a finite look on the stream. This is what we call table to stream duality. Essentially table usually represents the state. Stream represents a history. Another way to think about this. Have you folks seen Queen's Gambit TV show that Netflix? It was pretty good one and spiked the interest in chess and I also would like to use the chess analogy to explain this. When you look at the position of the figures on the board, at the particular moment of time, you see a current state of the game. But every time when everyone is doing some moves, they're recording their steps. Basically, for every time if the state will be destroyed and they still have each move written into this paper, they will be able to reconstruct the state by replaying all these moves that happened in the game. Same thing with the stream and table You can always reconstruct the current state if you have a history of events that happened. So in terms of how you approach this problem in organizations and what kind of challenges organizations might have with this, it's a change in terms of people like how I will query stream and you don't query your stream. You are querying state. So you need to materialize the state based on the stream somewhere. Usually with the traditional databases, traditional systems, storage of your data and your query engine are baked in together. Kafka provides you the storage where you can provide external query engine for this application. The interesting challenge when I'm working with some people and been asking how I can get to this particular message? Kafka provides only ways how we can read in one direction That's why it's fast. It doesn't provide you ways how you can do random access to the things. In order to access randomly, you need to have some sort of system. It can be like simple system as a hash map. You can easily materialize any stream in a hash map or it can be something more sophisticated like a distributed cache or distributed database. That's why Kafka can take things from the systems, turn them into stream. Imagine when you do database manipulation and your database was not designed to be a system where you can easily capture the changes happen with your database. You do a classic upsert if you need to update certain fields. However your database Is designed around the concept that called log that's captures all things that happen in your database in this log and after that database engine will turn this log into the table. That's how database works. Essentially, Kafka takes the same approach that was designed around database systems into this extreme and allow people to design the system that way and make the people to think about designing of the system around events, if it makes sense again. Don't hesitate to interrupt me because I can talk about these things all day. I'm like Captain America of Kafka. I can do this all day.

Viktor: [00:21:26]
Normally I interrupt people a lot, but that's usually to argue against and I'm yet to find something. I might get there but not yet. So basically, if you would like to let's say have the state source somewhere, you would be reading Kafka events? How do you call it? Do you call it events?

Vik: [00:21:46]
Yeah, events and it's goes into the times where the Kafka was designed because if you look into the APIs, APIs talks about records. Some people call it messages. Some people call it events. It's really interchangeable terms.

Viktor: [00:22:01]
If I would go back to your thermometer example, the event wouldn't be set the temperature to 50 degrees, it would be increase it by one degree or decrease by three which is I guess a valid example, right? Then you would have some other system that would be listening to those changes, figuring out what is the current temperature.

Vik: [00:22:21]
Yeah correct. There would be a system that captures all events that happened with environment like temperature increases. There will be the system that will store those events in this particular case, it will be Kafka. There will be system that will be interested in those then. For example, you have a dashboard that needs to show the current temperature. In order to show your current temperature, you need to constantly update this current state. It's a very typical example of how the streaming systems used to be like a driver of the constant updating in a real time. Another very popular use case that also based on the same concept that we just discussed that may be closer to the things that our listeners are used to. You do have a database and you do have a system that allows you to speed up access to database. Say you have a cache and usually your application layer talks to this cache because it's much faster to get data from the cache than going to database. Plus, if it's some distributed system, you can scale this cache independently from database. I remember the times where licenses for databases will be very costly, so that's why putting cache between your application layer and your database would be serious performance boost without increasing the budget for buying new licenses for your databases. Here's the problem though. It's one of the problems of computer science is naming things. We've already addressed this by why Kafka is called Kafka. Another thing is cache invalidation. There's two approaches to dealing with cache. It's either readthrough cache. You hitting the cache and cache is smart enough to figure out where to read this data from underlying system or it would be cache aside. Your application logic gets the data from database and after that pushes in the cache and if this data is not available from the cache it will go to the readers from database again. Your application logic needs to get this result somehow updated because no one likes to see outdated information about your shopping cart or about the inventory that you have. But this database that your application is dealing with may be updated by batch system that you don't have any control. It's a system were written a long time ago. There's no integration. It cannot send you notification so you will update your cache. What do you do? The system like Kafka and the components that are available in Kafka called Kafka Connect that allows to integrate with this underlying database functionality, transaction log, or write-ahead log to capture those changes and push the changes directly to cache. In this case your application will become real time immediately without you even changing the code. In this case, all this stuff that dealing with the background will be happened behind the scenes and your application logic don't need to be changed in order to get the benefit of it. In some system it's a concept called hot cache or refresh cache or something like that. This is very common use case where change data capture, essentially the process of capturing these changes from data system will work together with some other systems.

Darin: [00:25:21]
Do people come to you and say all of this eventing stuff is too hard. We just don't want to do it. The only thing we understand is a database and we have to know state all the time. How do you convince them otherwise that events are actually the truth? You've already stated it that the transaction log in a database is the truth. It's just that tables are just materialized.

Viktor: [00:25:46]
Let me give it a try. Is it either or? It's not that you have to have events or you have to have only the state. Those things work together in a way right?

Vik: [00:25:55]
It's a very good question and it's more like a philosophical question but let's discuss this. There's actually three maybe not so distinct version of it. I think we do have a quorum. No pun intended because Kafka distributed system and there should be quorum in order to figure out where to write and things like that. Yes, I as a representative of the vendor who developed and built a platform based on Apache Kakfa wants to make Kafka as a central neural system of your organization. Source of truth if you'd like. All streams lead to Kafka. Everything's just dumping over there and after that whoever will pick up and do something with this data. Ultimately I want people go and be in the everything is streaming. However I know I'm a practitioner. I was working in the consultancy. I was doing a lot of professional services engagements even with Confluent before I started doing developer advocacy. I understand this is not the reality that people are facing with and usually there's plenty of different systems that have different characteristics that serve particular thing for whatever business line they're trying to do. Viktor is also correct when he said maybe it's somehow aligned with the organizational structures because it's very difficult in many organizations if one department builds some system to get data out. You either need to convince them somehow to build API and provide the data that you require. For example, you do have a system that requires some of the data about payments and they don't want to build this. The payment is running on the mainframe. They don't have a budget to build the API for you but also they want to cover their backs and they don't want to give you direct access to their databases because there's some schema changes that they also need to notify you and saying okay so we changed the database now. Your application will be broken. Standard database is incredible tool for capturing the state of the organization but database is 100% useless to do some sort of state negotiation or a conversation negotiation type of thing. If you try to talk two or three systems through one same database, you will get into very unpleasant situations very quickly. It would be either cumbersome from perspective of separation of concerns, separation of securities aspects and the people cannot access to particular fields or they cannot change the schema but they only can do queries and things like that. It leads to silos inside organizations. One of the problems that Kafka was trying to solve, back in the day it wasn't a startup, it was in LinkedIn where the Kafka came from, is to break those silos, allows to different departments communicate correctly through the data that's there both systems might require. Some ad systems that run some advertising, they need some data that comes from the profile systems in order to show relevant advertisement for particular users, they need to know data about profiles but those updates from the profiles, it needs to be offloaded into HDFS and after that the Hadoop will crunch those numbers and after that they will be loaded in the system where they can do the ads. By that time, someone will update profile and again the ad might be not 100% relevant. They were trying to speed up this process but increasing the velocity how these changes will be propagated between the systems. This is where the idea of grabbing this in one place and after that not wait until the end of the day but get this data as soon as it arrived. I cannot stress enough to emphasize the concept of transaction log, write-ahead log that was there for many years and how this concept. They usually call this turning database inside out. Instead of looking at database from perspective of public API, we just extracting whatever we need from this log. That's why the concept of using database as a source of truth, it's all right. It's fine. However there's a lot of complications will arise if you're trying to use database as your point of conversation between the systems. You can build the systems around one database when you can count number of stakeholders by your hands. If you go any bigger than that, you probably need to have a committee that will approve certain changes that will go into particular schema on database. You need to have a tooling that will be responsible for rolling those changes and things like that. Kafka don't really care what kind of stuff you put in there. It's really about how your consumer and your producer can negotiate but it's a little bit orthogonal conversation where we can talk about schemas but those schemas can be more flexible. Schemas can support backward compatibility and forward compatibility and some sort of full compatibility if you change something. The cool thing about this is that also Viktor pointed out in the very beginning how you would know that consumer will be able to read the data. You don't, because a consumer and the producer, they can use different schemas and the tooling will be able to manage those. For example, producer can produce messages with version of data version two because they want to be more progressive. They want to do more data but consumer can be behind and they don't really care about the data that comes with version two. That's why they still will be able to extract information that they need based on version one, because version one is backward compatible with version two. That's how you can do this in Kafka. There's a third approach. First approach, we're using database as a source of truth. Second approach, using Kafka as a source of truth. Third approach is okay so let's use a distributed transaction and we will commit the same time in database and the Kafka and in this case will be win-win. Who wants to fight with me about the third approach and talking about this not fun thing to do. No, it is an option and in the field when I was talking to multiple customers, it comes again from the mindset where people tend to do this with messaging systems. Remember XA transactions where you can send the message and write this in database as a part of the same loop. It works to a certain degree. It works when you don't need scalability. It works when you have a single node database and single node messaging system. Once you go into distributed system world, latency increases and the complications to support increases. Some people might say all these startups, they just invented all the NoSQL technologies just because they couldn't afford expensive distributed transaction solutions. This is a little bit of truth about this. A little bit of truth about this because the startups who designed those systems they designed those to run on generic hardware. They don't have the money to have Exadata clusters. That's why they put more effort to designing the software aspect of the system. Software is winning over the hardware in terms of scale horizontally would be cheaper than scaling vertically like putting more beef in your computers where we can just throw the clusters. This is where distributed transactions stop working because it would be too expensive to do this coordination and make sure that consistency would be sufficient. This is where a third option is actually something that I cannot recommend and I'm explaining to customers why I cannot. This is why like you are either going with database as your source of truth and after that you're using the change data capture connector that will be extracting data and pushing this data into Kafka or using Kafka as your source of truth. You're always producing events into Kafka and after that using the different type of connector that will populate those databases for you instead of run this as one single transaction loop.

Viktor: [00:33:22]
But there is I guess fourth option. That is that if you do split systems into smaller pieces, let's say a product catalog and a shopping cart and if they're managed by different teams and they can have their own source of truth. This is my catalog. This is me and this is your shopping cart. This is you. I don't care about you. You don't care about me. The only thing that we really really need is a way to communicate with each other. Right now, we're speaking in English even though my native is Serbian. You don't really care whether I'm memorizing things in Serbian or in English. It's important we have a communication language and we can have those different sources of truth that each of us has a source of truth that is our own source of truth and then we have Kafka or something like that that is a communication mechanism between us. You don't care where I store my stuff. I just added a new product to the catalog that can enter into your shopping cart. I don't know. I don't care.

Vik: [00:34:19]
It makes sense. In this particular example, it can be anything. It can be Kafka. It can be API calls. One of the things that the event driven approach will give you is also asynchronous communication or asynchronous processing and resilience because we were talking about sunny day scenarios when my product catalog is up and my shopping cart service, they up and they communicate to each other through RPC, REST, or whatever, it's all good. It's all good and everything's fine. But what's going to happen if there is network partition happen between those systems? The idea of microservices is that the microservice itself needs to be sufficient and needs to be reliable and will not depend on the failures. Would we know cascading failures of the systems? In this case, Kafka can be used in multiple ways. Since it's a system that will be backbone of the communication that data stored inside of Kafka. Kafka allows to do persistence and when your shopping cart system will be down or for example shopping cart will depend on inventory, inventory system will be down but there will be still some information that can be retrieved either from Kafka or this stuff would be materialized on the shopping cart side of things. Duplication of data is fine. It's okay because this data is required for your service to execute certain tasks that your service needs to be doing. You don't really need to depend on other system. It's better to display stale data instead of displaying error 500. You can even do the sale and after that send the mail that Oh sorry this item is not correct then you're not doing the sale because your inventory system is down. Kafka allows to fulfill those use cases. Kafka is very ubiquitous type of thing. It's not necessarily needs to be used in the way how it's prescribed. It is ubiquitous like Unix pipes. You can tie together system that were not intended to be used together or like they were not but they they work with the same protocols. It's exactly what Kafka does. You have a cat, you have a tar and you pipe the output of the cat into tar so you can have a zip folder. Those systems they might do one small thing but they do this thing very well. In this case, Kafka will only bring some benefits.

Viktor: [00:36:35]
Exactly. That's the model I'm very interested in. That's why I was mentioning organizations before and shopping cart. I like that concept. I'm an echo command. I don't care who is going to get my output and I'm a sed command. I don't care who provided me input and it's up to somebody to figure out how to connect all that stuff.

Vik: [00:36:52]
Exactly. Darin in the very beginning brought up very interesting point, orchestration versus choreography. Kafka can be used in both of the cases. The system can be designed the way that you will have the orchestrator service that will be responsible for routing events across or events would be enough in order to establish communication or provide enough information for the other systems to react on. That's why events can be used as a state transfer. That's example that I brought up. We have a sensor reading and we do have some of the sensor reading changed. In this case, not changing every parameter of the sensor but for the particular parameter of the sensor and that can be just simple notification. Some events might include some data that only system who interested in this will be able to decode and do something with this versus where there's notification. Order happened. Maybe this is going to be a notification that needs to be propagated to multiple systems who is reading this particular message. It's nothing to do with Kafka that Kafka is prescribing you to use like ESBs for example. ESB is all about choreography because you need to put some logic inside your ESB in order to do routing of the messages based on headers, based on some payload and some other things. With Kafka your system design will dictate if you're using choreography or orchestration.

Viktor: [00:38:14]
You just mentioned something that died and I think that that's a signal to to stop stuff to stop kind of like. ESB. Dead. Finished.

Darin: [00:38:23]
Yeah. We have gone down some paths that we've never gone down before and you took us down one place that I never wanted to go again with ESBs. So Vik, if people wanted to follow you, listen to more of what you have to say, what can they do? You've got a podcast.

Vik: [00:38:41]
Yes, I have a podcast that you probably don't want to listen because it's in Russian. Lots of Russian swearing words and things like that, but there's a very good podcast where you can learn about stream processing and event driven paradigm called Streaming Audio that I participate time to time where we discussing some technical aspects or some of the more architectural aspects and talking about the things that we discussed today. Streaming Audio. It's a great podcast. My friend, Tim Berglund, is hosting this. Sometimes I'm also there. I like showing stuff and showing how certain concepts that we talking about can be implemented with this certain technology. So that's why I do weekly streams on Confluent YouTube channel, where I'm just taking someone's use case. For example, we're talking about use case of materialized view. How your application can leverage this use case of materialized view. There's tons of events happening and how my application will be able to build the materialized view on this event. How to turn the stream into table, for example, or how you can deploy Kafka on Kubernetes or how you can use Kotlin and serverless together with Kafka. All these things are happening on the Confluent YouTube channel and as always, you can follow me on Twitter, which is gAmUssA. This is my go-to place. If you have any questions, just DM me. DM is open. We do have, in general, very good place asking these questions called a Slack community. cnfl.io/slack, which is probably largest community in data and data processing space. You can ask any questions around some of the internals of Kafka or some externals and what would be the best practices to use Kafka with DevOps tools or how to automate deployment and do a blue-green deployments of Kafka clusters or your stream processing applications.

Darin: [00:40:31]
I think you might be underestimating. I think we would have a lot of our listeners enjoy listening to swearing Russians.

Vik: [00:40:39]
That's, that's great. There is a razborpoletov.com podcast, which can be translated as debrief, like after flight debrief type of show. I was doing this since 2011, I guess. Almost 10 years. Yeah. It was a pretty nice run. I did a podcast for some reasons we didn't go very far. It's called the Crazy Russians in Devoops. We were talking about all sorts of DevOps topics. It's still there. You can find this at pod.link/crid. Crazy Russians in Devoops.

Darin: [00:41:06]
All right, Vik. Thanks for hanging out with us today.

Darin:
We hope this episode was helpful to you. If you want to discuss it or ask a question, please reach out to us. Our contact information and the link to the Slack workspace are at https://www.devopsparadox.com/contact. If you subscribe through Apple Podcasts, be sure to leave us a review there. That helps other people discover this podcast. Go sign up right now at https://www.devopsparadox.com/ to receive an email whenever we drop the latest episode. Thank you for listening to DevOps Paradox.

DOP 97: Processing Event Streams With Apache Kafka

Show Notes

Guests

Viktor Gamov

Hosts

Darin Pope

Viktor Farcic

Links

Rate, Review, & Subscribe on Apple Podcasts

Signup to receive an email when new content is released

Transcript

33Across

host description

View Cookies

33Across