Best Practices for Event-Driven Microservice Architecture

Best Practices for Event-Driven Microservice Architecture

If you’re an enterprise architect, you’ve probably heard of and worked with a microservices architecture. And while you might have used REST as your service communications layer in the past, more and more projects are moving to an event-driven architecture. Let’s dive into the pros and cons of this popular architecture, some of the key design choices it entails, and common anti-patterns.

Very comprehensive article.
Regarding streams-based event-driven microservices using Kafka, I would just add that it enables local materialized views which allows you to keep the application state close to the services with no hassles, you may check my post.

1 Like

I would add orchestration as another approach to event-driven service architecture. In the large set of use cases it is much better fit than choreography described in this article. Look at the example of the beginning of the article:

  1. the order service which could write an order record to the database
  2. the customer service which could create the customer record, and
  3. the payment service which could process the payment.

A real application would never create payment, customer and order records based on a single message as the above three services have to coordinate between them to achieve the business objective. For example the payment should be processed only after the order is shipped and not processed if there is not enough inventory. Orchestration solves this problem much cleaner as it allows to specify the whole business transaction is a centralized component.

I would recommend to look at the Cadence Workflow which is an open source developer friendly orchestrator developed by Uber. For example it is used to process tip requests in Uber application.

Disclamer: I’m tech lead of the Cadence project.

1 Like

Great article, Jason. I enjoyed reading it.

I’d add that now it’s possible to have your event-driven architecture documented with AsyncAPI. Disclaimer: I’m the AsyncAPI creator.

1 Like

Thanks for sharing this a great article. I’ve recently started
Kalium provides an easy to use api for creating event driven architecture on top of Kafka. Currently, built for JVM and support handling of pojos and protobuf.
I’ll be more than happy to get your feedback.

1 Like

Maxim, I recently discovered Cadence and it looks very interesting, I watched the video of the talk you gave back in 2017 when you just open-sourced it.
I plan to (partially) rewrite my PoC event-driven microservice application (which is currently implemented using Kafka) in order to use Cadence instead, I think it will be a good learning exercise and after that I will of course blog about it, of course :slight_smile:
After having played with it, if I am satisfied, I may push for Cadence at work (I work for a major European financial organization and currently we are looking into new technologies to replace our legacy stack).
Quick question if I may: as I understood it, Cadence (as the orchestrator) talks to every single one of the microservices, and each of the microservices talks only to Cadence, correct?
Doesn’t that introduce a single point of failure? Thanks in advance and keep up the good work!

Victor, thanks for looking into Cadence!
Cadence is indeed the single point of failure of such systems as all the components interact through it. The solution is not having a single point of failure inside the Cadence itself. Each Cadence cluster is highly available and able to sustain multiple host failures without availability loss. It also runs on a replicated database store which is immune to host failures.
Obviously this is not enough as there are failure modes like bad database schema deployment or just full region power outage that can bring the whole cluster down. For these situations Cadence providescross-cluster asynchronous replication. With cross-cluster each workflow gets replicated to more than one Cadence cluster usually in different regions (using AWS terminology). This allows to keep system up and running even in the case of complete region or Cadence cluster outages.
This setup was considered reliable enough for Uber to bet on it for dozens of business critical use cases. For example when the “tip driver” button in the Uber app is pressed a Cadence workflow is started.

Feel free to join the Cadence Slack Channel to discuss.

Thanks so much for such a detailed answer! I have just joined the Cadence Slack, once I finish my initial research I will post any questions which may arise there.