Udi Dahan’s interview, part 3/3

Udi Dahan is known as The Software Simplist, an internationally renowned expert on software architecture and design. Also recognized by Microsoft with the Most Valuable Professional award for 5 years in the areas of connected systems and architecture.
As Thomas Erl previously, he gave us the immense honor to answer questions, always with simplicity.

Today NServiceBus, content-based routing and long-running processes.

Can you please present NServiceBus and also why did you decide to create your own ESB?

I guess the simple answer is I got tired of waiting for Microsoft to do it.
I was working on projects with large distributed systems and I needed this type of capability. There were all sorts of middleware products that gave me certain level of functionality, like Tibco a fantastic middleware solution with Rendezvous, very fast and powerful messaging. But it was very low level, it didn’t give enough support for developers in terms of guiding them and how they structure their code.

So when I was looking the various technologies, not just around Microsoft, all sort of vendors. I found that a lot of them gives strong messaging abilities but nothing at higher level. And that’s pretty much where I focused because I had no intention of writing my own type of middleware.
So I looked at distributing middleware in the form of Rendezvous or another type of form Microsoft Message Queue (MS-Queue) where the concept was you have local queues on all machines so you have a fully distributed P2P type of model without any central single point of failure.

So now we have this fully distributed model, what type of API should developers be using to communicate on top of this? When a message arrives how should that message be dispatched to developpers code? If there are errors when processing a message, how should those things be handled? How should retries be done? So all of those types of concerns were things that I just didn’t see addressed by the tool that was out there and I needed them to build the projects that I was working on.

So I started writing out this type of code for myself and put together a side of libraries.

What are the main features of NserviceBus and why it aligns on SOA?

When we talked about SOA, one of the things we mentioned was these autonomous, loosely coupled services that interact in a bus architectural style meaning an event-driven approach.
And the way that NServiceBus is built makes it very easy to do event-driven communication between different parts of a system.
That’s something that a lot of service buses support with a topic based communication model.
But there are two elements that NServiceBus provides and others don’t.

The first one is with regards to topology. A lot of the so-called enterprise service bus technologies that exist today are centrally deployed. Meaning that any services that want to publish an event, have to remotely connect to the ESB in order for the event to be published and if there was a connectivity issue, they’re stuck. So it’s kind hard to say that they really are going to be autonomous if there is a central point of failure like an ESB.

In NServiceBus because it is entirely distributed, in that sense it’s hosted together with the service that is using it. There is never a situation where a service can’t access a service bus because it’s deployed together with it. Internally, NServiceBus uses a store for queuing to make sure that the events are propagated later on, when the conductivity comes back again but the service is never blocked. So that’s one thing that NServiceBus does and that most ESB’s don’t do.

The second thing with regards to publish-subscribe is that NServiceBus takes semantics and brings them to a whole new level.
So traditional ESB is allow you to define what’s known as the topic hierarchy. So you can have a root level topic in a couple of other topics that inherit from that base topic. The things that NServiceBus does is a step further, it allows for multiple inheritance.
So you can define messages using interfaces rather than classes because interfaces allow for multiple inheritance. This allows developers to model much richer semantics where they can say that a certain event IS-A whereas that is a relationship with multiple other events.
This allows them to compose and recompose the semantics of the system over a long period of time without having to resort things like content-based routing.

Content-based routing is ultimately when you have already inherited from one topic and you would like to semantically model the fact that you’re related to other topics, you have to go for composition. So that the payload itself has to be opened and examined in order to see where it needs to be routed. Content-based routing can so be thought as workaround for the fact that the topic hierarchy only allow you to inherit from one other topic.
NServiceBus allows you to inherit from as many other topics as you like, means that you don’t have to go content-based routing hardly ever and ultimately that means that you don’t need to put complicated business logic into the service bus itself.

So alternately it drives down complexity, gives you stronger semantic modeling and gives you longer-term with better versioning as your services continue to evolve.

Apparently NServiceBus doesn’t support the content-based routing pattern because it is a dangerous pattern in a service bus context? Can you explain the reason?

I said that content-based routing is absolutely necessary if you’re doing integration centric work like with a broker. In other words you have several existing applications that you need to integrate and there may be some duplicate responsibilities between them.

For example you might have two fulfillment systems and you need to, when an order has arrived, you need to open up that order entity to see based on some type of product ID whether that order should be routed to the fulfillment system A which is responsible for a subset of the product ID or to fulfillment system B that is responsible for the other subset.
So that type of integration work is proper broker work and that’s where content based routing comes from. It’s not the kind of things that you want to do in a service bus type environment.

Now it’s not entirely accurate to say that NServiceBus doesn’t allow you to do content-based routing. It’s better to say that NServiceBus doesn’t make it easy for you to do content-based routing. You always have the ability to take control in your own code and say “now I’ve got the message, I’d like to look at it and decide base on its content whether I want to send it to destination A or destination B”. So you can do that in your own code and have that code hosted in NServiceBus just like everything else. So it’s sort of a core philosophy of NServiceBus, we want to make it so that working according to the bus architectural style is as easy as possible and we want to encourage you to do the right thing. When you need to do more broker centric work it’s not that you can’t do it but it will be a little bit harder, you have to write an extra two or three lines of code.

That tends to nudge people in the right direction with regards to design for solution.

Could you explain the use of long-running processes with NserviceBus?

In NServiceBus we call long-running processes Sagas, which is a pattern that the relational database community came up with for the purposes of handling long-lived transactions where ultimately you create an explicit set of smaller transactions to handle it.

In NServiceBus that means that the way that you handle a long-running process is by dividing it up into series of messages where each message is a single small transaction which is reliable and fault tolerant. And you are provided a larger steep management facility in order to handle the reliability and the fault tolerance of the long-running processes as a whole.
So this means that even if a server crashes in midway through processing a message that is part of a long-running process, not only will the message being processed be rolled back, but the long-running process itself can continue running on some other machine in the state that it was in before the message had its processing crash in the first machine.

So this type of ability allows developers to focus on the core business problem without having to deal with reliability concerns, performance concerns, scalability concerns in their environment.

The second thing that is significant about the long-running process development model in NServiceBus is that we provide very strong unit testing capabilities for these types of processes. This is something that is really quite critical when you’re talking about core business processes. You don’t want to have to deploy them to production in order to see if they are actually going to work correctly. When you compare this to other orchestration engines or business process execution language type runtimes, they make it very easy to graphically draw out some kind of process but it’s often a lot more difficult to isolate that process and to test its behavior correctly.

So it’s like getting back to the broker philosophy where you can have an activity as part of your business process which calls a mainframe.
The problem with that is that when you actually think about going to test that process you have to find some way to mock out the behavior of that mainframe and to simulate various types of things in your test, that’s very hard to do.

In NServiceBus because we don’t do integration there is an inherent decoupling between the process part itself and any integration parts that are off to the side. That way, if you needed to talk to the mainframe you’d be sending a message to another endpoint and it would be talking to a mainframe. In that way you can simulate all sorts of scenarios, saying “when I send you this message I expect you to send that message to the mainframe” and then I can simulate a response ack with a concrete message without having to actually stub out the API of the mainframe.

The second thing that is really quite significant in how NServiceBus helps you model your long-running processes is that makes time explicit. So in other words you can say that “when I get a message of type A, I’m going to send a message of type B to that other system over there and if I don’t get a response from that system within 15 minutes then I want to send an alert to an administrator” (which is another kind of message).

The important thing is that the way that we handle time bound processes is also using the same messaging mechanisms. That means that even if a server went to crash midway through a process that the amount of time that was remaining that you were waiting for that response to come back that would be remembered. So you’ll never have a business process that can stuck just because a message didn’t come back.
We also make it very easy for you to test these types of scenarios so that you can actually simulate the passage of time in your tests according to the order of the long-running process requesting them.

So this might not be so significant if we’re talking about timeouts that are in order of minutes but if we look at really long-running business processes. For example mortgage applications, in the field of insurance, loans or those types of things that can take days and weeks and months. Having the ability to unit test that the passage of time is happening correctly without actually having to run the system and either go to a database and fiddle with times yourself, to not have testers have to wait days and weeks and months.
You know it is really a big difference towards making sure that the behavior of those critical business processes is correct.

Today, which are the NServiceBus customers?

They are a whole bunch, pretty much from all walks of life. I can’t really talk about all of them because for many of them, they don’t want to expose the fact that they are using NServiceBus. Some of the better-known names for example Rackspace, they are using NServiceBus for their e-mail and apps division, rolling it out even more broadly.
One of the largest insurance companies in the northern regions called ?. They’re using NServiceBus for the core policy.
There’s Fortune 500 companies like Johnson Controls. They have all sorts of projects that monitor the energy utilization of all sorts of buildings and large structures and they’re using the NServiceBus to handle the connectivity between those buildings and propagation of the events within that environment into the cloud and back again.
There is also a company called Plexas that is using NServiceBus for their benefit and claims processing.
We see all sorts of software and service companies, some of them in advertising, some of them providing human resources software as a service.
We have banks, all sorts of financial services companies that are doing all sorts of stuff with NServiceBus. Investment banks, commercial banks, something working directly with the private sector that the actions that people do with SMS banking. It’s a big thing for more developing countries where not everybody has a smartphone but everybody has a phone that can text so all the actions of people can do anything on their bank account.

So just about every domain you can think about, customers are using NServiceBus.
If we look at just downloads and the amount of repeat downloads and people keep coming back upgrading to the newest version where the area of two or three thousand people that are continuously using NServiceBus. So it’s getting fairly big!

A big thanks to Udi for his simplicity!

More informations

Udi Dahan’s interview: Part 1/3

Udi Dahan’s interview: Part 2/3

Licence Creative Commons Think Service est mis à disposition selon les termes de la licence Creative Commons Attribution 3.0 non transposée.