January 28, 2006

SOA's technical landmarks

I think there's a lot of curiousity about what has led people towards SOA as a preferred architectural style for distributed computing. Besides market and business factors, especially SOA's focus on IT governance, which are likely the primary reasons, there are big, solid technical reasons for the shift, in my opinion.

I think the technical reasoning is three-fold: firstly, SOA recognizes and re-uses the most applicable facets of object-orientation to a systems-wide case. Services are definitely not distributed objects, but they retain a few basic facets of the general object oriented paradigm. These facets being the primacy of extensible message passing with all of its implications, and the importance of focusing on designing interactions between objects (instead of their internals) when trying to construct an evolvable, growable, and interoperable system. Alan Kay, Smalltalk's father, dropped this nugget of insight 8 years ago:

I'm sorry that I long ago coined the term "objects" for this topic because it gets many people to focus on the lesser idea.

The big idea is "messaging" -- that is what the kernal of Smalltalk/Squeak is all about (and it's something that was never quite completed in our Xerox PARC phase). The Japanese have a small word -- ma -- for "that which is in between" -- perhaps the nearest English equivalent is "interstitial".

The key in making great and growable systems is much more to design how its modules communicate rather than what their internal properties and behaviors should be. Think of the internet -- to live, it (a) has to allow many different kinds of ideas and realizations that are beyond any single standard and (b) to allow varying degrees of safe interoperability between these ideas.

The second reason SOA is so important is that it recognizes the long fought, hard won (and still not decided) battle that distributed computing is fundamentally different from local computing. To me, the watershed paper in this debate, now a classic, is Sun Microsystems Labs's 1994 paper A Note on Distributed Computing. I recall in 1996 the debates on the (sadly defunct) dist-obj mailing list about the importance of this paper, and how it shattered a number of the (then prevalent) CORBA and DCOM assumptions. Its major point was that distributed system endpoints require explicit boundaries to deal with the fundamental differences in latency, relability, availability, concurrency, and memory access when moving from local computing to distributed computing.

SOA doesn't have any explicit approaches to dealing with the above, other than recognizing that you have to. A service is the combination of implementation, interface, and contract, which contains the "rules of engagement". A contract is a mapping of service implementations to standard, well-understood "policies" for interaction - the mesage exchange patterns, the availability, reliability, latency, and expected volume characteristics, and how these policies are realized through the service interface.

Explicit contracts and policies, even if they aren't automated, are useful because it guides people to the correct usage of both legacy technology and newer technology. Progress towards automated policy enforcement will be slow as we're still mired in the muck of yesterday: SOAP/WSDL's RPC heritage, MOM's proprietary transport and fixed-message-format heritage, and Java Remote Method Invocation (RMI), which in practice missed important aspects mentioned in the paper, such as dealing with concurrency and interoperability, not to mention the myriad security, reliability, and availability standards and facilities out there.

Finally, SOA acknowledges the importance of shared data semantics for interoperability. A lot of the work in data warehousing community is important here, for they were the first real world attempt to integrate disparate systems under a common umbrella. Building practical enterprise canonical data models is absolutely necessary to ensure interoperability in SOA. The point is not to create a universal model for all audiences, the point is to ensure that groups of services that hope to interoperate must have an explicit mapping between their interface's representation and semantics and some other canonical representation and semantics. This may involve deterministic mappings, as would be the case with most transformation technologies, but it also may involve probabalistic mappings, as would be the case with search technologies or data cleansing/matching engines.

Posted by stu at January 28, 2006 10:30 AM