August 31, 2003

why distributed computing?

why is distributed computing such a big deal? My focus here is for enterprise computing systems, not internet based systems. The dream of distributed objects and "n tier" architectures was to have all of these services floating around. I was a huge proponent, it's probably what made me want to be a programmer. Today I'm asking - why? WHY do they need to be distributed?

Why can't they be co-located? Can we not design things that distribute because there's an actual reason behind it other than politics?

The over-complexification of software is continuing its takeover of our enterprise systems. Why can't we take the simple solution to XYZ problem by writing 10 stored procedures, wrap it with a SOAP service, and call it a day?

I see way too many systems that require physical component seperation if they're going to get any logical seperation. (A client I'm dealing with now has this problem: dozens of CORBA services, an EJB cache, an MQSeries-based bus, a JMS-based bus somewhere else, and pieces of C++, Java, and .NET code everywhere. All on, literally, hundreds of physical servers costing millions of dollars. The amount of hardware resources being squandered is staggering for what piddly performance they get.

Greg Pfister, author of In Search of Clusters, had a "standard litany" of "why go distributed". Scalability was the big buzz word, though I rarely see any systematic analysis of scalability beyond a few load-test scripts on any particular island in the distributed system - rarely on the end-to-end distributed system itself. Pfister's main point is still correct: the reason we don't have scalable & reliable clusters is that the clustering software still sucks. And yet we think our enterprise developers can do a better job than the cluster vendors. In our enterprises, we only have clustering software on the "islands". We don't leverage it in a larger sense by recombining our logical components into cluster nodes and designing the distributed system in a truly co-ordinated manner. (What's fascinating about this is that Gartner actually predicted this trend -- that we'll cobble our own approaches vs. taking a vendor's clustering approach to a large scale).

So here's my theory: It seems that many developers and architects need that physical split to wrap their heads around logical separation of concerns.

We've seen this kind of mental block for years in other areas of software: the need for physical libraries vs. logical modules. The inability for people to see data as "set oriented" and requiring pointer-based access to data, or to use a cursor to access data one-record-at-a-time. There are other examples.

All of these islands based on "skill sets" pop up here and there. I've seen it happening with CORBA - you have a C++ island here, a Java island there, etc. Now we have a .NET island somewhere else, a MOM island that bridges to CORBA, and an EJB cache that talks CORBA and MOM, with a bridge on the MOM that talks XML. It's all of these little "skill set islands" being deployed out into the enterprise without any attention paid to the fundamental problems with such an approach: tremendous complexity in debugging, latencies between each component that limit scale, and reliability concerns because each island can fail independently, has their own recoverability mechanisms, and the most planning I've seen is the "hot standby" + a DR site per island. Usually it's just a cron job that restarts the failed process. This is staggering! Though perhaps this "distributed shanty town" actually is what enterprises really want. I guess time will tell, if the reliability problems bite them enough economically. Thus far, I'm skeptical. Look at the current virus problems endemic on the web, especially with certain e-mail systems. What are people doing about them?

My feeling is that we're going to see service outages and tremendous scale problems in many enterprise systems. I hope I'm wrong. I hope that the relability concerns I see are really just an overreaction to the changing economics of software development, and the "shanty town" approach is really just the emergence of a new "ecosystem".

Now Microsoft has adopted this model of the "interoperable island" as their way of touting Windows in the enterprise, which is a good thing in a way. In the past, you were locked in the Microsoft world and couldn't talk to anyone else. Today you're still locked in, but now they're happy to let you talk to others.

So .. as Gerry Weinberg once said, once you solve problem #1, you're the one responsible for promoting problem #2. Problem #1 was to eliminate platform & language religion from distributed interoperability. Problem #2 (in my mind today) was pushing distributed computing for systems that didn't need to be distributed. Thus non-technical reasons such as divergent skill sets and the inability for develoeprs to make a logical/physical mental split became the rationale behind such designs.

We deserve it, I guess.

Maybe this is a good thing: the evolution of a "city plan" based approach to IT architecture, in Gartner's terms. The manager in me thinks it's probably a good thing. The technologist in me is frightened, because the only way I've seen end-to-end systems problems solved is by someone that could see the whole picture - forest and trees - and fix the problems locally. That's a rare quality... and something I see lacking: true architectural guidance that doesn't wind up being the hated "design cops".

Posted by stu at 08:56 AM

sometimes it's better to keep quiet

Instead of posting my thoughts, sometimes I think it's better for me to shut up and watch the fireworks. I find after I post any longish post or rant, I change my mind after I hit "send". Sigh.

Ok, so after my longish post to TSS about web services, I read Werner Vogel's excellent Web Services are NOT Distributed Objects. This is indeed the first time I've seen someone that's been a part of the dist-obj community in the past actually take this position, so I'm quite interested in what he has to say. I'm wondering if this will change my mind about what I said just a few hours ago...

At first I thought "great, just when I thought I had it figured out, my paradigm's going to have to shift". After reading the article though, I think it resonates with me that most of what he says is stuff I already believed. But it did change my mind about some things, in a sense. It reminded me about stuff that I've kind of forgotton that I knew a couple of years ago, but mired in the practicalities of day-to-day technology, I had forgotton.

I do believe web services are different from distributed objects. I've detailed in other places what I felt web services to be, and that page hasn't changed much in the 2+ years since I wrote it, so I think it's still relevant. My view of the differences are primarily a) intrinsic message structure vs. dependence on an extrinsic definition language, b) thus the web services "document" model is a lot more like a dynamically interpreted interface (DII / IDispatch / RMI+Reflection) than a traditional static dist-obj interface in the CORBA/COM world c) and it's not based on any object model at all, so we don't have to worry about platform religious wars.

Werner takes the perspective of the old "stateful vs. stateless" debate. Web services are stateless at their base level he says, whereas distributed objects (COM notwithstanding) are stateful. Granted. I have, however, been somewhat biased by practice here: stateless EJB session beans tend to be the norm as entry points into today's distributed object systems.

As for "document oriented computing", while I've believed this is important, I've seen a lot of mixed messages here from various parties, so I've been kind of confused about what it means. It seems to me that every vendor is hell-bent on retrofitting RPC on top of web services as if that was the end goal. I've seen a few fairly influential articles from old COM folks that were very eager to make Web Services as static as COM/CORBA. On the other hand, Microsoft .NET has almost totally embraced the doc+literal model, which is good. Perhaps I just stayed out of the web services front-line discussions for too long and there was a switch at some point among these types, but I don't think so -- I'd also like to note that the other major platform vendor out there dropped their messaging-oriented web services API (JAXM) to focus on JAX-RPC. That's a very sad thing, and I wonder if it will hurt Sun in the long run. But on the other hand, I do admit, programming for JAXM is more painful than JAX-RPC given current tools, but perhaps JMS will subsume what JAXM was.

One thing I hear a lot, and Werner's essay echos this, is that "document exchange" is very different from "object interfaces". At the core, I suppose this is true. But, as I go around training people in J2EE, EJB, .NET, and Web Services, I try to find a way to relate the approaches. How is "document exchange" really different from traditional message passing, specifically the self-describing messages of some products like TIBCO RV?

The way I've taught it is as such: take your object interface, and reduce it to one method: void execute(Object) or Object execute(Object). (Or, alternatively, look at it like you would a JMS MessageListener: void onMessage(Message). Or like any other MOM-based system where you get a callback). The object coming in and going out can be an object graph (bound to XML through some mapping), or a DOM tree, or a pipeline of SAX events - the idea is to take the "surface area" of your interface and place it into your data. With only one physical entrypoint, the data itself can map to further logical actions and/or events - multiple schema instances in the same message. All of this is fairly fuzzy, but it's the general direction I've been thinking. I've always seen WSDL as a "crutch" technology - it's a way of hacking up a document into procedures & arguments, but it doesn't necessarily imply "RPC". But perhaps I'm wrong on this, I know there are many that view WSDL as essential (perhaps they're just subscribing to the RPC uber alles school).

I do take Werner's points on lifecycle to heart. That's one area I haven't paid a lot of attention to. What use is a create() method on a stateless session bean? There is none!

And then there's the actual document's data representation, which is something nobody really talks about that much. Is XML's data model really appropriate for most uses? Where does it break down, or become too complicated? Isn't it just hierarchies all over again - aren't relations still "the thing"? I remember sitting in a BearingPoint/KPMG briefing on EAI a few months ago and listening to one of their chief architects wax poetic about the cognitive studies IBM did in the late 1960's about the "folding" of data being a natural way that humans perceive and deal with information, and it's "interesting" how XML relates to that, that we may be going full circle back into a more hierarchical or network view of data. I wonder. I've become a bit of a relational/Oracle nerd in the past year thanks in part on one hand to a well-known troublemaker and on the other hand, someone who is probably the most inspirational technologist I've come across (in terms of the ability for one man to master his speciality), Tom Kyte.

A lot of this distributed computing stuff ignores that you could do a lot of commercial processing pretty cheap and fast on a relational database by just slapping the XML API's on top. Oracle's done a nice job with this, I've seen examples where it takes only a few of lines of code to expose a fairly large Oracle system to RSS or SOAP. From the theoretical side, Pascal and Date have been having troubles with this XML hype wave because it further drives the need for a complete implementation of the relational model out of the mainstream.
Werner mentions versioning as a key problem. This is something I've really found a dearth of discussion on. Some refer to namespaces as a versioning mechanism, but I've heard others vehemently oppose such an idea. A session at JavaOne 2003 proposed a mechanism that leveraged UDDI, which struck me as similar to how we do it today with MOM, distributed objects and LDAP.

Versioning should be fertile ground for innovation, XML effectively eliminates the need for "positional" semantics in messages which was problem with static dist-obj RPCs - adding extra elements or tacking other schemas onto a document should be a lot easier in the XML world. A couple of problems (being naive here): the pervasive use of the xsl:sequence ensures there is ordering to elements (does there really need to be?) .. furthermore, do people really design their schemas to be flexible (i.e. allow any namespace attribute or element to be tacked on in spots?) I see some evidence of this, but I doubt the discipline will be there in the mainstream.

Anyway, this has gone on for too long, but getting back to my original reason for writing this: in the end, I don't really think I've changed my mind too much about what I said about Don's stuff on SOA. Though now that I read his blog I realize that he was quoted out of context (silly me, should have known). I think web services are different than distributed objects, but SOA really is just distributed component based development in new clothes with some more flexible underpinnings. Perhaps I'll change my mind tomorrow. :-)

Posted by stu at 08:30 AM

The problem with Schumpeter

Related to a Slashdot story on the unstoppable flow of IT jobs to India, I replied to a comment that suggested that none of this really matters because of Schumpeterian growth theory, that economic growth is really about innovation and technological change. I agree, but my views have been tempered by some modern problems of both a political and social nature. Here are the comments.

Posted by stu at 12:18 AM

August 30, 2003

SOA vs. OO

Don box says that Service-Oriented Architectures will defeat Object Oriented ones. This is a bit of a fluff piece, but I really respect Don, I learn a lot from him at times. Here are some of my comments on this article.

Posted by stu at 10:37 PM

August 28, 2003

if you think harder, your code will be better

some comments on John Carmack's latest programming efforts.

Posted by stu at 04:06 PM

August 24, 2003

a sad day

Wesley Willis has passed away.

Posted by stu at 04:59 PM

Music copying in Canada

... is apparently free? This could be a hilarious case of the industry not recognizing what it was getting itself into.

Posted by stu at 06:54 AM


Computing monoculture. Makes a lot of sense in light of the MSblaster worm et al recently.

Posted by stu at 06:53 AM

Apple has class

Apple has class. It's probably one of the main reasons I use their software. I mean, they're a flawed company, but there's something about using a product that makes you feel better for using it.

Anyway, here's my latest example. They replaced their front page with a tribute to Gregory Hines for around 2 or 3 days a few weeks ago.

Of course you could say this is crass commercialism taking advantage of emotions over someone's death. But in this case, Hines was an avid Mac user (he was an "AppleMaster", or registered celebrity mac user) so it makes sense.

Here's the link to the tribute.

Posted by stu at 06:49 AM