December 31, 2007

Continuing the debate

Some comments on this and that, because JJ's comments truncate at 3000.

- "Talking with Subbu, I think I understand the disconnect. He works for Yahoo (same would be true for Amazon, Google, Microsoft Live...). For these people, who operate vast data centers, every CPU cycle counts. It speaks directly to their user base: if yahoo is sluggish people will switch to Google and vice versa. I, on the contrary, work as an IT architect. "

Subbu is ex-BEA. I think he understands IT architecture, thanks.

- "For IT, If I can reuse a piece of code 2-5 times, this is a tremendous savings: imagine the cost of duplicating assets in IT? re-implementing, re-testing, integrating? how about maintenance? now I need to apply my changes to several code bases in different technologies?"

I've discussed this in email with you, but besides the technical challenges, SOA reuse is a pipe dream for most organizations because they are not willing to change their investment evaluation windows or mindset about the economics of software. Most are just looking to improve their agility -- which is about the way we design interfaces & interactions, not about reused logic.

- "Guys, in the Ganesh's style, I have news for you. There has been a big composite system running for decades: EDI. "

It's not news. Mark Baker has been talking about EDI for years as an example of interchange with a type of uniform interface.

- "Stu, I may misunderstand your point but you seem to say that one thing (we need actions) and the opposite (a uniform interface gives more loose coupling, therefore don't use actions)."

What I agreed with you was that we need a *contract* to generate shared understanding. I did not claim that we needed specific actions to generate shared understanding. On the other hand, I do think it would be useful to define higher-level abstractions in terms of uniform operations, though I am not convinced this will enhance interoperability.

- Your definition of loose coupling seems to reflect a very producer-focused view of services.

For example:

"RESTful Web Services indeed offer a coupling worse than CORBA, much worse because at least with CORBA you have an explicit interface to re-write the implementation on the opposite side. So in REST, if Pi changes, someone has to communicate to the other side (pick your method: email, blog, telephone, SMS...) what has changed."

Last I checked, Yahoo!, Google, or Amazon do not email me whenever they change Pi.

" As a matter of fact, thinking that any uniform interface is going to do the job is the biggest fallacy of the decade."

You have not demonstrated this beyond a bunch of hand waving that somehow "action oriented interfaces" are going to enable evolvability. I don't see it happening very often in practice. We'll see when your article comes out, but again, I caution against writing an article that is based on a strawman of your own design.

- Guy, this is no heresy, this is a fact and you know, facts are stubborn: our world is made up of actions, it is not "uniform".

This is not a fact, it is your opinion. This is sort of like saying "the world is not made up of objects" or "functions" or any of the many abstractions & constraints we choose to model our information systems. One idea is to take a look at Alexander Galloway's book _Protocol_ (if you don't mind philosophy). It talks a lot about the control structure of internetworks and how it is this very uniform control that enables an explosion of diversity at higher levels.

- "Eliminating Pi and Ci is the worst architectural choice you can make. It means a) your implementation is directly wired at the Pe and Ce levels and b) you constantly rewrite application semantics protocols on top of this uniform interface"

:Shrug:. I think the best bang for the buck is to lower the barrier to change for consumers by completely decoupling Pi and Ci from their external representaitons. You want to lower the barrier to change for producers by tying Pe and Ce more to Pi and Ci.

Example: You want to enable people to buy books from you. Do you
a) expose your API with WSDL?
b) conform to what clients already understand and build a set of uniform resources (like a website)?

(b) arguably requires more thought than (a) but has been more successful in practice. And there are plenty of web frameworks that are closing the gap with how productive it is to expose resources.

Your argument seems to reflect to a desire to make external representations align to the programmer's reality as much as possible, instead of designing the external interface to induce properties for the overall system. That's contrary to good architecture, especially "collaborative systems" architecture, as Rechtin & Maier would call it, where there is no overall owner or controlling entity.

One could argue, that the enterprise isn't like this -- it has a controlling entity (the CIO, or whatever). Except most enterprises that I've seen are decentralized messes, run with a Feudal governance model, or a Federal (i.e. politicized) model. But, it is not centralization vs. decentralization that matters, it is the balance of power. Thus I believe most large organizations could use with a dose of uniformity baked into their systems architecture -- it will help them gain quite a bit of traction in maintaining that balance.

- "XML, XSD, WSDL, SCA, BEPL, WS-CDL (and ebBP), WS-TX(and WS-CAF), WS-Security, WS-Eventing"

Half of these are not implemented widely. WS-Eventing and CDL probably will never be. SCA, I continue to note, is an implementation-level technology and doesn't enhance interoperability at the Ce / Pe level in any way. They help link Ci / Pi to the external representation, and frankly I could see an SCA binding for RESTful interfaces, though I think there would be gaps for dealing with link traversal.

- "They will understand that they have a lot of work to do, very hard work (not just to establish a robust communication infrastructure), to come even close to what WS-* has to offer today (not tomorrow, not in ten years)."

WS-* doesn't offer half of what you seem to claim it does today. Yes, it's a useful stack, yes it has some benefits, but reuse and loose coupling are not part of them.

- "(Stu I will also respond on choreography -I am not caught up on choreography, choreography is just another way to express Pe and Ce in a single artifact. It also adds some sequencing of operation compared to WSDL alone)."

WSDL alone does not specify much about the semantics or constraints on interactions beyond MEPs and expected data types. Ordering constraints are fundamental! In WSDL today, you understand them by reading a human language document! We're back to this fiction that somehow WS-* provides you a machine-readable contract. It doesn't. It gives you tiny slices of it to help generate code or activate your security or messaging subsystem, but the rest is an exercise for the reader.

Anyway, I think I'm finished debating you for a while. Please don't take this as implicit support for the ideas I have not directly addressed. You are on the right track in some ways, and very far away off in others. I look forward to your article -- if you'd like feedback on a draft, I am willing to look at it purely to ensure there are no major strawmen :-)

In any case, off to a New Years party. Happy new year!

Posted by stu at 07:19 PM

December 19, 2007

SimpleDB simply sucks

I mean, I really like the idea of Amazon's SimpleDB. Auto-indexing is great. Scalable is great. The price is great. Forget that their REST API is a joke that will have to change as soon as people start losing or corrupting their data. Why the fuck are they throwing out relational theory? The database barely even supports 1st normal form! You can't do any kind of aggregate operation -- no count, min, max, avg, grouping! There are no integrity constraints anywhere!

Take a look at the example they give:

Item_01ClothesSweaterCathair SweaterSiameseSmall, Medium, Large
Item_02ClothesPantsDesigner JeansPaisley Acid Wash30x32, 32x32, 32x34
Item_03ClothesPantsSweatpantsBlue, Yellow, PinkLarge
Item_04Car PartsEngineTurbos AudiS4
Item_05Car PartsEmissions02 Sensor AudiS4
Item_06Motorcycle PartsBodyworkFender EliminatorBlue YamahaR1
Item_07Motorcycle Parts, ClothingClothingLeather PantsSmall, Medium, LargeBlack

Let's ignore that item_07 has the Color & Size backwards. You'll note that Color and Size are multi-valued attributes. If you look up Multi-Valued Attributes in any relational textbook, they'll usually say something like: "Abandon all hope, ye who enter here."

Lately, however, even the diehards are allowing for nested relations & complex types inside domains, so this in and of itself isn't a bad thing if you treat them as nested relations. With that interpretation, this table is intended to manage "names & options for a particular item". It is interpretable in (at least) 1NF. I can retrieve "item_01", for example, I know that this Cathair Sweater comes in Siamese and Small, Medium, and Large.

But, the danger is if you treat this as a table for, oh, say, ordering items! One needs to know if this is a Small or a Large sweater. The only way to get to 1NF is to treat {ID, Color, Size} as a compound primary key. All of your multi-valued attributes become implicitly a part of your primary key! But there are no integrity constraints, so you better make sure your code and gateway API understands that in the above ITEMs table the primary key for item_01 through item_03 and item_06 through item_07 is {ID, Color, Size} and for item_04 & item_05 it is just {ID} -- for now!

So, while it is possible to treat SimpleDB with some level of logic, beware that it's not necessarily designed to be logical.

I also am looking forward to the nightly SimpleDB extracts to MS Excel or SQLite, or Oracle or MySQL so I can, you know, query my data for trends & business analysis. On the bright side, maybe this is Amazon's way of convincing you to build your data warehouse early.

Posted by stu at 04:51 AM

A rant, followed by RESTful choreography

This entry is a response to this entry from JJ. The first part of this is a rant of frustration. The second part is a discussion about the use for choreography in RESTful services.


"These were the only two people that had the courage to go to the end of the discussion even though they saw some limitations to the REST approach. Others, have stopped all communication as soon as they understood the problems with REST."

JJ, I hope might occur to you that people in the REST community do have their own priorities, and answering your pages & pages of debate is not necessarily one of them. I'd personally like to take the time to get into this in detail, but time has been scarce leading up to the holidays.

Secondly, you have not exactly been providing a lot of incentives to participate. You have consistently put words in the community's mouth, you have made outlandish and derogatory claims about the future of REST, made erroneous assumptions as to the motivations of the community, and have questioned the level of knowledge & competency in the community. Yet you expect people to actually give you the time of day.

In short, I believe you are acting like a bully, someone who challenges all to a duel, but claims victory before anyone has the energy & time to go several rounds with you. I don't think many are that interested in trying to prove their architecture "street cred" with you, they really just want to build better network-based software.

Thirdly, it feels as if there is no way to actually have a fruitful conversation with you via your blog because it seems you're not trying to understand how REST might fit into your set of priorities. You seem to be primarily trying to poke holes in it and ultimately try to limit its potential audience in the enterprise. That in and of itself is a good thing, but when you are tearing down strawmen of your own making, it becomes very difficult to communicate.

Most debate rounds so far have resulted in you flooding your blog with observations that are either misrepresentations of REST (redefining terms as you see fit, denying explanations that are spelled out in Roy's thesis, etc.) or are orthogonal to REST (even if interesting & worthy of discussion!). You seem to continue to claim that when REST doesn't somehow auto-magically fix a particular problem, it is a reason to ignore/discard/mock it as a failed architectural style, and to deride the community as a bunch of loons. It's extremely discouraging to have a debate when you continue to flout common courtesy in this way.

Obviously we'd like someone like you to understand where we're coming from, and many of us would like to understand your concerns -- but there's only so much time in the day. Please be patient.


Having said this, I haven't exactly given up, and hope to have more time to discuss some of your observations. Here are a couple of responses to your latest entry:

"The fact and the matter is that you CANNOT DO WITHOUT A CONTRACT to establish the shared understanding."

This has been the core argument I've seen you make, and I agree with it, but I think WS-* vs. REST is irrelevant here, as they address different concerns. So I think it's time we looked at doing this problem in detail. I do not believe that the WS-* stack gives you any such thing today, and mainstream WS-*, as it currently is led, is not heading in any clear direction to support this. WS-CDL, SCA, and WS-Policy will not get you there, not even 15% of the way, and probably will make life worse.

Today, in WS-* land, a contract is described by human-readable documentation, with certain degenerate yes/no policies described by WS-Policy. WSDL does not give anyone a shared understanding; at best, it is a template to generate code. A developer has to read the documentation that goes with the interface to know ordering constraints, non functional SLAs, any guards, preconds, postconds, etc. WS-CDL is not mainstream and is likely not an option (will discuss below).

SCA is not a pervasive solution to this because it is just an implementation-level container & component composition model -- it's a multi-language (but still Java-centric) alternative to the J2EE deployment model and EJB. It will not be adopted by Microsoft. And it doesn't (yet) help to specify contractual constraints any more than the WS* specs do.

Now, in REST, today, the contract is defined by the transfer protocol, the media type (which is usually a human readable spec), and perhaps an independent contract addendum to talk about specific ordering constraints (though forms & link traversal provide this information too), SLAs, etc. But in REST, just like in WS-*, there is no reasonable way to create a machine-readable shared contract of interactions & expectations.

So far, I would claim the difference is that due to the uniformity constraint, RESTful services naturally have a lot more loose coupling between service implementations than if we defined our own semantic community for service interfaces that include actions unique to a particular business domain. The data transfer protocol should not have to deal with business-semantics!

I *think* that what you're getting at is that you need a choreography language to truly build a shared understanding at a business-action level. If so, I agree! And I think this actually would be *great* for both REST and WS-* if the mainstream would embrace it.

In a RESTful choreography, all interactions, units of work, etc. should boil down into some kind of primitive uniform interface that everyone understands.

So, one might wonder -- what about WS-CDL? Sadly, WS-CDL has a number of problems:

  1. It doesn't seem to be generating a lot of popularity,
  2. It has some notable issues so far, mainly because it was blazing new trails way ahead of its time in a committee venue that's not built for such innovation;
  3. it embraced WSA without giving any love to Webarch, to its detriment;
  4. it also doesn't have a compact syntax, so many early adopters, especially those that don't like GUI modeling tools, aren't going to touch it.

But it serves as a model to improve on and a set of invaluable lessons.

A choreography language to describe RESTful interactions is absolutely doable, in my opinion.

To me, RESTful choreography would actually fix one of the bigger problems with WS-CDL today: it tightly binds the choreography to a particular WSDL and set of XML namespaces. Yet, a choreography arguably should be reusable across a variety of operation-level interfaces and/or schema documents. Furthermore, a set of services may participate in a variety of choreographies, particularly if we want any sort of reuse.

In short, the WSA way to improve WS-CDL so that it is more "reusable" would be to provide some kind of indirection between WSDL and the choreography and role descriptions.

The Webarch way would be to eliminate variation in the primitive bindings available in any ground choreography, and enforce uniformity. Hyperlinking would also provide a much easier time of managing tokens, identity references and channel references, I think.

"The fact and the matter is that a Result Set IS-NOT a resource"

Sez you.

A result set absolutely can be a resource: when I go to Google and get back a page search results, that's a resource (it has a URI, after all). Anything with identity, no matter how transient or persistent, is, by definition, a resource.

"For those of you who are not convinced yet, I suggest that tomorrow you try to drive your car with a (GET,PUT) interface (no links allowed) and then you tell me how you felt: : a state machine is a state machine and there is no way around it"

This is an absurd strawman. If you have no links, you're not doing REST, sorry. I have no idea what you're trying to prove by suggesting one can't drive a car via hypermedia.... what would the benefit be even if we tried?

"It has been notorious that REST is really bad at versioning (I am preparing an article on this topic that will be published early January)..... Have you tried to bake in versioning in a RESTful resource access? you mean that the URI of the resource depends of the version? Ouch..."

It has only been notorious in your own mind. I caution against writing an article based on a strawman of your own making.

Versioning information is usually included in representation state, not in the URI. There are times where you may want a new resource altogether, but that depends on the extent of the change and whatever backwards compatibility policy you are following.

"The second detail they missed is that Amazon is probably going to publish BigDB at some point and maybe they will want to develop a true CRUD, SQL based API. Have you ever tried to implement this kind of API in a RESTful way? huh? you mean you can't?"

This is the kind of "putting words in people's mouth" I ranted about above.

No one is claiming that REST is the only type of architectural style that's appropriate. Remote Data Access styles like SQL gateways are very useful. Just don't expect millions of diverse users to hit your SQL service with good scalability, reliability, and visibility! I mean, even in component-oriented SOA one tends not to expose a generic SQL endpoint except in scenarios where a generic interface for a relatively small audience is required.

The points against Amazon are that they're claiming that SimpleDB has a "REST API", but they are making a mockery of the term. Their implementation is running *against* the way the web is supposed to work, and means that no pre-fetching user agents or intermediaries can safely be used with SimpleDB as they may be a source of data integrity problems. This has nothing to do with religion, it's about Amazon's REST API author being completely oblivious to 15 year old architecture and recent history like the Google Web Accelerator.

Posted by stu at 04:12 AM

December 11, 2007

A note on science

In reading Gary Taubes' new book Good Calories, Bad Calories, along with his recent UC Berkeley webcast, he drew my attention to the great and quotable Claude Bernard, who was the father of the science of medicine, and the man who discovered homeostasis.

Some quotes I think are quite worthy of reflection:

"Particular facts are never scientific; only generalization can establish science."

"A great discovery is a fact whose appearance in science gives rise to shining ideas, whose light dispels many obscurities and shows us new paths."

"In experimentation, it is always necessary to start from a particular fact and proceed to the generalization....but above all, one must observe."

Some ways to look at this:
- If you observe something that contradicts your prevailing theory, perhaps that theory is wrong.

- If you observe something that no mainstream theory explains, perhaps an alternative hypothesis is worthy of further study.

- One does not improve knowledge in a scientific manner by just building, specifying, or explaining new things. One improves knowledge by observing effects, and working back and fitting a consistent hypothesis.

I find in our profession, we most often fall back on arguments from authority over arguments from empirical evidence. This takes several forms: "If a particular vendor/community/person builds it, it MUST be good.", "if the experts agree, it they MUST be right", "if the analysis say it will be so, we MUST invest in it", etc.

Perhaps all of this is because it's so hard to create a controlled experiment when dealing with systems science (except perhaps as simulation). Or because most empirical observations in our field are anecdotal, because we don't have an open environment sharing results due to competition. I also think it may have to do with business managers' need to make technical policy decisions where a YES/NO is required, and tend to be taught that deferrment is bad.

Taubes' book, by the way, is a very deep technical read on the science of obesity, heart disease, fat accumulation and a political history of how policy makers mixed with inconclusive science may lead to a generation or more of disastrous consequences.

I take heart that technologists aren't the only ones known for their great blunders, but I pity the victims. The world needs paradigmatic subversives.

Posted by stu at 01:55 PM