June 15, 2006

What I hate about my Mac

I love my MacBook Pro. But there are some things that are driving me crazy.

- Microsoft Powerpoint for the Mac was always a bit annoying, with dozens of "Converting Metafile" popups for any Windows-drawn presentation, as it converts the pictures into a more useable format. If you don't resave the presentation, it will do this every time you load it.

Running PPT on Rosetta makes it intolerable -- any presentation I open requires 2 to 3 minutes of waiting while it figures out how to render it. Saving a file takes 15 to 20 seconds. Once it gets going, it's somewhat useable. But frankly it's faster to boot up the Parallels VM and use MS Office there.

- I migrated my iMac G5 onto the MacBook Pro. This normally works well, when I went between my Powerbook and iMac, and for all observable effects also worked well for me on the MacBook, but may be the source of some of my woes below.

- I have many, many Microsoft Word, Excel, and Powerpoint documents. Perhaps as a side effect of this Rosetta-only support, Spotlight absolutely crawls on my system. A typical query takes around 25 to 40 seconds to run.

- The design of Spotlight is completely unusable for a system with a large number of files if the queries take this long. As you type, the incremental search kicks in, and usually pauses while in mid-word. If I have a spelling mistake, it takes a good 5 to 8 seconds for my delete and rekey to take effect, wasting a significant amount of time.

- Furthermore, I can't select an item on the Spotlight list until the entire query has finished, as the list is continually shifting around -- one minute the file I want is there, the next minute it disappears. I try to click "Show All" to get a more stable view, but every time I scroll down the window, it insists on resetting the scrollbar to the top of the window as it adds more files to my search set.

This is utterly maddening -- it means I have to sit and wait the better half of a minute for any search. The whole point of Spotlight was to make it quick to find anything within 10 seconds. It's almost faster to poke around with the Finder now unless I'm completely clueless as to where the file is.

I've heard that Quicksilver is a better interface to Spotlight, but I haven't acclimated to it yet.

- One final note on spotlight: Sometimes, for inexplicable reasons, "mds" and "coreservicesd" (which I believe are Spotlight services) will take up 50% to 80% of my CPU for 2 to 5 minutes, which means I'm basically using almost a single core in my Core Duo for indexing.

If I'm running Parallels VM at the same time, this translates to around 120% CPU usage at idle. Now, this normally goes away down to acceptable levels (Parallels tends to consume 15% CPU at idle).

- There is currently no great way to play WMV media files on the Intel Mac platform. The options are, in order of performance: use the legacy and deprecated Windows Media Player 9 from Microsoft under Rosetta (around 15-18 fps), install Flip4Mac WMV Components 2.0.2 under Rosetta (which is not supported and requires flag setting contortions to get to work, and is maybe 10-12 fps). VLC is not an option, as it doesn't play WMV3 files.

- Sometimes I get the "spinning wheel of death" upon awakening the Macbook Pro and have to perform a hard reset. This last happened when I was trying to select a WiFi network shortly after awakening.

- While I know its not supported (and there's a cabal of Mac users at BEA that keep clamouring for it), WebLogic Server 9 (based on the AIX install) seems to be really, really slow on my Mac. WLS 8.1 was much better. I haven't had time to investigate whether there's a new "fast=true" flag I'm missing.

- Boot Camp Beta's repartitioning feature is not foolproof -- it's best to run this on a fresh boot. I ran Boot Camp a few months ago and undo it. Decided to re-install XP recently. Boot Camp locked up at the end of its repartitioning (spinning wheel of death). After a hard reset, XP proceeded to install, but my OS X Install would kernel panic every time I selected it.

After picking myself off the floor, I discovered through Apple's support forums that the repartitioning apparently didn't properly handle swapfiles, corrupting my filesystem a bit. Performing an "fsck -fy" resulted in an "invalid extent entry". So, I boot to single user mode (Cmd-S), move the old swap files (with bad blocks) to a new location, so the next time OS X boots it recreates them on "good extents". It's good there was a way to recover with command prompts, I guess, but most sane people would have just re-installed....

Minor quibbles:
- When looking at album booklets in iTunes, they show up in the little postage stamp of a window, but when I click on it, it doesn't expand into a larger window, no matter what I do. I have to drag the file from the Finder onto Quicktime to actually play it. iTunes Videos work fine, just album booklets seem to have this problem (I am referring specifically to Zero 7's The Garden, which I bought off iTunes).

The list of things I love about my Mac would be much longer than this list, which is why I stay on the platform. But I really needed to vent, because my likely interim solution is a reinstall -- something I left Windows for in the first place.

Posted by stu at 06:08 AM

June 07, 2006

http can scale!

A co-worker drew my attention to an article from last summer, entitled When SOAP Fails HTTP. It discusses scenarios where HTTP is not scalable, and proceeds to suggest that the OMG's IIOP (Internet Inter ORB Protocol) should be a useful alternative to HTTP.

Given the authors' pedigree, I wanted to write a detailed rebuttal, respectful of the technical arguments. I agree with the premise, that HTTP isn't suitable for all use cases, but I think the examples are extremely flawed, and the conclusion doesn't follow.

Firstly, there is an assumption that HTTP's request/response orientation requires servers to "wait" for responses, thus making it unscalable. The same observation applies to database connection pooling, for example. Every database has its own network protocol, and most do not support interleaving requests. Yet, there are many examples of servers pooling database connection requests to handle thousands of concurrent users, despite the general lack of support for interleaving in many native database network protocols. If scalability challenges creep into an HTTP-oriented world, there is no technical roadblock to pooling HTTP connections in a similar manner.

Secondly, there is an assumption that servers are limited by the number of inbound and outbound network connections, and that it is more scalable to do things on a single connection. While there are niche cases where this is true (I'll discuss later), HTTP handles the vast majority of uses quite well.

Some context: HTTP has become so widespread that operating systems, TCP stacks and application servers have been tuned over the past 10 years to enable large numbers of concurrent connections. A scalable TCP stack, for example, will only require constant-time access to the TCP table. Most operating systems have the ability to set huge file descriptor limits to allow hundreds of thousands of concurrent connections. All that's required is enough memory -- 100,000 connections requires around 1 GB kernel-level RAM, for example. Beyond the TCP stack, a scalable server uses non-blocking I/O to handle the processing of these connections to ensure efficient use of CPU resources (such as threads).

Here is the major mistake the article makes, in my opinion: they describe a scenario that is an example of poor I/O architecture in a server, and really has nothing to do with the actual protocol being used.

An unscalable application server will dedicate CPU resources to connections, such as a 1:1 thread to connection mapping. This works well for some use cases (such as large file transfer), but less well with large numbers of small requests. Thus, a more scalable application server will dissociate CPU resources (threads) from connections.

For example, HTTP requests in the BEA AquaLogic Service Bus are processed in a different thread from HTTP responses, to enable the server to "do other things" while it's waiting for something. This is referred to as a non-blocking I/O architecture, and is essential to any scalable client or server. It's how Azeurus can support huge P2P BitTorrent transfers over TCP, or how any web server supports thousands of concurrent connections.

Certainly there are cases where HTTP isn't optimal: if you have an application with extremely high volumes of event streams with very low millisecond latency, you will not require the reliability levels that TCP gives you, for one, nor the verbosity of the HTTP header for each event. Cases such as real time stock ticks have used IP Multicast & hybrid usage of UDP and TCP to handle such cases, with products like TIB/Rendezvous and WebLogic JMS. UDP is also the basis that real time media streaming protocols take, such as RTSP.

Now, perhaps you do want TCP's reliability features (TCP window retry intervals can become problematic when you get into low latencies, but let's assume you're OK with it for now) , you could (as the article implies) gain significant performance benefits from an interleaved protocol on top of TCP. But the interleaving isn't the interesting thing -- it's the use case of communication style -- event notification, unsolicited responses, etc. HTTP is also not particularly well suited to generate unsolicited responses from servers, for publish/subscribe communication (though one could retrofit such behaviour onto TCP with SOAP and WS-*). Whenever Roy Fielding decides to publish a reference implementation of waka, we may have a shot at a globally interoperable protocol to tackle these challenges. Until that day, it's my belief that we will have to make do with proprietary transfer protocols in spots, with or without SOAP.

Both IIOP (as the authors propose) and JMS products are a suitable alternative behind the firewall for the cases where HTTP is not appropriate for the use case (as JMS can even wrap IIOP!) , but, one must recognize the limits of these approaches. These are not broadly interoperable protocols. Firstly, IIOP, while a standard, is not as widely deployed for this use case (event notification) as are proprietary messaging protocols such as WebSphere MQ, WebLogic JMS, or TIBCO EM4JMS. Second, IIOP likely will never be widely deployed for this use case, or even on the public internet for even request/response cases. It is a niche protocol, at this point -- CORBA works well behind the firewall, but the major case for CORBA today is an interoperable wire protocol for distributed transactions. And in my experience, most distributed transaction interoperability occurs at the language level, with the XA resource manager (and MSDTC or JTA) interfaces. Further, SOAP over IIOP is extremely rare, and not supported by anybody except perhaps IONA's ESB. The biggest problem is that IIOP is not native to Windows, and Microsoft will likely never support it. The other big problem is that it's a complex specification and is unlikely a high performance implementation and bindings will be available for different programming environments, even for a modest fee.

I don't mean to trash CORBA here, I was a proponent in the 1990's and it continues to do great work. But, in my experience, IIOP was rarely used in the use case they're describing -- it was used for request/response RPC style mostly. CORBA messaging, eventing and other message exchange patterns weren't widely used, even in niche enterprise systems -- those systems tended towards proporietary message-oriented middleware if they were based on events. I have seen market data feeds based on IIOP more than once, but they tended to be the exception, and the latencies / volumes were not at the level where HTTP would be inappropriate. I've also seen (and built) market data feeds that used a mod_pubsub type approach where the data content of the HTTP connection was an event stream (simulating a large, slow data transfer) when pushing events to applet or ActiveX based order blotters. I'm open minded, though, if someone could point me to some public benchmark or scalability test results of IIOP used in a P2P event notification scenario, send it my way!

JMS, on the other hand, is a binding to Java, that can wrap proprietary protocols, which may have performance characteristics beyond IIOP. Unfortunately, these proprietary protocols have all of the interoperability limitations of IIOP, with the exception that a JMS binding is standardized and generally deployable on any OS. Support for non-Java languages will vary, but I will note that most scripting languages do have JVM implementations (PHP, JRuby, Jython, Groovy, etc.), and .NET has J#.

To summarize, there will not likely be an Internet-scale interoperable event notification protocol with extreme performance characteristics requried of some applications unless you're going with the multimedia protocols like RTP and RTSP. Thus, intermediaries such as ESBs will be needed if you need to bridge between varying QoS levels, as one must adapt between a standard protocol (HTTP), a less widely supported standard (IIOP), and a proprietary protocol (with a JMS binding, for example). When choosing a transfer protocol (and its underlying transport, UDP or TCP), it should be obvious that HTTP should be considered the default choice, and one should have solid numbers to back any alternative choice up. Test, test, test the use case under load and extrapolate where the bottlenecks are. In most cases, they are not likely to be in the network transfer protocol, they will be in the application itself, or in the I/O architecture of the server infrastructure it utilizes.

Posted by stu at 10:00 PM