On my flight to Minneapolis today, I relaxed with my orange juice and the Wall Street Journal, breathing deep in prep for a day of meetings. Then, I started to tense up...my eyes were drawn to the bottom of the front page: "glitch"... "London"... "exchange"... "paralyzed"... words that should not be in the same headline. For a minute, I lived the vicarious terror of London Stock Exchange (LSEX) IT executives responding to an outage on Monday September 8, 2008 that lasted seven hours and froze trading on the LSEX and several of its partner organizations.
OK, so maybe I dramatize just a bit. Suffice it to say I'm glad I'm not part of the Microsoft Account Team for the LSEX this week. In the article (see this online version), the author glances by the fact that Microsoft technologies are at the core of the LSEX platform that handles high volume, high speed transactions. Around 9 in the morning, connectivity into the platform was interrupted and was not restored until after 4 in the afternoon. The exact cause of the failure was not outlined in the article. The emphasis was rightly on the financial ramifications and impact on perception. As London works to position itself as the world's financial center (above NYC), an outage like this will take some time to put out of our minds.
All this on the first business day after Fannie Mae and Freddie Mac were taken over by the US Treasury. The inability to trade on the LSEX must have felt like one of those frustrating dreams where you can't reach your destination. The precipitous drop in trading volume for the major London banks surely caused panic attacks in the City and Canary Wharf.
The LSEX CEO, Clara Furse, is clearly in the hot seat, and is likely drilling her technical executives about the ongoing stability of the platform. For their part, the tech execs have seen the LSEX platform perform well during other high-traffic days, and are performing forensics on the Monday outage. Microsoft is probably scrambling to run interference on their product, consulting partners (Accenture, Avenade), and architecture. The LSEX platform has a mix of .NET and some BizTalk server, including the TradElec infrastructure, which has been pinpointed as the source of the failure.
Cause of outage aside, the tremors sent through the LSEX and the possible impact on strategic direction became the topic of my dinner conversation in the Twin Cities. There may be pressure to rearchitect the TradElec platform, but that would unacceptably increase risk. The cause of the outage must be carefully traced, and the surrounding processes examined. Simulation of the failure must be performed. To shift platforms would require a retooling of both infrastructure and staff, even though the result may be a more stable infrastructure capable of surpassing the transaction requirements of the LSEX. All of this calls into question the readiness of the .NET platform to handle the enormous load and critical uptime of the world's most sensitive applications.It is likely that the senior technical execs will stick with their architecture direction based on .NET, but turn up the heat on Microsoft et al to ensure that it does not happen again.
A surprising amount of .NET infrastructure underlies sophisticated trading applications worldwide, both on exchanges and within Financial Services companies. Executives in charge of those other applications should watch the LSEX story carefully for root causes, and compare those weaknesses against their own implementations. I suspect the failure was due to something relatively simple...I just hope we find out and save other large applications from suffering a similar fate.