Jason Lenhart

A Note on Distributed Computing

by Jason Lenhart on Mar.12, 2009, under Computers

Every so often I find myself re-reading papers that have shifted my professional opinions.  One such paper is A Note on Distributing Computing which makes the fundamental case on why objects that interact in a distributed system need to be dealt with in ways intrinsically different than that of objects in the same address space (local object invocation confined to a single address space).  Such a great piece of work this paper is in my mind - they managed to keep their opinions very matter of fact and hard to disagree.  The vision of unified objects will never be realized - and not for the reason most like to cite, latency.

I often find myself cringing when I read a book or tutorial — see a presentation, and the statement is made:

“It is just an interface you don’t care where the implementation resides….” or “Remote is just like local…”

I have been there — I had those presentations whether on paper or in my head … then through implementation I learned about the wonderful world of “partial failure”.

When reading and thinking long about partial failure — I found myself going off on a tangent in my mind, especially when I reached this early statement:

“Partial failure requires that programs deal with indeterminacy.  When a local component fails, it is possible to know the state of the system that caused the failure and the state of the system after the failure.  No such determination can be made in the case of a distributed system.  Instead, the interfaces that are used for the communication must be designed in such a way that it is possible for the objects to react in a consistent way to possible partial failure.”

When you throw in the possibility of a resource manager providing derteministic state in a local component - I could not help but think of how a BPM suite can offer some level of comfort in partial failures.  In most BPM related applications there is the suite (holding the state of a long running transaction) and then there is the database (attributes of the long running transaction).  A partial failure offers a problem domain where they can get out of whack very quickly.  Why?  Because most large scale applications that utilize a BPM suite will make their business logic calls in a distributed context (the vendors would never want you putting your code into their environment and then providing support - not to mention scaling and partitioning issues).

The trade-off that I have made is to have a workflow process instance attempt to act as very simplistic (but deficient) external resource manager.  The interfaces are simple and provide a descriptive nature to allow the process manager to direct a partial failure to someone’s attention (manual interaction as partial failure should not be the norm).  For example, you might want to receive data into a system - a great opportunity to use a Pipes and Filters pattern:

  •  Receive a proprietary format
  • Validate format
  • Normalize the format
  • Enrich the format
  • Persist the format

Now within a local context - I could receive the format and persist all in the transaction.  However, in the case of using ’services’ spread out over the network some webservices, queues, etc… I have an opportunity for a partial failure.  Now within the BPM suite I think this situation plays nicely into my hands.

If I have an issue with Enrichment of the data - a simple YES/NO Unit style interface can provide the “state of the system after the failure” such that the process instance can be routed to a repair state with meta-data describing the nature of the problem (particular elements in the format could not be enriched because …. environment, application domain, or business logic).  What is the trade-off?  Now my users have to understand the business workflow under the covers and how this flow is broken down on an invocation level (e.g. the data could not be persisted).  Otherwise the repair may not be evident.  You can offer better evidence on what a user needs to do via a user interface — but now we are mired down in additional solutions to problem domains presented to partial failure (a definite trade-off).

I think I have a strong argument around users taking on my technical understanding of partial failures - aren’t BPM suites heralded just for this sort of scenario.  They act as a medium of collaboration that bridge functional requirements to a technical specification.

Okay — so I told you this was a pretty big tangent.

:

Leave a Reply

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!

Archives

All entries, chronologically...