Tuesday, February 01, 2011

On UML as a Modeling Standard for Internal IT Purposes

UML has evolved into a very large and fairly comprehensive Standard (http://www.uml.org/). The size and complexity allows it to be applied to most IT problems and also can create problems of it's own.

I do believe that it could be used to solve some of the problems I see in my day-to-day job and, with proper application, solve some of the problems I face.

'What we've got here is (a) failure to communicate' 
- from 'Cool Hand Luke'. 

Communication problems are the crux of the matter as I see it and come in two basic forms; challenges between two or more people trying to communicate an idea or situation at a point in time and also over time. In the first case there is an opportunity for the two parties to interact and discuss. In the second case that is not typically the case; a document is created and then read months or years later. The original author may no longer be with the company.

Aspects of the problem I see are:
  1. Integrity (correctness and completeness), 
  2. Comprehensibility (succinctness and semantics), 
  3. Usability (navigation and accessibility)

Before I go into these in more detail and describe how UML may fit into the solution, I need to define a scope of applicability - the context within which I am think about this. In my current job I deal mostly with IT investments, processes, business systems (solutions), and requirements. I don't often deal with software architectures (the deep structures of software inside those business solutions), or the deep details of deployment. I also usually only deal with the results of the business requirements process and not the detailed creation of them.

Very often the results of my work (or of a team of people with whom I work) is depicted in the form of diagrams within a PowerPoint. Although the diagrams titles are used over and over again (business context, system context, and anything followed by the word architecture), there are no common rules about what goes into them, no common definitions for what lines and boxes mean, and very little consistency across teams, or time. My common tools for creating and managing this information are: Visio, Excel, Word, PowerPoint, and Sharepoint. All though we do get a lot done with these tools, there are challenges with the integrity, comprehensibility and usability of the resulting documents. Let me illustrate with a Visio example.

A box likely means something exists (physically or conceptually) and a line means that two things are somehow related. The precise meaning of the diagrams parts is defined by the author and may not be included in the diagram itself. This description might sound like total chaos, and that would be misleading. People within my organization have learned what those things usually mean within the context they are presented and can deal with the ambiguities and information gaps that exist. In most cases, the fact that a diagram abstracts a complicated physically deployment and represents it as a simple line is useful.  

For example, I might depict the communication pattern between two applications as a simple line and maybe I put the label MQ on it. That tells many of my peers quite a bit. If they also know that those two systems are deployed within our own datacenter and a vendor datacenter then they can likely guess that there are likely two queue managers involved, a few MQ channels, several servers, likely at least two firewalls and a likely more IP switches than we would care to document. However, if I were the network engineer, I likely care about each and every one of those switches and circuits and not much at all about the communication activity between the two applications. Until there is a problem, or we need to make a major change and at that point all parties are interested in ensuring all levels are understood.

So we need to be able to communicate the idea that systemA talks to systemB and we want to show that as a simple line. We would like to be able to look inside that line to see the MQ specific characteristics and topology. Drilling further down we would see IP socket level details, IP addresses and ports - useful to network engineers and people writing firewall rules. Below that there is even more detail, circuits etc. Of course these layers of abstraction have been documented in another standard reference - the OSI 7-layer model (http://en.wikipedia.org/wiki/OSI_model). Should are architectural diagrams follow the same layers? Perhaps. Perhaps we should be producing artifacts that align with TOGAF 9 (http://www.togaf.com/)? Or Zachman (http://en.wikipedia.org/wiki/Zachman_Framework)? This is a major outstanding question for me. Maybe there are other possibilities.

Some of these frameworks provide better guidance at that than others. Zachman, for instance, provides guidance on many ways we should partition out models. TOGAF takes a different approach and is a little more concrete in some areas, but is generally less specific about deliverables. Which ever framework is chosen, I am thinking the most important part of the application of UML to the problem is getting guidance that is specific enough that two people would produce a models that are similar at the semantic level. What do I mean? If I were to point to people at a running system including all the code, deployment descriptions ands operational documents, and told them to draw the UML diagrams that describe the system at OSI layers 4 through 7, would they produce diagrams that use the same symbols in the same way? Would they define the same stereotypes and make the same profile extensions? I don't think so. There does not seem to be a standard methodology which can be applied to UML that gives good guidance on this. 

I recently was involved in the review of a large number of infrastructure diagrams that were produced by a team. It wad quite clear that the individual artists had different ideas and rules in their heads. One person would focus on the network topology, another on the data flow (process oriented), and yet another on the IP level session. Production and disaster recovery (DR) path were always on the same diagram, but sometimes the DR paths were dotted or pink, sometimes not. Sometimes you could identify active-active clusters, sometimes not. If we were to re-execute the same task using UML and a common UML tool, would it be any better? I don't see that it would be. The symbols would be more consistent, but beyond that I don't expect much improvement.

It may actually be worse. When you have a free-form Visio diagram you know that there is no defined semantics for the graphical objects. But once you put them into UML and then into a repository you might think you know things we more confidence than you actually do. Garbage in, Garbage out. I have seen the progression from visio diagram to excel spreadsheet and then aggregate all the spreadsheets into a database and run queries. Once we have gotten to that point we are in danger of drawing conclusions from data that was shaky to start with. When it was a visio it was unstructured and it was apparent that only so much could be done with it. Once it becomes more structured (without adding knowledge in the process) one might lose sight of the inherent weaknesses in the source.

So back to my MQ example. In its simplest form this is two boxes and one line. Should the two applications be 'components' in a UML model. Should the MQ connection be a 'Usage', a 'Component Realization', an 'Interface Realization', or an 'Association'?  Or something else? As I have poked around I do find answers, but I am left to think that somebody else doing the same job might get different guidance. When we try to aggregate our work together at some future time we might discover a large amount of rework ahead of us. 

Is there an undiscovered part of internet that holds the answer to my quest? Leave a comment and let me know what you think on the topic.

7 comments:

Elton said...

Terry,

You're touching on a number of interesting points on here. The point of UML (or archimate, or...) is to provide a common modelling language for these topics to hang off of.

We have a couple of challenges though.

UML is deep, but incomplete, Archimate is broader but shallower. Which do you use? One, both, neither? How do you possibly hang togther models from different taxonomies used in different disciplines that may or may not play nice together.

The more insideous challenge is the uptake problem. Most architecture happens in documents because they are easy to approach and easy to generate. Can you get a group of people to make the cognitive leap into using one language to rule them all? Is that simply too big a jump for an organization to make? And if so, what are those baby steps that practitioners should be engaging in to elevate the overall "game" of their organizations to make this achievable.

I have lots of questions and few crystalized thoughts yet. But I come back to the purpose of architecture which is (among other definitions) to simplify and communicate complex structures.

You do make an interesting comment though. Artist pops up in a number of places. Right now architecture is very much a cottage industry with individual artisans producing content. To make the jump to a proper profession we will need to bridge to some level of common language, taxonomy, modelling and approaches.

Elton

Anonymous said...

Terry,

I agree with you that UML might not be the right solution for everything. However UML has continued evolving bringing forth a unified standard modeling notation that IT professionals had been wanting for years. (http://www.omg.org/gettingstarted/what_is_uml.htm). Additionally you can extend the standard UML to fit your specific problem domain e.g. extending UML for Data Warehousing and Business Intelligence: Common Warehouse Metamodel (CWM).

Can UML be a standard for internal IT purposes? Certainly, since it can help us to:

Understand
Simplifies by abstracting
Provides blueprints for the solution
Reveals reuse opportunities
Exposes areas of risk

Communicate
Communicates key information
Provides different views or perspectives
Captures the concerns of all stakeholders
Makes all design decisions explicit and traceable

Manage
Breaks down the work and allows for team development
Prescribes all development work to be done
Helps establish and maintain consistent style
Guides team development

Doesn’t this match the imperatives of why we do Architecture?

Additionally, I will also look at what Domain-Specific Languages (DSL) can bring to the table.

A DSL:

 Is used to solve problems in a particular problem domain
 Provides a restricted and common vocabulary of terms
 Provides common approaches to problem-solving

Two types of domains become the focus of DSLs. Vertical domains focus on the key characteristics and abstractions from a particular business or industry. Horizontal domains focus on technologies and frameworks.

A DSL focuses on business aspects or on technical aspects:

 Vertical DSL:
 Business oriented
 Focused on one industry
 Examples: Banking, retail, insurance, and so on

 Horizontal DSL:
 Framework-based
 Focused on technical aspects
 Examples: Persistence, EJB, SOA, and so on

DSLs value proposition for architects

A DSL makes it easier for the architect to communicate effectively with technical and nontechnical domain experts.
As with other kinds of modeling, with the right tooling it is also possible to reuse domain-specific artifacts: DSLs themselves are reusable across teams and projects, and domain-specific models that capture best practices can be reused across different projects.
An important function for a DSL is to constrain the solution space. A formal DSL prevents the introduction or duplication of elements into the domain downstream. The use of a DSL can encourage or even enforce architectural approaches and best practices.

Cheers,

Alfredo Ramirez

Terry said...

Thanks Alfredo and Elton for your comments.

Whether UML, SysML, or Archimate are the right tool for the problem or not to me is an open question and a key part of the reason for my post.

One of my tests to answer that question is indicated in the second last paragraph. If we were each given the same system to model, would we each produce semantically equivalent results?

If we each model MQ connections in a different way, then when we try to aggregate the results of several people, we won't be able to interogate the model reliably.

Is there a solution?

Paul Lythgoe said...

Terry,

Is Value in the diagram or the information that is conveyed? A cylinder is used to symbolize a database and when we see a cylinder in a diagram we know it is some sort of database, data repository (we also need to standardize on language as well on what something is ...). What is important when we see this cylinder? The fact that it is a database, sure ..., or as you point out Terry we need to know the specific details of the object and its connections: vendor product, tables, resiliency, protocols, ports, replication methods, etc ... to properly design and deploy the solution.

If memory serves me well I recall at my previous employ that we inputted the data of the application and infrastructure environments into the architecture tool and this tool created the diagram. We used System Architect which is now owned by IBM. Diagrams are not the value as they are a point in time representation from the perspective of the current project or concern, which could be an incident issue. The value is in the data and the maintenance of that data. The architecture diagrams that we spend a lot of time preparing are soon out of date, which could be as soon as the application goes into production or generally at the first change request. As the diagrams are not updated after a project finishes, but going forward the data of objects and subjects of that application's architecture should be.

What may come out from this blog entry is the question of what information is necessary to collect. Not only will it be essential to capture and store the capabilities that any solution needs to implement as architectural entities but also the specific implementation details in the form of policies, rules and configuration to ensure future success to integrate solutions to fit into TD’s IT ecosystem that is in consistent motions.

Here is some of the information I will be collecting which will added to the specific implementation configuration:

-> Project Vision – a one to two sentence statement outlining what the project is looking to accomplish
-> Business Architecture: Who uses what processes to delivery what products and services
-> Information Architecture: What information/data is required to support a business process or customer purchasing decision?
-> Application Architecture: What IT capabilities are required to gather, access, store and secure the information/data required?
-> Technology Infrastructure: What IT capabilities are required to gather, access, store and secure the information/data required? What technologies, lifecycle stage/state, service and operation levels, events, alerts, security, location

Pictures, architecture diagrams are effective in delivering information into an easy understood message. However information is the essential element to support the decisions necessary to position, implement and support a chosen technology solution to support a business process to meet its delivery mandate. Information and the single data elements are key to efficient decision making on which technology solution is best suited for the business and is a good fit within the existing IT ecosystem. As long as it is an effective tool to gather, store, maintain and manipulate the IT technical and architecture data, and can be expressed in pictorial form, then it comes down to usability and cost, and how this information can be further used in other processes: development, incident/problem, change.

Thanks for Listening ...

Paul Lythgoe

Terry said...

Thanks Paul. The information I wish to capture is not the diagram - that is just one representation. I feel that we have outgrown "just a diagram" as the source of information.

We need to capture the facts, to use your example, surrounding the database. So we can answer questions like. "Which services access this database", "If this database were to go offline, which client facing applications would be impacted", "What is the aggregate inquiry rate for this database", "Which databases used by my line of business use SqlServer", etc.

In our current world We make a change to application X that uses database Y and we put that in the diagram (visio). We may not include the fact that application Z also uses the database, because it is not of interest for that change. We cannot do anything like answering the above questions easily becuase they are all just graphics.

Anonymous said...

Every weekend i used to visit this site, because i wish for enjoyment, as this this web site conations genuinely
fastidious funny data too.

Anonymous said...

Hi my friend! I want to say that this article is amazing, nice written and include almost
all vital infos. I'd like to look more posts like this .