Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The following diagram depicts a simple service model:

Image Modified
 

In this example, there is an entity ("Server") whose status depends on a number of sub-entities—in this case, metrics or KPIs. The status of the server in the model is determined by the combination of those three metrics. Arrows and their directions are used to represent both dependency and impact rules—children nodes impact on the status of the parent node and the parent nodes’ status depend on a combination of all their child nodes' individual statuses.

This model can be iterated and extended to represent more complex scenarios:

Image Modified
 

Services Operations builds on this foundation to translate business and/or operational realities into discreet, identifiable, contextualized, and measurable items. The result is a highly extensible and versatile mechanism that can summarize the status of a number of heterogeneous entities in a single pane, and then provide all the tools to diagnose and pinpoint the root cause of any issues. Since entities and their definition are industry or purpose-agnostic, Service Operations can virtually work and provide value in any scenario: IT operations, security, business processes, applications monitoring, and so forth.

...

Using the impact/propagation rules defined for the model, the abnormal condition for the disk metric might or might not affect the overall status of the server. In this case, we assume the model implements such relationship and therefore the overall status of the server is impacted by it, making its status transition to "warning".

Image Modified
 

Service status

As the previous process goes on and on, and goes from leaves to parent nodes in the tree, Service Operations can determine the overall status of a service (represented as a datacenter entity in the same example). What is more, it accomplishes two important benefits:

1. Determines the root cause of a problem reported at the top level ("datacenter") based on the impact and correlation rules set in the model.

...

Incidents can be associated with actions for closed-loop actions definition and automatic execution through the native alerting mechanisms in the Devo platform. This way, the datacenter data center incident reported by Service Operations could be linked to an automatic action (such as filing a Jira ticket automatically , or performing automatic remediation actions).