Pegasus Enhancement Proposal (PEP)

PEP #: 299

Title: Support for the indication delivery retry.

Version: 1.1

Created: 26th March 2007

PEP Type: Concept

Status:  Approved

Version History:

Version Date Author Change Description
0.1
26 March 2007 Dave sudlik
Placeholder PEP with initial details
 0.2  29 Sep 2008
Venkat Puvvada  Added more questions/modified definition of the problem
0.3
19 May 2009
Venkat Puvvada/
David Judkovics
Incorporated design issues  from latest voice balloted DSP1054 spec
0.4
30 June 2009
Venkat Puvvada/
David Judkovics
Added mechanism to identify delivery retry and clarification from DMTF
 on IndicationIdentifier design.
1.0
03 July 2009
Venkat Puvvada
 Made version change for the ballot.
1.1
14 April 2010 Venkat Puvvada
Rewrite based on DSP1054 ver 1.1.0 and 2.24.1 Event schema

 


Abstract: Add support to OpenPegasus for the indication delivery retry as defined in DSP1054 ver 1.1.0.


Definition of the Problem

If a management application uses indications to keep track of the status of a managed resource, it is essential that all indications it has subscribed for are actually delivered. Current behavior for indication delivery in OpenPegasus is single delivery attempt and no persistence of indications which is considered unreliable. To achieve indication delivery retry the following questions needs to be considered.

  1. How to determine whether the indication delivery is successful or not?
  2. What are the different types of listeners/handlers that are considered for delivery retry and what are the successful delivery mechanisms ?
  3. How are the indications stored in CIMServer during delivery retry ?
  4. What are the parameters defined in  DSP1054 used by the retry functionality ?
    1. How indications are uniquely identified ?
    2. What is the lifetime of the stored indications ?
  5. What happens to the indications on the queue store when delivery retry parameters are exceeded?
    1. Should those indications needs to be logged ?
  6. How are subscriptions handled when delivery retry parameters are exceeded?
  7. How will all function associated with indication delivery retry is controlled at build ?
  8. Is there a minimum or maximum delivery retry time requirement for a failed indication?
  9. What is relative priority of delivery retry function, with respect to other CIM server function?
  10. Is the order of delivery for indications handled via retry function guaranteed to be identical to the order in which they were generated?
  11. Shall CIMListener be able to handle duplicate/out-of-order/missing indications ?
  12. What happens to the existing indications in the queue those are attempted for delivery retry when the corresponding subscription has been deleted/removed?
  13. What is the minimum CIM Schema version required to support this functionality ?

Proposed Solution

This PEP proposes to define the parameters involved in reliable indication delivery.  Reliable indication delivery means indications delivery within the reasonable amount of time limits under given constraints(defined by standards). We try to deliver the indication in the reasonable limits which are guided by the CIM_IndicationService class properties DeliveryRetryAttempts and DeliveryRetryInterval from DSP1054. This proposal also uses the Sequence identifier of CIM_Indication class added in version "2.24.0".

Solutions proposed for the above problems.

  1. Protocol specifications shall define the way to determine the successful delivery of indications.
  2. The following listeners/handlers will be considered for indication delivery retry.
    1. CIM-XML Handler - CIMExportIndicationResponseMessage MUST be received by the CIMServer from CIMListener without any exception for successful delivery.
  3. Indications are stored in memory per listener destination, based on configurable queue length.
  4. Delivery retry function uses CIM_IndicationService class properties DeliveryRetryAttempts, DeliveryRetryInterval and sequence-identifier-lifetime as defined in the DSP1054.
    1. Indications are uniquely identified using the sequence-identifier of CIM_Indication class as defined in the 2.24.1 final schema. CIMServer populates the sequence-identifier.
    2. Lifetime of the indication is equals to sequence-identifier-lifetime( = DeliveryRetryAttempts * DeliveryRetryInterval * 10) as defined in DSP1054 ver 1.1.0. After sequence-identifier-lifetime is expired for an indication whose delivery is being retried, that indication will be considered unable to be delivered to that listener and removed from the queue, even if the DeliveryRetryAttempts  property would otherwise indicate that further retries should be attempted.
  5. Indications are deleted from the queue and are logged.
  6. Subscriptions are managed using CIM_IndicationService.SubscriptionRemovalTimeInterval and CIM_IndicationService.SubscriptionRemovalAction properties based on delivery failed attempts. Subscriptions  have 'OnFatalErrorPolicy' property which can be used to manage the individual subscriptions. If OnFatalErrorPolicy property value is 4 (Remove) then it will abide by the CIM_IndicationService.SubscriptionRemovalAction
     setting and behavior.  Subscription deletion can also happen when indication delivery has failed to transient handlers or when they expire.
  7. When indication profile support is enabled by setting PEGASUS_ENABLE_DMTF_INDICATION_PROFILE_SUPPORT=true. And also another config option specifying the size of delivery retry queue per destination. If queue length is zero no retry attempt is made.
  8. DSP1054 has property CIM_IndicationService.DeliveryRetryInterval which defines minimal time interval in seconds for the indication service to wait before delivering an indication to a particular listener destination that previously failed. Maximum time is not defined and it can take longer due to other processing in CIMOM. Note that delivery retry priority is very low.
  9. The priority of the delivery retry function will be kept low enough so as not to adversely affect the CIM server's ability to respond to client requests.
  10. No, CIMListener can verify the sequence-identifier of the indication for out-of-order indications.
  11. Yes,  Using sequence-identifier of indication, CIMListener can verify duplicate/out-of-order/missing indications. 
  12. Indications are discarded and deleted from the queue as defined in DSP1054.
  13. CIM Schema version 2.24.1.
Sequence Identifier:

Sequence identifier is the combination of SequenceContext and SequeneNumber properties of the CIM_Indication class. SequenceContext is populated by CIMServer in the following way.

SequenceContext = CIM_IndicationService.Name + CIM_ObjectManager.Name + CIMServer startup time stamp + CIM_ListenerDestination creation time.

SequenceNumber starts at 0 initially or whenever the sequence context string changes. Otherwise, it will be increased by 1 for every new indication to that listener destination, and it will be wrapped to 0 when the value range is exceeded.

WBEMListener requirements

See section 7.10.3 of DSP1054 ver 1.1.0 for WBEMListener requirements for identifying the duplicate/out-of-order/missing indications

References

Future work/ideas

  1. Consider the following listeners/handlers for future work.
    1. SNMP Handler.
    2. EMAIL Handler.
    3. Syslog destination Handler.
    4. Consumer providers residing in the CIMServer.
  2. How are the indications persisted in CIMServer?
    1. If the secondary store is used for the persistence of indications, what is the organization of the secondary store?  1 file per subscription, 1 file per OS instance, or something else?
    2. How to control the overall size of the secondary store?  Build-time or run time or both?
    3. When should the indication with delivery fail be written to secondary store?
    4. Does indications persists across CIMServer restarts ?
  3. Do we need delivery retry parameters (Ex. delivery retry attempts/interval, persistence storage size etc...)  per subscription basis?
  4. Should the CIM server generate an indication every time it starts ?
  5. Should the CIM server generate heartbeat indications for the client applications to know that indication delivery has been interrupted?
  6. Does the provider need some kind of reliable indicator when an indication was accepted for delivery? Like, first persist the indication before returning success to the provider?

    Consider various levels of reliability as follows

Discussion

(r_kumpf)  How important is it to prevent the same indication from being delivered to the same listener multiple times?
(venkat_puvvada) DSP1054 says IndicationIdentifier of CIM_Indication class shall provide uniqueness to identify possible duplication indictaions those happen during CIMServer attempts for delivery retry.

(r_kumpf) This definition is specific to the CIM-XML protocol, so it is insufficient. It also raises questions about multiple delivery of the same indication  and possibly significant extra overhead for delivery retry if it can be determined that delivery will never be successful.
(venkat_puvvada) CIMListener shall be able to distinguish duplicate indications. We can consider the following classification for successful indication delivery.
a. CIM-XML handler: We consider successful delivery when listener sends back CIMExportIndicationResponseMessage without any exception. Predicting the indication delivery that would be never successful, for example a permenant failure like incorrect hostname present.
b. Email handler: Unknown at moment, need to discuss and elaborate with current users.
c. SNMP handler: Unknown at moment, need to discuss and elaborate with current users.

Discussion on future items

(r_kumpf) What is the expected/desired result when the CIM Server is not running? A provider will not generate indications during the time it is not running. Can the stated requirement, 'it is essential that all indications it (a management application) has subscribed for are actually delivered,' be met in that case? I'm struggling to understand why it is interesting to persist indications across a CIM Server restart, when indications can be lost during that time anyway. A CIM Server crash would also presumably lose the indications since the persistence could would not be invoked. What, specifically, is the value of trying to persist indications across CIM Server restarts?
(venkat_puvvada) The assumption is that the CIMOM is a service that runs forever, so we are not trying to solve the case for when it is not running. However, the CIMOM is taken off-line during reboots, sometimes when applications are adding/removing providers, or when an unexpected failure (crash) occurs. Indication delivery is a more complex scenario when compared to normal get and enum operations where the client can simply retry. In this case, each layer of the stack may work normally, but due to some external issue (network, crash, reboot, etc) , the indication is lost. And, since indications usually communicate an event of interest, the CIMOM should taken extra precautions to detect failures and reduce indication loss within the stack.

Discussion on 0.3 version

(marek_szermutzky) I guess queue length means number of possible indication entries. How does a systems administator know what implications on memory and CPU usage a change to this configuration has ?
(venkat_puvvada) It depends upon the number of destinations that CIMServer has been attempting the delivery retry. Its implementation specific and can be discussed how to provide delivery retry statistics.

(r_kumpf) What happens when an indication is generated and the delivery retry queue is full? Is the indication at the head of the queue discarded or the newly generated one?
(venkat_puvvada) Indication at the head of the queue is discarded.
(k_schopmeyer) Is there a log entry for this?
(venkat_puvvada) yes, indications are logged.

(marek_szermutzky) Who (server, provider, client) will generate the identifier ? How (algorithm used for uniqueness, what grade of uniqueness) will the identifier be generated ? What grade of uniqueness is required ? I think unique on a specific CIM server should suffice, i.e. a guaranteed unique number(atomic count) not overflowing within 'SubscriptionRemovalTime Interval'.
(venkat_puvvada) Provider should maintain the IndicationIdentifier unique. Construction algorithm for IndicationIdentifier is defined in CIM_Indication class definition. Yes, provider should not use same IndicationIdentifier within SubscriptionRemovalTimeInterval.
There has been lot of discussion going on in DMTF on design of IndicationIdentifier and time and context of maintaining the indicationIdentifier unique. For intial implementation we can consider the above factors.
(k_schopmeyer) Sadly, it is probably more complex than simply a counter. We have to account for provider restart somewhere and the provider must be capable of knowing the identifier (the case of correlating indications). At this point we don't have to account for server restart because I am assuming that there are no delivery retries through server restart. They are all dropped so that there are no retries through a server restart. I would assume that we are going to have to do something like a two part id where the provider gets some initial part when starting or on request and then can add additional uniqueness for its indications with and additional component (ex. incrementing integer). The DMTF is trying to come up with a definition for the version 1.1.0 version of the indication profile now.


Copyright (c) 2008 Hewlett-Packard Development Company, L.P.; IBM Corp.; EMC Corporation; Symantec Corporation; The Open Group.

Permission is hereby granted, free of charge, to any person obtaining a copy  of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

THE ABOVE COPYRIGHT NOTICE AND THIS PERMISSION NOTICE SHALL BE INCLUDED IN ALL COPIES OR SUBSTANTIAL PORTIONS OF THE SOFTWARE. THE SOFTWARE IS PROVIDED  "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.