Pegasus Enhancement Proposal (PEP)
PEP #: 299
Title: Support for the indication delivery retry.
Version: 1.1
Created: 26th March 2007
PEP Type: Concept
Status: Approved
Version History:
Version |
Date |
Author |
Change Description |
0.1
|
26 March 2007 |
Dave sudlik
|
Placeholder PEP with initial details
|
0.2 |
29 Sep 2008
|
Venkat Puvvada |
Added more questions/modified
definition of the problem |
0.3
|
19 May 2009
|
Venkat
Puvvada/
David Judkovics
|
Incorporated
design issues
from latest voice balloted DSP1054 spec
|
0.4
|
30 June
2009
|
Venkat
Puvvada/
David Judkovics |
Added
mechanism to identify
delivery retry and clarification from DMTF
on IndicationIdentifier design.
|
1.0
|
03 July 2009
|
Venkat
Puvvada
|
Made
version change for
the ballot.
|
1.1
|
14 April 2010 |
Venkat Puvvada
|
Rewrite based
on DSP1054 ver 1.1.0 and 2.24.1 Event schema
|
Abstract: Add support to OpenPegasus for the indication
delivery retry as defined in DSP1054 ver 1.1.0.
Definition of the Problem
If a management application uses
indications to keep track of the status of a managed resource, it is
essential that all indications it has subscribed for are actually
delivered. Current behavior for indication
delivery in OpenPegasus is single delivery attempt and no persistence
of
indications which is considered unreliable. To achieve indication
delivery retry the following questions needs to be considered.
- How to determine whether the indication delivery is successful or
not?
- What are the different types of listeners/handlers that are
considered for delivery retry and what are the successful delivery
mechanisms ?
- How are the indications stored in CIMServer during delivery retry
?
- What are the parameters defined in DSP1054 used by the
retry functionality ?
- How indications are uniquely identified ?
- What is the lifetime of the stored indications ?
- What happens to the indications on the queue store when delivery
retry parameters are exceeded?
- Should those indications needs to be logged ?
- How are subscriptions handled when delivery retry parameters are
exceeded?
- How will all function associated with indication delivery retry
is controlled at build ?
- Is there a minimum or maximum delivery retry time requirement for
a failed indication?
- What is relative priority of delivery retry function, with
respect to other CIM server function?
- Is the order of delivery for indications handled via retry
function
guaranteed to be identical to the order in which they were generated?
- Shall CIMListener be able to handle
duplicate/out-of-order/missing indications ?
- What happens to the existing indications in the queue those are
attempted for delivery retry when the corresponding subscription has
been deleted/removed?
- What is the minimum CIM Schema version required to support this
functionality ?
Proposed Solution
This PEP proposes to define the
parameters involved in reliable indication delivery. Reliable
indication delivery means indications delivery within the reasonable
amount of time limits under given constraints(defined by standards). We
try to deliver the indication in the reasonable limits which are guided
by the CIM_IndicationService
class properties DeliveryRetryAttempts
and DeliveryRetryInterval
from DSP1054. This proposal also uses the Sequence identifier of
CIM_Indication class added in version "2.24.0".
Solutions proposed for the above
problems.
- Protocol specifications shall define the way to determine the
successful delivery of indications.
- The following listeners/handlers will be considered for
indication
delivery retry.
- CIM-XML Handler - CIMExportIndicationResponseMessage MUST be
received by the CIMServer from CIMListener without any exception for
successful delivery.
- Indications are stored in memory per
listener destination, based on configurable queue length.
- Delivery retry function uses CIM_IndicationService
class properties DeliveryRetryAttempts,
DeliveryRetryInterval and sequence-identifier-lifetime
as defined in the DSP1054.
- Indications are uniquely
identified using the sequence-identifier of CIM_Indication class as
defined in the 2.24.1 final schema. CIMServer populates the
sequence-identifier.
- Lifetime of the indication is
equals
to sequence-identifier-lifetime( = DeliveryRetryAttempts *
DeliveryRetryInterval * 10) as defined in DSP1054 ver 1.1.0. After
sequence-identifier-lifetime is expired for an indication whose
delivery is being retried, that indication will be considered unable to
be delivered to that listener and removed from the queue, even if the
DeliveryRetryAttempts property would otherwise indicate that
further retries should be attempted.
- Indications are deleted from the
queue and are logged.
- Subscriptions are managed using CIM_IndicationService.SubscriptionRemovalTimeInterval
and CIM_IndicationService.SubscriptionRemovalAction
properties based on delivery failed attempts. Subscriptions have
'OnFatalErrorPolicy' property
which
can be used to manage the individual subscriptions. If
OnFatalErrorPolicy property value is 4 (Remove) then it will abide by
the CIM_IndicationService.SubscriptionRemovalAction
setting and behavior. Subscription deletion can also happen
when indication
delivery has failed
to transient handlers or when they expire.
- When indication profile support is enabled by setting
PEGASUS_ENABLE_DMTF_INDICATION_PROFILE_SUPPORT=true. And also another
config option specifying the size of delivery retry queue per
destination. If queue length is zero no retry attempt is made.
- DSP1054 has property CIM_IndicationService.DeliveryRetryInterval which defines minimal time interval
in seconds for the indication service to wait before delivering an
indication to a particular listener destination that previously failed.
Maximum time is not defined and it can take longer due to other
processing in CIMOM. Note that delivery retry priority is very low.
- The priority of the
delivery retry function will be kept low enough so as not to adversely
affect the CIM server's ability to respond to client requests.
- No, CIMListener can verify the sequence-identifier of the
indication for out-of-order indications.
- Yes, Using sequence-identifier of indication, CIMListener
can verify duplicate/out-of-order/missing indications.
- Indications are discarded and deleted from the queue as defined
in DSP1054.
- CIM Schema version 2.24.1.
Sequence Identifier:
Sequence identifier is the combination of SequenceContext and
SequeneNumber properties of the CIM_Indication class. SequenceContext
is populated by CIMServer in the following way.
SequenceContext = CIM_IndicationService.Name + CIM_ObjectManager.Name +
CIMServer startup time stamp + CIM_ListenerDestination creation time.
SequenceNumber starts at 0 initially or whenever the sequence context
string changes. Otherwise, it will be increased by 1 for every new
indication to that listener destination, and it will be wrapped to 0
when the value range is exceeded.
WBEMListener requirements
See section 7.10.3 of
DSP1054 ver 1.1.0 for WBEMListener requirements for identifying the
duplicate/out-of-order/missing indications
References
- Refer to DSP0107 CIM Indications (Events) White Paper
- Refer to draft DSP1054 Indications Profile for more information.
Future work/ideas
- Consider the following listeners/handlers for future work.
- SNMP Handler.
- EMAIL Handler.
- Syslog destination Handler.
- Consumer providers residing in
the CIMServer.
- How are the indications persisted in CIMServer?
- If the secondary store is used for the persistence of
indications, what is the organization of the secondary store? 1
file
per subscription, 1 file per OS instance, or something else?
- How to control the overall size of the secondary store?
Build-time or run time or both?
- When should the indication with delivery fail be written to
secondary store?
- Does indications persists across CIMServer restarts ?
- Do we need delivery retry parameters (Ex. delivery retry
attempts/interval, persistence storage size etc...) per
subscription
basis?
- Should the CIM server generate an indication every time it starts
?
- Should the CIM server generate heartbeat indications for the
client applications to know that indication delivery has been
interrupted?
- Does the provider need some kind of reliable indicator when an
indication
was accepted for delivery? Like, first persist the indication before
returning success to the provider?
Consider various levels of reliability as follows
- 0 = single (exactly 1) delivery attempt, no retry, no persistence
- 1 = delivery retry per Indication Profile params, no persistence
across server starts, log the delivery failed indications, delete on
successful delivery or when delivery params exceeded.
- 2 = delivery retry per Indication Profile params, persistence
across server starts, log the delivery failed indications, delete on
successful delivery or when delivery params
exceeded.
- 3= delivery retry
happens as per Indication Profile prams, persistence
across server starts, persist elsewhere on successful delivery.
Discussion
(r_kumpf) How important is
it to prevent the same indication from being delivered to the same
listener multiple times?
(venkat_puvvada) DSP1054 says
IndicationIdentifier of CIM_Indication class shall provide uniqueness
to identify possible duplication indictaions those happen during
CIMServer attempts for delivery retry.
(r_kumpf) This definition is
specific to the CIM-XML protocol, so it is insufficient. It also raises
questions about multiple delivery of the same indication and
possibly significant extra overhead for delivery retry if it can be
determined that delivery will never be successful.
(venkat_puvvada) CIMListener shall
be able to distinguish duplicate indications. We can consider the
following classification for successful indication delivery.
a. CIM-XML handler: We consider
successful delivery when listener sends back
CIMExportIndicationResponseMessage without any exception. Predicting
the indication delivery that would be never successful, for example a
permenant failure like incorrect hostname present.
b. Email handler: Unknown at
moment, need to discuss and elaborate with current users.
c. SNMP handler: Unknown at
moment, need to discuss and elaborate with current users.
Discussion on
future items
(r_kumpf) What is the
expected/desired result when the CIM Server is not running? A provider
will not generate indications during the time it is not running. Can
the stated requirement, 'it is essential that all indications it (a
management application) has subscribed for are actually delivered,' be
met in that case? I'm struggling to understand why it is interesting to
persist indications across a CIM Server restart, when indications can
be lost during that time anyway. A CIM Server crash would also
presumably lose the indications since the persistence could would not
be invoked. What, specifically, is the value of trying to persist
indications across CIM Server restarts?
(venkat_puvvada) The assumption is
that the CIMOM is a service that runs forever, so we are not trying to
solve the case for when it is not running. However, the CIMOM is taken
off-line during reboots, sometimes when applications are
adding/removing providers, or when an unexpected failure (crash)
occurs. Indication delivery is a more complex scenario when compared to
normal get and enum operations where the client can simply retry. In
this case, each layer of the stack may work normally, but due to some
external issue (network, crash, reboot, etc) , the indication is lost.
And, since indications usually communicate an event of interest, the
CIMOM should taken extra precautions to detect failures and reduce
indication loss within the stack.
Discussion on
0.3 version
(marek_szermutzky) I guess queue
length means number of possible indication entries. How does a systems
administator know what implications on memory and CPU usage a change to
this configuration has ?
(venkat_puvvada) It depends upon
the number of destinations that CIMServer has been attempting the
delivery retry. Its implementation specific and can be discussed how to
provide delivery retry statistics.
(r_kumpf) What happens when an
indication is generated and the delivery retry queue is full? Is the
indication at the head of the queue discarded or the newly generated
one?
(venkat_puvvada) Indication at the
head of the queue is discarded.
(k_schopmeyer) Is there a log
entry for this?
(venkat_puvvada) yes, indications
are logged.
(marek_szermutzky) Who
(server, provider, client) will generate the identifier ? How
(algorithm used for uniqueness, what grade of uniqueness) will the
identifier be generated ? What grade of uniqueness is required ? I
think unique on a specific CIM server should suffice, i.e. a guaranteed
unique number(atomic count) not overflowing within
'SubscriptionRemovalTime Interval'.
(venkat_puvvada) Provider should
maintain the IndicationIdentifier unique. Construction algorithm for
IndicationIdentifier is defined in CIM_Indication class definition.
Yes, provider should not use same IndicationIdentifier within
SubscriptionRemovalTimeInterval.
There has been lot of discussion going on in DMTF on design of
IndicationIdentifier and time and context of maintaining the
indicationIdentifier unique. For intial implementation we can consider
the above factors.
(k_schopmeyer) Sadly, it is
probably more complex than simply a counter. We have to account for
provider restart somewhere and the provider must be capable of knowing
the identifier (the case of correlating indications). At this point we
don't have to account for server restart because I am assuming that
there are no delivery retries through server restart. They are all
dropped so that there are no retries through a server restart. I would
assume that we are going to have to do something like a two part id
where the provider gets some initial part when starting or on request
and then can add additional uniqueness for its indications with and
additional component (ex. incrementing integer). The DMTF is trying to
come up with a definition for the version 1.1.0 version of the
indication profile now.
Copyright (c) 2008 Hewlett-Packard Development
Company,
L.P.; IBM Corp.; EMC Corporation; Symantec Corporation; The Open Group.
Permission is hereby granted, free of charge, to any
person
obtaining a copy of this software and associated documentation
files
(the "Software"), to deal in the Software without restriction,
including without limitation the rights to use, copy, modify, merge,
publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit
persons to whom the Software is furnished to do so, subject to the
following
conditions:
THE ABOVE COPYRIGHT NOTICE AND THIS PERMISSION NOTICE
SHALL BE
INCLUDED IN ALL COPIES OR SUBSTANTIAL PORTIONS OF THE SOFTWARE. THE
SOFTWARE IS
PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
IN THE
SOFTWARE.