Pegasus Enhancement Proposal (PEP)
PEP
#: 324
PEP Type: Functional
Title: DMTF Indications Profile (DSP1054) Implementation,
stage 2.
Version: 1.1
Authors: Venkateswara Rao Puvvada
Status: Approved
Version History:
Version |
Date |
Author |
Change Description |
0.1 |
11 Apr 2008
|
Venkat Puvvada
|
Initial Submission |
0.2
|
21 Apr
2008
|
Venkat
Puvvada
|
Added more design details,
removed CIM_IndicationServiceSettingData class, removed
modification to
CIM_IndicationService and CIM_IndicationServiceCapability classes.
|
0.3
|
1 May 2008
|
Venkat
Puvvada
|
Full rewrite using
implementation experience, Removed indication persistence
|
0.4
|
5 May 2008
|
Venkat
Puvvada
|
Decided to move retry logic from
IndicationService to HandlerService, Disabling/removing
subscriptions does not affect indications the retry queue.
|
0.5
|
8 May 2008
|
Venkat
Puvvada
|
Added RetryThread Algorithm
|
0.6
|
15 May 2008
|
Venkat
Puvvada
|
Modified RetryThread Algorithm,
decided to retry the indication for DeliveryRetryAttempts +1 times
when the indication was not attempted for initial delivery , because
retry
queue already exists.
|
0.7
|
19 May 2008
|
Venkat
Puvvada
|
Added flowchart/ design picture
|
0.8
|
14 July 2009
|
Venkat
Puvvada
|
Full rewrite using
approved concept PEP 299
|
1.0
|
21 May 2010
|
Venkat Puvvada
|
Rewrite using the DSP1054 ver
1.1.0
|
1.1
|
04 June 2010
|
Venkat Puvvada
|
Incorporated review comments,
ballot version.
|
Abstract: This PEP implements indication delivery retry using
CIM_IndicationService DeliveryRetryAttemts
and DeliveryRetryInterval properties
when indication delivery has failed because of 'temporary' errors in
the protocol. The proposed
implementation is based on
DMTF Indications Profile (DSP1054) ver 1.1.0 and CIM Schema version
version 2.24.1(Minimum CIM Schema version required) and above.
Definition of the Problem
Today indications which are failed to deliver at first attempt, never
be delivered even if only a temporary error occurs. Currently when
indication delivery has failed, a trace message is written and
indication is ignored. No attempt is made to retry those delivery
failed
indications. There needs to be a mechanism to avoid the loss of
indications during the temporary network problems. Indications Profile
(DSP1054) central
class CIM_IndicationService
has the
properties DeliveryRetryAttempts and
DeliveryRetryInterval which
defines how many retry attempts should be made with retry interval
before discarding the indication.
PEP 323 implements all the mandatory classes from DSP1054. As part of
PEP 323 CIM_IndicationService is implemented and the properties DeliveryRetryAttempts and DeliveryRetryInterval are not
considered to implement the delivery retry.
Current behavior
The following sequence of operations
happens during the indication delivery at present.
- Provider generates the indication.
- IndicationService receives the indication and gets the all
matched
subscriptions. If subscriptions are found,
CIMHandleIndicationRequestMessage
is constructed for each matched subscription and sent to the
HandlerService. Indications are sent using SendAsync()
with callback method.
- Indications in the form of CIMHandleIndicationRequestMessage
comes to HandlerService
from IndicationService. HandlerService loads appropriate Handler
(gets Handler information from CIMHandleIndicationRequestMessage
) and gives the indication to the Handler for the delivery.
- Handler(CIM-XML, SNMP etc...) tries to send the indication and
returns the status of
delivery to the HandlerService.
- HandlerService constructs CIMHandleIndicationResponseMessage and
adds the exception if delivery has failed. Response is sent to
IndicationService.
- IndicationService receives CIMHandleIndicationResponseMessage
through callback method, a trace message will be written if
indication delivery has failed.
Proposed Solution
This PEP proposes to retry
the delivery failed indications a
dedicated amount of times. This means indications will be delivered if
only a temporary error broke the delivery attempt. This
PEP proposes the solution for improving the protocol so that deliveries
can be accomplished in case of 'temporary' errors in the protocol.
Delivery failed indications are attempted for DeliveryRetry in the
reasonable limits which are guided
by the CIM_IndicationService
class properties DeliveryRetryAttempts
and DeliveryRetryInterval.
Indications are discarded when DeliveryRetryAttempts
exceeded
or sequence-identifier-lifetime
has expired for a particular indication as defined in DSP1054.
Note: This PEP proposes
solution for CIMXMLIndication listener
destinations
only. Solution for other types of listener destinations like Server
based Consumer providers, SNMP handlers and Email handlers destinations
are not implemented as part of this PEP.
Terms
used:
DestinationQueue : Queue maintained for each
ListenerDestination to store
the indications to be retried.
DestinationQueueTable: A hash
table consists of all DestinationQueues.
DeliveryRetry : Indication is
being retried for delivery after
CIM_indicationService.DeliveryRetryInterval has expired.
Sequence Identifier:
Combination of SequenceContext (type
String) and
SequenceNumber properties of the CIM_Indication class.
Sequence
Identifier Lifetime:
Sequence Identifier Lifetime is equals to (DeliveryRetryAttempts *
DeliveryRetryInterval * 10) as defined in DSP1054.After
sequence-identifier-lifetime is expired for an indication whose
delivery is being retried, that indication will be considered unable to
be delivered to that listener and removed from the queue, even if the
DeliveryRetryAttempts property would otherwise indicate that
further retries should be attempted. The reason for
this is that the DeliveryRetryInterval is a
minimum time, so that property alone does not define an upper time
limit. The Sequence Identifier Lifetime actually does define an upper
limit.
DispatcherThread : Thread
which
monitors all DestinationQueues
in DestinationQueueTable
and attempts DeliveryRetry
using
dedicated thread pool.
Brief
summary
of the proposed solution.
- SequenceContext and SequenceNumber are set to the indication when
it arrives to the HandlerService. This is done by creating the
DestinationQueue for the destination of the indication.
DestinationQueue will have SequenceContext and next available
SequenceNumber.
- SequenceContext value of indication is equals to
"CIM_IndicationService.Name + CIM_ObjectManager.Name +
CIMServer startup time stamp + CIM_ListenerDestination creation time".
SequenceNumber is sint64 value starts from 0, incremented by 1 for each
indication
arrived for this destination and wraps to 0 when reached maximum
positive value.
CIM_ObjectManager.Name is added to differentiate between different
flavors of CIMOM running on the same machine ,
CIM_ObjectManager.Name +
CIMServer startup time stamp constitutes cim-service-start-id in
this implementation. For
performance and footprint reasons the time stamp is a plain Uint64
microseconds value since epoch.
- Indication delivery will be considered successful only when
CIMExportResponseMessage is received from CIMListener without any
exception for CIM-XML Listener destinations.
- CIM_IndicationService
class properties DeliveryRetryAttempts
(=3) and DeliveryRetryInterval(=
20 seconds) are used to implement delivery retry in case
of temporary
failures for CIM-XML
Handlers. These values are not modifiable.These values are get
from CIM_IndicationService instance which is built dynamically in the
IndicationService.
ModifyInstance is not supported on the CIM_IndicationService instance
as CIM_IndicationServiceCapabilities instance setting does not allow
it (optional) in the current Pegasus implementation. This
implementation chooses the default values specified in the
DSP1054 sec 7.1.2. DSP1054 allows the implementation to
choose these properties not modifiable.
- If indication delivery fails,
indication is
enqueued to the DestinationQueue for later DeliveryRetry. Because of
performance/memory
consumption problems, all indications received cannot be accommodated
in the memory. Maximum size of the DestinationQueue is determined using
the formula MaxDestinationQueueSize =
(10*DeliveryRetryAttempts*DeliveryRetryInterval)/IndicationRateInterval
, Where IndicationRateInterval is the average rate of indication flow
for a particular destination over a period of
Sequence-identifier-lifetime (10*3*20=600 seconds, in this
implementation) with minimum value of .25 seconds and maximum
value of 3 seconds. Minimum and Maximum values for
IndicationRateInterval ensures that implementation won't
allocate too high/low queue size. Range of IndicationRateInterval
would be 1/4 -3 seconds, means queue size vary from 200 to 2400.
- IndicationService sends a message (CIMNotifySubscriptionNotActiveResponseMessage)
to HandlerService when
subscription is disabled or deleted. HandlerService deletes and logs
the matched
indications for the subscription from the DestinationQueue.
- IndicationService sends a message (CIMNotifyListenerNotActiveResponseMessage)
to HandlerService when
ListenerDestination has been deleted. HandlerService deletes and logs
all indications in the DestinationQueue and finally deletes the
DestinationQueue.
- New class PG_ListenerDestinationQueue is added in the
root/PG_Internal namespace for DeliveryRetry statistics.
HandlerService instruments this new class, EnumerateInstances and
EnumerateInstanceNames operations are supported.
Note: For more information on
SequenceContext and SequenceIdentifier properties, see CIM_Indication
class.
Functionality
not implemented as part
of this PEP.
- This PEP does not propose solution for other types of handlers
like SNMP handlers and Email handlers etc.. See the Future work
section below.
- The properties
CIM_IndicationService.SubscriptionRemovalTimeInterval
and CIM_IndicationService.SubscriptionRemovalAction
are not used as part of this PEP. Default value of
CIM_IndicationService.SubscriptionRemovalAction is 'Ignore'.
Subscriptions will not be removed/disabled
when
CIM_IndicationService.SubscriptionRemovalTimeInterval
expires. There is no change to the current behavior.
- WBEM
Listener support for detection of lost indications,
detection of duplicated indications, and re-establishing of original
order of indications, as defined in DSP1054.
Proposed
implementation
Thread and
Queue model
The major goal of thread and queue model is as follows.
- No major impact on indication delivery to the destinations which
are able to receive the indications without any problems.
- No major impact on the normal CIMServer operations.
- Should be extensible for the future DeliveryRetry implementation
for the handlers other than CIM-XML.
- Leverage existing message queue/thread
infrastructure.
The following sequence of operations
explains the above diagram
- Indications in the form of CIMHandleIndicationRequestMessage
comes to HandlerService
from IndicationService. Indications are sent using SendForget().
- HandlerService sets the SequenceIdentifier. The following steps
are involved.
- Create DestinationQueue for
the destination of indication and add to the DestinationQueueTable if
it not already exists, set the SequenceContext for the DestinationQueue.
- Get the next SequenceNumber for the DestinationQueue. SequenceContext
and
SequenceNumber is stored per DestinationQueue (C++
class, see below) . Mutex is used for protection.
- Set the SequenceContext and SequenceNumber to the indication.
- HandlerService loads appropriate Handler
and gives the indication to the Handler for the delivery.
- Handler tries to send the indication and
returns the status of
delivery to the HandlerService. If indication delivery is not
successful , HandlerService enqueues the indication on to the
destination queue (Currently only for CIM-XML Handlers) and
starts DispatcherThread if not already running.
- DispatcherThread starts monitoring the DestinationQueueTable for
every DISPATCHER_THREAD_WAITTIME and gets all
the eligible indications for DeliveryRetry. DISPATCHER_THREAD_WAITTIME
=
100 milliseconds + MIN (All
DestinationQueue's next eligible indication
DeliveryRetryInterval expiration time in milliseconds).
100 milliseconds is constant time for DispatcherThread to wait. This
solves the problem of spike of activity when suddenly network comes up.
DeliveryRetry eligible indications are found in the following way
from each DestinationQueue in the DestinationQueueTable.
DispatcherThread
acquires ReadLock on DestinationQueueTable
during this process. DestinationQueue's
Mutex is also acquired while accessing it.
- Delete all
sequence-identifier-lifetime expired indications from the
DestinationQueue.
- If there are indications in the
DestinationQueue and the
DeliveryRetryInterval has expired for the indication which is at the
front of the queue, indication is eligible for the DeliveryRetry.
Remove the indication from the queue.
- DeliveryRetry eligible indications found
at the step 5 are enqueued into the DeliveryQueue.
- DispatcherThread starts worker thread from
DeliveryThreadPool. Default maximum worker threads are 5.
- Worker thread started at step 7
removes the indication
at the front of the delivery queue.
- Indication got at the step 8 will be
attempted for the DeliveryRetry.
- Indication is deleted if the DeliveryRety
is successful. If the DeliveryRetry is failed , the following
actions will be taken place.
- If the DestinationQueue of
the indication is
full(See the formula specified in proposed solution section for the
calculating the MaxDestinationQueueSize) discard the
indication (From the listener perspective this indication is lost).
- Increment the DeliveryRetryAttempts for
this indication.
- If retry attempts exceeds the max DeliveryRetryAttempts
delete the
indication else put the indication at the back of the DestinationQueue
of the indication.
Note 1: Indications are sent
using SendForget() from
IndicationService to HandlerService. At present IndicationService does
not take any action if indication delivery fails when response arrives
through callback method from HandlerService. In future enhancements,
HandlerService
sends message to IndicationService to reconcile
SubscriptionRemovalAction which implements the subscription's
onFatalErrorpolicy. This will greatly reduces burden on
metadispatcher to route the responses back to IndicationService and
improves the performance.
Note 2: All the
discarded/deleted indications are traced under the
Handler(TRC_IND_HANDLER) component with SequenceContext and
SequenceNumber information.
Note 3: During the CIMServer shutdown all
the indications being retried are discarded and traced (See Note 2
Above). This implementation does not propose a solution for the
persistence of indications because of the following reasons.
- CIMServer
may shutdown or crash before indication arrives to HandlerService.
There might be many indications in the IndicationService for evaluation
or with meta-dispatcher for routing when CIMServer went down. There
needs to be an elegant solution where indication is persisted after
provider generates the indication and retrieve them for later
evaluation and delivery.
- When
CIMServer crashed or shutdown, from the listener/client
perspective (if it knows by some kind of heart beat indications from
CIMServer with subscription) resource monitoring has been stopped. Is
there any value sending the persisted indications after restart?
- If
indications are persisted, sequence-identifier-lifetime might expire
for all persisted indications if the CIMServer has not started
immediately (considering that this implementation uses DeliveryRetryAttempts
(=3) and DeliveryRetryInterval(=
20 seconds)), which is 600 seconds.
Considering
the above mentioned problems, persistence of indications has been
deferred in this implementation. Note that the proposed
implementation does not limit persistence of indications in
future.
DeliveryRetry
overview
When delivery of
indication has
failed for a particular ListenerDestination,
the unsent indications will be queued-up per
listener destination basis. All listener destinations will have a
DestinationQueue. DestinationQueue is created
for
the listener destination when the first indication arrived to it.
DestinationQueue
is a list (C++ class which have methods to insert/delete the
indications to/from queue), can
grow dynamically whose size is determined by
MaxDestinationQueueSize. DestinationQueueTable, a hash
table that consists of all DestinationQueues
. Handler name is used as key for the lookup of the
DestinationQueue in
the DestinationQueueTable.
First delivery failure of any indication will start a DispatcherThread
which
monitors all the DestinationQueues in the DestinationQueueTable and
attempts DeliveryRetry
according to the
DeliveryRetryAttempts and DeliveryRetryInterval properties of
CIM_IndicationService instance. When new indication is added to the
DestinationQueue
and if DestinationQueue is full, indication
at the front of the DestinationQueue will be deleted and new indication
is added
at the back of the DestinationQueue.
If there are 'n' DestinationQueues
and if DeliveryRetry was
successful for
particular DestinationQueue,
DispatcherThread will not continue to deliver all the indications from
the same
DestinationQueue instead
it
continues iterating all DestinationQueues
trying to deliver the indications
and comeback to successful DestinationQueue.
This is iterative
approach and allows all the DestinationQueues
to get the same priority when
attempting DeliveryRetry. This approach
also solves the problem where consumer is too slow to receive the
indications continuously without any time delay. If there are many
Destination queues and if suddenly network comes up this won't cause
spike
of activity.
DispatcherThread
once
started actually sleeps for DISPATCHER_THREAD_WAIT_TIME before
monitoring
the queues. Each
indication will have
lastDeliveryTime which
is used to find out
whether the the DeliveryRetryInterval time has
expired for the indication. DispatcherThread
acquires the ReadLock
on DestinationQueueTable,
removes one indication from each DestinationQueue
in DestinationQueueTable whose DeliveryRetryInterval expired,
enqueues them into the DeliveryQueue and starts worker threads in
DeliveryThreadPool.
Max delivery worker threads are 5 which is configurable during build
time. DispatcherThread will
exit if there are no indications in the DestinationQueueTable.
Server
behavior
from a client/listener perspective
DSP1054 ver 1.1.0 sec 7.10.3 defines the WBEM Listener
requirements. WBEMListener can guess missed/out-of-oder/duplicate
indications using SequenceIdentifier and its lifetime. WBEM Listener support is not
part
of this PEP.
MaxDestinationQueueSize
vs DSP1054 ver 1.1.0 spec conformance
An indication may be deleted when DestinationQueue is full (size
exceeds MaxDestinationQueueSize
), even
if the DeliveryRetryAttempts property would otherwise indicate
that further retries should be attempted. Is this violation of the
DSP1054 spec ? NO. Note that
DSP1054 has property CIM_IndicationService.DeliveryRetryInterval which
defines minimal time interval in seconds for the indication service to
wait before delivering an indication to a particular listener
destination that previously failed. Maximum time is not defined and it
can take longer due to other processing in CIMServer. Same condition
applies here. From CIMListener perspective the indication is lost.
Note:
It may not be very "reliable" from a provider perspective to
drop indications
just because the DestinationQueue is full. Note that indication
delivery retry priority is low and should not impact normal CIMServer
operations. MaxDestinationQueueSize
is introduced to
limit the resource (memory, cpu etc...) consumption by the delivery
retry
functionality.
Indication
Information
Indications are stored in the DestinationQueue in the form of
IndicationInfo class. IndicationInfo class will have the following
properties.
CIMInstance
indication; - Indication to be delivered
CIMInstance
subscription; - Subscription to which indication matched
OperationContext
context; - OperationContext for the delivery of indication.
String nameSpace;
- Namespace from where indication originated
DestinationQueue
*queue; - DestinationQueue pointer to which this indications belongs
to, this is used to route the indications to appropriate queue.
Uint16
deliveryRetyAttemptsMade; - Number of delivery retry attempts made.
CIMDateTime
arrivalTime; - Arrival time of the indication to the HandlerServcie.
Used to calculate the sequence-identifier-lifetime.
CIMDateTime
lastDeliveryRetryTime; - last DeliveryRetry time for this indication.
New CIMMessages
CIMNotifySubscriptionNotActiveRequestMessage
is sent to HandlerService from IndicationService when subscription is
deleted or disabled. HandlerService deletes and logs all indications
matching the subscription. As mentioned in the
PEP earlier, All the
discarded/deleted indications are traced under the
Handler(TRC_IND_HANDLER) component with SequenceContext and
SequenceNumber information.
class PEGASUS_COMMON_LINKAGE
CIMNotifySubscriptionNotActiveResponseMessage
: public
CIMResponseMessage
{
public:
CIMNotifySubscriptionNotActiveResponseMessage(
const String& messageId_,
const CIMException& cimException_,
const QueueIdStack& queueIds_)
:
CIMResponseMessage(CIM_NOTIFY_SUBSCRIPTION_NOT_ACTIVE_RESPONSE_MESSAGE,
messageId_, cimException_, queueIds_)
{
}
};
CIMNotifyListenerNotActiveResponseMessage is sent to HandlerService from
IndicationService when listener has been deleted.
HandlerService deletes and logs all indications from DestinationQueue
and finally deletes the queue itself.
class PEGASUS_COMMON_LINKAGE
CIMNotifyListenerNotActiveResponseMessage
: public
CIMResponseMessage
{
public:
CIMNotifyListenerNotActiveResponseMessage(
const String& messageId_,
const CIMException& cimException_,
const QueueIdStack& queueIds_)
:
CIMResponseMessage(CIM_NOTIFY_LISTENER_NOT_ACTIVE_RESPONSE_MESSAGE,
messageId_, cimException_, queueIds_)
{
}
};
DeliveryRetry
statistics
The following new class is add to the root/Internal namespace for the
DeliveryRetry statistics. This class is instrumented by the
HandlerService, EnumerateInstances and
EnumerateInstanceNames operations are supported. This feature would be useful in
production/development/test environments. Provides useful
DeliveryRetry statistics like which all destinations are unable to
receive indications, indications dropped count etc... .
//
===================================================================
// PG_ListenerDestinationQueue
//
===================================================================
[Version("2.11.0"), Description (
"PG_ListenerDestinationQueue is a representation of queue maintained "
"for the listener
destination for indication delivery.")]
class PG_ListenerDestinationQueue
{
[Key, Propagated ("CIM_ListenerDestination.Name"),
Description ("Name of the listener destination.")]
string
ListenerDestinationName;
[Description ("Listener destination queue creation time in microseconds
since epoch.")]
Uint64
CreationTime;
[Key, Propagated ("CIM_Indication.SequenceContext"),
Description ("SequenceContext of the listener destination.")]
string
SequenceContext;
[Key, Propagated ("CIM_Indication.SequenceNumber"),
Description ("Next available sequenceNumber for the listener "
"destination.")]
sint64
NextSequenceNumber;
[Description ("The Sequence Identifier Lifetime in seconds.")]
uint32
SequenceIdentifierLifetime;
[Description ("The maximum number of indications that queue can hold.")]
uint32
MaxQueueSize;
[Description ("The number of indications in the queue at present.")]
uint32
CurrentIndications;
[Description ("The number of indications dropped because of the "
"maximum indications that queue can hold were exceeded.")]
uint64
QueueFullDroppedIndications;
[Description ("The number of indications dropped because of the "
"sequence-identifier-lifetime was expired.")]
uint64
LifetimeExpiredIndications;
[Description ("The number of indications dropped because of the "
"DeliveryRetryAttempts were exceeded.")]
uint64
RetryAttemptsExceededIndications;
[Description ("The number of indications dropped because of the "
"corresponding subscription has been disabled or deleted.")]
uint64
SubscriptionNotActiveDroppedIndications;
[Description ("Last indication successful delivery time in microseconds since epoch.")]
Uint64 LastSucessfulDeliveryTime;
};
DestinationQueue
is created when the indication is arrived
to the listener destination for the first time and remains active
until listener has been deleted. DestinationQueues are not persisted
across the CIMServer restarts. The average rate of indications arriving
to the DestinationQueue can be known using the CreationTime and
NextSequenceNumber properties. The
instances of this class are built dynamically and there will be no
major
impact on footprint and server performance.
Testcases
Testcases will be added to test the
functionality
proposed in this PEP. For example Create the subscription, don't
start the Listener. Provider generates the 'n' indications. Now start
the listener, listener should get 'n' indications generated by the
provider before DeliveryRetryAttempts expired for the first indication.
Tests are also added to test discarded
indications.
Schedule
Available in 2.11
Future Work
- Verify if the indication delivery retry implementation is
feasible for other type of handlers
like SNMP, Email, Syslog and Consumer Providers.
- Implement CIM_IndicationService.SubscriptionRemovalAction.
- DynamicListener support for
detection of lost/duplicate/out-of-order indications.
Discussion
Comments on
version 0.1
(r_kumpf) What about this PEP is
specific to CIM-XML? Why would the IndicationService care about the
type of the listener destination?
(venkat_puvvada) There is no
reason for not supporting the other Listener destination types. I am
not sure how best we can match these parameters for Email and SNMP
handlers, so i decided not include support for them at this stage.
(r_kumpf) Do indications continue
to be retried for delivery after the associated
CIM_IndicationSubscription and CIM_ListenerDestination instances are
deleted?
(venkat_puvvada) No, indications
will be discarded.
(r_kumpf) What is the rationale
for persisting indications across cimserver restarts? When the
cimserver is stopped, indications will cease to be generated. When it
is restarted, the listener may receive stale indications and not
receive more current ones for events that occurred while the cimserver
was stopped. This could result in an administrator getting paged about
a critical problem that was fixed months earlier.
(venkat_puvvada) The reason for
persistence of indications is client may not want loose any
indications. Client must be intelligent enough to discard out of date
indications by looking at timestamp of delivered indication.
(r_kumpf) The traceFilePath is a
poor choice for a directory to persist data needed for CIM Server
operation. This directory is generally world writable.
(venkat_puvvada) yes, i agree,
this needs to be discussed.
(r_kumpf) What are the contents
and format of this file? How is compatibility protected on CIM Server
upgrade?
(venkat_puvvada) The file will
have Handler , subscription and Indication(with content language list
added to indication instance) instances in XML form. Indications are
saved for each subscription under for each listener destination. It
will be compatible with CIMServer upgradation.
Comments on version 0.2
(r_kumpf) Doesn't the CIMHandleIndicationRequestMessage already contain
all the information that is needed to deliver the indication? The only
extra data the IndicationService should need to track is related to the
retry algorithm. What am I missing?
(venkat_puvvada) CIMHandleIndicationRequestMessage does not have the
following information.
subscriptionInstanceNames
providerName
pendingRetryCount
These are required to construct CIMProcessIndicationRequestMessage
request again.
(r_kumpf) How is it determined which
exceptions indicate an indication delivery failure? For example, why
does CannotCreateSocketException cause a retry but not bad_alloc?
(venkat_puvvada) Though its difficult to examine the
CannotCreateSocketException , its possible that we can retry when
socket() returns errno with ENOBUFS or ENOMEM means resources at TCP/IP
layer/memory exhausted and can be retried later.
Comments on
version 0.3
(k_schopmeyer) Nit. This is only
one component in moving from 'sort of best effort' to reliable
delivery. I suggest that this is simply improving the protocol so that
deliveries can be accomplished in case of 'temporary' errors in the
protocol and not really reliable delivery.
(venkat_puvvada) ok
(r_kumpf) Why is the HandlerRetryQueue logic in the IndicationService?
Retrying delivery seems like it should be the
IndicationHandlerService's job. I think it is more of a protocol-level
thing than an indication processing thing.
(venkat_puvvada) Yes, i agree. Actually we have decided to discard the
indications on the RetryQueue when matched subscription is
removed/disabled. If we implement this in IndicationService, we can
directly access the ActiveSubscriptionTable to see if subscription is
active or not. This will have performance benefit. Keeping this
implementation in the HandlerService requires to check in repository
for for subscription validity or a message needs to be sent by
IndicationService to HandlerService when subscription is
removed/disabled.
(r_kumpf) What happens when a delivery retry fails? Is the indication
put back on the queue? At the beginning or the end? What if the queue
is full?
(venkat_puvvada) If DeliveryRetry fails indication is inserted at the
front of the queue. When queue is full ,indication at the front of the
queue will be removed and new indication is added at the back of the
queue.
(r_kumpf) Is a new exception class the best way for a handler to
communicate the delivery status? It might make sense to change the
CIMHandler::handleIndication return type from void to a status value.
Possible values could be Success, Error, and FatalError, for example.
An interesting question here is what is the behavior when the handler
throws an exception which is not DeliveryFailedException? Is it assumed
that the delivery was sucecssful or permanently failed?
(venkat_puvvada) This is good idea. We can have possible values
Success, Error, and FatalError.
Success - Delivery success
Error - Error, can be retried later
FatalError - Permanent failure, no retry.
If handler throws other than DeliveryFailedException, thats either
permanent failure or post-delivery failure, we don't retry in those
cases.
Comments on
version 0.4
k_schopmeyer) Should we consider some maximum limit on the number of
retry queues? This is just another possible memory protector.
(venkat_puvvada) I am ok with that, need to discuss this.
r_kumpf) I presume these test cases will get pretty interesting. Do you
have thoughts about how they will work?
(venkat_puvvada) Create the subscription, don't start the Listener.
Provider generates the 'n' indications. Now start the listener,
listener should get 'n' indications generated by the provider when
DeliveryRetryInterval expired.
Comments on
version 0.5
(r_kumpf) Can you characterize the
threading implications here? If each
of the retries is done by the RetryThread, that would mean the
DestinationQueueTable would potentially be locked for a long time.
If the delivery retry fails, the IndicationHandlerService will need to
put the indication back into the DestinationQueueTable. Will
deadlock occur?
If a new thread is started for each delivery retry, that would cause a
spike of activity on each interval, affecting the delivery of
indications to listeners that have not experienced failures.
(venkat_puvvada) No deadlock will occur. It works in the following way.
1. Take the lock on the queue table.
1. Iterate through queue table, get one indication from each queue,
store them in array.
2. Release lock on the table.
3. Send each indication in the array to HandlerService, using
SendAsync() method.
4. If DeliveryRetry fails HandlerService puts the indication on to the
queue.
(r_kumpf) Is the specification clear about the meaning of the
DeliveryRetryAttempts value? It seems like it should be the number of
delivery retry attempts made AFTER an initial failed delivery attempt.
Karl volunteered to follow up with the DMTF on this item.
Comments on version 0.7
(r_kumpf) Shouldn't the lastRetryTime be tracked per indication rather
than per queue?
(dmitry_mikulin) If lastRetryTime is per queue, how are you going to
tell which indications are ready to be re-tried?
(venkat_puvvada) If we maintain the lastRetryTime for each indication,
we can not deliver the indications in sequence. For example if there
are many indications in the retry queue if we try to deliver the
indications according to the indications lastRetryTime it is possible
that we deliver latest indications in the queue.
(r_kumpf) This steps seems like it would unnecessarily delay the
delivery of queued indications once the intermittent problem (network
error, for example) is resolved.
(venkat_puvvada) RETRY_THREAD_WAIT_TIME value is configurable. This
also prevents spike of activity when suddenly all clients/listeners
comes up and also solves the problem where consumers are too slow to
receive the inidcations.
(b_whiteley) I would prefer to see a solution where all handler types
are supported. In addition to extending this functionality to the other
handler types, I suspect this implementation would be cleaner.
(venkat_puvvada) Yes, this can be tried in the next stage of
implementation.
(b_whiteley) I'm not very familiar with the current Indication Handler
Service, so I apologize for the lack of specifics. As I read through
this PEP, my gut feeling is that the approach proposed in this PEP will
introduce a lot of problems and instability.
It doesn't seem right to have the Handler Service hand indications to
other components that will ultimately hand the indications back to the
Handler Service.
I would prefer a design that incorporates the following:
* Refactor the HandlerService itself to handle all of the delivery
retry logic, rather than having a separate component reinsert
indications into the HandlerService.
* Enhance the Handler interface so that delivery retry is applicable to
all types of Handlers, not just CIM-XML.
* Design it in a way that is consistent with turning Handlers into
Handler Providers at a later date, so that new handlers can be added
just as instance providers are added today.
Comments on version 0.8
(k_schopmeyer) The
trace is primarily a development tool. Should we not be logging
something when we throw indications away. The original use of the
discarded data was for 'abnormal' discards, those things that were
probably due to pegasus problems. This is a normal event,
queue-too-big, discard.
(venkat_puvvada) Yes, i agree. Discarded indications will be just
logged.
(k_schopmeyer) Since we are now going to have a mechanism that uses
memory to store data for possibly long periods of time, can cause log
entries when indications are discarded, and also is going to ask the
adminstrator to set config variables, I think we are going to have to
have some tools so that the admin can figure out what is happening. Are
there indications in retry, how many, how long, possibly which
destinations, what are the high-water marks, etc. Without this type of
information, the admin will not really understand when his server
develops memory issues because of large numbers of retries in queue and
will not have any real clue how to set the config variables.
(venkat_puvvada) Yes , i will add class like
PG_IndicationDeliveryQueue, which will have properties like, name,
size, creation time, last delivery time, number of indications
discarded, number of indications successfully delivered. User can
enumerate instances of PG_IndicationDeliveryQueue and check for number
of delivery queues and their status.
(k_schopmeyer) At this point, we are getting to where we will have a
number of different 'scheduled' thread mechanisms between a) provider
unload, pull operaitons timer timers, etc. and I wonder if it is not
time to define a simple scheduler instead of everybody doing their own
thread,wait mechanism. This should not be too difficult, one thread to
run the scheduler and an api to enter new timed events in the scheduler.
(venkat_puvvada) With the proposed solution DispatcherThread is only
created when there are delivery failures and thread automatically
terminates when there are no indications to be delivered. Having a
scheduler is nice idea, we can definitely have it proposed and
discussed in a separate PEP.
Comments
on version 1.0
(k_schopmeyer) While this PEP covers only CIM/XCML, I am having
problems determinings 1) why, 2) what is the common part so we know
what has to be added to other handers to make them 'reliable'.
(venkat_puvvada) we need to find the feasiblity of implementing the
indication delivery retry for other handlers. While this implementation
does not prohibit extending delivery retry to other handlers and it
would be a future enhancement
(k_schopmeyer) Since this is defined as a statistics class should we
not start keeping some more statistics that will give the admin
infomration on how much this queue is being used. Currently the
statistical information is about overruns effectively but how about
things like a high-water-mark (i.e. the highest point the queue
reached), average, etc.
(venkat_puvvada) The average rate of indications arriving to the
DestinationQueue can be known using the CreationTime and
NextSequenceNumber properties. No mechanism provided in this
implementation to find out the
highest point in size the queue reached. To some extent it can known
using the QueueFullDroppedIndications property if it reached the
maximum queue size.
Copyright (c) 2006 Hewlett-Packard Development Company, L.P.; IBM Corp.;
EMC Corporation; Symantec Corporation; The Open Group.
Permission is hereby granted, free of charge, to any person obtaining a
copy
of this software and associated documentation files (the "Software"), to
deal in the Software without restriction, including without limitation
the
rights to use, copy, modify, merge, publish, distribute, sublicense,
and/or
sell copies of the Software, and to permit persons to whom the Software
is
furnished to do so, subject to the following conditions:
THE ABOVE COPYRIGHT NOTICE AND THIS PERMISSION NOTICE SHALL BE INCLUDED
IN
ALL COPIES OR SUBSTANTIAL PORTIONS OF THE SOFTWARE. THE SOFTWARE IS
PROVIDED
"AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING
BUT NOT
LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN
AN
ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Template last modified: March 26th 2006 by Martin Kirk
Template version: 1.11