Pegasus Enhancement Proposal (PEP)


PEP #: 324

PEP Type: Functional

Title: DMTF Indications Profile (DSP1054) Implementation, stage 2.

Version: 1.1

Authors: Venkateswara Rao Puvvada

Status:  Approved

Version History:

Version Date Author Change Description
0.1 11 Apr 2008
Venkat Puvvada
Initial Submission
0.2
21 Apr 2008
Venkat Puvvada
Added more design details, removed CIM_IndicationServiceSettingData class, removed
 modification to CIM_IndicationService and CIM_IndicationServiceCapability classes.
0.3
1 May 2008
Venkat Puvvada
Full rewrite using implementation experience, Removed indication persistence
0.4
5 May 2008
Venkat Puvvada
Decided to move retry logic from IndicationService to HandlerService, Disabling/removing
subscriptions does not affect indications the retry queue.
0.5
8 May 2008
Venkat Puvvada
Added RetryThread Algorithm
0.6
15 May 2008
Venkat Puvvada
Modified RetryThread Algorithm, decided to retry the indication for DeliveryRetryAttempts +1 times
when the indication was not attempted for initial delivery , because retry queue already exists.
0.7
19 May 2008
Venkat Puvvada
Added flowchart/ design picture
0.8
14 July 2009
Venkat Puvvada
Full  rewrite using approved concept PEP 299
1.0
21 May 2010
Venkat Puvvada
Rewrite using the DSP1054 ver 1.1.0
1.1
04 June 2010
Venkat Puvvada
Incorporated review comments, ballot version.


Abstract: This PEP implements indication delivery retry using CIM_IndicationService DeliveryRetryAttemts and DeliveryRetryInterval properties when indication delivery has failed because of 'temporary' errors in the protocol. The proposed implementation is based on DMTF Indications Profile (DSP1054) ver 1.1.0 and CIM Schema version version 2.24.1(Minimum CIM Schema version required) and above.


Definition of the Problem

Today indications which are failed to deliver at first attempt, never be delivered even if only a temporary error occurs. Currently when indication delivery has failed, a trace message is written and indication is ignored. No attempt is made to retry those delivery failed indications. There needs to be a mechanism to avoid the loss of indications during the temporary network problems. Indications Profile (DSP1054) central class CIM_IndicationService has the properties DeliveryRetryAttempts and DeliveryRetryInterval  which defines how many retry attempts should be made with retry interval before discarding the indication.

PEP 323 implements all the mandatory classes from DSP1054. As part of PEP 323 CIM_IndicationService is implemented and the properties DeliveryRetryAttempts and DeliveryRetryInterval are not considered to implement the delivery retry.

Current behavior

The following sequence of operations happens during the indication delivery at present.
  1. Provider generates the indication.
  2. IndicationService receives the indication and gets the all matched subscriptions. If subscriptions are found,  CIMHandleIndicationRequestMessage is constructed for each matched subscription and sent to  the HandlerService. Indications are sent using SendAsync()  with callback method.
  3. Indications in the form of CIMHandleIndicationRequestMessage comes to HandlerService from IndicationService. HandlerService  loads appropriate Handler (gets Handler information from CIMHandleIndicationRequestMessage ) and gives the indication to the Handler for the delivery.
  4. Handler(CIM-XML, SNMP etc...) tries to send the indication and returns the status of delivery to the HandlerService.
  5. HandlerService constructs CIMHandleIndicationResponseMessage and adds the exception if delivery has failed. Response is sent to IndicationService.
  6. IndicationService receives CIMHandleIndicationResponseMessage through callback method, a trace message will be written if indication delivery has failed.

Proposed Solution

This PEP proposes to retry the delivery failed  indications a dedicated amount of times. This means indications will be delivered if only  a temporary error broke the delivery attempt. This PEP proposes the solution for improving the protocol so that deliveries can be accomplished in case of 'temporary' errors in the protocol. Delivery failed indications are attempted for DeliveryRetry in the reasonable limits which are guided by the CIM_IndicationService class properties DeliveryRetryAttempts and DeliveryRetryInterval. Indications are discarded when DeliveryRetryAttempts exceeded or  sequence-identifier-lifetime has expired for a particular indication as defined in DSP1054.

Note: This PEP proposes solution for CIMXMLIndication listener destinations only. Solution for other types of listener destinations like Server based Consumer providers, SNMP handlers and Email handlers destinations are not implemented as part of this PEP.

Terms used:

DestinationQueue :
Queue maintained for each ListenerDestination to store the indications to be retried.
DestinationQueueTable: A hash table consists of all DestinationQueues.
DeliveryRetry : Indication is being retried for delivery after CIM_indicationService.DeliveryRetryInterval has expired.
Sequence Identifier:  Combination of SequenceContext (type String) and SequenceNumber properties of the CIM_Indication class.
Sequence Identifier Lifetime:  Sequence Identifier Lifetime is equals to  (DeliveryRetryAttempts * DeliveryRetryInterval * 10) as defined in DSP1054.After sequence-identifier-lifetime is expired for an indication whose delivery is being retried, that indication will be considered unable to be delivered to that listener and removed from the queue, even if the DeliveryRetryAttempts  property would otherwise indicate that further retries should be attempted. The reason for this is that the DeliveryRetryInterval is a minimum time, so that property alone does not define an upper time limit. The Sequence Identifier Lifetime actually does define an upper limit.
DispatcherThread : Thread which monitors all DestinationQueues in DestinationQueueTable and attempts DeliveryRetry using dedicated thread pool.

Brief summary of the proposed solution.
  1. SequenceContext and SequenceNumber are set to the indication when it arrives to the HandlerService. This is done by creating the DestinationQueue for the destination of the indication. DestinationQueue will have SequenceContext and next available SequenceNumber.
  2. SequenceContext value of indication is equals to  "CIM_IndicationService.Name + CIM_ObjectManager.Name + CIMServer startup time stamp + CIM_ListenerDestination creation time". SequenceNumber is sint64 value starts from 0, incremented by 1 for each indication arrived for this destination and wraps to 0 when reached maximum positive value. CIM_ObjectManager.Name is added to differentiate between different flavors of CIMOM running on the same machine ,  CIM_ObjectManager.Name + CIMServer startup time stamp constitutes  cim-service-start-id in this implementation. For performance and footprint reasons the time stamp is a plain Uint64 microseconds value since epoch.
  3. Indication delivery will be considered successful  only when CIMExportResponseMessage is received from CIMListener without any exception for CIM-XML Listener destinations.
  4. CIM_IndicationService class properties DeliveryRetryAttempts (=3) and DeliveryRetryInterval(= 20 seconds)  are used to implement delivery retry in case of temporary failures for CIM-XML Handlers. These values are not modifiable.These values are get from CIM_IndicationService instance which is built dynamically in the IndicationService. ModifyInstance is not supported on the CIM_IndicationService instance as CIM_IndicationServiceCapabilities instance setting does not allow it  (optional) in the current Pegasus implementation. This implementation chooses the default values specified in the DSP1054  sec 7.1.2.  DSP1054 allows the implementation to choose these properties not modifiable.
  5. If indication delivery fails, indication is enqueued to the DestinationQueue for later DeliveryRetry. Because of performance/memory consumption problems, all indications received cannot be accommodated in the memory. Maximum size of the DestinationQueue is determined using the formula MaxDestinationQueueSize = (10*DeliveryRetryAttempts*DeliveryRetryInterval)/IndicationRateInterval , Where IndicationRateInterval is the average rate of indication flow for a particular destination over a period of Sequence-identifier-lifetime (10*3*20=600 seconds, in this implementation) with minimum value of .25 seconds and  maximum value of 3 seconds. Minimum and Maximum values for IndicationRateInterval ensures that implementation won't allocate too high/low queue size. Range of IndicationRateInterval  would be 1/4 -3 seconds, means queue size vary from 200 to 2400.
  6. IndicationService sends a message (CIMNotifySubscriptionNotActiveResponseMessage) to HandlerService when subscription is disabled or deleted. HandlerService deletes and logs the matched indications for the subscription from the DestinationQueue.
  7. IndicationService sends a message (CIMNotifyListenerNotActiveResponseMessage) to HandlerService when ListenerDestination has been deleted. HandlerService deletes and logs all indications in the DestinationQueue and finally deletes the DestinationQueue.
  8. New class PG_ListenerDestinationQueue is added in the root/PG_Internal namespace for DeliveryRetry statistics. HandlerService instruments this new class, EnumerateInstances and EnumerateInstanceNames operations are supported.
Note: For more information on SequenceContext and SequenceIdentifier properties, see CIM_Indication class.

Functionality not implemented as part of this PEP.
  1. This PEP does not propose solution for other types of handlers like  SNMP handlers and Email handlers etc.. See the Future work section below.
  2. The properties CIM_IndicationService.SubscriptionRemovalTimeInterval and CIM_IndicationService.SubscriptionRemovalAction are not used as part of this PEP. Default value of CIM_IndicationService.SubscriptionRemovalAction is 'Ignore'. Subscriptions will not be removed/disabled when    CIM_IndicationService.SubscriptionRemovalTimeInterval expires. There is no change to the current behavior.
  3. WBEM Listener support for detection of lost indications, detection of duplicated indications, and re-establishing of original order of indications, as defined in DSP1054.
Proposed implementation

Thread and Queue model

The major goal of  thread and queue model is as follows.
  1. No major impact on indication delivery to the destinations which are able to receive the indications without any problems.
  2. No major impact on the normal CIMServer operations.
  3. Should be extensible for the future DeliveryRetry implementation for the handlers other than CIM-XML.
  4. Leverage existing message queue/thread  infrastructure.


The following sequence of operations explains the above diagram
  1. Indications in the form of CIMHandleIndicationRequestMessage comes to HandlerService from IndicationService. Indications are sent using SendForget().
  2. HandlerService sets the SequenceIdentifier. The following steps are involved.
    1. Create DestinationQueue for the destination of indication and add to the DestinationQueueTable if it not already exists, set the SequenceContext for the DestinationQueue.
    2. Get the next SequenceNumber for the DestinationQueue.  SequenceContext and SequenceNumber is stored per DestinationQueue (C++ class, see below) . Mutex is used for protection.
    3. Set the SequenceContext and SequenceNumber to the indication.
  3. HandlerService  loads appropriate Handler and gives the indication to the Handler for the delivery.
  4. Handler tries to send the indication and returns the status of delivery to the HandlerService. If indication delivery is not successful , HandlerService enqueues the indication on to the destination queue (Currently only for CIM-XML Handlers)  and starts DispatcherThread if not already running.
  5. DispatcherThread starts monitoring the DestinationQueueTable for every DISPATCHER_THREAD_WAITTIME and gets all the eligible indications for DeliveryRetry. DISPATCHER_THREAD_WAITTIME =  100 milliseconds + MIN (All DestinationQueue's  next eligible  indication  DeliveryRetryInterval  expiration time in milliseconds). 100 milliseconds is constant time for DispatcherThread to wait. This solves the problem of spike of activity when suddenly network comes up. DeliveryRetry eligible indications are found in the following way from each DestinationQueue in the DestinationQueueTable. DispatcherThread acquires ReadLock on DestinationQueueTable during this process. DestinationQueue's Mutex is also acquired while accessing it.
    1. Delete all sequence-identifier-lifetime expired indications from the DestinationQueue.
    2. If there are indications in the DestinationQueue and the DeliveryRetryInterval has expired for the indication which is at the front of the queue, indication is eligible for the DeliveryRetry. Remove the indication from the queue.
  6. DeliveryRetry eligible indications found at the step 5 are enqueued into the DeliveryQueue.
  7. DispatcherThread starts worker thread from DeliveryThreadPool. Default maximum worker threads are 5.
  8. Worker thread started at step 7 removes the indication at the front of the delivery queue.
  9. Indication got at the step 8 will be attempted for the DeliveryRetry.
  10. Indication is deleted if the DeliveryRety is successful. If the DeliveryRetry is failed , the following actions will be taken place.
    1. If the  DestinationQueue of the indication is full(See the formula specified in proposed solution section for the calculating the MaxDestinationQueueSize)  discard the indication (From the listener perspective this indication is lost).
    2. Increment the DeliveryRetryAttempts for this indication.
    3. If retry attempts exceeds the max DeliveryRetryAttempts delete the indication else put the indication at the back of the DestinationQueue of the indication.
Note 1: Indications are sent using SendForget() from IndicationService to HandlerService. At present IndicationService does not take any action if indication delivery fails when response arrives through callback method from HandlerService. In future enhancements, HandlerService sends message to IndicationService to reconcile SubscriptionRemovalAction which implements the subscription's onFatalErrorpolicy. This will greatly reduces burden on metadispatcher to route the responses back to IndicationService and improves the performance.

Note 2: All the discarded/deleted indications are traced under the Handler(TRC_IND_HANDLER) component with SequenceContext and SequenceNumber information.

Note 3: During the CIMServer shutdown all the indications being retried are discarded and traced (See Note 2 Above). This implementation does not propose a solution for the persistence of indications because of the following reasons.
  1. CIMServer may shutdown or crash before indication arrives to HandlerService. There might be many indications in the IndicationService for evaluation or with meta-dispatcher for routing when CIMServer went down. There needs to be an elegant solution where indication is persisted after provider generates the indication and retrieve them for later evaluation and delivery.
  2. When CIMServer crashed or shutdown,  from the listener/client perspective (if it knows by some kind of heart beat indications from CIMServer with subscription) resource monitoring has been stopped. Is there any value sending the persisted indications after restart?
  3. If indications are persisted, sequence-identifier-lifetime might expire for all persisted indications if the CIMServer has not started immediately (considering that this implementation uses DeliveryRetryAttempts (=3) and DeliveryRetryInterval(= 20 seconds)),  which is 600 seconds.
Considering the above mentioned problems, persistence of indications has been deferred in this implementation.  Note that the proposed implementation does not limit  persistence of indications in future.

DeliveryRetry overview

When delivery of indication has failed for a particular ListenerDestination,  the unsent indications will be queued-up per listener destination basis. All listener destinations will have a DestinationQueue. DestinationQueue is created for the listener destination when the first  indication arrived to it. DestinationQueue is a list (C++ class which have methods to insert/delete the indications to/from queue), can grow dynamically whose size is determined by  MaxDestinationQueueSize. DestinationQueueTable, a hash table that consists of all DestinationQueues . Handler name is used as key for the lookup of the DestinationQueue in the DestinationQueueTable.

First delivery failure of any indication will start a DispatcherThread which monitors all the DestinationQueues in the DestinationQueueTable and attempts DeliveryRetry according to the DeliveryRetryAttempts and DeliveryRetryInterval properties of CIM_IndicationService instance. When  new indication is added to the DestinationQueue and if DestinationQueue is full, indication at the front of the DestinationQueue will be deleted and new indication is added at the back of the DestinationQueue.

If there are 'n' DestinationQueues and if DeliveryRetry was successful for particular DestinationQueue, DispatcherThread will not continue to deliver all the indications from the same DestinationQueue  instead it continues iterating all DestinationQueues trying to deliver the indications and comeback to successful DestinationQueue. This is iterative approach and allows all the DestinationQueues to get the same priority when attempting DeliveryRetry. This approach also solves the problem where consumer is too slow to receive the indications continuously without any time delay. If there are many Destination queues and if suddenly network comes up this won't cause spike of activity.

DispatcherThread once started actually sleeps for DISPATCHER_THREAD_WAIT_TIME before monitoring the queues. Each indication will have  lastDeliveryTime which is used to find out whether the the DeliveryRetryInterval  time has expired for the indication. DispatcherThread acquires the ReadLock on DestinationQueueTable, removes one indication from each DestinationQueue in DestinationQueueTable whose DeliveryRetryInterval expired, enqueues them into the DeliveryQueue and starts worker threads in DeliveryThreadPool. Max delivery worker threads are 5 which is configurable during build time. DispatcherThread will exit if there are no indications in the DestinationQueueTable.

Server behavior from a client/listener perspective

DSP1054 ver 1.1.0 sec 7.10.3 defines the WBEM Listener requirements. WBEMListener can guess missed/out-of-oder/duplicate indications using SequenceIdentifier and its lifetime. WBEM Listener support  is not part of this PEP.

MaxDestinationQueueSize vs DSP1054 ver 1.1.0 spec conformance

An indication may be deleted when DestinationQueue is full (size exceeds MaxDestinationQueueSize ), even if the DeliveryRetryAttempts  property would otherwise indicate that further retries should be attempted. Is this violation of the DSP1054 spec ? NO. Note that DSP1054 has property CIM_IndicationService.DeliveryRetryInterval which defines minimal time interval in seconds for the indication service to wait before delivering an indication to a particular listener destination that previously failed. Maximum time is not defined and it can take longer due to other processing in CIMServer. Same condition applies here. From CIMListener perspective the indication is lost.

Note: It may not be very "reliable" from a provider perspective to drop indications just because  the DestinationQueue is full. Note that indication delivery retry priority is low and should not impact normal CIMServer operations.  MaxDestinationQueueSize  is introduced to limit the resource (memory, cpu etc...) consumption by the delivery retry functionality.

Indication Information

Indications are stored in the DestinationQueue in the form of IndicationInfo class. IndicationInfo class will have the following properties.

    CIMInstance indication; - Indication to be delivered
    CIMInstance subscription; - Subscription to which indication matched
    OperationContext context; - OperationContext for the delivery of indication.
    String nameSpace; - Namespace from where indication originated
    DestinationQueue *queue; - DestinationQueue pointer to which this indications belongs to, this is used to route the indications to appropriate queue.
    Uint16 deliveryRetyAttemptsMade; - Number of delivery retry attempts made.
    CIMDateTime arrivalTime; - Arrival time of the indication to the HandlerServcie. Used to calculate the sequence-identifier-lifetime.
    CIMDateTime lastDeliveryRetryTime; - last DeliveryRetry time for this indication.


New CIMMessages

CIMNotifySubscriptionNotActiveRequestMessage is sent to HandlerService from IndicationService when subscription is deleted or disabled. HandlerService deletes and logs all indications matching the subscription. As mentioned in the PEP earlier, All the discarded/deleted indications are traced under the Handler(TRC_IND_HANDLER) component with SequenceContext and SequenceNumber information.

class PEGASUS_COMMON_LINKAGE CIMNotifySubscriptionNotActiveResponseMessage
    : public CIMResponseMessage
{  
public:
    CIMNotifySubscriptionNotActiveResponseMessage(
        const String& messageId_,
        const CIMException& cimException_,
        const QueueIdStack& queueIds_)
    : CIMResponseMessage(CIM_NOTIFY_SUBSCRIPTION_NOT_ACTIVE_RESPONSE_MESSAGE,
        messageId_, cimException_, queueIds_)
    {
   }
};

CIMNotifyListenerNotActiveResponseMessage is sent to HandlerService from IndicationService when listener has been deleted. HandlerService deletes and logs all indications from DestinationQueue and finally deletes the queue itself.

class PEGASUS_COMMON_LINKAGE CIMNotifyListenerNotActiveResponseMessage
    : public CIMResponseMessage
{
public:
    CIMNotifyListenerNotActiveResponseMessage(
        const String& messageId_,
        const CIMException& cimException_,
        const QueueIdStack& queueIds_)
    : CIMResponseMessage(CIM_NOTIFY_LISTENER_NOT_ACTIVE_RESPONSE_MESSAGE,
       messageId_, cimException_, queueIds_)
    {
    }
};

DeliveryRetry statistics

The following new class is add to the root/Internal namespace for the DeliveryRetry statistics. This class is instrumented by the HandlerService, EnumerateInstances and EnumerateInstanceNames operations are supported. This feature would be useful in production/development/test environments. Provides useful DeliveryRetry statistics like which all destinations are unable to receive indications, indications dropped count etc... . 

// ===================================================================
// PG_ListenerDestinationQueue
// ===================================================================
[Version("2.11.0"), Description (
    "PG_ListenerDestinationQueue is a representation of queue maintained "
    "for the listener destination for indication delivery.")]

class PG_ListenerDestinationQueue
{
        [Key, Propagated ("CIM_ListenerDestination.Name"),
         Description ("Name of the listener destination.")]
    string ListenerDestinationName;

         [Description ("Listener destination queue creation time in microseconds since epoch.")]
    Uint64 CreationTime;

        [Key, Propagated ("CIM_Indication.SequenceContext"),
         Description ("SequenceContext of the listener destination.")]
    string SequenceContext;

        [Key, Propagated ("CIM_Indication.SequenceNumber"),
        Description ("Next available sequenceNumber for the listener "
             "destination.")]
    sint64 NextSequenceNumber;

        [Description ("The Sequence Identifier Lifetime in seconds.")]
    uint32 SequenceIdentifierLifetime;

        [Description ("The maximum number of indications that queue can hold.")]
    uint32 MaxQueueSize;

        [Description ("The number of indications in the queue at present.")]
    uint32 CurrentIndications;

        [Description ("The number of indications dropped because of the "
            "maximum indications that queue can hold were exceeded.")]
    uint64 QueueFullDroppedIndications;

        [Description ("The number of indications dropped because of the "
            "sequence-identifier-lifetime was expired.")]
    uint64 LifetimeExpiredIndications;

        [Description ("The number of indications dropped because of the "
            "DeliveryRetryAttempts were exceeded.")]
    uint64 RetryAttemptsExceededIndications;

        [Description ("The number of indications dropped because of the "
            "corresponding subscription has been disabled or deleted.")]
    uint64 SubscriptionNotActiveDroppedIndications;

         [Description ("Last indication successful delivery time in microseconds since epoch.")]
    Uint64 LastSucessfulDeliveryTime;
};

DestinationQueue is created when the indication is arrived to the  listener destination for the first time and remains active until listener has been deleted. DestinationQueues are not persisted across the CIMServer restarts. The average rate of indications arriving to the DestinationQueue can be known using the CreationTime and NextSequenceNumber properties. The instances of this class are built dynamically and there will be no major impact on footprint and server performance.

Testcases

Testcases will be added to test the functionality proposed in this PEP. For example
Create the subscription, don't start the Listener. Provider generates the 'n' indications. Now start the listener, listener should get 'n' indications generated by the provider before DeliveryRetryAttempts expired for the first indication. Tests are also added to test discarded indications.

Schedule

Available in 2.11

Future Work

  1. Verify if the indication delivery retry implementation is feasible for other type of handlers  like  SNMP, Email, Syslog and Consumer Providers.
  2. Implement CIM_IndicationService.SubscriptionRemovalAction.
  3. DynamicListener support for detection of lost/duplicate/out-of-order indications.

Discussion

Comments on version 0.1

(r_kumpf) What about this PEP is specific to CIM-XML? Why would the IndicationService care about the type of the listener destination?
(venkat_puvvada) There is no reason for not supporting the other Listener destination types. I am not sure how best we can match these parameters for Email and SNMP handlers, so i decided not include support for them at this stage.

(r_kumpf) Do indications continue to be retried for delivery after the associated CIM_IndicationSubscription and CIM_ListenerDestination instances are deleted?
(venkat_puvvada) No, indications will be discarded.

(r_kumpf) What is the rationale for persisting indications across cimserver restarts? When the cimserver is stopped, indications will cease to be generated. When it is restarted, the listener may receive stale indications and not receive more current ones for events that occurred while the cimserver was stopped. This could result in an administrator getting paged about a critical problem that was fixed months earlier.
(venkat_puvvada) The reason for persistence of indications is client may not want loose any indications. Client must be intelligent enough to discard out of date indications by looking at timestamp of delivered indication.

(r_kumpf) The traceFilePath is a poor choice for a directory to persist data needed for CIM Server operation. This directory is generally world writable.
(venkat_puvvada) yes, i agree, this needs to be discussed.

(r_kumpf) What are the contents and format of this file? How is compatibility protected on CIM Server upgrade?
(venkat_puvvada) The file will have Handler , subscription and Indication(with content language list added to indication instance) instances in XML form. Indications are saved for each subscription under for each listener destination. It will be compatible with CIMServer upgradation.

Comments on version 0.2

(r_kumpf) Doesn't the CIMHandleIndicationRequestMessage already contain all the information that is needed to deliver the indication? The only extra data the IndicationService should need to track is related to the retry algorithm. What am I missing?
(venkat_puvvada) CIMHandleIndicationRequestMessage does not have the following information.
subscriptionInstanceNames
providerName
pendingRetryCount
These are required to construct CIMProcessIndicationRequestMessage request again.

(r_kumpf)  How is it determined which exceptions indicate an indication delivery failure? For example, why does CannotCreateSocketException cause a retry but not bad_alloc?
(venkat_puvvada) Though its difficult to examine the CannotCreateSocketException , its possible that we can retry when socket() returns errno with ENOBUFS or ENOMEM means resources at TCP/IP layer/memory exhausted and can be retried later.

Comments on version 0.3

(k_schopmeyer) Nit. This is only one component in moving from 'sort of best effort' to reliable delivery. I suggest that this is simply improving the protocol so that deliveries can be accomplished in case of 'temporary' errors in the protocol and not really reliable delivery.
(venkat_puvvada) ok

(r_kumpf) Why is the HandlerRetryQueue logic in the IndicationService? Retrying delivery seems like it should be the IndicationHandlerService's job. I think it is more of a protocol-level thing than an indication processing thing.
(venkat_puvvada) Yes, i agree. Actually we have decided to discard the indications on the RetryQueue when matched subscription is removed/disabled. If we implement this in IndicationService, we can directly access the ActiveSubscriptionTable to see if subscription is active or not. This will have performance benefit. Keeping this implementation in the HandlerService requires to check in repository for for subscription validity or a message needs to be sent by IndicationService to HandlerService when subscription is removed/disabled.

(r_kumpf) What happens when a delivery retry fails? Is the indication put back on the queue? At the beginning or the end? What if the queue is full?
(venkat_puvvada) If DeliveryRetry fails indication is inserted at the front of the queue. When queue is full ,indication at the front of the queue will be removed and new indication is added at the back of the queue.

(r_kumpf) Is a new exception class the best way for a handler to communicate the delivery status? It might make sense to change the CIMHandler::handleIndication return type from void to a status value. Possible values could be Success, Error, and FatalError, for example. An interesting question here is what is the behavior when the handler throws an exception which is not DeliveryFailedException? Is it assumed that the delivery was sucecssful or permanently failed?
(venkat_puvvada) This is good idea. We can have possible values Success, Error, and FatalError.
Success - Delivery success
Error - Error, can be retried later
FatalError - Permanent failure, no retry.

If handler throws other than DeliveryFailedException, thats either permanent failure or post-delivery failure, we don't retry in those cases.

Comments on version 0.4

k_schopmeyer) Should we consider some maximum limit on the number of retry queues? This is just another possible memory protector.
(venkat_puvvada) I am ok with that, need to discuss this.

r_kumpf) I presume these test cases will get pretty interesting. Do you have thoughts about how they will work?
(venkat_puvvada) Create the subscription, don't start the Listener. Provider generates the 'n' indications. Now start the listener, listener should get 'n' indications generated by the provider when DeliveryRetryInterval expired.

Comments on version 0.5

(r_kumpf) Can you characterize the threading implications here? If each of the retries is done by the RetryThread, that would mean the DestinationQueueTable would potentially be locked for a long time. If the delivery retry fails, the IndicationHandlerService will need to put the indication back into the DestinationQueueTable. Will deadlock occur?
If a new thread is started for each delivery retry, that would cause a spike of activity on each interval, affecting the delivery of indications to listeners that have not experienced failures.
(venkat_puvvada) No deadlock will occur. It works in the following way.
1. Take the lock on the queue table.
1. Iterate through queue table, get one indication from each queue, store them in array.
2. Release lock on the table.
3. Send each indication in the array to HandlerService, using SendAsync() method.
4. If DeliveryRetry fails HandlerService puts the indication on to the queue.

(r_kumpf) Is the specification clear about the meaning of the DeliveryRetryAttempts value? It seems like it should be the number of delivery retry attempts made AFTER an initial failed delivery attempt. Karl volunteered to follow up with the DMTF on this item.

Comments on version 0.7

(r_kumpf) Shouldn't the lastRetryTime be tracked per indication rather than per queue?
(dmitry_mikulin) If lastRetryTime is per queue, how are you going to tell which indications are ready to be re-tried?
(venkat_puvvada) If we maintain the lastRetryTime for each indication, we can not deliver the indications in sequence. For example if there are many indications in the retry queue if we try to deliver the indications according to the indications lastRetryTime it is possible that we deliver latest indications in the queue.

(r_kumpf) This steps seems like it would unnecessarily delay the delivery of queued indications once the intermittent problem (network error, for example) is resolved.
(venkat_puvvada) RETRY_THREAD_WAIT_TIME value is configurable. This also prevents spike of activity when suddenly all clients/listeners comes up and also solves the problem where consumers are too slow to receive the inidcations.

(b_whiteley) I would prefer to see a solution where all handler types are supported. In addition to extending this functionality to the other handler types, I suspect this implementation would be cleaner.
(venkat_puvvada) Yes, this can be tried in the next stage of implementation.

(b_whiteley) I'm not very familiar with the current Indication Handler Service, so I apologize for the lack of specifics. As I read through this PEP, my gut feeling is that the approach proposed in this PEP will introduce a lot of problems and instability.
It doesn't seem right to have the Handler Service hand indications to other components that will ultimately hand the indications back to the Handler Service.
I would prefer a design that incorporates the following:
* Refactor the HandlerService itself to handle all of the delivery retry logic, rather than having a separate component reinsert indications into the HandlerService.
* Enhance the Handler interface so that delivery retry is applicable to all types of Handlers, not just CIM-XML.
* Design it in a way that is consistent with turning Handlers into Handler Providers at a later date, so that new handlers can be added just as instance providers are added today.

Comments on version 0.8

(k_schopmeyer) The trace is primarily a development tool. Should we not be logging something when we throw indications away. The original use of the discarded data was for 'abnormal' discards, those things that were probably due to pegasus problems. This is a normal event, queue-too-big, discard.
(venkat_puvvada) Yes, i agree. Discarded indications will be just logged.

(k_schopmeyer) Since we are now going to have a mechanism that uses memory to store data for possibly long periods of time, can cause log entries when indications are discarded, and also is going to ask the adminstrator to set config variables, I think we are going to have to have some tools so that the admin can figure out what is happening. Are there indications in retry, how many, how long, possibly which destinations, what are the high-water marks, etc. Without this type of information, the admin will not really understand when his server develops memory issues because of large numbers of retries in queue and will not have any real clue how to set the config variables.
(venkat_puvvada) Yes , i will add class like PG_IndicationDeliveryQueue, which will have properties like, name, size, creation time, last delivery time, number of indications discarded, number of indications successfully delivered. User can enumerate instances of PG_IndicationDeliveryQueue and check for number of delivery queues and their status.

(k_schopmeyer) At this point, we are getting to where we will have a number of different 'scheduled' thread mechanisms between a) provider unload, pull operaitons timer timers, etc. and I wonder if it is not time to define a simple scheduler instead of everybody doing their own thread,wait mechanism. This should not be too difficult, one thread to run the scheduler and an api to enter new timed events in the scheduler.
(venkat_puvvada) With the proposed solution DispatcherThread is only created when there are delivery failures and thread automatically terminates when there are no indications to be delivered. Having a scheduler is nice idea, we can definitely have it proposed and discussed in a separate PEP.

Comments on version 1.0

(k_schopmeyer) While this PEP covers only CIM/XCML, I am having problems determinings 1) why, 2) what is the common part so we know what has to be added to other handers to make them 'reliable'.
(venkat_puvvada) we need to find the feasiblity of implementing the indication delivery retry for other handlers. While this implementation does not prohibit extending delivery retry to other handlers and it would be a future enhancement

(k_schopmeyer) Since this is defined as a statistics class should we not start keeping some more statistics that will give the admin infomration on how much this queue is being used. Currently the statistical information is about overruns effectively but how about things like a high-water-mark (i.e. the highest point the queue reached), average, etc.
(venkat_puvvada) The average rate of indications arriving to the DestinationQueue can be known using the CreationTime and NextSequenceNumber properties. No mechanism provided in this implementation to find out
the highest point in size the queue reached. To some extent it can known using the QueueFullDroppedIndications property if it reached the maximum queue size.



Copyright (c) 2006 Hewlett-Packard Development Company, L.P.; IBM Corp.;
EMC Corporation; Symantec Corporation; The Open Group.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to
deal in the Software without restriction, including without limitation the
rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
sell copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

THE ABOVE COPYRIGHT NOTICE AND THIS PERMISSION NOTICE SHALL BE INCLUDED IN
ALL COPIES OR SUBSTANTIAL PORTIONS OF THE SOFTWARE. THE SOFTWARE IS PROVIDED
"AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT
LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


 

Template last modified: March 26th 2006 by Martin Kirk
Template version: 1.11