Pegasus Enhancement Proposal (PEP)

PEP #: 316

PEP Type: Functional

Title: In-Memory Tracing

Version: 1.1

Created: 14th April 2008

Authors:  Marek Szermutzky, Thilo Boehm

Status: Approved

 

Version History:

Version Date Author Change Description
0.1 13 Feb 2008
 Marek Szermutzky
Draft to gather ideas, needs and requirements
0.2 20 Feb 2008 Thilo Boehm
 Updates based on first discussion
0.3
19 Mar 2008
Thilo Boehm
Add implementation details
1.0
14 Apr 2008
Thilo Boehm
Final version
1.1
22.Apr. 2008
Thilo Boehm
Updated  Version

 


Abstract: Enablement of the CIM Server to keep traces in-memory available on-the-fly


Definition of the Problem

Today when an error raises with a CIM Server in production, diagnosing the problem is difficult.
Tracing needs to be explicitly enabled and the error needs to be reproduced. In the case of intermittent errors this becomes hard as tracing to hard disk itself changes the timing, sometimes reproducing the error even becomes impossible.

Proposed Solution

We propose to keep trace information on-the-fly available in memory. One major objective is not to introduce a major impact on performance. Meaning both memory usage and instruction path length have to stay at a reasonable value.
To keep the trace below a certain amount of available memory, the trace will have to be write in a wrap-around manner.
The solution will be implemented in at least two stages:

Stage 1 - cleanup and efficiency optimization of existing trace statements in the source code

In stage 1 the currently existing trace statements will be checked for their performance. This probably means that the inefficient PEG_TRACE_STRING macro will disappear completely and all current occurrences of it be replaced with the faster PEG_TRACE macro.
At the same time tracing code that generate objects, which in the end do not get traced, will be incorporated into the trace macros.
In PEP#315 the usage of trace levels will be cleaned up.

Stage 2 - Implement In-Memory trace

The Tracer will be extended to route the trace messages into memory in addition to routing them into a file.

Writing trace messages

The trace messages are selected by the configuration option traceComponentes and traceLevel. It keeps the semantic the same as for writing the traces to a different locations (e.g. Disk) than memory. It also gives you the capability to change the destination dynamically and you still geting the same trace content.

The memory buffer for the trace is allocated in one chunk with the configured size. At the top of the memory block the eye-catcher "PEGASUSMEMTRACE" is written to be able to locate the trace in a memory dump.

The traces are written as full formated strings with trailing CR or CR/LF, dependent on the platform, into the Buffer. The buffer is organized as a wraparound buffer. If a message does not fit into the buffer, the buffer is filled up with the message until it is full, then the remaining message is written to the beginning of the trace buffer. The advantage of writing fully formated strings with CR/LF into the buffer is, that the buffer can be read by any editor or tool. The last written message gets additionally the chars "EOTRACE" after CR/LF to indicate the end of current trace. If "EOTRACE" spans the wrap-around point, the left chars are filled up with spaces and "EOTRACE" is written to the beginning of the trace buffer. THE "EOTRACE" flag will be overwritten by the next trace message, which itself is the terminated with the "EOTRACE" flag again. To clean up the wrapping, you can search for "EOTRACE", from there cut the buffer until the end, and paste it to the beginning. Then you have a fully readable trace.

If the allocation of the memory buffer fails in the main program or a out-of-process agents, tracing is switched off, a message is written to the Logging facility.

Configuration Options

traceFacility=Memory

To route the trace messages into the memory buffer the destination of the trace messages has to be set by the configuration option traceFacility to the new facility Memory. This configuration option is dynamic. If the trace facility is dynamically set from Memory to any other location, the memory buffer is flushed using the function Tracer::dumpTrace() and the memory buffer is deallocated. If the trace facility is dynamically set from any other facility to Memory, a new  memory buffer is allocated. If the allocation of the memory buffer fails in the main program or a out-of-process agents, tracing is switched off, a message is written to the Logging facility.

traceMemoryBufferKbytes=<size of in-memory buffer in kB>

The size of the of the memory trace buffer is configured by the configuration option traceMemoryBufferKbytes. The value is the allocated memory size in kbytes ( 1024 bytes ). The default value is 10240 kB. The lowest value is 16 kB. The default value and the lowest value depends of the platform requirements and may differ from the default. It can not be set to 0 kB. 
This configuration option is ignored, if traceFacility is not set to Memory.
The configuration option traceMemoryBufferKbytes is static. It only can be set at server startup

Dumping the memory trace

function Tracer::flushTrace()
There will be a function Tracer::flushTrace() to flush the trace.
The implementation for the in-memory trace is writing the buffer to the file defined by the configuration property traceFilePath. Existing security checks are preserved when the file is written.
If the traceFilePath is not set, the buffer is not flushed.
Also, if traceFilePath is not writeable or another error occurs, the buffer is not flushed.


To allow other dump methods to flush the in-memory buffer, the function for the in-memory trace handler will be exchangeable.
For other trace handlers this function does not do anything.
PEGASUS_ASSERT
The PEGASUS_ASSERT is using Tracer::flushTrace() to write the in-memory trace if the test fails.

Out-Of-Process consideration

The Out-Of-Process agents are using their own memory trace buffer and the function Tracer::flushTrace() is used to flush the buffer.
The  memory trace buffer size will be the fraction of 16 ( 1/16 ) of the configured trace buffer size (traceMemoryBufferKbytes).

This fraction depends of the platform requirements and may differ from the default.

Implementation details

A new Pegasus::TraceMemoryHandler class is manging the memory buffer, writing the traces and dumping the buffer.
When switching the traceFacility to Memory the Pegasus::Trace is using a new implemented Pegasus::TraceMemoryHandler class to handle the trace records.
It has the same interface as the Pegasus::TraceFileHandler but extents it with two additional methods:

void setMemoryBufferSize(Uint32 sizeInKbyte);
virtual void dumpMemoryTrace();

These functions are also added at the Pegasus::TraceFileHandler class but with no operations.

The Pegasus::TraceMemoryHandler implementation of the dumpMemoryTrace() method is declared virtual to be exchanged by an implementation other than the initial implementation.
The initial implementation dumps the memory traces to the traceFile.

The Pegasus::Trace class interface is extended with the method Tracer::flushTrace(). This method triggers the dumpMemoryTrace() to flush the memory trace buffer.

Testing

The OpenPegasus unit tests must be modified and extended to test the trace buffer itself and the integration into the Trace.

Update for PEP #308: Recommended Configuration Options

traceMemoryBufferKbytes

      Description: Defines the size of the buffer for in memory tracing in kbytes ( 210 bytes ).  Smallest value is 16.
      Default Value: 10240
      Recommended Default Value (Development Build): 10240
      Recommended Default Value (Release Build): 10240
      Recommend To Be Fixed/Hidden (Development Build): No/No
      Recommend To Be Fixed/Hidden (Release Build): No/No
      Dynamic?: No
      Considerations: Is ignored if configuration property traceFacility is not set to Memory.
      Source Configuration File: Pegasus/Config/TracePropertyOwner.cpp

traceFacility

      Description: Defines the desired facility for writing trace messages. Valid values include: File,Log,Memory
      Default Value: File
      Recommended Default Value (Development Build): File
      Recommended Default Value (Release Build): File
      Recommend To Be Fixed/Hidden (Development Build): No/No
      Recommend To Be Fixed/Hidden (Release Build): No/Yes
      Dynamic?: Yes
      Case sensitive?: No
      Considerations:
      Source Configuration File: Pegasus/Config/TracePropertyOwner.cpp

Input for Release 2.8 Notes

Please extend the input for the release notes 2.8 from PEP# 315:


 traceFacility 
Description
Memory
The trace messages are written to a wrap-around memory buffer.  The size of the memory buffer is specified by the configuration option traceMemoryBufferKbytes.

If the traceFacility is set to "File" the behavior is the same as before.
If the traceFacility is set to "Log" the trace messages are written to the OpenPegasus Logger to combine the log message stream with the trace message stream. If a trace message is longer than the Logger can handle, the message is written in chunks.
If the traceFacility is set to  "Memory" the trace messages are collected in a wrap-around memory buffer. If this memory is part of a dump the trace messages can be found by the eye-catcher "PEGASUSMEMTRACE" at the top of the memory buffer. The trace is in clear text and the last written message has the suffix "EOTRACE".

The configuration of the size of the memory trace buffer is done by the new configuration property:

traceMemoryBufferKbytes=<size of in-memory buffer in kB>

The size of the memory trace buffer is configured by the configuration option traceMemoryBufferKbytes. The value is the allocated memory size in kbytes ( 1024 bytes ). The default value is 10240 kB. The lowest value is 16 kB. It can not be set to 0 kB. This configuration option is ignored, if traceFacility is not set to Memory.
The configuration option traceMemoryBufferKbytes is static. It only can be set at server startup

Rationale

The decision to trace into memory was made as writing to a file on hard disk does bode a severe performance penalty. In addition the exact amount of information that is traced to memory has to be kept to the minimum needed to track down an error that occurred at runtime.
The advantage of the in-memory trace is that on a crash on many system the core dump will now also contain the last trace entries, making clearer how the error was caused and what led to a specific state.

Schedule

Ideas for future work

  1. develop a tool to grab the current trace information available in the memory buffer from the CIM server.

Discussion

1. Do we need/want new trace macros for in-memory traces ?
Decision: No. It is more flexible to define the trace content by subcomponents & traceLevel.

2. If in-memory trace and trace to disk share the same facility, should in-memory trace be restricted to certain components and do those have to be configurable ?
Decision: No. It is more flexible to define the trace content by traceComponets & traceLevel. This provides the flexibility what should be traced.

3. Which trace components do we deem worth to be traced into memory ? (DiscardedData, XmlIO?, Http?, ProviderManager?, ...?)
Decision: There will be no predefined trace settings. Look at section "Writing trace messages".

Version 1.0 -> 1.1

[ (r_kumpf) What is the behavior when allocation of the trace memory buffer fails in one of the processes (cimserver or cimprovagt)?
(thilo_boehm) If the allocation of the buffer fails, the trace facility is not switched.¤73]
[(r_kumpf) Regarding the comment above, do you mean that a buffer allocation in a cimprovagt process is communicated back to the cimserver and the traceFacility is then restored to the previous value in all processes? Or is the traceFacility set inconsistently across the processes?
(thilo_boehm) After rethinking my comment <73>, I come to the conclusion that it is not the feasible to switch back to the previous log facility due to the missing function to send messages from an OOP client to the cimserver to be able to switch back the whole setup to the previous trace facility, Instead of swtiching back, I would use a behavior similar to writing traces to files on a full file system. If the memory can not be allocated, tracing is switched off for the OOP agent or the cimserver itself and a message is send to the Logging facility.¤74]
Integrated into design

[ (r_kumpf) Also, if traceFilePath is not writeable or another error occurs, the buffer is not flushed. Remember that the traceFilePath is usually in a world writeable directory (to account for OOP providers), so the existing security checks need to be preserved when the file is written.¤85]
Integated into design.

[ (r_kumpf) Should this be named traceMemoryBufferKilobytes? ¤113]
Will be renamed into
traceMemoryBufferKbytes. On Marek's objection Kilo was substiuted with 'K' only.  'Kilo' suggests a unit of 1000 Bytes but the 1024 is the multiplier.

Copyright (c) 2006 Hewlett-Packard Development Company, L.P.; IBM Corp.;
EMC Corporation; Symantec Corporation; The Open Group.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to
deal in the Software without restriction, including without limitation the
rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
sell copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

THE ABOVE COPYRIGHT NOTICE AND THIS PERMISSION NOTICE SHALL BE INCLUDED IN
ALL COPIES OR SUBSTANTIAL PORTIONS OF THE SOFTWARE. THE SOFTWARE IS PROVIDED
"AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT
LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


 

Template last modified: March 26th 2006 by Martin Kirk
Template version: 1.11