PEP

Pegasus Enhancement Proposal (PEP)

PEP #: 224

Title: OpenPegasus Test Architecture

Version: 1.5

Created: 3 March 2005

Authors: Ed Boden, Dave Sudlik

Status: approved

PEP type: Concept

Version History:

Version	Date	Author	Change Description
1.0	11 March 2005	Ed Boden	Initial Submission
1.1	22 March 2005	Ed Boden	1st discussion updates; changes ~~are~~ <were> in blue
1.2	1 April 2005	Ed Boden	2nd round of comments & responses. Changes are in blue.
1.3	18 April 2005	Ed Boden	3rd round of updates, changes in blue; added abbreviations for each test categories, supplied more specific dir lists for functional testing coverage, fixed a few typos.
1.4	22 April 2005	Ed Boden	Removed recommendation of directories for improved code coverage and added basic amount-of-code per directory information.
1.5	30 April 2005	Ed Boden	Edits for submittal to 'approved'; typos, small change, lines-of-code table updated.

Abstract: Describes the overall approach to testing OpenPegasus, test automation, and makes recommendations for improvements and follow-on PEPs.

Definition of the Problem

Various types of tests exist, and are regularly executed for OpenPegasus. Some are more well defined (and documented) than others. How these various kinds of tests are related to OpenPegasus, and each other, and to existing make targets that are test related, is not clearly articulated. In addition, the expectations for new testing that should accompany new functional contributions to OpenPegasus needs to be documented.

Proposed Solution

That OpenPegasus testing be organized into the following categories or types. This section describes each type. Later sections describe the existing (current) status of OpenPegasus with respect to each test category. And following that, recommendations for testing improvements are made, for each type of testing.

First, some general comments regarding OpenPegasus testing;

(1) The scope of OpenPegasus testing is automated testing that is included in OpenPegasus CVS, which is routinely executed on the two OpenPegasus reference platforms Linux & Windows. In addition, the platform advocates are encouraged to execute the automated 'daily build & tests' on their respective platform, and report the results (http://cvs.opengroup.org/cgi-bin/pegasus-build- status.cgi). Frequency will vary by platform, but where practical, platform advocates are encouraged to report test results on a nightly basis.

(2) A 'test case' is an individual test within a test category. The specific scope and meaning of a test case failure means different things, within each test category. Some test cases are very specific, and narrow in scope of code tested (particularly in the unit or function test categories), whereas in the performance test category, a test case generally executes large portions of the CIM server and client code base, including some number of providers. Test cases should be uniquely named, using the category name, and numbered within category, e.g. 'functiontest0057', 'stresstest0008'.

(3) Testing of code not in OpenPegasus CVS is outside the scope of this architecture.

(4) A single source, top-level document should be developed and kept up-to-date that explains all existing (at a given point in time) tests that can be executed (at least at the 'make target' level, or test category level), and how to execute the tests. For purposes of reference in the current PEP, this new to-be-written PEP is referred to as the 'OpenPegasus Test Execution' PEP. This should be a release PEP.

(5) The OpenPegasus Test Execution PEP should include a definition of what tests are to be executed for "daily build & test". This definition should align with the test categories, and the resulting (i.e. new & reorganized) make test targets and be selected from the among all tests defined the OpenPegasus test architecture. The PEP will also need to specify the appropriate build options.

Definition of the OpenPegasus test categories

unit test -- basic testing of individual classes or smaller components of OpenPegasus
function test -- testing of larger scope functions, typically end-to-end (client, thru CIM server, provider then back). This category includes the PEP#167 command line utilities (e.g. cimprovider, comconfig, cimserver, ...).
performance test -- testing to determine how a given release of OpenPegasus or snapshot of CVS compares to an earlier release (or snapshot), for certain usage scenarios. This category includes all important resource utilization testing; time, memory, disk space.
stress test -- testing designed to load the CIM server to achieve various possible execution limits, possibly to crash the server, generate excessive logs, etc. One way to think about this kind of testing is that is it intended to see how fast, or whether, we can get the CIM server to crash, or hang.
continuous operation test -- testing designed to verify the stability of the CIM server, and help detect memory leaks. Clearly this type of testing relates to stress testing, but is aimed at extended duration in what is estimated to be a normal ('business-as-usual") operational environment with normal operational load.
compliance test -- tests to verify that a given OpenPegasus release complies with some externally defined capabilities.
backward compatibility tests -- tests to verify that a new release of OpenPegasus maintains backward compatibility with previous releases of OpenPegasus. For example a 2.4 OpenPegasus client should continue to function with a 2.5 OpenPegasus CIM server.
interoperability test -- tests to verify that a given OpenPegasus release successfully interoperates with other CIM implementations, e.g. non-OpenPegasus clients work with the OpenPegasus CIM server, OpenPegasus clients work with other CIM implementations, etc.. Includes provider interoperability for appropriate provider interfaces.

Existing OpenPegasus testing status, for each test category

unit test -- executed normally as part of daily OpenPegasus build and test cycle on multiple platforms, via make tests. (The PEP template should include a test section that can cover unit & function testing.)
function test -- executed normally as part of daily OpenPegasus build and test cycle on multiple platforms, via make -poststarttests. The measure of how good the testing is in this category is based on subjective evaluation of how many functional areas are tested, and quantitatively using tools which measure how much of OpenPegasus execution paths are executed. The current testing is roughly in the 60-70% range for functions and 50-60% for execution paths.
performance test -- currently not well-defined for OpenPegasus, with no generally agreed-to definition, no overall automation. There is a 'benchmark' provider and client (see pegasus/src/Clients/benchmarkTest/Makefile) which is occasionally used by members of the OpenPegasus community. Some documentation exists in these directories.
stress test -- currently not well-defined for OpenPegasus, with no automation.
continuous operation test -- currently not well-defined for OpenPegasus. Sort of; 4 or 5 clients each looping thru a series of getInstance and enumerateNames operations, with goal of 24 hours without interruption.
compliance test -- currently only known example is testing for SNIA compliance, via CTP.
backward compatibility tests -- these are done informally by members of the OpenPegasus community.
interoperability test -- currently not done within OpenPegasus. Some interoperability testing is done with OpenPegasus as part of SNIA plugfests.

General Recommendations for each test category

An overall goal for all OpenPegasus is to automate the testing as much as possible. Ideally, each test category should be able to execute independently of the other categories. In addition, a goal of the test automation should be, to the extent possible, allow testing to proceed if an error is encountered.

unit test -- goal is fully automated. Recommendations are a) ensure additional testing as new function is added to OpenPegasus, b) unit tests should be fully automated (as should all tests). Abbreviation 'unitT'.
function test -- improve test coverage to >90% (function & decision) for OpenPegasus. (See table below on amount of code per key directories.) For purposes of measuring test code coverage the following type of code should be excluded; test code, all sample or example code (e.g. providers), utilities and java. In addition to using code coverage measures to determine function test quality, discussions have concluded that the external interface (see PEP 189 for 2.4) would benefit from a test focus. It is recommended that an analysis of the entire external interface be done to determine how complete is the testing, at the interface level. This is the subject of a follow-on function PEP. Automated comparisons of results from previous executions of function test can be considered, to detect undesirable trends. Abbreviation 'funcT'.
performance test -- should be defined & automated (see below for more detail). Abbreviation 'perfT'.
stress test -- should be defined & automated (see below for more details). Abbreviation 'stressT'.
continuous operation test -- should be defined, documented (possibly in the OpenPegasus Test Execution PEP), and made usable via make test target(s) defined within the framework of these test categories. Abbreviation 'continT'.
compliance test -- include in OpenPegasus the necessary testing automation for SNIA CTP. Abbreviation 'compliT'.
backward compatibility tests -- should be defined in a follow-on PEP & automated appropriately. Abbreviation 'compatT'.
interoperability test -- this should be the subject a future test enhancement function PEP, at a lower priority than the other categories. Abbreviation 'interopT'.

Additional specific recommendations for Performance testing

A set of basic usage scenarios should be defined and measured regularly, at least per OpenPegasus release (major and minor). This should be as automated as possible for the OpenPegasus reference platforms.

Performance testing should be based on elapsed time as the basic unit of measure, and should assume that the CIM server is running on a system (OS image) dedicated to CIM server (for the duration of the performance testing).

The basic suggested performance test scenarios are;

basic performance: with 1 instance & 1 client measure getInstance, enumerateNames, associatorNames
multiples performance: for {10, 100} instances and {1,2 4} clients measure getInstance, enumerateNames, associatorNames
indication performance: [tbd]
benchmark Provider & Client: 'standard' or default versions of this test should be documented, automated and included within the overall OpenPegasus Test Execution PEP, with appropriate top-level make test target(s).

Basic memory and disk utilization should be routinely measured as part of this test category, for selected usage scenarios.

Full OpenPegasus performance testing should be proposed in a follow-on functional PEP.

MAKE targets for running various test (aka 'test targets')

A few additional thoughts for the follow-on PEP addressing changes to make test targets;

Test target recommendation 1: when an overall set of OpenPegasus test categories is agreed-to, as a follow-on (i.e. function PEP level work) the make targets should be reorganized to clearly and directly align with the OpenPegasus test categories.

Test target recommendation 2: As part of follow-on, the top-level OpenPegasus test execution PEP should document (among other things suggested here) all the make targets, there relationships and their recommended use. If the make target test reorg PEP exists, refer to that.

Test target recommendation 3: A goal of this reorganization should also be that the build of all tests be separate from the build of OpenPegasus itself. This PEP should recommend details on how this will be accomplished.

Test execution recommendation 4: As new test automation is introduced, it should clearly map to the OpenPegasus test categories, and this should be documented in the [to-be-written] OpenPegasus Test Execution PEP.

How other terminology relates to this test architecture

'Regression testing' -- another term for the function test category. Alternatively, might be defined as a subset of test cases in 1 or more of the test categories that is regularly executed to help ensure that an updated code base has not 'regressed' previously working function. In this later interpretation, regression testing is synonymous with testing executed as part of the daily build & test. Because of the widely varying interpretations, this term is not defined for OpenPegasus.

'Acceptance testing' -- for a given release, might be specifically defined set of criteria, measurements, test cases, across the test categories. Could be termed 'release testing'. Not defined for OpenPegasus.

'Customer testing' -- not defined for OpenPegasus.

'Verification testing' (and its variations, 'function verification testing', 'system verification testing') -- same as our function test category. Not defined for OpenPegasus.

'Integrated testing' -- assumed to be covered by our function test category. Not defined for OpenPegasus.

'Component testing' -- assuming a component is bigger than a unit (as in 'unit testing') and smaller than the whole thing, this sort of testing is subsumed by the function test category. Not defined for OpenPegasus.

'Installation testing' -- done for Linux via RPM's from OpenPegasus CVS.

Rationale

Improve the common understanding of OpenPegasus testing, set expectations for future OpenPegasus releases, establish a structure for testing improvements, and generally improve the quality of OpenPegasus testing.

Schedule

Once approved, this concept PEP provides the framework for improved testing of OpenPegasus. It is expected to first apply during the OpenPegasus 2.6 release cycle.

List of proposed follow-on PEP's

(working title:) OpenPegasus Test Execution -- top-level document that describes automated testing available for a given release of OpenPegasus, referencing lower-level documents, make test targets, other test targets, as appropriate. How to execute all of these. Probably needs a refresh per OpenPegasus release, hence seems like this could reasonably be a release PEP (also, this test exec pep does not itself propose any changes to CVS).
(working title:) Reorganize 'make' Test Targets -- A function PEP to Reorganize or align make test targets.
(working title:) OpenPegasus Conventions for Testing & Test Automation -- A function PEP that explains our conventions and preferences for test automation, how and and where to add new test cases, etc. For example, across all the test categories, a failing test case should be 'reported' in the same, conventional way (e.g. "------- test 123 failed"). Do we assume that absence of a fail msg for any given test case means the test succeeded? Conventions for naming test cases with categories, conventions for non-make based automation, etc., etc. (Another thought; use TestMakefile, not Makefile to drive tests. And, use PEGASUS_TEST_VERBOSE env variable to control output.) Also these ideas; a) minimal output on success, b) that the output of tests should always be somewhere below PEGASUS_HOME and should never be in PEGASUS_ROOT.
(working title:) Testing Improvements for OpenPegasus External Interfaces -- A function PEP on recommendations to improve testing of OpenPegasus external interfaces.
(working title:) OpenPegasus Performance Testing -- A function PEP that details OpenPegasus performance testing.
(working title:) OpenPegasus Stress Testing -- A function PEP that details OpenPegasus stress testing.
(working title:) OpenPegasus Continuous Operation Testing -- A function PEP that details OpenPegasus continuous operations testing.
(working title:) OpenPegasus Compliance Testing -- A function PEP that details OpenPegasus compliance testing.
(working title:) OpenPegasus Backwards Compatibility Testing -- A function PEP that details OpenPegasus backward compatibility testing.
(working title:) OpenPegasus Interoperability Testing -- A future (beyond release 2.6 timeframe) function PEP should define and establish interoperability testing.

The order of the above list is not a recommendation as to order in which the PEPs should be done; the numbers are for ease of reference. With this exception; the PEPs 1-3 above seem more basic and key than the others, hence it is recommended that 1-3 be addressed sooner.

Discussion

Item 1: How does 'regression' testing fit in this test architecture? Is it another category of testing, a subcategory, or is it a part of the OpenPegasus Test Execution PEP's definition of 'daily build & test'? Do we even need a 'regression' test? The answer depends on how it is defined. Since it is such a common term, at the least, this PEP or OpenPegasus Test Execution PEP should describe how it fits with the OpenPegasus test architecture.

Here is a description of 'regression testing' from wikipedia <http://c2.com/cgi/wiki?RegressionTesting>, that may help the discussion;

"RegressionTesting is testing that a program has not regressed: that is, in a commonly used sense, that the functionality that was working yesterday is still working today.

(Historical note:)

This common use of the term "regressed" has drifted from the original strict meaning that the program behavior has backslid to some earlier, "known bad" state. In the dim mists of test automation antiquity, each regression test case would have been created in response to a known (and putatively fixed) bug.

So "Did the product regress?" was a terse way of asking "Did we re-inject any of those old 'fixed' problems somehow? Did a masking effect go away, revealing old buggy behavior for a new reason?"

It is meaningful to distinguish such test cases from ConformanceTest? cases, which would be created from looking at the spec or other expectations about the product, rather than in response to found bugs.

But in common parlance and practice, the routine testing of both gets called Regression Testing."

Item 2: There is, apparently, a kind of software testing which operates via repeated application of these steps; 1) a (relatively) large number of test cases is generated for a given API, or command, or function, or usage scenario, or such. These may be generated at random. (Note that no effort is made to determine whether or not any of these tests actually exercise unique, or new, or any other category of the code being tested.); 2) these test cases are saved in some suitable form for automation; 3) these tests are executed and all the outputs, results, appropriate logs, etc, etc.. are saved; 4) the saved results of all kinds are compared with a previous execution of these same tests and all their respective results; 5) all the differences between previous results and latest results are 'reported'; 6) these reported differences are used for some purpose.

Recommendation for this [to-be-named] kind of testing: OpenPegasus should postpone this kind of testing until all reasonable and agreed-to objectives for all the above categories of testing are achieved.

Where the code resides

(This table might be useful to help decide where to add tests so that testing code coverage is improved.)

	#cpp,h
OpenPegasus 2.5 (late April 2005)	files		loc
-------------------------------------------------------------------	--------		---------
CQL	57	6.6%	9816	7.1%
Client	24	2.8%	4674	3.4%
Common	317	36.6%	47119	34.3%
Compiler	30	3.5%	5828	4.2%
Config	47	5.4%	3260	2.4%
ControlProviders/ConfigSettingProvider	3	0.3%	507	0.4%
ControlProviders/InteropProviders	7	0.8%	1477	1.1%
ControlProviders/NamespaceProvider	3	0.3%	461	0.3%
ControlProviders/ProviderRegistrationProvider	3	0.3%	1067	0.8%
ControlProviders/Statistic	3	0.3%	208	0.2%
ControlProviders/UserAuthProvider	3	0.3%	653	0.5%
DynListener	17	2.0%	1762	1.3%
ExportClient	7	0.8%	721	0.5%
ExportServer	8	0.9%	762	0.6%
getoopt	3	0.3%	481	0.3%
HandleService	5	0.6%	365	0.3%
IndicationService	11	1.3%	6564	4.8%
Listener	7	0.8%	563	0.4%
ManagedClient	7	0.8%	840	0.6%
Provider	27	3.1%	3347	2.4%
ProviderManager	0	0.0%	0	0.0%
ProviderManager2	31	3.6%	3499	2.5%
ProviderManager2/CMPI	57	6.6%	7335	5.3%
ProviderManager2/CMPIR	21	2.4%	2804	2.0%
ProviderManager2/CMPIR/native	19	2.2%	2524	1.8%
ProviderManager2/Default	12	1.4%	2943	2.1%
ProviderManager2/JMPI	12	1.4%	5044	3.7%
Query/QueryCommon	19	2.2%	970	0.7%
Repository	21	2.4%	5205	3.8%
Security/Authentication	20	2.3%	1320	1.0%
Security/UserManager	10	1.2%	994	0.7%
Server	30	3.5%	11569	8.4%
WQL	24	2.8%	2834	2.1%

	865	100.0%	137516	100.0%

Copyright (c) 2005 EMC Corporation; Hewlett-Packard Development Company, L.P.; IBM Corp.; The Open Group; VERITAS Software Corporation

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

THE ABOVE COPYRIGHT NOTICE AND THIS PERMISSION NOTICE SHALL BE INCLUDED IN ALL COPIES OR SUBSTANTIAL PORTIONS OF THE SOFTWARE. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Template last modified: March 9th 2004 by Martin Kirk
Template version: 1.8