Pegasus Enhancement Proposal (PEP)

PEP #: 223

Title: Concept PEP: Pegasus Security Implementation Guidelines

Disclaimer:

THE SOFTWARE IS PROVIDED  "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Version History:


Version Date Status Authors Change Description
1.0 16 Feb 2005 draft Robert Fritz, Keith Buck Initial Submission
1.1 4 Mar 2005 draft Robert Fritz Made 1st-round changes, highlighted in yellow, from comments.
1.2 27 Mar 2005 draft Robert Fritz Made Changes , highlighted in yellow, from comments.
1.3 5 Apr 2005 draft Robert Fritz Removed SSL Portion, replaced with reference to upcoming SSL guide, and added reference and definitions section.  Added disclaimer (not yet reviewed by legal).
1.4 8 Apr 2005 draft Robert Fritz Corrected and clarified SSL and network behavior.
1.5 8 Apr 2005 draft Robert Fritz Incorporated Feedback about support and responsibility concerns
1.6 8 Apr 2005 draft Robert Fritz Removed "Not" in provider section, per comment
1.7/1.8 8 Apr 2005 approved Robert Fritz Removed Highlighting for final archive.

 


Abstract: To minimize / reduce the possibility of security defects (defined as deviations between published and implemented authorization) when executing code at elevated privilege, this PEP specifies generally accepted secure coding practices for all of Pegasus, and some necessary coding practices specific to the needs of the providers.  These practices are split into two categories: bugs and best-practices.  Like all bugs, the bugs specified here should always be avoided.  The best practices are good guidelines, but there may be cases when a valid and secure implementation may not follow them.  Note this guidance document may not be all-inclusive, nor that partners that comply with it will be security defect free.  It is simply a concrete way of summarizing many security best-practices.  It is also not meant to imply a partner support obligation.


Proposed Required Processes Changes:

These criteria will be brought into the developers' peer review and other processes, as applicable, and some of the checks may be automated with the aid of Flawfinder. (link: http://www.dwheeler.com/flawfinder/)

Proposed Externally Visible Changes:

There are no external interface changes required by this PEP.


Problem Statement

Open Pegasus has a challenging role. It provides a portal for users and programs to access a wide variety of information on a system. Pegasus is responsible for user authentication and provides a framework for the provider authors to authorize specific read and write operations on the server. Managing the resulting "trust delta" (the difference between what the provider or CIM Server could do in its current execution context, vs. what a given user is authorized to do) is hard. The bigger the trust delta, the greater the incentive to "break in" past authorizations in providers and the Pegasus "server" super-user/administrator execution context. Though PEP197 does provide a way to lower the risk to a given provider that takes advantage of the run-as-requestor context, there are still risks for the providers that decide to run as the privileged user, and the CIM Server itself in those portions of the CIM Server that run at elevated privilege (defined as when the execution context has more permissions/abilities than the authorized users... hard to avoid when the execution is not as the authorized user).

Since so much of Pegasus runs on many platforms at elevated privilege, and since coding problems like unchecked buffers, race conditions, are the largest sources of security defects in software today, we need to find a way to either reign in the coding problems that are occurring in the code, or find ways to architecturally limit which code runs at elevated privilege.  Since architectural solutions are harder, and may take longer, we can at least minimize the risk in the current architecture.

This problem is cast in sharper relief when it is noted that the current Pegasus code base contains literally hundreds of unchecked buffers, calls to format-string-vulnerable functions offer race conditions, and other risky function use (see bug #2934). Though the vast majority of these either deal with internal data or are in non-privileged client or test code, there are still enough to warrant concern that these issues exist in other places.

Requirements, Constraints or Assumptions

Most of the risk of failing to follow the guidelines below are only present when code is not run as the authenticated user (whether that code be in the server, the provider, or the client) or in deployments for which the concept of authentication isn't used (i.e.: SNMP-public info). Even for more limited deployments, many of the problems below can still cause potential crashes/denial of service conditions.

Acceptance Criteria

This is a Concept PEP, and does not directly mandate interface changes. Performing a WBEM regression test will ensure that the external functionality/behavior remains the same.  If there is no other way to follow the guidelines than to change an interface in a way that affects backward compatibility, then those changes should be considered on a case-by-case basis.

Definitions:

Elevated code: a difference between the actions that the logged in user is authorized to perform, and the execution context of the running program.  This "trust delta" must then be managed by the code to ensure that the user doesn't perform more actions than they are authorized either directly or through side-effect.  For example, a process run as the UID of the authenticated user is thus said to be "non-elevated."  A process running as administrator on behalf of a "non-administrator" user, would be called elevated.

Privilege: The collection of actions a process or user is not prevented from doing.  An "administrator"/root user is said to be full privilege with respect to a system since that execution context does not prevent any action on the system.

Trust: The degree to which an actor that is interacting with  the component under consideration is believed to behave non-maliciously.  For example, an arbitrary user on the Internet has no 'trust," a junior operator or administration is trusted to no attempt malicious activity, but may accidentally attempt damaging actions.  Root/Administrator code is trusted to behave correctly.

Security Testing: Non-functional testing that centers around behavior in the presence of malicious use.  Examples includes testing for crashes or security side-effects in the presence of overly long inputs, special character inputs, high-system load, and network storm environments.

Security Side Effect: Applications often, in accomplishing their goals, perform actions beyond those visible to the user.  Examples include writing temporary files, or clearing or requesting memory.  Since this behavior is not specified in the functional requirements, it is often not tested.  This "side effect" behavior is often the behavior that a malicious user attempts to leverage when trying to gain privilege.  Examples include exploiting race conditions where a temporary file is momentarily world writeable, before it is chmod-ed.  This window is an opportunity for a malicious user to insert data that can change the behavior of the application.

References:

Architecture: http://www.opengroup.org/security/secarch.htm

Books and References (not endorsed by Opengroup or partners):

 

 

Proposed Solution

General Implementation Guidelines:

Code that doesn't following the following guidelines in elevated code should be considered a bug, including providers not running as-requestor. it is a best-practice to follow the following guidelines for all code.

 

  1. Avoid buffer overflow vulnerabilities in your code (hackers use to insert arbitrary code)
    Buffer overflows in network-accessible software cause the most, and some believe, the majority, of software vulnerabilities. Since Pegasus is written in C/C++, it is especially susceptible to buffer overflows. Strongly consider using a static tool like Flawfinder or RATS to look for common problems. Dynamic tools can also be used but only identify problems when the overflow actually happens vs. finding potential overflows. The susceptibility stems from a lack of bounds checks in C++ and C. Problematic functions include strcpy, sprintf, strcat, gets, and strlcat.
  2. Avoid format string vulnerabilities in your code (hackers use to read or insert arbitrary code)
    Format strings define the format and types of program variables that are substituted into an input or output string. Exploitation of format strings occurs when functions that require a format string are coded with a variable, and that variable is not validated. For example the following is vulnerable code: printf(string_from_untrusted_user) as the user can supply the format string, and read or overwrite (using %n) arbitrary data. Each developer must:
  3. Adhere to the general, good programming practices:
    1. Always have people other than the coder review the code.
    2. Always have people other than the coder develop tests and test the end product.
    3. Check return codes from system or library calls, and handle errors or exceptions gracefully.
    4. Keep your code simple: simple code decreases the risk of defects; complex code increases the risk of defects.
    5. Don’t use uninitialized variables.
    6. Use symbolic constants (such as #define) to minimize typos and improve code maintainability.
    7. Use temporary files with care. Do not create temporary directories or files from your program that are world writable. Limit the permissions to what is needed by the program. Clean up temporary files or directories when you are finished using them.
    8. Do not put sensitive information in log files. For example, do not print social security numbers, passwords, credit card numbers, or any other sensitive or personal information for debugging purposes in the log files. Sometimes such information shows up in the web browser in case of exceptions or application failure.
    9. Enforce strong password policies and a delay on failed logins. This helps to prevent unauthorized access to private data. See libpam for a good way to implement this.
    10. Validate input to the program or system before processing the input. Test to see if the input is the proper type of data and in the range of acceptable or expected values and test both upper and lower bounds. For example, if you are reading in a year value (int), and you have already checked for buffer overflow and format strings:

      Correct: if 0 <= year <= 3000 then (accept input and process)

      Unsafe: if year <= 3000 then (accept input and process)

      The vulnerability of the unsafe example is that if someone were to return a value of –32769, then they could intentionally stop or corrupt a procedure. If the language which you are using does not enforce strong types, then a type check should also be performed before accepting the input.

  4. Use the principles of least and necessary privilege.
    Only grant the minimum set of privileges required to perform an operation, and grant these privileges for the minimum required amount of time. For example, if a provider needs to modify both mail queues and print spools, don’t run the application as root; instead, use /etc/logingroup or other facilities to give the application the privileges which it needs, but not more privileges than it needs.
  5. Use SSL securely:

    Please refer to SSL partner guide for correct SSL use.

  6. Handle race conditions securely
    Race conditions occur when two or more processes access a shared resource in an order that was not expected by the program. Unordered access to resources is common in multitasking environments, and is mostly associated with either Signal Handlers or File Handling. Pegasus makes frequent use of chmod. A better solution would be to ensure the umask is correct to avoid a window where a credential could be read from localauth.
  7. Use secure defaults when possible, or clearly document when they aren't used
  8. Design Securely
    Design your code so that as little as possible runs as a privileged user. All privileged user code (especially if it listens on a network or executes on behalf of other users) should be inspected very thoroughly, so it should be short and simple. Each module of code should have a clean interface for other modules to use and a well-defined perimeter around each module.  Additional details can be found at: Architectural Patterns for Enabling Application Security
  9. Test for security (use both positive and negative tests)
    “Positive tests” verify that the functionality of the product works as specified. “Negative tests” attempt to subvert the security of the system, and are often overlooked when testing software. Spend some time thinking like a hacker and trying to break your system. Always test boundary conditions or corner cases for values of data, size of data, and type of data. Many common bugs are related to this. Sometimes this type of bug may result in wrong information being retrieved from the database instead of failing gracefully. Attempt to exploit the system with buffer overflow and format string attacks.
  10. Don’t bundle private copies of security code
    Security code (especially highly scrutinized open-source code) is likely to have security bulletins issued against it. When such security bulletins are inevitably issued against code you depend on, you don’t want to have to issue a bulletin against your product also. If you put a dependency in your code to a standard distribution of a component which you need (for example OpenSSL), rather than embedding a private copy, then whenever a security bulletin is issued against it, you won’t have to reissue the bulletin after repacking the fix for your private copy.

General Coding Best-Practices:

  1. Avoid implementing security functionality: Making security claims in your documentation (beyond the implied security claims of authentication and authorization done by the operating system) can increase your risk of having a security defect. This is because any of those claims that are not fully implemented or enforced is by definition a security defect and requires an expedited fix and a security bulletin to announce that fix. Reuse of tried-and-tested code, that has been used in a security context is always a better choice. Never implement a random number generator or cryptographic algorithm unless you're a cryptographer by profession. You will almost certainly get it wrong.
        You should always, however, document your security behavior.
  2. Duplicating authorization code: Related to risk #1, every WBEM provider has the risk of authorization related defects because the authorization done in each provider is a duplication of the kernel authorization code. However, you can still decrease your risk by using common API’s. For example, many providers will need a way to tell if the authenticated user should have access to a given file. The code which does this needs to check the user id, group ids of the file in question and all of its parent directories. Any defect in this code could easily be a security defect and would need to be fixed in every copy of that code. For this reason it is imperative that this logic exists in only one place and that your provider uses that copy. Do not try to replicate this complex logic in your own provider, unless you are the single owner of that code.
    1. WBEM provider/client combinations: Writing a WBEM provider that is also a WBEM client (makes requests of other providers) also increases your risk. There are two subcategories of this risk:
    2. using the connectLocal() API uses the UID of the running process to do authentication. Thus, the provider initiating the request must ensure authorization of the other provider’s data before making the request. (This is another example of Risk 3, multiple copies of authorization code) One feasible way to do this is to check that the user is a privileged user before calling the other provider (in which case the UID matches the running process) Using the connect() API has additional complexities. Credentials must be somehow passed into the provider and then handled appropriately. Also, there are additional client responsibilities as far as certificate validation and testing, and the consequences are more severe because the client is running with elevated privileges

Provider Implementation Guidelines 

Code that doesn't following the following guidelines in providers running at elevated-privilege should be considered a bug.
Code in providers running as-requestor can consider the following as general best practice.

  1. Check the username/uid and execute every method as if it was running as that user (i.e. had the OS kernel or authorization service done the authorization).
    By checking each operation they perform, and ensuring those operations, when performed on behalf of a non-privileged user, do not have security side-effects. Any discrepancy between OS authorizations done by the kernel and that done by the provider that is not part of documented behavior is a security defect. If the user does not have the privileges to perform the requested operation the Provider must throw CIMAccessDeniedException.
  2. Keep your design/provider simple.
    While this is difficult to quantify, it is important to minimize the amount of code running as a privileged user. As a general guideline, if you have significant lines of code running with elevated privilege, the likelihood of a security defect is high. Remember that defects in elevated privilege code is a potential security defect, so all of this code must be straightforward and easy to review based on the principles mentioned in the General Coding principles above. The likelihood of a defect not being found is proportional to the amount and complexity of the code.
  3. Provider must not use any calls such as setuid or setting environment variables (i.e. PATH) that would alter the state of the process running the CIM Server.
    This could cause unexpected results for other providers or threads.
  4. Provider must document property authorizations.
    Specifically, the provider should describe which data elements they make available for reading, which system changes they are capable of making, and which users will be able to read those elements and make those changes.
  5. Provider must check all untrusted input for validity.
    While the CIM Server ensures that the input is a valid CIM request, the provider is responsible for validating that the CIM request does not cause any side effects by ensuring that the input strings contain only expected characters and that values are within an expected range. Examples of input data that must be checked include directory or file names, data within files that are read by the provider, and data returned from system calls.
  6. Provider must execute stress tests.
    These include operation in the presence of multiple interacting provider requests. Based on a white box analysis of your provider, identify ways in which testing could stress your provider. For example, sending large input strings, a large number of simultaneous requests, requests including out-of-bounds data, or ensuring that every branch is covered are just a few ways that you could stress your product to find potential defects. By exploring the way your provider fails, you can look for side effects that might lead to "infinite" resource requests, overwritten data, or other anomalies that could cause a denial-of-service or reveal a side-effect that can be leveraged as an exploit.
  7. Design your provider to expect belligerent input.
    For example, have a common method that validates all CIM requests and ensure that that method gets called for every request. The method should assume that input is invalid unless it matches a specific format and specific bounds are checked. Also, if your provider allocates any memory buffers or writes to any file based on user input, all error conditions (out-of-memory, disk full, file is a symbolic link/device file/directory instead of the expected format, buffer/array too small for data, etc.) should be checked and all of this should be enforced in a common place.
  8. Do not allow group or world-write access to your shared library, any other executable code, configuration files, or any parent directory of any of the above.
    Although only a privileged user ought to be able to create the symbolic links or shortcuts to the provider shared library in the designated WBEM provider library directory, the actual provider shared library can be placed in any directory. A provider must ensure that their shared libraries are protected in such a way that only a privileged user can modify or delete the shared library or the directory where the shared library is located.


Provider Best-Practices:

  1. Use "UserContext registration setting, defined in PEP 197" where available:
    In Pegasus 2.5 and after, you should strongly consider registering your provider to run as requestor context, or if not available, use Windows "impersonation" or fork a correct-user-running process. For providers in versions prior to 2.5, you may want to consider implementing your own out of process provider, to avoid the risks of running at elevated privilege. For those that must run privileged:
  2. Providers should consider the tradeoff between default installation/registration and optional:  An optional installation of a component (as part of an OS or software package) gives customers a choice as to whether or not to limit their interface/exposure, and maintenance/patch burden.  Your provider likely meets a real need for many customers, but there are also customers who do not need the functionality you provide. There are many customers who would prefer less patching/update cost and decreased security risk (risk is added whenever there is a new interface) versus the functionality that your product provides. Although technically this doesn’t decrease the risk of having a security defect, it can give you more options for interim workarounds until you can get a critical fix out, and fewer customers would be affected by any given defect.  Provider writers and bundlers should consider these benefits and weigh those against the bundling benefits of mandatory inclusion. (Note: deleted old #3)
  3. Log important events, such as unauthorized requests: This can help a customer track down a potential intrusion as well as debug problems. Do not include confidential information, such as passwords, in the log. Ensure that the confidentiality of information stored in the log is commensurate with access to the log. It is recommended that you use a common logging facility, such as syslog. Syslogd takes care of things like log rotation, etc. and the administrator already knows where to look for your logs.
  4. When making system changes, use platform security checks where possible vs. rewriting your own authorization code: Duplicating authorization code at least doubles the work and is more error-prone.

Client Implementation Guidelines:

Note: In general, these are the responsibility of the applications invoking CIM client libraries to the extent that the client libraries don't yet provide the direct support.

Client code that doesn't follow these guidelines should be considered a bug:

  1. Use SSL as follows in your remote production client. Though WBEM does provide libraries to help, client behavior is the client's responsibility:
  2. Protect the Keystore and Truststore for remote production clients:
  3. General programming standards

General client best-practices:

  1. Limit access to client data: Each user of a WBEM client should have his/her own WBEM client instance.  The WBEM client process should run as the correct user on the client machine.
  2. Local vs. Remote Requests and Username/Password Authentication: Use the connectLocal() API call to connect to the CIM server whenever possible.  To use this API call properly, the process must run with the correct userid

    Warning:  For Pegasus earlier than 2.5, doing client operations from a CIM provider significantly increases your security risk if the initial client requester was not running as root.  This is due to the implementation which runs the provider in the CIM Server process space with a single, often privileged, user so the provider it connects to will be unable to use built-in.  Providers issuing WBEM client operations must adequately address the security risk.  A few alternatives to address the security concern are: 1) ensure (either at design time or at runtime in the provider) that the user is authorized to access the data being requested from the second provider, and 2) the provider could launch another process and issue the request to the second provider as the intended user.

     

    Background on connectLocal(): 
    A local connection mechanism exists for clients to communicate with the CIM Server on the same system. The connectLocal() function is used for this purpose, and does not take any arguments. In the case where PEGASUS_LOCAL_DOMAIN_SOCKET is defined, (default on all but Windows) the user ID passed to the provider is that of the process in which the client program is running. The CIM Server verifies that the user ID of the request is indeed that of the requesting process. Namespace authorization, if enabled, is still performed.  When the client must be able to connect to a CIM Server on a remote system, or when it must be able to specify a different user than that of the process, it must use the connect() function. This function allows a hostname and port number to be specified, as well as a username and password.  If you need to use the connect() API, the WBEM client has several responsibilities to ensure correct authentication and to protect confidential information.   Because connectLocal() does not use SSL, these guidelines only apply to the connect() interface. Using connectLocal() bypasses these requirements except where PEGASUS_LOCAL_DOMAIN_SOCKET is not defined.  In that case, it behaves like connect(), using HTTPS and/or HTTP as defined in Pegasus settings.
     

  3. General programming standards
  4. Security Testing Guidelines

Globalization Checklist

N/A

Platform Considerations

The coding guidelines may not help, but will not hurt implementations where Pegasus and its providers are not run at elevated privilege. Examples of this include environments with only one user or where Pegasus itself is executed as the requesting user.

Test Plan

Not a functional PEP, so no interface testing is necessary beyond the normal regression expected of any code change. Developers may consider using scanning tools like RATS and Flawfinder that test for many deviations from the general guidelines described herein. Note that security code scanners are still in their infancy, and as such, have a low signal-to-noise ratio.

 

OpenPegasus Release Roadmap

Begin using these guidelines in 2.6 changes, and look to subsequent PEPs to propose architectural changes, defaults, or fix timelines fro old code.

Schedule

Concept PEP should guide new development as soon as it is approved.

PEP  Milestone Planned Date Revised Date Actual Date Comments
PEP Approval        
Functionality Complete (FC)        
Certification Test Complete (CTC)        

 

Discussion

  1. We would like to discuss and address the impact of these changes on platforms that deploy Pegasus as non-privileged.
  2. We would like to discuss and address community concerns about addressing the current Pegasus security risk with defaults, vs. the code migration described in this PEP.

 


Copyright (c) 2005 EMC Corporation; Hewlett-Packard Development Company, L.P.; IBM Corp.; The Open Group; VERITAS Software Corporation

Permission is hereby granted, free of charge, to any person obtaining a copy  of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

THE ABOVE COPYRIGHT NOTICE AND THIS PERMISSION NOTICE SHALL BE INCLUDED IN ALL COPIES OR SUBSTANTIAL PORTIONS OF THE SOFTWARE. THE SOFTWARE IS PROVIDED  "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


Template last modified: July 6, 2004 by Denise Eckstein
Template version: 1.11A