Quality Process as an IT Strategy
Annick Bernard, Sigrun Fredenucci
Grenoble University, France
abstract
The "Centre Interuniversitaire de Calcul de Grenoble", CICG, is Center for Information Resources and Technology shared by the five universities of Grenoble (representing 63,000 students and a number of renowned research labs). CICG provides Management Information Systems (MIS) and backbone networks. MIS software is Unix, Oracle client/server based. Major issues are availability, performance and security of data, as well as software maintenance and improvement.
Quality requirements led to the development of middleware around
BMC Software "Patrol" knowledge modules. In this context, documentation
about systems administration is designed according to ISO 9002 standards.
Issues discussed include: staff incentives, control of computing activities,
risk assessment, quality and performance requirements, reactivity to technical
troubles.
Introduction
Our Support Group team at CICG maintains data center facilities, including all operating systems, Oracle/Unix system administration, software installations, automated network backups and security for all administrative applications of the five universities. This represents 69 university-wide data bases, 9 Unix servers, and 2 computer centers located in the 2 cities of Grenoble and Chambéry. The Support Group includes 3 teams merged in 1995 from the 5 universities: Systems Administration (3 engineers, 2 of them are the authors of this paper), Operations (2 technicians), and Users Support (2 technicians).
After a while, Support Group engineers concluded to various management needs:
To unify administration, operations and observation procedures,
To define a risk assessment strategy,
To write down Support Group's know-how to ensure duration of the business.
Distributed software for systems administration recently became available on the market, and seemed to be an interesting solution to points 1 and 2. Having done a comparative study, we bought the PATROL platform, from BMC Software.
In the same time, as the Group merged with various work cultures, the urgent need of a written documentation led us to start an ISO standards Quality Insurance process.
This approach allowed us to compare our relations with users to those of a private firm with clients. We thus decided to write a Quality manual.
Implementing quality indicators, validity thresholds and work procedures is a part of Quality process, as well as formalizing software administration with alarms and corrections setting.
We now present the two parallel and converging PATROL and Quality processes,
and especially installation steps and difficulties, present state and expectations
of the project.
The PATROL management platform.
"Our aim is to pilot, like in industry, the university MIS and to reduce the complexity of managing our heterogeneous environment".
We looked for an application-oriented distributed management platform when Grenoble MIS was reengineered within a nation-wide software consortium. CICG yet used the Cisco SunNetManager platform to manage the common Campus backbone network.
Tools to administrate Unix systems and Oracle databases were needed, such as data base reorganization, event management, detection of exceptions, and possible automatic correction. A log of all the significant indicators and an easy customization were also necessary. After a comparative study with the ECOTOOL suite, and a three month test period, CICG bought the PATROL Application Management products, by BMC Software.
The most significant points are:
Patrol is designed for distributed applications, databases and systems, fitting our needs to support in-house applications.
This platform allows unification and integration of our various home-made tools to maintain peak availability, reliability and performance of all Unix servers and Oracle data bases.
It includes tools to provide statistics and integrated tools for subsequent database reorganization.
After staff training, we noticed that that Patrol was really easy to use. Everyone in Support Group is now able to control events of his competence and thus to have a statistical log of the production.
Support Group engineers used immediately PATROL with the delivered knowledge modules (KM) for Unix and Oracle. They provide a common view on one console of our three Unix clusters. For example, for the student registration software (about 63000 students), operational tolerance for a service interrupt is only 5 minutes. PATROL event manager and KMs efficiently complement the IBM high availability system called HACMP.
PATROL is also used for DB-administration. A new knowledge module was developed for the student system, to track some well known availability risk-full events. A complete list of these events has been set up.
We experienced that:
PATROL is friendly-use, but its customization is quite time-consuming,
risk full events and corrective actions must be listed with care, and constantly updated,
statistics have to be kept online 5 days for Unix servers, to control computer activity, system administration and resource performance,
performance gain are easily evaluated. For example this appeared when backup library was upgraded or when the network became 100Mbs,
Patrol data bases reorganization module is more complicated to implement than the others.
The need to manage the applications grew up with the number and instances of Oracle data bases ( the student system Apogée, the new financial application NABUCO, payroll, interfaces with the french Minitel and the WWW for 5 universities). Fixing Unix, Oracle, and users troubles became a major issue as the number of Support Group staff remains constant …
To spread out Patrol, we are hiring 3 students (3 times 2 month) on two projects: one to improve and customize the Unix Knowledge Module, the other one to prototype an application Knowledge Module.
PATROL improved Support Group staff reactivity and efficiency. Errors have been avoided and incidents prevented. It is especially useful for daily actions like watching file systems space and CPU time, controlling printers.
The university environment is a brake and a driving force for the implementation
of a system like PATROL. It is a brake because of absence of an industry
culture and service quality notion, it is a driving force because the universities
demand a continuous improvement of service.
Quality process
Support Group provides systems availability, reliability and performance, software maintenance and users support.
Individuals in Support Group are specialized and thus less interchangeable. They work generally in twos, and cannot easily replace a colleague of another two. Standardization of the two computer center sites procedures becomes also compulsory.
The Quality Process we started aims to:
- answer customer requirements,
- formalize, standardize, and spread out the know-how to ensure a good service.
It begun by developing documentation based on risk assessment. It is based on EN ISO 9002 standards for quality insurance in production, installation and services. The Quality Manual lists team activities and explains "Who do What Where When How Why". Its object is "TO FORESEE".
It also describes the responsibilities, authorities and relationships of staff who manage, perform, verify or review work regarding quality; it refers to the quality system procedures and instructions, and to a instructions for reviewing, updating and controlling the manual itself. It explains the vocabulary and describes the quality policy and, accordingly, stated and implied needs. This last point is very important because it induces the choice of quality indicators. In other words, the needs are translated into a set of entity requirements (process, product, organization…) to enable its examination and short or long term realization. Measure tools have thus to be standardized and used at defined intervals. We chose BMC Software PATROL as explained above.
A part of the quality manual deals with the quality system procedures and instructions. It checks off taken risks in regards to universities needs and analyze preventive and corrective actions to process. This section refers to:
- Documents about activities organization,
- Written specified ways to perform an activity,
- Records, documents which give evidence about activities and results.
Services
Process |
Unix administration | Oracle administration | Application
administration |
Hot-line | Operation | Person in charge of application | consortium support |
To maintain operating system | x | x | x | x | |||
To administer operating system | x | x | |||||
To maintain
database management system |
x | x | x | x | |||
To administer database management system | x | x | x | x | |||
To administer application | x | x | x | x | |||
To manage database | x | x | x | x | x | ||
To preserve database | x | x | x |
This "process/organization" array is an illustration of quality approach formalization.
Corresponding process is described in the chapter "Processes control" of the quality process, referring to procedures, instructions and records.
PROCESS: To maintain Apogée database.
TASK : This process upgrades software used by test, training or production databases.
ACTIVITIES : Concerning APOGEE (the student system), a local update can be requested by Apogee team. This request must be written and must specify localization of the file to install, the name of this file, name of receiving file, name of database.
The application administrator follows directives of the procedure PR-SU/4.9/AA/01: he saves the current version of file, installs the new version and establishes required access rights. He updates tables of databases and reports to the caller. This one tests the DB and informs users if all is good.
The application administrator installs new versions following the installation note and the procedure PR-SU/4.9/AA/01. Next, he prepares the environment (setting up of a restricted session on database, storage redo-log files, specifics jobs) and verifies log files generated during updating.
If a problem occurs, he contacts the consortium hot-line to fix it.
After that, he installs middleware developed locally. He informs hot-line
and Apogée team that installation is closed. He notes database version
change.
Schedule of the project.
The team manager gave at the beginning Quality directions and nominated a quality manager. A monthly meeting is dedicated to quality, improvements in organization, etc... These meetings help to:
- motivate the team about quality,
- track down the weakness of our structure,
- choose the best way for each process.
- give directions to write quality manual and other organization’s documents.
In the same time, we draw procedures, instructions and record frames. Attendees of quality meetings relate in detail for each process "what, when, how, and why" they do so, and explain their choices. But some people do not like to write notes about what they process, and quality approach don't succeed if the staff don't subscribe. We thus decided that some descriptions can be verbal depositions to the quality manager, who checks accuracy and concatenations of actions and writes procedures according to ISO9002 standard and defined vocabulary. Development is then reviewed by concerned staff.
A first evaluation of our quality approach can be done:
- the extra work it involves is sometimes a problem, especially when new computers or systems have to be installed. But the work progress is now satisfying,
issues are usual ones in similar cases: some are afraid about possible judgement of colleagues, or to loose their power. Somebody told these tasks did not valorize their authors. Bad past experiences can also be argued.
Discussions and debates during the meetings seemed to sweep away reserves and hostility. Today, the complete group is involved and motivated, especially as the first results of the quality approach can be seen.
- analyze of practices show some failures in our organization, which we correct immediately. For example, we thought we had the same backup cycles in the two computer centers, which was not true.
moreover such an approach shows daily tasks complexity and valorizes everybody’s work.
Technicians jobs can be improved by giving them new procedures.
communication with clients is also improved.
These results will be even better when the quality manual will be finished. We'll then be looking forward to .. total quality !
Conclusion
First aims of PATROL tools were to make the production easier and to unify the various operation and observation procedures. Today PATROL is integrated the quality approach which driven by the development of the quality manual. The quality approach introduced changes in staff work and organization. It is being emerging as a management tool with some impacts on others teams of the computer center.
Maintaining MIS for the five universities of Grenoble, managing the archives of all treatments, preventing risk, ensuring the transmission of know-how, all these important goals become today the key target for our team. We are close to reach our objectives, as it is for example showed by the quick integration of a new engineer. We hope to be able, as soon as the quality manual will be ready, to ensure a full service in case of absence (vacation, illness) in the team.
Developing the quality manual and improving software administration contributed to establish a climate of confidence inside our team and with our colleagues and clients. Quality approach encourages us to try always better to satisfy universities needs.
Acknowledgements
Many thanks to Jean-François Desnos, Department Head at
CICG, and Anne Farnier, Systems Administrator in Support Group,
for profitable discussions and help in reviewing and translating this paper.
References
NF EN ISO 9002 , NF X50-163 , NF X50-160 ISO Standards documentation
Patrol BMC Software http://www.bmc.com,
CICG http://www.grenet.fr