[Future Technology Research Index] [SGI Tech/Advice Index] [Nintendo64 Tech Info Index]

[WhatsNew] [P.I.] [Indigo] [Indy] [O2] [Indigo2] [Crimson] [Challenge] [Onyx] [Octane] [Origin] [Onyx2]

Ian's SGI Depot: FOR SALE! SGI Systems, Parts, Spares and Upgrades

(check my current auctions!)
(note that this report predates the release of IRIX 6.2, 6.3, 6.4 and 6.5)

Indigo2 and POWER Indigo2 Technical Report

Section 8 The IRIX 6.0.1 Operating System

IRIX 6.0.1 is the first version of IRIX for R8000 POWER Indigo2 systems. This system will support today's 32-bit user code binaries without change, and will permit the creation of binaries that use the 64-bit features of the new MIPS R8000 streaming superscalar RISC family.

IRIX 6.0.1 is an upwardly compatible revision of IRIX 5 that incorporates substantial functionality from UNIX System V, Release 4.1. IRIX 6.0.1 complies with the Applications Binary Interface described in the System V Interface Definition, Issue 3 (SVID 3), the defining document for System V Release 4 (SVR4). This compliance places IRIX 6.0.1 in the mainstream of UNIX products, and in a leadership position in supplying an SVR4 UNIX with 64 bit features.

Outside of extensions to support 64 hits, the features or IRIX 6.0.1 are inherited directly from IRIX 5.3, so the kernel technology, file system, processor scheduler enhancements, memory management policies, and I/O enhancements have the same behavior as they do in IRIX 5.3.

This chapter explains:

This chapter concludes with suggestions for further reading.

8.1 Kernel Architecture

Figure 21 diagrams the IRIX 6.0.1 kernel architecture.

[IRIX 6.0.1 kernel architetcure]

FIGURE 21 IRIX 6.0.1 Kernel Architecture.

Because the IRIX 6.0.1 kernel itself is a 64-bit entity, all pointers and longs are treated as 64-bit quantities. Table 7 summarizes other structures that were extended to provide for 64-bit support.

/sys/                 Other

<sys/ipc.h>           <dirent.h>
<sys/msg.h>           <fcntl.h>
<sys/procset.h>       <ftw.h>
<sys/resource.h>      <grp.h>
<sys/sem.h>           <locale.h>
<sys/shm.h>           <math.h>
<sys/siginfo.h>       <netconfig.h>
<sys/stat.h>          <netdir.h>
<sys/statvfs.h>       <nl_types.h>
<sys/time.h>          <poll.h>
<sys/times.h>         <pwd.h>
<sys/tiuser.h>        <rpc.h>
<sys/types.h>         <search.h>
<sys/uio.h>           <setjmp.h>
<sys/utime.h>         <sigaction.h>
<sys/utsname.h>       <signal.h>

TABLE 7 Kernel Structures Extended to Support 64 Bits

The effects of these adjustments are:

8.2 Memory Management

The IRIX virtual memory subsystem is essentially a "ground-up" design to support SVR4 virtual memory functionality including mmap(), mprotect(), and SVR4 Extended Linking Format (ELF) executables. The region structure is SVR3 in nature, although heavily modified for MP efficiencies related to cache flushing and TLB management. Virtual memory is paged in a classic demand-paged model.

It is possible to add and subtract swap partitions on a running system. One can also swap to NFS partitions, and to mirrored partitions, and to regular files. Another feature of IRIX is that one can overcommit swap allocation, although this practice is not recommended and requires good knowledge of the application's working set.

Numerous tunable variables that affect memory management policies and performance are documented in the systune() man page.

8.3 Process Management (Scheduling)

The IRIX process represents a thread of execution; this abstract entity is defined by a collection of data. The virtual address space of a process, the contents of its user structure and/iroc table entry, and the values contained in machine registers when the process is running all constitute the context of the process.

To support multiple processes, IRIX implements a process-scheduling algorithm that assures a fairly equitable division of processor time between all processes. This algorithm is said to be nonpreemptive, that is, the running process cannot be preempted by another process (although it can be preempted by the kernel).

The running process can yield to another process "voluntarily," by making a system call (such as an I/O request) that causes it to sleep, in which case another process is selected to run. Also, the running process can be preempted by the kernel to handle an exception, in which case execution returns to the process after the exception handler has finished its business.

The kernel also enforces a limit on the amount of time a process can monopolize the processor. When this specified time has elapsed, an exception is generated, and the exception handler selects a new process to run and executes a context switch.

While the process management model discussed represents an apparently equitable resource sharing model, the effect of scheduling processes in a simplistic model does not maximize the computing capability of a multiprocessor system. These effects increase with the number of processors, and are central to the issue of how well a multiprocessor system scales in performance when additional processors are added.

These and other effects are accounted for in sophisticated scheduling algorithms and include:

For more information on these scheduler models, see the Silicon Graphics white papers "Processor Segmentation: A resource management facility for Shared Memory Multiprocessors" and "Parallel throughput performance of IRIX 5.X."

8.4 Synchronization Primitives

Silicon Graphics process synchronization primitives - locks, barriers, and semaphores - are allocated from a shared memory "arena". These are memory-mapped between processes and can permit pointers to he shared. Synchronization primitives are implemented with the R4000 instruction itself, allowing for much faster synchronization than in traditional System V mechanisms.

8.5 Lightweight Processes

sproc is an interface that permits users to create lightweight processes that share the virtual address space of the parent process. The parent and the child each have their own program counter value and stack pointer, but all the text and data space is visible to both processes. This scheme provides one of the basic mechanisms upon which parallel programs can be built.

A shared process group can be constructed from sproc calls from a common ancestor. In addition to virtual address space, members of a share group can share other attributes such as file tables, current working directories, effective userids, and so on.

Traditional POSIX threads are available in the Silicon Graphics ADA product, and are planned as a feature for general release on a future version of IRIX.

8.6 File Services

The IRIX file subsystem supports multiple physical file systems of different file system types, and gives them the appearance of a single logical file system with a hierarchical arrangement.

IRIX 6.0.1 uses the Virtual File System interface known as the vnode interface. The name vnode is derived from the name of the data structure used in the interface. This interface was developed to facilitate the incorporation of different file systems in the system. A de facto standard, this interface has been used by third-party providers of file system technology, such as the enhanced file system found in the CASEVision Tracker product. It facilitates the inclusion of additional file system types into IRIX 6.0.1.

The file system types supported for IRIX include:

8.7 Extent File System

The native file system, the Extent File System (EFS), provides storage and retrieval of data by storing data in contiguous blocks, and by trying to maintain locality between the inode and its data extents, and between parent directories and their children. Allocation of resources is sped up by a bitmap of allocated data blocks, and by cylinder group summary structures which track the number of free data blocks and inodes in each cylinder group.

EFS offers much higher performance than conventional UNIX file systems because it uses extents of up to 64 Kbytes instead of the 8-KByte (or smaller) blocks commonly found in other systems. This arrangement can dramatically reduce the number of I/O operations compared to other file systems. EFS also permits more files in a file system through the use of 32-bit internal handles (inode number). EFS file systems can be up to 8 GBytes (an individual file is limited to 2 GBytes). Conventional UNIX file system structures are constrained by the use of a 16-bit inode number, and thus are limited to 64 K files in a file system and to 2 GBytes for both file size and file system size.

8.7.1 File System Reorganizer

The fsr (file system reorganizer) program improves the organization of mounted file systems. The reorganization algorithm operates on one file at a time, compacting or otherwise improving the layout of the file extents (contiguous blocks of file data) while simultaneously compacting the file system free space. The intended usage is to call for from crontab at a regular time; the default is once per week.

8.8 I/O Performance Improvement Options

IRIX 6.0.1 includes new interfaces that can be used in systems demanding the maximum possible I/O performance, such as DBMS. These new features require source changes, but, when used appropriately, the performance gains can be dramatic. These new features are:

8.8.1 Asynchronous I/O

Traditionally, UNIX systems have allowed a given process to have only one I/O operation in progress at a time. This situation has forced applications like database servers that required several concurrent I/O operations to adopt complex, multiprocess architectures. IRIX 6.0.1 includes support for multiple concurrent I/O operations within a single process. The interfaces comply with Draft 12 of the POSIX P1003.4 specifications. These features allow simpler, more efficient implementations of subsystems requiring multiple concurrent I/O operations. Since they are standards-compliant, applications that use them are potentially portable between systems using the same interfaces.

8.8.2 Memory-mapped I/O (mmap())

IRIX 6.0.1 supports the SVR4 interfaces for memory-mapped I/O, which can make disk files visible in the address space of the program and allow them to be accessed without explicit I/O operations. Behind the scenes, the kernel arranges to bring in the necessary pages similarly to the way in which it brings in the pages of an executable, using the demand paging features of the virtual memory subsystem.

mmap() permits the implementation of systems that are potentially more efficient by avoiding the copying of bytes to (from) user space from (to) I/O buffers in the kernel. IRIX 6.0.1 supports the auxiliary operations for memory-mapped files as defined in SVID 3, such as control of copying modified pages back to the disk. Full support for memory-mapped I/O has required internal changes to the virtual memory subsystem in the kernel to support page level protection.

8.8.3 Direct I/O

Like memory-mapped I/O, direct I/O also accomplishes the I/O directly from (to) the disk to (from) the user address space. Direct I/O, however, requires substantially less change to an existing application than memory-mapped I/O. Direct I/O requires that the program implement its own read-ahead and write-behind policies. Using both direct and asynchronous I/O, the program can have complete control of its I/O while retaining the benefits of the file system. Previously, this level of control was only available to programs using the raw disk interfaces; direct I/O provides performance approaching that of the raw disk. The interfaces for direct I/O have been defined by Silicon Graphics.

8.9 64-bit API for 32-bit programs

In preparation for a 64-bit file system, interfaces have been added to allow both 32-bit and 64-bit applications access to 64-bit file system features. To support the access of 64-bit files from 32-bit applications, new interfaces that take 64-bit parameters must be defined. These interfaces are cleanly supported in the kernel, without the information of whether an application is 64- or 32-bit filtering down below the system call level.

The need for these extensions is only to allow 32-bit applications to access 64-bit files. They are not needed by 64-bit applications, or by 32-bit applications that have no need to deal with 64-bit files.

A 32-bit program either recognizes the existence of large files or it does not. Those that do use new interfaces to access these files. Those that do not see all files as having a maximum size of 2 GB. Data beyond offset 2GB in a file will be inaccessible to applications that do not recognize the existence of large files. A file size greater than 2GB will not be visible to an application that does not know of such files.

8.10 NFS

Silicon Graphics continues to provide support for the Solaris Open Network Computing (ONC) software. This includes the Network File System (NFS) and Network Information Services (NIS).

Sun Microsystems has proposed 64-bit extensions to NFS as part of the ONC+ package. These extensions are required because the current offset is explicit in each NFS request and is a 32-bit quantity. Silicon Graphics will implement ONC+ features as soon as they are made available.

8.11 MIPS Applications Binary Interface (ABI)

One of the most important attributes of SVR4 is the definition of an Applications Binary Interface (ABI). The ABI permits application migration among all of the various implementations of SVR4 for a given processor family. The ABI is defined by two documents:

Both documents are published as part of the Prentice Hall/UNIX Press series of books for SVR4. See "References" at the end of this chapter for more information.

Compliance with the SVR4 ABI is tested with a variety of commercial and public-domain test packages. One of the most comprehensive is the Generic ABI Test Suite (gABI) developed by UNISOFT, a comprehensive test of the compliance of the system with the SVR4 ABI as defined by the System V Interface Definition Issue 3.0 (SVID 3.0). The test suite consists of over 7,800 tests of the details of ABI compliance.

No amount of care in writing specifications and in implementing systems based on them can ensure that binary executables will indeed execute on all systems. To address this issue, the community building SVR4 systems for MIPS processors has joined together in an organization called the MIPS ABI Group. The goal of the group has been to iron out the subtle inconsistencies between their systems to ensure that they do indeed support binary compatibility. The effort includes major Independent Software Vendors (ISVs) who are porting their packages to the systems. Silicon Graphics has supported this effort, including the setting up of laboratory space on the Mountain View campus to house a collection of systems from the member companies for testing of interoperability. IRIX 6.0.1 meets the ABI and can run the "shrink-wrapped" versions of the ISV packages. The reaction from the ISV community to this effort has been very positive, which will translate into a broader range of applications becoming available for IRIX, allowing us to better serve expanded markets.

8.12 IRIX 4 and IRIX 5 Binary Compatibility

One of the hallmarks of IRIX has been the ability to maintain binary upward compatibility between major operating system upgrades. This compatibility dates back to IRIX 2.2 on early Silicon Graphics platforms. Because IRIX 6.0.1 can execute IRIX 4, IRIX 5, and MIPS ABI binaries, it preserves investments in legacy applications, and allows the user to selectively port applications that benefit the most from IRIX 6.0.1 and the R6000 processor.

8.13 References

Ian's SGI Depot: FOR SALE! SGI Systems, Parts, Spares and Upgrades

(check my current auctions!)
[WhatsNew] [P.I.] [Indigo] [Indy] [O2] [Indigo2] [Crimson] [Challenge] [Onyx] [Octane] [Origin] [Onyx2]
[Future Technology Research Index] [SGI Tech/Advice Index] [Nintendo64 Tech Info Index]