Profile Facts: Multiprocessing

Showing posts with label Multiprocessing. Show all posts

Sunday, November 28

Colossus computer

A Colossus Mark 2 computer. The operator on the left is Dorothy Duboisson. The slanted control panel on the left was used to set the pin patterns on the Lorenz; the paper tape transport is on the right.

The Colossus machines were electronic computing devices used by British codebreakers to help read encrypted German messages during World War II. These were the world's first programmable, digital, electronic, computing devices. They used vacuum tubes (thermionic valves) to perform the calculations.

Colossus was designed by engineer Tommy Flowers with input from Harry Fensom, Allen Coombs, Sid Broadhurst and Bill Chandler at the Post Office Research Station, Dollis Hill to solve a problem posed by mathematician Max Newman at Bletchley Park. The prototype, Colossus Mark 1, was shown to be working in December 1943 and was operational at Bletchley Park by February 1944. An improved Colossus Mark 2 first worked on 1 June 1944, just in time for the Normandy Landings. Ten Colossi were in use by the end of the war.

The Colossus computers were used to help decipher teleprinter messages which had been encrypted using the Lorenz SZ40/42 machine—British codebreakers referred to encrypted German teleprinter traffic as "Fish" and called the SZ40/42 machine and its traffic "Tunny". Colossus compared two data streams, counting each match based on a programmable Boolean function. The encrypted message was read at high speed from a paper tape. The other stream was generated internally, and was an electronic simulation of the Lorenz machine at various trial settings. If the match count for a setting was above a certain threshold, it would be sent as output to an electric typewriter.

The Colossus was used to find possible key combinations for the Lorenz machines – rather than decrypting an intercepted message in its entirety.

In spite of the destruction of the Colossus hardware and blueprints as part of the effort to maintain a project secrecy that was kept up into the 1970s—a secrecy that deprived some of the Colossus creators of credit for their pioneering advancements in electronic digital computing during their lifetimes—a functional replica of a Colossus computer was completed in 2007.

Purpose and origins

The Lorenz machine was used by the Germans to encrypt high-level teleprinter communications. It contained 12 wheels with a total of 501 pins.

The Colossus computers were used in the cryptanalysis of high-level German communications, messages which had been encrypted using the Lorenz SZ 40/42 cipher machine; part of the operation of Colossus was to emulate the electromechanical Lorenz machine electronically. To encrypt a message with the Lorenz machine, the plaintext was combined with a stream of key bits, grouped in fives. The keystream was generated using twelve pinwheels: five were termed (by the British) χ ("chi") wheels, another five ψ ("psi") wheels, and the remaining two the "motor wheels". The χ wheels stepped regularly with each letter that was encrypted, while the ψ wheels stepped irregularly, controlled by the motor wheels.

Bill Tutte, a cryptanalyst at Bletchley Park, discovered that the keystream produced by the machine exhibited statistical biases deviating from random, and that these biases could be used to break the cipher and read messages. In order to read messages, there were two tasks that needed to be performed. The first task was wheel breaking, which was discovering the pin patterns for all the wheels. These patterns were set up once on the Lorenz machine and then used for a fixed period of time and for a number of different messages. The second task was wheel setting, which could be attempted once the pin patterns were known. Each message encrypted using Lorenz was enciphered at a different start position for the wheels. The process of wheel setting found the start position for a message. Initially Colossus was used to help with wheel setting, but later it was found it could also be adapted to the process of wheel breaking as well.

Colossus was developed for the Newmanry, the section at Bletchley Park responsible for machine methods against the Lorenz machine, headed by the mathematician Max Newman. It arose out of a prior project which produced a special purpose opto-mechanical comparator and counting machine called "Heath Robinson".

The main problems with the Heath Robinson were the relative slowness of electro-mechanical relays and the difficulty of synchronising two paper tapes, one punched with the enciphered message, the other representing the patterns produced by the wheels of the Lorenz machine. The tapes tended to stretch when being read at some 2000 characters per second, resulting in unreliable counts. Tommy Flowers of the Post Office Research Station at Dollis Hill was called in to look into the design of the Robinson’s combining unit. He was not impressed with the machines and, at his own initiative, designed an electronic machine which stored the data from one of the tapes internally. He presented this design to Max Newman in February 1943, but the idea that the one to two thousand thermionic valves (vacuum tubes) proposed, could work together reliably was greeted with scepticism, so more Robinsons were ordered from Dollis Hill. Flowers, however, persisted with the idea and obtained support from the Director of the Research Station.

The construction of Colossus

Tommy Flowers spent eleven months (early February 1943 to early January 1944) designing and building Colossus at the Post Office Research Station, Dollis Hill, in northwest London. After a functional test in December 1943, Colossus was dismantled and shipped north to Bletchley Park, where it was delivered on 18 January 1944 and assembled by Harry Fensom and Don Horwood, and attacked its first message on 5 February.

The Mark 1 was followed by nine Mark 2 Colossus machines, the first being commissioned in June 1944, and the original Mark 1 machine was converted into a Mark 2. An eleventh Colossus was essentially finished at the end of the war. Colossus Mark 1 contained 1,500 electronic valves (tubes). Colossus Mark 2 with 2,400 valves was both 5 times faster and simpler to operate than Mark 1, greatly speeding the decoding process. Mark 2 was designed while Mark 1 was being constructed. Allen Coombs took over leadership of the Colossus Mark 2 project when Tommy Flowers moved on to other projects. For comparison, later stored-program computers like the Manchester Mark 1 of 1949 used about 4,200 valves. In comparison, ENIAC (1946) used 17,468 valves, but, unlike Colossus, was not a software programmable machine.

Colossus dispensed with the second tape of the Heath Robinson design by generating the wheel patterns electronically, and processing 5,000 characters per second with the paper tape moving at 40 ft/s (12.2 m/s or 27.3 mph). The circuits were synchronized by a clock signal generated by the sprocket holes of the punched tape. The speed of calculation was thus limited by the mechanics of the tape reader. Tommy Flowers tested the tape reader up to 9,700 characters per second (53 mph) before the tape disintegrated. He settled on 5,000 characters/second as the desirable speed for regular operation. Sometimes, two or more Colossus computers tried different possibilities simultaneously in what now is called parallel computing, speeding the decoding process by perhaps as much as doubling the rate of comparison.

Colossus included the first ever use of shift registers and systolic arrays, enabling five simultaneous tests, each involving up to 100 Boolean calculations, on each of the five channels on the punched tape (although in normal operation only one or two channels were examined in any run).

Initially Colossus was only used to determine the initial wheel positions used for a particular message (termed wheel setting). The Mark 2 included mechanisms intended to help determine pin patterns (wheel breaking). Both models were programmable using switches and plug panels in a way the Robinsons had not been.

Design and operation

In 1994, a team led by Tony Sale (right) began a reconstruction of a Colossus at Bletchley Park. Here, in 2006, Sale supervises the breaking of an enciphered message with the completed machine.

Colossus used state-of-the-art vacuum tubes (thermionic valves), thyratrons and photomultipliers to optically read a paper tape and then applied a programmable logical function to every character, counting how often this function returned "true". Although machines with many valves were known to have high failure rates, it was recognised that valve failures occurred most frequently with the current surge when powering up, so the Colossus machines, once turned on, were never powered down unless they malfunctioned.

Colossus was the first of the electronic digital machines with programmability, albeit limited in modern terms. It was not, however, a fully general Turing-complete computer, even though Alan Turing worked at Bletchley Park. It was not then realized that Turing completeness was significant; most of the other pioneering modern computing machines were also not Turing complete (e.g. the Atanasoff–Berry Computer, the Harvard Mark I electro-mechanical relay machine, the Bell Labs relay machines (by George Stibitz et al.), or the first designs of Konrad Zuse). The notion of a computer as a general purpose machine—that is, as more than a calculator devoted to solving difficult but specific problems—would not become prominent for several years.

Colossus was preceded by several computers, many of them first in some category. Zuse's Z3 was the first functional fully program-controlled computer, and was based on electromechanical relays, as were the (less advanced) Bell Labs machines of the late 1930s (George Stibitz, et al.). The Atanasoff–Berry Computer was electronic and binary (digital) but not programmable. Assorted analog computers were semiprogrammable; some of these much predated the 1930s (e.g., Vannevar Bush). Babbage's Analytical engine design predated all these (in the mid-19th century), it was a decimal, programmable, entirely mechanical construction—but was only partially built and never functioned during Babbage's lifetime (the first complete mechanical Difference engine No. 2, built in 1991, does work however). Colossus was the first combining digital, (partially) programmable, and electronic. The first fully programmable digital electronic computer was the ENIAC which was completed in 1946.

Defining characteristics of some early digital computers of the 1940s (In the history of computing hardware)

Name First operational Numeral system Computing mechanism Programming Turing complete

Zuse Z3 (Germany) May 1941 Binary floating point Electro-mechanical Program-controlled by punched film stock (but no conditional branch) Yes (1998)

Atanasoff–Berry Computer (US) 1942 Binary Electronic Not programmable—single purpose No

Colossus Mark 1 (UK) February 1944 Binary Electronic Program-controlled by patch cables and switches No

Harvard Mark I – IBM ASCC (US) May 1944 Decimal Electro-mechanical Program-controlled by 24-channel punched paper tape (but no conditional branch) No

Colossus Mark 2 (UK) June 1944 Binary Electronic Program-controlled by patch cables and switches No

Zuse Z4 (Germany) March 1945 Binary floating point Electro-mechanical Program-controlled by punched film stock Yes

ENIAC (US) July 1946 Decimal Electronic Program-controlled by patch cables and switches Yes

Manchester Small-Scale Experimental Machine (Baby) (UK) June 1948 Binary Electronic Stored-program in Williams cathode ray tube memory Yes

Modified ENIAC (US) September 1948 Decimal Electronic Program-controlled by patch cables and switches plus a primitive read-only stored programming mechanism using the Function Tables as program ROM Yes

EDSAC (UK) May 1949 Binary Electronic Stored-program in mercury delay line memory Yes

Manchester Mark 1 (UK) October 1949 Binary Electronic Stored-program in Williams cathode ray tube memory and magnetic drum memory Yes

CSIRAC (Australia) November 1949 Binary Electronic Stored-program in mercury delay line memory Yes

Influence and fate

The use to which the Colossus computers were put was of the highest secrecy, and the Colossus itself was highly secret, and remained so for many years after the War. Thus, Colossus could not be included in the history of computing hardware for many years, and Flowers and his associates also were deprived of the recognition they were due.

Being not widely known, it therefore had little direct influence on the development of later computers; EDVAC was the early design which had the most influence on subsequent computer architecture.

However, the technology of Colossus, and the knowledge that reliable high-speed electronic digital computing devices were feasible, had a significant influence on the development of early computers in Britain and probably in the US. A number of people who were associated with the project and knew all about Colossus played significant roles in early computer work in Britain. In 1972, Herman Goldstine wrote that:

Britain had such vitality that it could immediately after the war embark on so many well-conceived and well-executed projects in the computer field.

In writing that, Goldstine was unaware of Colossus, and its legacy to those projects of people such as Alan Turing (with the Pilot ACE and ACE), and Max Newman and I. J. Good (with the Manchester Mark 1 and other early Manchester computers). Brian Randell later wrote that:

the COLOSSUS project was an important source of this vitality, one that has been largely unappreciated, as has the significance of its places in the chronology of the invention of the digital computer.

Colossus documentation and hardware were classified from the moment of their creation and remained so after the War, when Winston Churchill specifically ordered the destruction of most of the Colossus machines into "pieces no bigger than a man's hand"; Tommy Flowers personally burned blueprints in a furnace at Dollis Hill. Some parts, sanitised as to their original use, were taken to Newman's Royal Society Computing Machine Laboratory at Manchester University. The Colossus Mark 1 was dismantled and parts returned to the Post Office. Two Colossus computers, along with two replica Tunny machines, were retained, moving to GCHQ's new headquarters at Eastcote in April 1946, and moving again with GCHQ to Cheltenham between 1952 and 1954. One of the Colossi, known as Colossus Blue, was dismantled in 1959; the other in 1960. In their later years, the Colossi were used for training, but before that, there had been attempts to adapt them, with varying success, to other purposes. Jack Good relates how he was the first to use it after the war, persuading NSA that Colossus could be used to perform a function for which they were planning to build a special purpose machine. Colossus was also used to perform character counts on one-time pad tape to test for non-randomness.

Throughout this period the Colossus remained secret, long after any of its technical details were of any importance. This was due to the UK's intelligence agencies use of Enigma-like machines which they promoted and sold to other governments, and then broke the codes using a variety of methods. Had the knowledge of the codebreaking machines been widely known, no one would have accepted these machines; rather, they would have developed their own methods for encryption, methods that the UK services might not have been able to break. The need for such secrecy ebbed away as communications moved to digital transmission and all-digital encryption systems became common in the 1960s.

Information about Colossus began to emerge publicly in the late 1970s, after the secrecy imposed was broken when Colonel Winterbotham published his book The Ultra Secret. More recently, a 500-page technical report on the Tunny cipher and its cryptanalysis – entitled General Report on Tunny – was released by GCHQ to the national Public Record Office in October 2000; the complete report is available online, and it contains a fascinating paean to Colossus by the cryptographers who worked with it:

It is regretted that it is not possible to give an adequate idea of the fascination of a Colossus at work; its sheer bulk and apparent complexity; the fantastic speed of thin paper tape round the glittering pulleys; the childish pleasure of not-not, span, print main header and other gadgets; the wizardry of purely mechanical decoding letter by letter (one novice thought she was being hoaxed); the uncanny action of the typewriter in printing the correct scores without and beyond human aid; the stepping of the display; periods of eager expectation culminating in the sudden appearance of the longed-for score; and the strange rhythms characterizing every type of run: the stately break-in, the erratic short run, the regularity of wheel-breaking, the stolid rectangle interrupted by the wild leaps of the carriage-return, the frantic chatter of a motor run, even the ludicrous frenzy of hosts of bogus scores.

Reconstruction

The Colossus rebuild seen from the rear.

Construction of a fully-functional replica of a Colossus Mark 2 was undertaken by a team led by Tony Sale. In spite of the blueprints and hardware being destroyed, a surprising amount of material survived, mainly in engineers' notebooks, but a considerable amount of it in the U.S. The optical tape reader might have posed the biggest problem, but Dr. Arnold Lynch, its original designer, was able to redesign it to his own original specification. The reconstruction is on display, in the historically correct place for Colossus No. 9, at The National Museum of Computing, in H Block Bletchley Park in Milton Keynes, Buckinghamshire.

In November 2007, to celebrate the project completion and to mark the start of a fundraising initiative for The National Museum of Computing, a Cipher Challenge[12] pitted the rebuilt Colossus against radio amateurs worldwide in being first to receive and decode three messages enciphered using the Lorenz SZ42 and transmitted from radio station DL0HNF in the Heinz Nixdorf MuseumsForum computer museum. The challenge was easily won by radio amateur Joachim Schüth, who had carefully prepared for the event and developed his own signal processing and decrypt code using Ada. The Colossus team were hampered by their wish to use World War II radio equipment, delaying them by a day because of poor reception conditions. Nevertheless the victor's 1.4 GHz laptop, running his own code, took less than a minute to find the settings for all 12 wheels. The German codebreaker said: "My laptop digested ciphertext at a speed of 1.2 million characters per second—240 times faster than Colossus. If you scale the CPU frequency by that factor, you get an equivalent clock of 5.8 MHz for Colossus. That is a remarkable speed for a computer built in 1944."

The Cipher Challenge verified the successful completion of the rebuild project. "On the strength of today's performance Colossus is as good as it was six decades ago", commented Tony Sale. "We are delighted to have produced a fitting tribute to the people who worked at Bletchley Park and whose brainpower devised these fantastic machines which broke these ciphers and shortened the war by many months."

(source:wikipedia)

Virtual memory

In computing, virtual memory is a memory management technique developed for multitasking kernels. This technique virtualizes a computer architecture's various hardware memory devices (such as RAM modules and disk storage drives), allowing a program to be designed as though:

there is only one hardware memory device and this "virtual" device acts like a RAM module.

the program has, by default, sole access to this virtual RAM module as the basis for a contiguous working memory (an address space).

Systems that employ virtual memory:

use hardware memory more efficiently than systems without virtual memory.

make the programming of applications easier by:

hiding fragmentation.

delegating to the kernel the burden of managing the memory hierarchy; there is no need for the program to handle overlays explicitly.

obviating the need to relocate program code or to access memory with relative addressing.

Memory virtualization is a generalization of the concept of virtual memory.

Virtual memory is an integral part of a computer architecture; all implementations (excluding[dubious – discuss] emulators and virtual machines) require hardware support, typically in the form of a memory management unit built into the CPU. Consequently, older operating systems (such as DOS of the 1980s or those for the mainframes of the 1960s) generally have no virtual memory functionality, though notable exceptions include the Atlas, B5000, IBM System/360 Model 67, IBM System/370 mainframe systems of the early 1970s, and the Apple Lisa project circa 1980.

Embedded systems and other special-purpose computer systems that require very fast and/or very consistent response times may opt not to use virtual memory due to decreased determinism; virtual memory systems trigger unpredictable interrupts that may produce unwanted "jitter" during I/O operations. This is because embedded hardware costs are often kept low by implementing all such operations with software (a technique called bit-banging) rather than with dedicated hardware. In any case, embedded systems usually have little use for multitasking features or complicated memory hierarchies.

History

In the 1940s and 1950s, before the development of virtual memory, all larger programs had to contain logic for managing two levels of storage (primary and secondary); one such management technique is overlaying. The main reason for introducing virtual memory was therefore not simply to extend primary memory, but to make such an extension as easy as possible for programmers to use.

To allow for multiprogramming and multitasking, many early systems already had the ability to divide the memory between multiple programs even without providing virtual memory; for instance, early models of the PDP-10 provided memory access through what are called base and bounds registers.

Paging was first developed at the University of Manchester as a cheap way to extend the Atlas Computer's working memory by transparently supplementing its 16 thousand words of primary core memory with the additional 96 thousand words of secondary drum memory. Although the first Atlas was commissioned in 1962, working prototypes of paging had reportedly been developed by 1959. Also, Burroughs independently released in 1961 the first commercial computer with virtual memory, the B5000, which used segmentation rather than paging.

Like many technologies in the history of computing, virtual memory was not accepted without challenge. Before it could be implemented in mainstream operating systems, many models, experiments, and theories had to be developed to overcome numerous problems. Dynamic address translation required expensive specialized hardware that was difficult to build. Moreover, initial implementations slightly slowed down access to memory. There were also worries that new system-wide algorithms utilizing secondary storage would be less effective than previously used application-specific algorithms.

By 1969 the debate over virtual memory for commercial computers was over. An IBM research team led by David Sayre showed that their virtual memory overlay system consistently worked better than the best manually controlled systems.

Possibly the first minicomputer to introduce virtual memory was the Norwegian NORD-1. During the 1970s, other minicomputers implemented virtual memory, notably VAX models running VMS.

Virtual memory was introduced to the x86 architecture with the protected mode of the Intel 80286 processor; however, its segment swapping technique was discovered to scale poorly to larger segment sizes. The Intel 80386 introduced support for paging underneath the existing segmentation layer, and the page fault exception could be chained with other exceptions without causing a double fault. The 80386 did not have a Translation Lookaside Buffer (TLB), which made loading segment descriptors expensive and caused operating system designers to rely strictly on paging rather than a combination of paging and segmentation.

Paged virtual memory

Almost all implementations of virtual memory divide the virtual address space of an application program into pages; a page is a block of contiguous virtual memory addresses. Pages are usually at least 4 KiB (4×1024 bytes) in size, and systems with large virtual address ranges or large amounts of real memory generally use larger page sizes.

Page tables

Almost all implementations use page tables to translate the virtual addresses seen by the application program into physical addresses (also referred to as "real addresses") used by the hardware to process instructions. The hardware that handles this specific translation or mapping function between logical addresses and physical addresses is often known as the Dynamic Address Translation (DAT) box or as the Memory Management Unit (MMU). Each entry in the page table contains a mapping for a virtual page to either a subsidiary page table, the real memory address at which the page is stored, or an indicator that the page is currently held in a disk file. (Although most do, some systems may not support use of a disk file for virtual memory.)

Systems can have one page table for the whole system, a separate page table for each application, a separate page table for each segment, a tree of page tables for large segments or some combination of these. If there is only one, different applications which are running at the same time share a single virtual address space, i.e. they use different parts of a single range of virtual addresses. Systems that use multiple page or segment tables provide multiple virtual address spaces—concurrent applications think they are using the same range of virtual addresses, but their separate page tables redirect to different real addresses.

Dynamic address translation

If, while executing an instruction, a CPU fetches an instruction located at a particular virtual address, or fetches data from a specific virtual address or stores data to a particular virtual address, the virtual address must be translated to the corresponding physical address. This is done by a hardware component, sometimes called a memory management unit, which looks up the real address (from the page table) corresponding to a virtual address and passes the real address to the parts of the CPU which execute instructions. If the page tables indicate that the virtual memory page is not currently in real memory, the hardware raises a page fault exception (special internal signal) which invokes the paging supervisor component of the operating system (see below).

Paging supervisor

This part of the operating system creates and manages page tables. If the address translation hardware raises a page fault exception, the paging supervisor accesses secondary storage, returns the page containing the required virtual address, updates the page tables to reflect the physical location of the virtual address and finally tells the dynamic address translation mechanism to restart the request. When all physical memory is already in use as is typical, the paging supervisor must free an area in primary storage to hold the swapped-in page. Freeing memory minimally requires updating the paging table to say that the page is in secondary storage. The supervisor saves time by not re–swapping pages that are already present in secondary storage.

Paging supervisors generally choose the page that has been least recently used, guessing that such pages are less likely to be requested. Every time the dynamic address translation hardware matches a virtual address with a real physical memory address, it time-stamps the page table entry for that virtual address.

Permanently resident pages

Operating Systems have memory areas that are "pinned" — i.e., never swapped to secondary storage. For example:

Interrupt mechanisms generally rely on an array of pointers to their handlers (I/O completion, timer event, program error, page fault, etc.). If the pages containing these pointers or the code that they invoke were pageable, interrupt-handling would become far more complex and time-consuming; particularly in the case of page fault interrupts.

At least some part of the page table structures are almost always not pageable.

Data buffers that are accessed directly, for example by peripheral devices that use direct memory access (DMA) or by I/O channels. Usually such devices and the buses (connection paths) to which they are attached use physical memory addresses rather than virtual memory addresses. Even on buses with an IOMMU, which is a special memory management unit that can translate virtual addresses used on an I/O bus to physical addresses, the transfer cannot be stopped if a page fault occurs and then restarted when the page fault has been processed. So pages containing locations to which or from which a peripheral device is transferring data are either permanently pinned down or pinned down while the transfer is in progress.

Timing-dependent kernel/application areas cannot tolerate the varying response time caused by paging. In particular the paging supervisor code or drivers for secondary storage devices must not be swapped out.

Virtual=real operation

In OS/VS1, OS/VS2 (SVS), MVS, z/OS, and similar OSes, some parts of systems memory are managed in virtual=real mode, where every virtual address corresponds to a real address. Those are:

interrupt mechanisms

paging supervisor and page tables, in older systems

application programs that use non-standard methods of managing I/O, e.g., reading into an active channel program.

IBM's z/OS has 3 modes, V=V (virtual=virtual; fully pageable), V=R and V=F (virtual = fixed, i.e. "pinned down" but with DAT operating).

Segmented virtual memory

Some systems, such as Burroughs B5500, do not use paging to implement virtual memory. Instead, they use segmentation, that divide virtual address spaces into variable-length segments. A virtual address consists of a segment number and an offset within the segment.

Notably, the Intel 80286 supports a similar segmentation scheme as an option, but it is little used.

It is possible to combine segmentation and paging, dividing each segment into pages. In systems that combine them, such as Multics, IBM System/38 and IBM System i machines, virtual memory is usually implemented with paging, with segmentation providing memory protection.

In the Intel 80386 and later IA-32 processors, the segments reside in a 32-bit linear, paged address space. Segments can be moved in and out of that space, and pages in that space can "page" in and out of main memory, providing two levels of virtual memory; however, few if any operating systems do so. Instead, they use only paging. Early x86 virtualization solutions (non-hardware-assisted) combined paging and segmentation because x86 paging offers only two protection domains, whereas a VMM / guest OS / guest applications stack (typically running on rings 0 / 1 / 3) needs three memory protection domains.:22

The difference between paging and segmentation systems is not only about memory division. In Multics, System/38 and Prime machines, segmentation is visible to user processes, as part of memory model semantics. In other words, instead of memory that looks like a single large vector, memory is structured into multiple spaces.

This difference has important consequences; a segment isn't just a "page with a variable length", or a simple way to lengthen the address space (as in Intel 80286). In Multics, segmentation that can provide a single-level memory model, in which there is no differentiation between "process memory" and "file system". I.e., a process's active address space for both code and data consists of only a list of segments (files) which are mapped into the process's potential address space.

This is not the same as the mechanisms provided in TENEX and TOPS-20 (PMAP), modern Unix-like systems (mmap), and Win32 systems (MapViewOfFile), because inter-file pointers don't work when mapping files into semi-arbitrary places. In Multics, a file (or a segment from a multi-segment file) is mapped into a segment in the address space, so files are always mapped at a segment boundary. A file's linkage section can contain pointers that are "unsnapped links"; an attempt to load the pointer into a register or make an indirect reference through it causes a trap (a field in the pointer can have a value that specifies that an attempt to dereference it should cause a trap). The unresolved pointer contains an indication of the name of the segment to which the pointer refers, and an offset within the segment; the handler for the trap will "snap the link" the pointer by mapping the segment into the address space, putting the segment number into the pointer, and changing the tag field in the pointer so that it no longer causes the trap), and return to the code where the trap occurred, re-executing the instruction that caused the trap. This eliminates the need for a linker completely. This also works when different processes map the same file into different places in their private address spaces.

Avoiding thrashing

When paging is used a potential problem called "thrashing" can occur, in which the computer spends a disproportionate amount of its capacity swapping pages to and from a backing store and therefore performs useful work more slowly. Adding real memory is the simplest response, although improving application design, scheduling, and memory usage can help.

(source:wikipedia0

Multiprocessing

Multiprocessing is the use of two or more central processing units (CPUs) within a single computer system. The term also refers to the ability of a system to support more than one processor and/or the ability to allocate tasks between them. There are many variations on this basic theme, and the definition of multiprocessing can vary with context, mostly as a function of how CPUs are defined (multiple cores on one die, multiple dies in one package, multiple packages in one system unit, etc.).

Multiprocessing sometimes refers to the execution of multiple concurrent software processes in a system as opposed to a single process at any one instant. However, the terms multitasking or multiprogramming are more appropriate to describe this concept, which is implemented mostly in software, whereas multiprocessing is more appropriate to describe the use of multiple hardware CPUs. A system can be both multiprocessing and multiprogramming, only one of the two, or neither of the two.

Types

Processor symmetry

In a multiprocessing system, all CPUs may be equal, or some may be reserved for special purposes. A combination of hardware and operating-system software design considerations determine the symmetry (or lack thereof) in a given system. For example, hardware or software considerations may require that only one CPU respond to all hardware interrupts, whereas all other work in the system may be distributed equally among CPUs; or execution of kernel-mode code may be restricted to only one processor (either a specific processor, or only one processor at a time), whereas user-mode code may be executed in any combination of processors. Multiprocessing systems are often easier to design if such restrictions are imposed, but they tend to be less efficient than systems in which all CPUs are utilized.

Systems that treat all CPUs equally are called symmetric multiprocessing (SMP) systems. In systems where all CPUs are not equal, system resources may be divided in a number of ways, including asymmetric multiprocessing (ASMP), non-uniform memory access (NUMA) multiprocessing, and clustered multiprocessing (qq.v.).

Instruction and data streams

In multiprocessing, the processors can be used to execute a single sequence of instructions in multiple contexts (single-instruction, multiple-data or SIMD, often used in vector processing), multiple sequences of instructions in a single context (multiple-instruction, single-data or MISD, used for redundancy in fail-safe systems and sometimes applied to describe pipelined processors or hyper-threading), or multiple sequences of instructions in multiple contexts (multiple-instruction, multiple-data or MIMD).

Processor coupling

Tightly-coupled multiprocessor systems contain multiple CPUs that are connected at the bus level. These CPUs may have access to a central shared memory (SMP or UMA), or may participate in a memory hierarchy with both local and shared memory (NUMA). The IBM p690 Regatta is an example of a high end SMP system. Intel Xeon processors dominated the multiprocessor market for business PCs and were the only x86 option until the release of AMD's Opteron range of processors in 2004. Both ranges of processors had their own onboard cache but provided access to shared memory; the Xeon processors via a common pipe and the Opteron processors via independent pathways to the system RAM.

Chip multiprocessors, also known as multi-core computing, involves more than one processor placed on a single chip and can be thought of the most extreme form of tightly-coupled multiprocessing. Mainframe systems with multiple processors are often tightly-coupled.

Loosely-coupled multiprocessor systems (often referred to as clusters) are based on multiple standalone single or dual processor commodity computers interconnected via a high speed communication system (Gigabit Ethernet is common). A Linux Beowulf cluster is an example of a loosely-coupled system.

Tightly-coupled systems perform better and are physically smaller than loosely-coupled systems, but have historically required greater initial investments and may depreciate rapidly; nodes in a loosely-coupled system are usually inexpensive commodity computers and can be recycled as independent machines upon retirement from the cluster.

Power consumption is also a consideration. Tightly-coupled systems tend to be much more energy efficient than clusters. This is because considerable economies can be realized by designing components to work together from the beginning in tightly-coupled systems, whereas loosely-coupled systems use components that were not necessarily intended specifically for use in such systems.

Software implementation issues

Flynn's taxonomy

Single

Instruction Multiple

Instruction

Single

Data SISD MISD

Multiple

Data SIMD MIMD

SISD multiprocessing

In a single instruction stream, single data stream computer one processor sequentially processes instructions, each instruction processes one data item. One example is the "von Neumann" architecture with RISC.

SIMD multiprocessing

In a single instruction stream, multiple data stream computer one processor handles a stream of instructions, each one of which can perform calculations in parallel on multiple data locations.

SIMD multiprocessing is well suited to parallel or vector processing, in which a very large set of data can be divided into parts that are individually subjected to identical but independent operations. A single instruction stream directs the operation of multiple processing units to perform the same manipulations simultaneously on potentially large amounts of data.

For certain types of computing applications, this type of architecture can produce enormous increases in performance, in terms of the elapsed time required to complete a given task. However, a drawback to this architecture is that a large part of the system falls idle when programs or system tasks are executed that cannot be divided into units that can be processed in parallel.

Additionally, programs must be carefully and specially written to take maximum advantage of the architecture, and often special optimizing compilers designed to produce code specifically for this environment must be used. Some compilers in this category provide special constructs or extensions to allow programmers to directly specify operations to be performed in parallel (e.g., DO FOR ALL statements in the version of FORTRAN used on the ILLIAC IV, which was a SIMD multiprocessing supercomputer).

SIMD multiprocessing finds wide use in certain domains such as computer simulation, but is of little use in general-purpose desktop and business computing environments.

MISD multiprocessing

MISD multiprocessing offers mainly the advantage of redundancy, since multiple processing units perform the same tasks on the same data, reducing the chances of incorrect results if one of the units fails. MISD architectures may involve comparisons between processing units to detect failures. Apart from the redundant and fail-safe character of this type of multiprocessing, it has few advantages, and it is very expensive. It does not improve performance. It can be implemented in a way that is transparent to software. It is used in array processors and is implemented in fault tolerant machines.

MIMD multiprocessing

MIMD multiprocessing architecture is suitable for a wide variety of tasks in which completely independent and parallel execution of instructions touching different sets of data can be put to productive use. For this reason, and because it is easy to implement, MIMD predominates in multiprocessing.

Processing is divided into multiple threads, each with its own hardware processor state, within a single software-defined process or within multiple processes. Insofar as a system has multiple threads awaiting dispatch (either system or user threads), this architecture makes good use of hardware resources.

MIMD does raise issues of deadlock and resource contention, however, since threads may collide in their access to resources in an unpredictable way that is difficult to manage efficiently. MIMD requires special coding in the operating system of a computer but does not require application changes unless the programs themselves use multiple threads (MIMD is transparent to single-threaded programs under most operating systems, if the programs do not voluntarily relinquish control to the OS). Both system and user software may need to use software constructs such as semaphores (also called locks or gates) to prevent one thread from interfering with another if they should happen to cross paths in referencing the same data. This gating or locking process increases code complexity, lowers performance, and greatly increases the amount of testing required, although not usually enough to negate the advantages of multiprocessing.

Similar conflicts can arise at the hardware level between processors (cache contention and corruption, for example), and must usually be resolved in hardware, or with a combination of software and hardware (e.g., cache-clear instructions).

(source:wikipedia)