Electronic Design

  
Reprints     Printer-Friendly    Email this Article    RSS        Font Size     What's This?


[Design View / Design Solution]
Master On-Chip Embedded Multiprocessor Coherence
Although snoopy virtual-bus approaches are the first step, hybrid snoopy-directory schemes will be the next trend in embedded coherence.

Sanjay Vishin  |   ED Online ID #12219  |   March 30, 2006


Without a doubt, embedded systems-on-a-chip (SoCs) are becoming "software-rich,"1 and they're incorporating more and more processors on one chip. The driving forces behind these changes are advances in fabrication technology (propelled by Moore's Law) to address short time-to-market pressures, greater design complexity, and the amortizing of high-cost ASIC fabrication through design reuse.

There's also the economic benefit of higher performance with backward-compatibility to a single-threaded model of computation (the so-called Von Neumann model). That model has long plagued general-purpose computing. Now, such a performance benefit becomes applicable to high-throughput, software-rich embedded SoCs. Examples include high-end set-top boxes, smart phones, automotive media centers, and printer/copier stations.

Current high-end embedded SoCs are mostly heterogeneous. The processors on these SoCs communicate through noncoherent, shared memory using some form of message passing. The classic RISC/DSP combination in a third-generation cell phone communicating through a dual-ported SRAM and interrupts represents a good example of these simple schemes.

When sheer clock-speed scaling ran out of steam, maintaining this single-threaded programming abstraction forced general-purpose uniprocessor designers to resort to dual- or quad-processor coherent systems. The same will happen for these software-rich, high-performance embedded systems—with slight modifications.

Future high-performance SoCs will be hierarchical and heterogeneous systems of processors with coherent clusters of homogeneous multiprocessors embedded in the hierarchy. Some of this transition already has been observed in one specific high-performance embedded market: networking (in the form of coherent network multiprocessors).2,3

The exact nature of future embedded chip multiprocessors (CMPs) is debatable (heterogeneous versus heterogeneous with hierarchical homogeneous processors). But for many of them, shared memory with coherence will be an important issue.

Definition And Basics
A multicore shared-memory system with caches is considered to be cache-coherent if the value returned by any Load (issued by a processor) is always the value of the latest Store to that memory location. To address the ambiguity of the term "latest Store," we're forced to take a small diversion into memory models. We use the help of a common memory model like sequential consistency (SC), where the results of any execution of a parallel program on an SC system make it possible to construct a global serial order of all operations (mainly Loads and Stores) to a location. Then coherence implies:

  • The order of Loads and Stores from each processor appears in the system's global serial order in the same way in which they were issued to the memory system by that processor.
  • The value returned by each read from a processor in the system is the value written by the last write to that location in the global serial order.

Therefore, the term "global serial order" is a product of the memory consistency model (memory model for short) implemented by the system (informally termed Weak, Strong...). The memory model relates to the instruction set architecture (ISA) for single processors, which defines the operational contract between the compiler and the hardware (Fig. 1).

The ISA defines the contract between the programmer and the memory system for a multiprocessor or, more generally speaking, a multithreaded system. Hence, multithreaded languages like Java also have a defined memory model. In this article, most occurrences of multiprocessing can be substituted with multithreading.

SC, total store ordering (TSO), and processor consistency (PC) are some of the common memory models at the machine level (from strong to weak). Stronger implies that more constraints are imposed on the parallel memory-system implementer, which makes the tasks performed by the parallel middleware or system-library writer a bit simpler.

Another way to look at coherence is that it's the weakest form of memory consistency, since it doesn't restrict memory operations any more than what is necessary to provide a reasonable memory system from a single-processor point of view. Informally, stronger models help the programmer by ensuring that a parallel memory system guarantees more than just "Reads return the value from the latest Store." These added guarantees are typically used to form efficient synchronizing constructs between threads or processors.

To achieve coherence, a system must have a few essential properties. For one, Writes to a particular memory location must be serialized at some point in the system. Note that serialization is a logical concept. For some high-performance speculative implementations, it's only a guideline for returning transactions during commit. It's similar to "out-of-order" processors, which maintain a temporary state and an "architectural state" separated by a commit point.

Another property of coherent systems is Write propagation, which implies that a Write needs to eventually propagate to all agents that care about the new value. The third important property (a result of the memory model rather than coherence) is Write atomicity, which implies that a write needs to be propagated in its entirety to all processors in the system after they're serialized.


<-- prev. page     [1] 2 3     next page -->

Reprints   Printer-Friendly  Email this Article  RSS    Font Size   What's This?


  • In EDA, A Year Of Mergers, Failed And Otherwise
  • 2008 BEST Electronic Design Winners
  • Engineers Rely On Internet For Product Info
  • Rochester Electronics Establishes New Design and Technology Group
  • November 17, 2008
  • Custom Sources Light Way To 22-nm IC Lithography
  • Software Turns Scopes Into Vector RF Signal Analyzers
  • Couple’s $15 Million Gift Advances Rice Engineering Education
    1) Behind The Bright Lights, LED Drivers Evolve To Meet New Requirements
    (1085 views today)
    2) Build A Smart Battery Charger Using A Single-Transistor Circuit
    (305 views today)
    3) Ten Top Design Skills For Tough Times
    (285 views today)
    4) Easily Convert Decimal Numbers To Their Binary And BCD Formats
    (189 views today)
    5) Wi-Fi Chips Stand Out In A Sea Of Wireless Products
    (185 views today)
    ALL TOP 20



    Reader Comments

    Full of buzzwords but the explanations were very good as are the footnoted references.

    Don Wilde -April 05, 2006   (Article Rating: )

    POST YOUR COMMENTS HERE
    Name:

    Email:
    Your Comments:

    Enter the text from the image below


    Please refresh the page if you have trouble reading this text.

    Search Electronic Design
         
      
     
    Web Seminar
    Sponsored By:
    Title: Read Pacing: A Performance Enhancing Feature of PCI Express Gen 2 Switch Devices
    Speakers: 
    Date: 07/01/08
    Register: 

    Electronic Design Europe Electronic Design China EEPN Power Electronics Auto Electronics Microwaves & RF
    Mobile Dev & Design Schematics Find Power Products Military Electronics EE Events Related Resources