NUMA

PallabRay3 1,526 views 14 slides Aug 24, 2016
Slide 1
Slide 1 of 14
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14

About This Presentation

No description available for this slideshow.


Slide Content

Implementing of Non-Uniform Memory Access (NUMA) Systems Project Submitted by PALLAB KUMAR RAY (ME 2014-10001) Under the supervision of Mr. Somak Das (Dept. of CSE/IT)

INTRODUCTION In a shared memory multiprocessor, all main memory is accessible to and shared by all processors. The cost of accessing shared memory is the same for all processors. In this case, from a memory access viewpoint they are called UMA or Uniform Memory Access Systems . A particular category of shared memory multiprocessor is NUMA or Non-Uniform Memory Access. It is a shared memory architecture that describes the placement of main memory modules with respect to processors in a multiprocessor system. Like most every other processor architectural feature, ignorance of NUMA can result in sub-par application memory performance.

NUMA In the NUMA shared memory architecture, each processor has its own local memory module that it can access directly with a distinctive performance advantage. At the same time, it can also access any memory module belonging to another processor using a shared bus (or some other type of interconnect) as seen in the diagram

Shared Memory In contrast to "shared nothing" architectures, memory is globally accessible under shared memory. Communication is anonymous; there is no explicit recipient of a shared memory access, as in message passing, and processors may communicate without necessarily being aware of one another. Shared memory provides 2 services: a. Direct access to another processor's local memory. b. Automatic   address mapping  of a (virtual) memory address onto a (processor, local memory address ) pair. Convergence of parallel architectures :- While cc-NUMA architectures add specialized support for shared memory, e.g. coherence control, they still rely on  fine-grained  message passing involving short messages. So do single sided architectures.  So it appears that designs are converging, with the important detailed handled through a combination of software and specialized support . The Cache Coherence Problem:- Owing to the use of cache memories in modern computer architectures, shared memory introduces the  cache coherence problem.   Cache coherence  arises with shared data that is to be written and read. If one processor modifies a shared cached value, then the other processor(s) must get the latest value . Coherence says nothing about  when  changes propagate through the memory sub system, only that they will eventually  happen. Other steps must be take (usually in software) to avoid  race conditions  could lead to  non -deterministic  program behavior . Program order :- If a processor writes and then reads the same location X, and there are no other intervening writes by other processors to X , then the read will always return the value previously written. Definition of a coherent view of memory :- If a processor P reads from location X that was previously written by a processor Q , then the read will return the value previously written, if a sufficient amount of time has elapsed between the read and the write. Serialization of writes :- Multiple writes to a location X happen sequentially. If two processors write to the same location, then other processors reading X will observe the same the sequence of values in the order written. If a 10 and then a 20 is written to X, then it is not possible for any processor to read 20 then 10.

3. Managing coherence:- There are two major strategies for managing coherence Snooping protocol.  In this bus-based scheme processors passively listen for bus activity, updating or invalidating cache entries as necessary. The scheme is ultimately non-scalable, and isn't appropriate for machine with tens of processors or more . b. Directory-based.  This is a scalable scheme employing point-to-point messages to handle coherence. A memory structure called a  directory  maintains information about data sharing. This scheme was first applied to cache-coherent multiprocessors by the DASH project at Stanford and is used in the SGI  Altix 3000 , which is a CC-NUMA architecture. NUMA stands for Non Uniform Memory Access time to memory. Depending on the location of the processor and the address accessed . The 3 goals of DASH are :- Scalable memory bandwidth Scalable cost (use commodity parts) Deal with large memory latencies

Memory Architecture block diagram the memory unit has 8-bit address bus, two 8-bit data buses – one for input and the one for output, a clock – input and 1-bit write-enable input.   When the write_enable is set HIGH (1), the incoming data (coming through the data_in bus) is first store in the memory address specified by the address bus, and then the newly written data is fetched from the same address and is output through the data_outbus .

Circuit diagram In this chip write_enable that is a input . Three more inputs are connectd to the chip. Those are the clk , the clk is HIGH all the time another is 8-bit data input bus, last one is 8-bit address bus. When the data input in the memory , the write_enable is HIGH. Data pass through the chip and the data stored in the address bus. When data is fetched , the output data is store in the data_out bus. That time the clk is high or enable.

Memory chip algorithm This is the algorithm use for the memory chip begin if ( write_enable ) begin memory[address] <= data_in ; end data_out <= memory[address];

An Example of the Memory Architecture When the Right Enable is LOW At this time CLk = (high) Data_in = X (don ’ t care) Address = (A6)H [10100110] Write_enable = 0 Then the Data_out = (8B)H [10001011] Data_out = memory [A6] this is using by this data_out <= memory [ address ]; this is diagram for the write _enable is low and how it works ,

When the Right Enable is HIGH At this time CLk = high Data_in = 9FH [10011111] Address = (A6)H [10100110] Write_enable = 1 clk =when High low Then the Mem[A6] = 9F ( data_in ) Data_out = mem[A6] Data_out = 9F(10011111)   this is using by this data_out <= memory [address];   Here when the Right Enable is High the DATA store in A6 address that is changed. The DATA is 9FH that is the present Data in Address A6.

Simulation Architecture of memory chip

Output

R eferences [1 ] https://books.google.co.in /books NUMA coding for VLSI and Source code. [2] “Introduction to NUMA architecture with shared memory.” https://computing.llnl.gov/tutorials/parallel_comp   [ 3] Intel in NUMA [4]“UMA-NUMA Scalability” www.cs.drexel.edu/~wmm24/cs281/lectures/ppt/cs282_lec12.ppt  

THANK YOU
Tags