Abstractions for parallel computing

« 2025/08 »
일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

from 몸에 새기게 공부다 2013. 3. 21. 05:07

Three models of communication
- Shared address space
- Message passing
- Data parallel
Shared address space model: Abstraction
- 공유 변수는 게시판 같은거 아무나 와서 쓰고 읽고한니깐
- Threads communicate by
  - reading/writing to shared variables
    - interprocessor communication is implicit in memory operations
    - Thread 1 stores to X. Later, thread 2 read X
  - Manipulating synchronization primitives

Natural extension of sequential programming model

Shared address space model: implementation @ OS
- Option1: threads share an address space ( all data is sharable)
- Option2: each thread has its own virtual address space, shared portion of address spaces maps to same physical location

Non-uniform memory access NUMA
- in reality, All processors can access any memory location, but … cost of memory access (latency or bandwidth) is different for different processor
- problem with preserving uniform access time: scalability
  - good: costs are uniform, bad: but memory is uniform far away
- NUMA design are more scalable
  - high bandwidth to local memory; low latency access to local memory
- Increased programmer effort: performance tuning

Shared address space summary
- Communication
  - 공유 변수를 스레드가 읽고 쓴다
  - 동기화 primitives를 써서 조절한다 (lock, semaphors)
  - 단일 프로세서 프로그램의 확장판이다- NUMA는 조금 다른 이야기
- Hardware support to make implementations efficient
  - 어떤 프로세서건 어떤 주소를 읽고 쓸수 있어
  - NUMA design은 확장성이 uniform memory access 보다 조아
Message passing model : abstraction
- threads operate within independent address spaces
- threads communicate by sending/receiving messages
  - explicit – point to point
  - send: specifies buffer to be transmitted, recipient, optional message tag
  - receive: specifies buffer to store data, sender and optional message tag
  - messages may be synchronous or asynchronous
Message passing model: implementation
- 인기 라이브러리 – MPI
- 난점: 메시지 버퍼링 ( 어플리케이션이 받을때까지), 메모리 복사 비용을 최소화 하는 것
- 하드웨어가 시스템 전체의 load와 store를 할 필요 없어
  - 독립적인 시스템을 연결해서 만들어, 클러스터
프로그래밍 모델과 실제 머신 타입을 연결시키는 일은 어려워
- 메시지 패싱 추상화가 주소 공유 모델의 하드웨어를 사용하는 경우가 흔해
- Can implement shared address space abstraction on machines that do not support in HW
  - mark all pages with shared variables as invalid
  - page-fault handler issues appropriate network requests
- Keep in mind what is the programming model ( abstractions used to specific program) and what is the HW implementation; 추상화 모델과 실제 적용에 차이를 기억할것
Data parallel model

변치 않는 계산 구조체
역사: 같은 작업을 array안의 각 element에 적용하기
- SIMD 스타일
  - connection machine: thousands of processors, one instruction
  - Cray supercomputer vector processors
    - Add (A,B,n) : add array A and B with length n
Matlab이 좋은 예
SPMD programming (single program, multi data)
- map (function, collection)
- where function may be a complicated sequence of logic
- application of function to each element of collection is independent
  - in pure form : no communication between iteration of map
- synchronization is implicit at the end of the map
Stream programming

한가지의 함수를 데이터덩어리에 맵핑하는 것; functional = no side effect; no communication among invocation

하지만 실제로 인기 있는 opencl, cuda의 경우 imperative style에도 유연하게 작동해

side effect 없는 순수 함수 (cannot write a non-deterministic program)
예측가능 데이터 접근으로 자료를 가져오고, 입력과 출력의 순간이 미리 정해져 있어
생산자-소비자 지역성; 컴파일러가 알아서 하는 일이 많아
단점:

복잡한 자료 흐름의 경우에는 별도의 연산자가 필요해
컴파일러가 충분히 똑똑하길 빌며 프로그램하자
실제 사용하다 보면, 원하는 연산자를 찾지 못하는 경우가 있다

l'amor che move il sole e l'altre stelle

Recent Post

Recent Comment

Calendar

Archive

Link

Abstractions for parallel computing

티스토리툴바