ํ•˜๋‘ก ๋งต๋ฆฌ๋“€์Šค ํ”„๋ ˆ์ž„์›Œํฌ

๋งต๋ฆฌ๋“€์Šค ํ”„๋ ˆ์ž„์›Œํฌ(MapReduce Framework)

๋งต๋ฆฌ๋“€์Šค ํ”„๋ ˆ์ž„์›Œํฌ๋ž€?

๋ฐ์ดํ„ฐ ์ค‘์‹ฌ ํ”„๋กœ์„ธ์‹ฑ(๋น…๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ)์€ ๋น„์‹ธ๊ณ  ๋ณต์žกํ•œ ์—ฐ์‚ฐ๋“ค์ด ์š”๊ตฌ๋œ๋‹ค. ๋ณต์žกํ•œ ์—ฐ์‚ฐ์„ ์œ„ํ•ด ์ปดํ“จํ„ฐ์˜ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” scale-up(์ˆ˜์ง์  ์„ฑ๋Šฅ ํ–ฅ์ƒ) ๋ฐฉ๋ฒ•๊ณผ scale-out(์ˆ˜ํ‰์  ์„ฑ๋Šฅ ํ–ฅ์ƒ) ๋ฐฉ๋ฒ•์ด ์žˆ๋Š”๋ฐ, scale-outํ•œ ๋ฐฉ๋ฒ•์ด ๋น„์šฉ ์ธก๋ฉด์—์„œ ๋”์šฑ ์œ ๋ฆฌํ•˜๋‹ค.

๋งต๋ฆฌ๋“€์Šค ํ”„๋ ˆ์ž„์›Œํฌ๋ž€ ์ €๋ ดํ•œ ์ปดํ“จํ„ฐ๋ฅผ ๋ชจ์•„ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ๋งŒ๋“ค๊ณ  ์ด๋ฅผ ํ†ตํ•ด ๋น…๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๋„๋ก scalableํ•œ ๋ณ‘๋ ฌ ์†Œํ”„ํŠธ์›จ์–ด๋ฅผ ๊ตฌํ˜„ํ•˜๊ธฐ ์œ„ํ•œ ํ”„๋กœ๊ทธ๋ž˜๋ฐ ๋ชจ๋ธ์ด๋‹ค.

๊ตฌ๊ธ€ ๋งต๋ฆฌ๋“€์Šค(MapReduce), ์•„ํŒŒ์น˜ ํ•˜๋‘ก(Hadoop) ์˜คํ”ˆ ์†Œ์Šค๋Š” ๋งต๋ฆฌ๋“€์Šค ํ”„๋ ˆ์ž„์›Œํฌ์˜ ๊ตฌํ˜„์ฒด์ด๋‹ค.

MapReduce Phase

  • ๋งต ํŽ˜์ด์ฆˆ(Map Phase)

    • ๊ฐ€์žฅ ๋จผ์ € ์ˆ˜ํ–‰๋˜๋ฉฐ ๋ฐ์ดํ„ฐ์˜ ์—ฌ๋Ÿฌ ํŒŒํ‹ฐ์…˜์— ๋ณ‘๋ ฌ ๋ถ„์‚ฐ ํ˜ธ์ถœ๋˜์–ด ์ˆ˜ํ–‰๋จ
    • ๊ฐ ๋จธ์‹ ๋งˆ๋‹ค ์ˆ˜ํ–‰๋˜๋Š” Mapper๋Š” ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ํ•œ ์ค„๋งˆ๋‹ค Map ํ•จ์ˆ˜๋ฅผ ํ˜ธ์ถœํ•จ
    • Map ํ•จ์ˆ˜๋Š” (Key, Value) ์Œ ํ˜•ํƒœ๋กœ ๊ฒฐ๊ณผ๋ฅผ ์ถœ๋ ฅํ•จ
    • Map ํ•จ์ˆ˜ ๊ฒฐ๊ณผ ๊ฐ™์€ Key๋ฅผ ๊ฐ€์ง„ ์Œ์€ ๊ฐ™์€ ๋จธ์‹ ์œผ๋กœ ๋ณด๋‚ด์ง
  • ์…”ํ”Œ๋ง ํŽ˜์ด์ฆˆ(Shuffling Phase)

    • ๋ชจ๋“  ๋จธ์‹ ์˜ ๋งต ํŽ˜์ด์ฆˆ๊ฐ€ ์ข…๋ฃŒ๋˜๋ฉด ์‹œ์ž‘๋จ
    • Map ํ•จ์ˆ˜์— ์˜ํ•ด ๊ฐ ๋จธ์‹ ์œผ๋กœ ๋ณด๋‚ด์ง„ (Key, Value) ์Œ์„ Key๋ฅผ ์ด์šฉํ•ด ์ •๋ ฌ
    • ๊ฐ™์€ Key๋ฅผ ๊ฐ€์ง„ ์Œ๋ผ๋ฆฌ ๋ชจ์•„ Value-List๋ฅผ ๋งŒ๋“  ๋’ค, (Key, Value-List) ์Œ ํ˜•ํƒœ๋กœ ์—ฌ๋Ÿฌ ๋จธ์‹ ์— ๋ถ„์‚ฐํ•˜์—ฌ ๋ณด๋ƒ„
  • ๋ฆฌ๋“€์Šค ํŽ˜์ด์ฆˆ(Reduce Phase)

    • ๋ชจ๋“  ๋จธ์‹ ์˜ ์…”ํ”Œ๋ง ํŽ˜์ด์ฆˆ๊ฐ€ ์ข…๋ฃŒ๋˜๋ฉด ์‹œ์ž‘๋จ
    • Reduce ํ•จ์ˆ˜๋Š” ๋จธ์‹ ์˜ ๊ฐ (Key, Value-List) ์Œ๋งˆ๋‹ค ํ˜ธ์ถœ๋จ

ํ•˜๋‘ก(Hadoop)

์•„ํŒŒ์น˜์˜ ๋งต๋ฆฌ๋“€์Šค ํ”„๋ ˆ์ž„์›Œํฌ ์˜คํ”ˆ์†Œ์Šค

ํ•˜๋‘ก ๋ถ„์‚ฐ ํŒŒ์ผ ์‹œ์Šคํ…œ(HDFS, Hadoop Distributed File System)

๋น…๋ฐ์ดํ„ฐ ํŒŒ์ผ์„ ์—ฌ๋Ÿฌ ์ปดํ“จํ„ฐ์— ๋‚˜๋ˆ„์–ด ์ €์žฅํ•˜๋Š” ๊ธฐ์ˆ . ๊ฐ ํŒŒ์ผ์„ ์—ฌ๋Ÿฌ๊ฐœ์˜ ์ˆœ์ฐจ์ ์ธ ๋ธ”๋ก์œผ๋กœ ์ €์žฅํ•œ๋‹ค.

ํ•˜๋‚˜์˜ ํŒŒ์ผ์˜ ๊ฐ ๋ธ”๋ก์€ ์—ฌ๋Ÿฌ๋ฒˆ ๋ณต์‚ฌ๋˜์–ด ๋จธ์‹ ์˜ ๊ณณ๊ณณ์— ๋ถ„์‚ฐํ•˜์—ฌ ์ €์žฅ๋œ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์‹œ์Šคํ…œ ์ผ๋ถ€์—์„œ ๊ฒฐํ•จ์ด ๋ฐœ์ƒํ•˜๋”๋ผ๋„ ๋Š๊น€์—†์ด ๊ธฐ๋Šฅ์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค(fault tolerance).

์ฃผ์š” ๊ตฌ์„ฑ ์š”์†Œ

  • MapReduce : ์†Œํ”„ํŠธ์›จ์–ด ์ˆ˜ํ–‰์„ ๋ถ„์‚ฐ
  • HDFS : ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„์‚ฐ

Namenode์™€ Datanode

  • Namenode : Master, ํด๋ผ์ด์–ธํŠธ๊ฐ€ ํŒŒ์ผ์— ์ ‘๊ทผํ•˜๋Š” ํŒŒ์ผ ์‹œ์Šคํ…œ์„ ๊ด€๋ฆฌ
  • Datanode : Slaves, ์ปดํ“จํ„ฐ ๋‚ด ๋ฐ์ดํ„ฐ ์ ‘๊ทผ์„ ๊ด€๋ฆฌ

MapReduce ํ•จ์ˆ˜

  • Map ํ•จ์ˆ˜

    • ์ž…๋ ฅ ํ…์ŠคํŠธ ํŒŒ์ผ์˜ ๋ผ์ธ๋งˆ๋‹ค ํ˜ธ์ถœ๋จ
    • ์ž…๋ ฅ์€ (Key, Value-List) ํ˜•ํƒœ
    • Key : ์ž…๋ ฅ ํ…์ŠคํŠธ ํŒŒ์ผ ๊ฐ ๋ผ์ธ์˜ ์ฒซ๋ฒˆ์งธ ๋ฌธ์ž(character)๊นŒ์ง€ ์˜คํ”„์…‹
    • Value : ๊ฐ ๋ผ์ธ ์ „์ฒด ๋ฌธ์ž์—ด
  • Reduce ํ•จ์ˆ˜

    • Shuffling ํŽ˜์ด์ฆˆ์˜ ์ถœ๋ ฅ์„ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›๋Š” (Key, Value-List) ํ˜•ํƒœ
  • Combine ํ•จ์ˆ˜

    • ๋งŒ์•ฝ ํ˜ธ์ถœ๋œ๋‹ค๋ฉด Shuffling ํŽ˜์ด์ฆˆ ์ด์ „ Map ํŽ˜์ด์ฆˆ ์ดํ›„ ํ˜ธ์ถœ๋จ
    • Map ํ•จ์ˆ˜์˜ ์ถœ๋ ฅ ๊ฒฐ๊ณผ ํฌ๊ธฐ๋ฅผ ์ค„์ด๋Š” ์—ญํ• (Combine ํ•จ์ˆ˜๋ฅผ ํ˜ธ์ถœํ•œ ๋’ค Shuffling ํŽ˜์ด์ฆˆ์™€ Reduce ํŽ˜์ด์ฆˆ๋ฅผ ๊ฑฐ์น˜๋ฉด ์—ฐ์‚ฐ ๋น„์šฉ์„ ์ ˆ๊ฐํ•  ์ˆ˜ ์žˆ์Œ)
    • ๊ฐ ๋จธ์‹ ์—์„œ Reduce ํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•˜๋Š” ๊ฒƒ์ฒ˜๋Ÿผ ์ˆ˜ํ–‰๋จ

MapReduce๋ฅผ ์ด์šฉํ•œ Word Counting ์•Œ๊ณ ๋ฆฌ์ฆ˜

1. Map Phase

๋จธ์‹  1๊ณผ ๋จธ์‹  2๊ฐ€ ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•œ๋‹ค. ๊ฐ ๋จธ์‹ ์—๋Š” ๋ฌธ์„œ๊ฐ€ ์—ฌ๋Ÿฌ๊ฐœ ์กด์žฌํ•˜๊ณ  ๊ฐ ๋ฌธ์„œ์˜ ํ…์ŠคํŠธ์—์„œ ๋‹จ์–ด๋ฅผ ์นด์šดํŠธํ•˜๊ณ  ์‹ถ๋‹ค.

<Machine 1>

  • Document 1 : Financial, IMF, Economics, Crisis
  • Document 2 : Financial, IMF, Crisis

<Machine 2>

  • Document 3 : Economics, Harry
  • Document 4 : Financial, Harry, Potter, Film
  • Document 5 : Crisis, Harry, Potter

๊ฐ ๋จธ์‹ ๋งˆ๋‹ค Mapper๊ฐ€ ์ˆ˜ํ–‰๋˜๊ณ  ๋‹ค์‹œ Mapper๋Š” ๋ฌธ์„œ์˜ ๊ฐ ๋ผ์ธ๋งˆ๋‹ค Map ํ•จ์ˆ˜๋ฅผ ํ˜ธ์ถœํ•œ๋‹ค. ์‹คํ–‰ ๊ฒฐ๊ณผ๋Š” ๋‹ค์‹œ ๊ฐ™์€ Key๋ฅผ ๊ฐ€์ง„ ์Œ๋ผ๋ฆฌ ๋ชจ์ด๋„๋ก ์—ฌ๋Ÿฌ ๋จธ์‹ ์— ๋ถ„์‚ฐํ•˜์—ฌ ์ €์žฅํ•œ๋‹ค.

<Machine 1> ์— ์ €์žฅ

Key Value
Financial 1
Economics 1
Economics 1
Financial 1
Crisis 1
Financial 1
Film 1
Crisis 1
Crisis 1

<Machine 2> ์— ์ €์žฅ

Key Value
IMF 1
Harry 1
Harry 1
Potter 1
IMF 1
Harry 1
Potter 1

2. Shuffling Phase

Map ํŽ˜์ด์ฆˆ ๊ฒฐ๊ณผ ๊ฐ ๋จธ์‹ ์— ๋ถ„์‚ฐ ์ €์žฅ๋œ ๋ฐ์ดํ„ฐ์—์„œ ๊ฐ™์€ Key๋ฅผ ๊ฐ€์ง„ ์Œ๋ผ๋ฆฌ ๋ฌถ์–ด (Key, Value-List) ์Œ์œผ๋กœ ๋งŒ๋“ ๋‹ค. ๊ฒฐ๊ณผ๋Š” Key๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์ •๋ ฌํ•œ๋‹ค.

<Machine 1>

Key Value
Crisis 1, 1, 1
Economics 1, 1
Film 1
Financial 1, 1, 1

<Machine 2>

Key Value
Harry 1, 1, 1
IMF 1, 1
Potter 1, 1

3. Reduce Phase

Shuffling ํŽ˜์ด์ฆˆ ๊ฒฐ๊ณผ ๊ฐ ๋จธ์‹ ๋งˆ๋‹ค Reducer๊ฐ€ ์ˆ˜ํ–‰๋˜๊ณ  ๋‹ค์‹œ Reducer๋Š” ๊ฐ (Key, Value-List) ์Œ๋งˆ๋‹ค Reduce ํ•จ์ˆ˜๋ฅผ ํ˜ธ์ถœํ•œ๋‹ค.

<Machine 1>

Key Value
Crisis 3
Economics 3
Film 1
Financial 3

<Machine 2>

Key Value
Harry 3
IMF 2
Potter 2

๋ฒˆ์™ธ. Combine ํ•จ์ˆ˜ ์ ์šฉํ•˜๊ธฐ

์ผ๋ฐ˜์ ์œผ๋กœ ์œ„์™€ ๊ฐ™์ด Map ํŽ˜์ด์ฆˆ ๊ฒฐ๊ณผ๋ฅผ ์—ฌ๋Ÿฌ ๋จธ์‹ ์— ๋ถ„์‚ฐํ•˜์—ฌ ์ €์žฅํ•œ ์ฑ„ Shuffling ํŽ˜์ด์ฆˆ๋กœ ๋„˜์–ด๊ฐ„๋‹ค. Combine ํ•จ์ˆ˜๋Š” Map ํ•จ์ˆ˜ ๊ฒฐ๊ณผ๋ฅผ ๊ฐ ๋จธ์‹ ์— ๋ถ„์‚ฐ ์ €์žฅํ•˜๊ธฐ ์ „ ๊ฐ ๋จธ์‹  ๋ณ„ Map ํ•จ์ˆ˜ ๊ฒฐ๊ณผ์— ์ ์šฉํ•œ๋‹ค. Combine ํ•จ์ˆ˜๋ฅผ ํ˜ธ์ถœํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ถœ๋ ฅํ•œ๋‹ค.

<Machine 1>

Key Value
Financial 2
IMF 2
Economics 1
Crisis 2

<Machine 2>

Key Value
Economics 1
Harry 3
Financial 1
Potter 2
Film 1
Crisis 1

Combine ํ•จ์ˆ˜๋Š” ์œ„์™€ ๊ฐ™์ด Map ํ•จ์ˆ˜ ๊ฒฐ๊ณผ๋ฅผ ๋ถ€๋ถ„์ ์œผ๋กœ ํ•ฉ์‚ฐํ•˜์—ฌ Reduce ํ•จ์ˆ˜๊ฐ€ ํ•  ์ผ์„ ์‚ฌ์ „์— ์ˆ˜ํ–‰ํ•œ๋‹ค.

Combine ํ•จ์ˆ˜๋ฅผ ์ˆ˜ํ–‰ํ•œ ๊ฒฐ๊ณผ๋กœ Shuffling ํŽ˜์ด์ฆˆ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ฒฐ๊ณผ๋ฅผ ์ถœ๋ ฅํ•œ๋‹ค. ๋”ฐ๋ผ์„œ Reduce ํ•จ์ˆ˜ ์—ฐ์‚ฐ์ด ๋”์šฑ ๋นจ๋ผ์ง„๋‹ค.

Key Value
Financial 2, 1
IMF 2
Economics 1, 1
Crisis 2, 1
Harry 3
Film 1
Potter 2

Combine ํ•จ์ˆ˜

  • ๊ฐ ๋จธ์‹ ์—์„œ Map ํ•จ์ˆ˜๊ฐ€ ์ข…๋ฃŒ๋œ ๋’ค ํ˜ธ์ถœ๋˜๋ฉฐ, Reduce ํ•จ์ˆ˜์˜ ์—ญํ• ์„ ๋ถ€๋ถ„์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•œ๋‹ค.
  • ์…”ํ”Œ๋ง ๋น„์šฉ๊ณผ ๋„คํŠธ์›Œํฌ ํŠธ๋ž˜ํ”ฝ์„ ๊ฐ์†Œํ•˜์—ฌ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚จ๋‹ค.