νλ‘ λ§΅λ¦¬λμ€ wordcount μμ
Hadoop μ€μΉ λ° MapReduce μκ³ λ¦¬μ¦ μ€μ΅
νκ²½ μ€μ
- VMWare μ€μΉ
- Ubuntu Linux CD image λ€μ΄λ‘λ
- VmWare κ°μλ¨Έμ μ μμ±ν λ€ μ΄μ체μ μ΄λ―Έμ§λ‘ Ubuntu μ΄λ―Έμ§ μ€μ νμ¬ κ°μ 리λ μ€ νκ²½μ ꡬμΆ
- μ½μμμ Hadoop μ€μΉ
- μ½μμμ Hadoop νκ²½ μ€μ
Linux, HDFS λλ ν 리 ꡬ쑰
Linux λλ ν 리
Home
βββ Project /* 맡리λμ€ νλ‘μ νΈ */
β βββ src /* 맡리λμ€ μ½λ */
β | βββ Driver.java /* 맡리λμ€ μ½λ μ»΄νμΌμ μν μλ° νμΌ */
β | βββ Wordcount.java
β βββ template /* ν
νλ¦Ώ */
β βββ datagen /* λ°μ΄ν° μμ±μ μν μ½λ */
β βββ data /* λ°μ΄ν°λ₯Ό μ μ₯ */
β βββ build.xml /* 맡리λμ€ μ½λ μ»΄νμΌμ μν μ€μ νμΌ */
|
...
HDFS λλ ν 리
hadoop
βββ wordcount_test /* 맡리λμ€ μ½λ μ€νμ μν λ°μ΄ν° λλ ν 리 */
βββ wordcount_test_out /* 맡리λμ€ μ½λ μ€ν κ²°κ³Όλ₯Ό μ μ₯νλ λλ ν 리 */
wordcount μμ μ€ν
- Project/src/Wordcount.java ꡬν
-
Project/src/Driver.java μμ νκΈ°
pgd.addClass("wordcount", Wordcount.java, "A map reduce program that perform word counting)
- κ° μΈμ : μ€νν alias, νμΌ, λ©νΈ
-
맡리λμ€ μ½λ μ»΄νμΌ
- Project λλ ν λ¦¬λ‘ μ΄λ
$ ant
λͺ λ Ήμ΄ μν
-
ν μ€νΈ λ°μ΄ν°λ₯Ό HDFSμ λ£κΈ°
- Projectλ‘ μ΄λ
$ hdfs dfs -mkdir wordcount_test
$ hdfs dfs -put data/wordcount-data.txt wordcount_test
-
μλ‘μ΄ λ§΅λ¦¬λμ€ μ€ν κ²°κ³Όλ₯Ό μ μ₯ν λλ ν 리λ₯Ό ν보νκΈ° μν΄ κΈ°μ‘΄ κ²°κ³Ό λλ ν 리 μμ
- Projectλ‘ μ΄λ
$ hdfs dfs -rm -r wordcount_test_out
-
맡리λμ€ μ½λ μ€ν
- Projectλ‘ μ΄λ
$ hadoop jar [jar file] [program name] [saved data dir] [result dir]
$ hadoop jar κ²°κ³Ό.jar wordcount wordcount_test wordcount_test_out
- [program name]μλ Driver.java νμΌμ λͺ μν aliasλ₯Ό λ£μ
-
μ€ν κ²°κ³Ό νμΈ
$ hdsf dfs -cat wordcount_test_out/part-r-00000 | more
$ hdsf dfs -cat wordcount_test_out/part-r-00001 | more
- Reducer κ°μ λ§νΌμ μΆλ ₯ νμΌμ΄ μμ±λ¨.(
Reducer κ°μ - 1
μ μΈλ±μ€λ‘ νμ¬ μ κ·Ό)