2016年11月28日月曜日

Hello world spark sbt on sandbox (HDP 2.5.0)

前準備

1)DockerバージョンのHDP Sandboxにログイン
ssh -p 2222 root@sandbox.hortonworks.com

2)SBTとVimをインストール
http://www.scala-sbt.org/release/docs/Installing-sbt-on-Linux.html
curl https://bintray.com/sbt/rpm/rpm | tee /etc/yum.repos.d/bintray-sbt-rpm.repo
yum install -y sbt vim

2.1)Vimがそのままだと見づらいので、ちょっと変更
http://bsnyderblog.blogspot.com.au/2012/12/vim-syntax-highlighting-for-scala-bash.html
mkdir -p ~/.vim/{ftdetect,indent,syntax} && for d in ftdetect indent syntax ; do curl -o ~/.vim/$d/scala.vim https://raw.githubusercontent.com/derekwyatt/vim-scala/master/syntax/scala.vim; done

実作業

1)作業用フォルダを作成し、必要なファイルを編集
http://spark.apache.org/docs/1.6.2/quick-start.html#self-contained-applications
mkdir scala && cd ./scala
mkdir -p ./src/main/scala
vim simple.sbt
name := "Simple Project"

version := "1.0"

scalaVersion := "2.10.5"

libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.2"

vim ./src/main/scala/SimpleApp.scala
/* SimpleApp.scala */
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object SimpleApp {
  def main(args: Array[String]) {
    val logFile = "YOUR_SPARK_HOME/README.md" // Should be some file on your system
    val conf = new SparkConf().setAppName("Simple Application")
    val sc = new SparkContext(conf)
    val logData = sc.textFile(logFile, 2).cache()
    val numAs = logData.filter(line => line.contains("a")).count()
    val numBs = logData.filter(line => line.contains("b")).count()
    println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
  }
}

2)パッケージ化
sbt package
...
[info] Packaging /root/scala/target/scala-2.10/simple-project_2.10-1.0.jar ...
[info] Done packaging.
[success] Total time: 98 s, completed Nov 24, 2016 11:35:26 PM

2.1)HDFS側の用意(プログラムを変えるのが面倒なので、変なフォルダ名)
hdfs dfs -mkdir YOUR_SPARK_HOME
locate README.md
hdfs dfs -put /usr/lib/hue/ext/thirdparty/js/test-runner/mootools-runner/README.md YOUR_SPARK_HOME

3)ジョブをサブミット!
[root@sandbox hdfs]# spark-submit --class "SimpleApp" --master local[1] --driver-memory 512m --executor-memory 512m --executor-cores 1 /root/scala/target/scala-2.10/simple-project_2.10-1.0.jar 2>/dev/null
Lines with a: 23, Lines with b: 10


3.1)Windowsでもトライ
http://www.ics.uci.edu/~shantas/Install_Spark_on_Windows10.pdf
https://wiki.apache.org/hadoop/WindowsProblems
Set the environment variable %HADOOP_HOME% to point to the directory above the BIN dir containing WINUTILS.EXE.

C:\Apps\spark-1.6.2-bin-hadoop2.6\bin>spark-submit --class "HdfsDeleteApp" c:\Users\Hajime\Desktop\hdfsdeleteapp-project_2.10-1.0.jar 2>nul 

0 件のコメント:

コメントを投稿