前準備
1)DockerバージョンのHDP Sandboxにログインssh -p 2222 root@sandbox.hortonworks.com
2)SBTとVimをインストール
http://www.scala-sbt.org/release/docs/Installing-sbt-on-Linux.html
curl https://bintray.com/sbt/rpm/rpm | tee /etc/yum.repos.d/bintray-sbt-rpm.repo
yum install -y sbt vim
2.1)Vimがそのままだと見づらいので、ちょっと変更
http://bsnyderblog.blogspot.com.au/2012/12/vim-syntax-highlighting-for-scala-bash.html
mkdir -p ~/.vim/{ftdetect,indent,syntax} && for d in ftdetect indent syntax ; do curl -o ~/.vim/$d/scala.vim https://raw.githubusercontent.com/derekwyatt/vim-scala/master/syntax/scala.vim; done
実作業
1)作業用フォルダを作成し、必要なファイルを編集http://spark.apache.org/docs/1.6.2/quick-start.html#self-contained-applications
mkdir scala && cd ./scala
mkdir -p ./src/main/scala
vim simple.sbt
name := "Simple Project"
version := "1.0"
scalaVersion := "2.10.5"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.2"
vim ./src/main/scala/SimpleApp.scala
/* SimpleApp.scala */
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object SimpleApp {
def main(args: Array[String]) {
val logFile = "YOUR_SPARK_HOME/README.md" // Should be some file on your system
val conf = new SparkConf().setAppName("Simple Application")
val sc = new SparkContext(conf)
val logData = sc.textFile(logFile, 2).cache()
val numAs = logData.filter(line => line.contains("a")).count()
val numBs = logData.filter(line => line.contains("b")).count()
println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
}
}
2)パッケージ化
sbt package
...
[info] Packaging /root/scala/target/scala-2.10/simple-project_2.10-1.0.jar ...
[info] Done packaging.
[success] Total time: 98 s, completed Nov 24, 2016 11:35:26 PM
2.1)HDFS側の用意(プログラムを変えるのが面倒なので、変なフォルダ名)
hdfs dfs -mkdir YOUR_SPARK_HOME
locate README.md
hdfs dfs -put /usr/lib/hue/ext/thirdparty/js/test-runner/mootools-runner/README.md YOUR_SPARK_HOME
3)ジョブをサブミット!
[root@sandbox hdfs]# spark-submit --class "SimpleApp" --master local[1] --driver-memory 512m --executor-memory 512m --executor-cores 1 /root/scala/target/scala-2.10/simple-project_2.10-1.0.jar 2>/dev/null
Lines with a: 23, Lines with b: 10
3.1)Windowsでもトライ
http://www.ics.uci.edu/~shantas/Install_Spark_on_Windows10.pdf
https://wiki.apache.org/hadoop/WindowsProblems
Set the environment variable %HADOOP_HOME% to point to the directory above the BIN dir containing WINUTILS.EXE.
Lines with a: 23, Lines with b: 10
3.1)Windowsでもトライ
http://www.ics.uci.edu/~shantas/Install_Spark_on_Windows10.pdf
https://wiki.apache.org/hadoop/WindowsProblems
Set the environment variable %HADOOP_HOME% to point to the directory above the BIN dir containing WINUTILS.EXE.
C:\Apps\spark-1.6.2-bin-hadoop2.6\bin>spark-submit --class "HdfsDeleteApp" c:\Users\Hajime\Desktop\hdfsdeleteapp-project_2.10-1.0.jar 2>nul
0 件のコメント:
コメントを投稿