下载安装包 sparkling-water-1.6.13.zip
wget http://h2o-release.s3.amazonaws.com/sparkling-water/rel-1.6/13/sparkling-water-1.6.13.zip或者http://h2o-release.s3.amazonaws.com/sparkling-water/rel-1.6/3/index.html
解压安装包
安装包上传到 /usr/local
cd /usr/local; unzip sparkling-water-1.6.13.zip;cd sparkling-water-1.6.13
启动sparkling-shell 运行脚本
sudo -u hdfs bin/sparkling-shell --num-executors 3 --executor-memory 2g --master yarn-client --conf "spark.dynamicAllocation.enabled=false" --master yarn-client运行案例:摘抄自https://github.com/h2oai/sparkling-water/tree/rel-1.6
1.Initialize H2O services on top of Spark cluster: scala> import org.apache.spark.h2o._ scala> val h2oContext = H2OContext.getOrCreate(sc) scala> import h2oContext._ scala> import h2oContext.implicits._2.Load weather data for Chicago international airport (ORD), with help from the RDD API:
scala> import org.apache.spark.examples.h2o._ scala> val weatherDataFile = "/tmp/examples/Chicago_Ohare_International_Airport.csv" #该路径为hdfs上的路径 scala> val wrawdata = sc.textFile(weatherDataFile,3).cache() scala> val weatherTable = wrawdata.map(_.split(",")).map(row => WeatherParse(row)).filter(!_.isWrongRow())3.Load airlines data using the H2O parser:
scala> import java.io.File scala> val dataFile = "/usr/local/sparkling-water-1.6.13/examples/smalldata/allyears2k_headers.csv.gz" #可以发现该本地路径随资源分类的结点发生变化 scala> val airlinesData = new H2OFrame(new File(dataFile))4.Select flights destined for Chicago (ORD):
scala> val airlinesTable : RDD[Airlines] = asRDD[Airlines](airlinesData) scala> val flightsToORD = airlinesTable.filter(f => f.Dest==Some("ORD"))5.Compute the number of these flights:
scala> flightsToORD.countscala> flightsToORD.count
res0: Long = 2103API:
http://h2o-release.s3.amazonaws.com/sparkling-water/rel-1.4/1/scaladoc/index.html#org.apache.spark.h2o.H2OContext