This assignment gets you started with the basic tools you will need to complete all of your homework projects in Spark using Scala. This project will ensure that you have correctly installed Scala, SBT, Spark and IntelliJ.
You are a student who needs to install all the tools necessary to get started in CS4641.
In this assignment you will set up your computer to
Install Scala for system-wide use on your computer by downloading the appropriate distribution from the bottom of https://www.scala-lang.org/download/
Download and install a programmer’s text editor (you can also use IntelliJ as a general text editor, but it can be awkward for quick file editing). In this course we will prmiarily use IntelliJ, but it’s important to be comfortable with general-purpose text editors too.
Install Spark using the Spark instructions on the course web site.
Install SBT for your operating system using the instructions linked on the Getting Started with Scala and SBT on the Command Line page on docs.scala-lang.org.
cs4641.
cd command).
mkdir cs4641.cd cs4641.Create a subdirectory of your cs4641 directory named hw0.
On the command line, make sure you are in the hw0 folder. Enter these commands (remember that ‘$’ is the shell prompt (something like ‘C:\cs4641\hw0>’ on Windows) – don’t type the shell prompt character(s)):
$ scalac -version > hw0-output.txt
$ scala -version 2>> hw0-output.txt
Please note what is happening here:
>redirects the standard output of a program.2>(or2>>) redirectsstderr, which is used for diagnostics (such as version strings). The first line creates thehw0-output.txtfile, and the second line (with the extra>) adds more text to the file. Here is a nice discussion of the file descriptorsstdin,stdoutandstderr.What this means is that
>(or2>) will overwrite the file, so if you go back to repeat the first step, you’ll need to repeat all the other steps as well.
Open your text editor and create the following files and directories (substitute your loginID for loginID):
.
├── build.sbt
└── src
└── main
└── scala
└── edu
└── gatech
└── cs4641
└── loginID
└── HelloSpark.scala
7 directories, 2 files
In HelloSpark.scala enter the following Scala code (substitute your loginID for loginID):
package edu.gatech.cs4641.loginID.hw0;
import org.apache.spark.sql.SparkSession
object HelloSpark {
def main(args: Array[String]) {
val spark = SparkSession.builder.appName("Hello Spark").getOrCreate()
println(s"Spark version: ${spark.version}")
spark.stop()
}
}
In build.sbt enter these contents:
name := "Hello Spark"
version := "1.0"
scalaVersion := "2.12.8"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.3"
Compile and package your HelloSpark application with the following command (the first time you run it may take severla minutes):
sbt package
Run your application by submitting it to Spark (Note: you can use --local[2] if you have more than 2 cores):
spark-submit --class "edu.gatech.cs4641.loginID.hw0.HelloSpark" --master local[1] target/scala-2.12/hello-spark_2.12-1.0.jar
Lots of output will appear on the console.
Run the script again and add its output to hw0-output.txt by entering
Unix/Linux:
spark-submit --class "edu.gatech.cs4641.loginID.hw0.HelloSpark" --master local[1] target/scala-2.12/hello-spark_2.12-1.0.jar >> hw0-output.txt
Don’t forget the the double arrows in >>!
Most of the same output will appear on the console, except one line – the output of your println – which will be in hw0-output.txt.
hw0-output.txt file to ensure that it contains the scalac version string, the scala version string, and the output of running your HelloSpark program.hw0-output.txt FileAt this point your `hw0-output.txt file should contain
scalac version string,scala version string, andprintln statement in your HelloSpark program.If your hw0-output.txt file is missing any of those elements you should redo all the steps that add content to hw0-output.txt in each of the previous sections.
Submit your hw0-output.txt file on Canvas as an attachment. When you’re ready, double-check that you have submitted and not just saved a draft.
Practice safe submission! Verify that your HW files were truly submitted correctly, the upload was successful, and that your program runs with no syntax or runtime errors. It is solely your responsibility to turn in your homework and practice this safe submission safeguard. NOTE: Unlike TSquare, Canvas will not send an email indicating that your assignment has been submitted successfully. Follow the steps outlined below to ensure you have submitted correctly.
This procedure helps guard against a few things.