This assignment gets you started with the basic tools you will need to complete all of your homework projects in Spark using Scala. This project will ensure that you have correctly installed Scala, SBT, Spark and IntelliJ.
You are a student who needs to install all the tools necessary to get started in CS4641.
In this assignment you will set up your computer to
Install Scala for system-wide use on your computer by downloading the appropriate distribution from the bottom of https://www.scala-lang.org/download/
Download and install a programmer’s text editor (you can also use IntelliJ as a general text editor, but it can be awkward for quick file editing). In this course we will prmiarily use IntelliJ, but it’s important to be comfortable with general-purpose text editors too.
Install Spark using the Spark instructions on the course web site.
Install SBT for your operating system using the instructions linked on the Getting Started with Scala and SBT on the Command Line page on docs.scala-lang.org.
cs4641
.
cd
command).
mkdir cs4641
.cd cs4641
.Create a subdirectory of your cs4641
directory named hw0
.
On the command line, make sure you are in the hw0
folder. Enter these commands (remember that ‘$’ is the shell prompt (something like ‘C:\cs4641\hw0>’ on Windows) – don’t type the shell prompt character(s)):
$ scalac -version > hw0-output.txt
$ scala -version 2>> hw0-output.txt
Please note what is happening here:
>
redirects the standard output of a program.2>
(or2>>
) redirectsstderr
, which is used for diagnostics (such as version strings). The first line creates thehw0-output.txt
file, and the second line (with the extra>
) adds more text to the file. Here is a nice discussion of the file descriptorsstdin
,stdout
andstderr
.What this means is that
>
(or2>
) will overwrite the file, so if you go back to repeat the first step, you’ll need to repeat all the other steps as well.
Open your text editor and create the following files and directories (substitute your loginID for loginID
):
.
├── build.sbt
└── src
└── main
└── scala
└── edu
└── gatech
└── cs4641
└── loginID
└── HelloSpark.scala
7 directories, 2 files
In HelloSpark.scala
enter the following Scala code (substitute your loginID for loginID
):
package edu.gatech.cs4641.loginID.hw0;
import org.apache.spark.sql.SparkSession
object HelloSpark {
def main(args: Array[String]) {
val spark = SparkSession.builder.appName("Hello Spark").getOrCreate()
println(s"Spark version: ${spark.version}")
spark.stop()
}
}
In build.sbt
enter these contents:
name := "Hello Spark"
version := "1.0"
scalaVersion := "2.12.8"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.3"
Compile and package your HelloSpark application with the following command (the first time you run it may take severla minutes):
sbt package
Run your application by submitting it to Spark (Note: you can use --local[2]
if you have more than 2 cores):
spark-submit --class "edu.gatech.cs4641.loginID.hw0.HelloSpark" --master local[1] target/scala-2.12/hello-spark_2.12-1.0.jar
Lots of output will appear on the console.
Run the script again and add its output to hw0-output.txt
by entering
Unix/Linux:
spark-submit --class "edu.gatech.cs4641.loginID.hw0.HelloSpark" --master local[1] target/scala-2.12/hello-spark_2.12-1.0.jar >> hw0-output.txt
Don’t forget the the double arrows in >>
!
Most of the same output will appear on the console, except one line – the output of your println
– which will be in hw0-output.txt
.
hw0-output.txt
file to ensure that it contains the scalac
version string, the scala
version string, and the output of running your HelloSpark
program.hw0-output.txt
FileAt this point your `hw0-output.txt file should contain
scalac
version string,scala
version string, andprintln
statement in your HelloSpark
program.If your hw0-output.txt
file is missing any of those elements you should redo all the steps that add content to hw0-output.txt
in each of the previous sections.
Submit your hw0-output.txt
file on Canvas as an attachment. When you’re ready, double-check that you have submitted and not just saved a draft.
Practice safe submission! Verify that your HW files were truly submitted correctly, the upload was successful, and that your program runs with no syntax or runtime errors. It is solely your responsibility to turn in your homework and practice this safe submission safeguard. NOTE: Unlike TSquare, Canvas will not send an email indicating that your assignment has been submitted successfully. Follow the steps outlined below to ensure you have submitted correctly.
This procedure helps guard against a few things.