CMU-CS 15-415/615 - Homework #2

Solutions

  1. The solution pdf file for this assignment is available here.
  2. The solution sql queries for this assignment is available here.

Instructions

Download Homework Data Set

  1. The write-up for this assignment is available here.
  2. The data set tarball contains the start-up directory for setting up Postgres and template for SQL answers

Important - Stop the Server!

Please remember to stop the Postgres server, whenever you are done working on the assignment. Otherwise, when you resume your work there may be subtle, hard-to-detect problems (server running out of threads, etc). You can automatically do this by adding the following command to your .logout file.

$ pg_ctl -D $HOME/db415 stop

Getting Started with PostgreSQL

This document will help you get started with PostgreSQL, the DBMS that will be used for all the homeworks. For more information, check out the reference documentation.

The homeworks will be auto-graded (with our scripts), on one of the GHC cluster machines ghcXX.ghc.andrew.cmu.edu,, running PostgreSQL version 9.2.4. You may develop your code anywhere, but make sure it runs correctly on a ghc machine.

Each student is assigned with a ghc machine, number XX and port number YYYYY, to avoid conflicts. Ports (YYYYY) are unique; machines (XX) are shared among 2-3 people. You’ll find these numbers on blackboard, under 'Grade Center / HW2 machine and port assignments', and throughout this document we’ll refer to the machine as ghcXX as the port as YYYYY.

Initialization

Before using PostgreSQL you’ll have to initialize a database structure and create a database. You only have to do this once, and you don’t have to do it again if you move from one cluster machine to another. We provide set up commands in BASH - modify accordingly, for other shells.

Log into your cluster machine (remember to replace ghcXX with your assigned machine):

$ ssh <your-andrew-id>@ghcXX.ghc.andrew.cmu.edu

On the cluster machine, run the following commands (remember to replace YYYYY with your assigned port):

$ export PGPORT=YYYYY   # sets the PORT to be used by PostgreSQL
$ export PGHOST=/tmp    # sets the directory for the socket files
$ initdb $HOME/db415    # initializes a database structure on the folder $HOME/db415
$ pg_ctl -D $HOME/db415 -o '-k /tmp' start # starts the server on the port YYYYY, using $HOME/db415 as data folder
$ createdb $USER        # creates a database, with your andrew id as its name

IMPORTANT: When you’re done, please stop the server (or add this line to your .logout file):

$ pg_ctl -D $HOME/db415 stop

Using the Database

Log into your assigned cluster machine:

$ ssh <your-andrew-id>@ghcXX.ghc.andrew.cmu.edu

To work with the database, set up the environment variables, and start the server:

$ export PGPORT=YYYYY; export PGHOST=/tmp;    # set env. vars
$ pg_ctl -D $HOME/db415 -o '-k /tmp' start    # starts the server on port YYYYY, using db415 as data folder

To get the SQL prompt:

$ psql -d $USER                               # enter sql commands, now
And, again, when done, stop the server:

$ pg_ctl -D $HOME/db415 stop

Setting up for Homework #2

For this homework, we provide the script for setting up Postgres, along with the input data. Please do not change the provided makefile.

  1. From your local machine command prompt, log into one your cluster machine:

    $ ssh <your-andrew-id>@ghcXX.ghc.andrew.cmu.edu
  2. If you have not done so already, initialize the postgres database, and start the server (initdb, pg_ctl start, createdb etc, as described above).
  3. Download and untar the hw2-data.tar.gz tarball.
    $ wget https://15415.courses.cs.cmu.edu/fall2016/hws/HW2/hw2-data.tar.gz
    $ tar zxvf hw2-data.tar.gz  # Warning: expands to about 70MB; creates or overwrites ./hw2 directory
    $ cd hw2
  4. Set up our "bike-sharing" database. From now on, we assume you are in the hw2 directory.

    $ make setup_postgres # will insert all into $USER db; after that, your may delete the csv files in bike_sharing_data directory.
  5. Check to make sure that the data loaded correctly by executing the following SQL command: SELECT count(*) from [table];
    Table Number of Tuples
    station 70
    trip 669959
    weather 3665
  6. Before working on the problems, do the "sample" question (Q1(a)) and place your SQL code in the file queries/q1a.sql. Important: Check your answer by executing make, which should give no differences. Also see hw2 handout for more details on this.
  7. Solve the rest of the questions and place your SQL answers in the corresponding place-holder queries/*.sql file. FYI, We will grade your homework (after replacing the outputs directory with correct outputs) by running the command above and expect to see no output.

Submitting

Put all your SQL answers in the queries directory. Then, generate the tarball to submit:
$ make submission

This creates a tarball named $USER.tar.gz in your hw2 directory, ready to submit on blackboard.

Check it again (tar xvzf $USER.tar.gz), to make sure nothing is missing. Then, on blackboard, submit your tarball under "Assignments - Homework 2 Tarball".

Frequently Asked Questions

Q: My assigned machine doesn't work!
Our apologies - please use some other machine in the range XX=25-86. Make sure you use it with your original port number (i.e., YYYYY).

Q: Help, I’m stuck!
Please post a question on blackboard ('HW2 forum'), or come to office hours.