This document will help you get started with PostgreSQL, the DBMS that will be used for all the homeworks. For more information, check out the reference documentation.
The homeworks will be auto-graded (with our scripts),
on one of the GHC cluster machines
ghcXX.ghc.andrew.cmu.edu,
, running
PostgreSQL version 9.2.4.
You may develop your code anywhere, but make sure
it runs correctly on a ghc machine.
Each student is assigned with a ghc machine, number XX and port number YYYYY, to avoid conflicts. Ports (YYYYY) are unique; machines (XX) are shared among 2-3 people. You’ll find these numbers on blackboard, under 'Grade Center / HW2 machine and port assignments', and throughout this document we’ll refer to the machine as ghcXX as the port as YYYYY.
Before using PostgreSQL you’ll have to initialize a database structure and create a database. You only have to do this once, and you don’t have to do it again if you move from one cluster machine to another. We provide set up commands in BASH - modify accordingly, for other shells.
Log into your cluster machine (remember to replace ghcXX with your assigned machine):
$ ssh <your-andrew-id>@ghcXX.ghc.andrew.cmu.edu
On the cluster machine, run the following commands (remember to replace YYYYY with your assigned port):
$ export PGPORT=YYYYY # sets the PORT to be used by PostgreSQL
$ export PGHOST=/tmp # sets the directory for the socket files
$ initdb $HOME/db415 # initializes a database structure on the folder $HOME/db415
$ pg_ctl -D $HOME/db415 -o '-k /tmp' start # starts the server on the port YYYYY, using $HOME/db415 as data folder
$ createdb $USER # creates a database, with your andrew id as its name
IMPORTANT: When you’re done, please stop the server
(or add this line to your .logout
file):
$ pg_ctl -D $HOME/db415 stop
Log into your (="XX") cluster machine:
$ ssh <your-andrew-id>@ghcXX.ghc.andrew.cmu.edu
To work with the database, set up the environment variables, and start the server:
$ export PGPORT=YYYYY; export PGHOST=/tmp; # set env. vars
$ pg_ctl -D $HOME/db415 -o '-k /tmp' start # starts the server on port YYYYY, using db415 as data folder
To get the SQL prompt:
$ psql -d $USER # enter sql commands, now
And, again, when done,
stop the server:
$ pg_ctl -D $HOME/db415 stop
For homework 2, we provide the script for setting up Postgres, along
with the input data.
Please do not change the provided makefile
.
Step 1) From your local machine command prompt, log into one your cluster machine:
$ ssh <your-andrew-id>@ghcXX.ghc.andrew.cmu.edu
Step 2) Download and untar the hw2.tar.gz tarball. (Warning: it is large - about 45MB)
$ wget http://www.cs.cmu.edu/~christos/courses/dbms.F15/415F15HW2/hw2.tar.gz
$ tar xvzf hw2.tar.gz # Warning: expands to about 100MB; creates or overwrites ./hw2 directory
$ cd hw2
Step 3) Set up our "Yelp" database. From now on, we assume you are in the hw2 directory.
$ make setup_postgres # will insert all into $USER db; after that, your may delete the hw2.tar.gz and yelp_data.
Step 3') Do a sanity check for the data import: Do (SELECT count(*) from [table];), and you should get:
Table | number of tuples |
---|---|
business | 61184 |
yelp_user | 366715 |
business_category | 176697 |
review | 1569264 |
Step 4) Do the "sample" question (Q1(a)) and place your SQL code in the file queries/q1a.sql. Important: Check your answer by typing
$ make
which should give no differences.
Step 5) Solve the rest of the questions and place your SQL answers in the corresponding place-holder queries/*.sql file. FYI, we will grade your homework (after replacing the outputs directory with correct outputs) by running the command above and expect to see no output.
Again, please remember to stop the Postgres server, whenever you are done. Otherwise, in your next run, there may be subtle, hard-to-detect problems (server running out of threads, etc).
Put all your SQL answers in the queries directory. Then, generate the tarball to submit:
$ make submission
This creates a tarball named $USER.tar.gz in your hw2 directory, ready to submit on blackboard.
Check it again ( tar xvzf
), to make sure nothing is missing.
Then, on blackboard, submit your tarball under "Assignments - Homework 2 Tarball".
A: Our apologies - please use some other machine in the
range XX=25-86. Make sure you use it with your
original port (YYYYY).
A: Please post a question on blackboard ('HW2 forum'), or come to office hours — see the course website for locations and hours.