Carnegie Mellon University
Department of Computer Science
15-415/615 - Database Applications
C. Faloutsos & A. Pavlo, Fall 2015

Getting Started with PostgreSQL


Where to get help

This document will help you get started with PostgreSQL, the DBMS that will be used for all the homeworks. For more information, check out the reference documentation.

Before running PostgreSQL

The homeworks will be auto-graded (with our scripts), on one of the GHC cluster machines ghcXX.ghc.andrew.cmu.edu,, running PostgreSQL version 9.2.4. You may develop your code anywhere, but make sure it runs correctly on a ghc machine.

Each student is assigned with a ghc machine, number XX and port number YYYYY, to avoid conflicts. Ports (YYYYY) are unique; machines (XX) are shared among 2-3 people. You’ll find these numbers on blackboard, under 'Grade Center / HW2 machine and port assignments', and throughout this document we’ll refer to the machine as ghcXX as the port as YYYYY.

Setting up a database

Before using PostgreSQL you’ll have to initialize a database structure and create a database. You only have to do this once, and you don’t have to do it again if you move from one cluster machine to another. We provide set up commands in BASH - modify accordingly, for other shells.

Log into your cluster machine (remember to replace ghcXX with your assigned machine):

     $ ssh <your-andrew-id>@ghcXX.ghc.andrew.cmu.edu

On the cluster machine, run the following commands (remember to replace YYYYY with your assigned port):


    $ export PGPORT=YYYYY   # sets the PORT to be used by PostgreSQL
    $ export PGHOST=/tmp    # sets the directory for the socket files
    $ initdb $HOME/db415    # initializes a database structure on the folder $HOME/db415
    $ pg_ctl -D $HOME/db415 -o '-k /tmp' start # starts the server on the port YYYYY, using $HOME/db415 as data folder
    $ createdb $USER        # creates a database, with your andrew id as its name

IMPORTANT: When you’re done, please stop the server (or add this line to your .logout file):


    $ pg_ctl -D $HOME/db415 stop

Using the database

Log into your (="XX") cluster machine:


    $ ssh <your-andrew-id>@ghcXX.ghc.andrew.cmu.edu

To work with the database, set up the environment variables, and start the server:


    $ export PGPORT=YYYYY; export PGHOST=/tmp;    # set env. vars
    $ pg_ctl -D $HOME/db415 -o '-k /tmp' start    # starts the server on port YYYYY, using db415 as data folder

To get the SQL prompt:


    $ psql -d $USER                               # enter sql commands, now
And, again, when done, stop the server:


    $ pg_ctl -D $HOME/db415 stop

Setting up for Homework 2

For homework 2, we provide the script for setting up Postgres, along with the input data. Please do not change the provided makefile.

Step 1) From your local machine command prompt, log into one your cluster machine:

    $ ssh <your-andrew-id>@ghcXX.ghc.andrew.cmu.edu

Step 2) Download and untar the hw2.tar.gz tarball. (Warning: it is large - about 45MB)


    $ wget http://www.cs.cmu.edu/~christos/courses/dbms.F15/415F15HW2/hw2.tar.gz
    $ tar xvzf hw2.tar.gz       # Warning: expands to about 100MB; creates or overwrites ./hw2 directory
    $ cd hw2 

Step 3) Set up our "Yelp" database. From now on, we assume you are in the hw2 directory.


    $ make setup_postgres # will insert all into $USER db; after that, your may delete the hw2.tar.gz and yelp_data.

Step 3') Do a sanity check for the data import: Do (SELECT count(*) from [table];), and you should get:

Table number of tuples
business 61184
yelp_user 366715
business_category 176697
review 1569264

Step 4) Do the "sample" question (Q1(a)) and place your SQL code in the file queries/q1a.sql. Important: Check your answer by typing


    $ make

which should give no differences.

Step 5) Solve the rest of the questions and place your SQL answers in the corresponding place-holder queries/*.sql file. FYI, we will grade your homework (after replacing the outputs directory with correct outputs) by running the command above and expect to see no output.

Important - stop the server

Again, please remember to stop the Postgres server, whenever you are done. Otherwise, in your next run, there may be subtle, hard-to-detect problems (server running out of threads, etc).

Turn-in Instructions

Put all your SQL answers in the queries directory. Then, generate the tarball to submit:


    $ make submission

This creates a tarball named $USER.tar.gz in your hw2 directory, ready to submit on blackboard.
Check it again ( tar xvzf ), to make sure nothing is missing. Then, on blackboard, submit your tarball under "Assignments - Homework 2 Tarball".


FAQ

Q: My assigned machine doesn't work!

A: Our apologies - please use some other machine in the range XX=25-86. Make sure you use it with your original port (YYYYY).

Q: Help, I’m stuck!

A: Please post a question on blackboard ('HW2 forum'), or come to office hours —  see the course website for locations and hours.


Created by: Jinliang Wei. Last updated by Christos Faloutsos, Sep 20, 2015.