{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/tarabalakrishnan/.local/lib/python2.7/site-packages/IPython/config.py:13: ShimWarning: The `IPython.config` package has been deprecated. You should import from traitlets.config instead.\n",
" \"You should import from traitlets.config instead.\", ShimWarning)\n",
"/Users/tarabalakrishnan/.local/lib/python2.7/site-packages/IPython/utils/traitlets.py:5: UserWarning: IPython.utils.traitlets has moved to a top-level traitlets package.\n",
" warn(\"IPython.utils.traitlets has moved to a top-level traitlets package.\")\n"
]
},
{
"data": {
"text/plain": [
"u'Connected: None@'"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%load_ext sql\n",
"%sql sqlite:///"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Problem Set 2\n",
"=============\n",
"\n",
"\n",
"### Instructions / Notes:\n",
"\n",
"**_Read these carefully_**\n",
"\n",
"* This problem set does **not** come with a dataset to load; instead, make your own instances of tables, either as solutions to the problems or for testing solutions to the problems.\n",
"* You are encouraged to create new IPython notebook cells to use for testing, debugging, exploring, etc. **Just make sure that your final answer for each question is in its own cell when you submit.**\n",
"* When you see `In [*]:` to the left of the cell you are executing, this means that the code / query is _running_.\n",
" * **If the cell is hanging (i.e. running for too long): You can restart the SQL connection by restarting the entire python kernel.**\n",
" * To restart the python kernel: Use the menu bar (Kernel >> Restart).\n",
" * To restart the SQL connection: Re-execute the SQL connection cell at the top of the notebook\n",
" * You will also need to restart the connection if you want to load a different version of the database file\n",
"* Remember:\n",
" * `%sql [SQL]` is for _single line_ SQL queries\n",
" * `%%sql [SQL]` is for _multi line_ SQL queries\n",
"* **See Piazza for submission instructions**\n",
"* _Have fun!_"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Problem 1\n",
"---------\n",
"\n",
"**_[20 points total]_**\n",
"\n",
"For each part of this problem you will provide a _single_ SQL query to check whether a certain condition holds on a specific instance of a relation in the following way: **your query should return an empty result if and only if the condition holds on the instance.** (If the condition _doesn't hold_, your query should return something non-empty, but it doesn't matter what this is).\n",
"\n",
"Note our language here: the conditions that we specify cannot be proved to hold **in general** without knowing the externally-defined functional dependencies. This means that you are _checking whether they **could** hold in general for the relation, given any specific set of tuples_.\n",
"\n",
"You may assume that there will be no `NULL` values in the tables, **and you may assume that the relations are _sets_ rather than multisets**, but otherwise your query should work for general instances. We define the schemas of the tables used below for convenience, but in this problem you will need to construct your own test tables if you wish to use them to check your answers!"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Done.\n",
"Done.\n",
"Done.\n",
"Done.\n"
]
},
{
"data": {
"text/plain": [
"[]"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"DROP TABLE IF EXISTS R; CREATE TABLE R (A INT, B INT, C INT, D INT, E INT);\n",
"DROP TABLE IF EXISTS S; CREATE TABLE S (A INT, B INT, C INT);"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Done.\n",
"Done.\n",
"1 rows affected.\n"
]
},
{
"data": {
"text/plain": [
"[]"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"DROP TABLE IF EXISTS R; \n",
"CREATE TABLE R (A INT, B INT, C INT, D INT, E INT);\n",
"INSERT INTO R VALUES(0, 0, 0, 0, 0);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (a)\n",
"\n",
"**_[5 points]_**\n",
"\n",
"$\\{A, B\\} \\rightarrow \\{C\\}$ and $\\{C\\} \\rightarrow \\{A, B\\}$ hold for a relation $R(A,B,C,D,E)$."
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Done.\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" \n",
" \n",
" \n",
" \n",
""
],
"text/plain": [
"[]"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"SELECT DISTINCT R1.A, R1.B, R1.C\n",
"FROM R as R1, R as R2\n",
"WHERE (R1.A = R2.A \n",
"AND R1.B = R2.B \n",
"AND R1.C <> R2.C) or \n",
"(R1.C = R2.C and R1.B <> R2.B or R1.A <> R2.A);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (b)\n",
"\n",
"**_[5 points]_**\n",
"\n",
"$\\{A, B, C\\}$ is a **superkey** for a relation $R(A,B,C,D,E)$."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%sql\n",
"SELECT *\n",
"FROM R r1, R r2\n",
"WHERE (r1.A = r2.A and r1.B = r2.B and r1.C = r2.C) and\n",
" (r1.D <> r2.D or r1.E <> r2.E);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (c)\n",
"\n",
"**_[5 points]_**\n",
"\n",
"$\\{A\\}$ and $\\{B\\}$ are each **keys** for a relation $S(A,B,C)$."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%sql\n",
"SELECT *\n",
"FROM S s1, S s2\n",
"WHERE (s1.A = s2.A and (s1.B <> s2.B or s1.C <> s2.C)) or\n",
" (s1.B = s2.B and (s1.A <> s2.A or s1.C <> s2.C));"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (d)\n",
"\n",
"**_[5 points]_**\n",
"\n",
"A **multi-valued dependency (MVD)** is defined as follows: let $R$ be a schema i.e. a set of attributes, and consider two _sets of attributes_ $X\\subseteq R$ and $Y\\subseteq R$. We say that a multi-valued dependency (MVD), written:\n",
"\n",
"$X\\twoheadrightarrow Y$\n",
"\n",
"**holds on $R$** if whenever there are two tuples $t_1,t_2$ such that $t_1[X] = t_2[X]$, there also exists a third tuple $t_3$ such that:\n",
"\n",
"* $t_3[X] = t_1[X] = t_2[X]$\n",
"* $t_3[Y] = t_1[Y]$\n",
"* $t_3[R \\setminus Y] = t_2[R \\setminus Y]$\n",
"\n",
"Note that $R \\setminus B$ is all the attributes in $R$ that are not in $B$, and that $t_3$ need not be distinct from $t_1$ or $t_2$. Note especially that an MVD holds on an entire _relation_, meaning that any two tuples (in any order) in the relation should satisfy the above conditions if the MVD holds. **See the end of the lecture 7 slides for more on MVDs!**\n",
"\n",
"\n",
"In this problem, write your query to check if the MVD $\\{A\\}\\twoheadrightarrow \\{C,E\\}$ holds for a relation $R(A,B,C,D,E)$."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%sql\n",
"SELECT *\n",
"FROM R t1, R t2\n",
"WHERE t1.A = t2.A and\n",
" NOT EXISTS (\n",
" SELECT *\n",
" FROM R t3\n",
" WHERE t3.A = t1.A and\n",
" t3.C = t1.C and\n",
" t3.E = t1.E and\n",
" t3.B = t2.B and\n",
" t3.D = t2.D);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Problem 2\n",
"---------\n",
"\n",
"Consider a relation $R(X, Y, Z)$ with some list of functional dependencies $f_1, f_2, \\ldots, f_n$. Suppose that $K$ is a **superkey** for this relation, given these functional dependencies (recall that any superkey $K$ must be a subset of the attributes of $R$, which are $\\{X, Y, Z\\}$). In each part of this problem, we will examine what can happen if we add an additional functional dependency to our relation.\n",
"\n",
"To answer **yes**, provide python code that assigns the variable ```answer``` to ```True``` and assigns ```explanation``` to be a python string which contains a (short!) explanation of why. For example:\n",
"\n",
"```python\n",
"answer = True\n",
"explanation = \"Alex is correct because all keys are superkeys.\"\n",
"```\n",
"\n",
"To answer **no**, provide python code that assigns the variable ```answer``` to ```False```. You must also assign the variable ```K``` to be a set of attributes, the variable ```FDs``` (functional dependencies) to be a python list having elements that are _pairs_ of sets, and the variable ``new_FD`` (the new functional dependency to be added) to be a pair of sets, such that these three variables together produce a counter-example for the desired statement. For example, a counter-example to the statement \"if $K \\rightarrow \\{Z\\}$ is false, then it will remain false when we add a new functional depencency\" could look like: \n",
"\n",
"```python\n",
"answer = False\n",
"K = set(\"X\")\n",
"FDs = [\n",
" (set(\"Y\"), set(\"Z\")),\n",
" (set((\"Y\", \"Z\")), set(\"X\"))\n",
"]\n",
"new_FD = (set(\"X\"), set(\"Z\"))\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (a)\n",
"\n",
"CS145 student Alex claims that if we add any new functional dependency $f_{n+1} = U \\rightarrow V$ (where $U$ and $V$ are subsets of $\\{X, Y, Z\\}$) to our list of functional dependencies, then $K$ will still be a superkey for $R$ given $f_1, f_2, \\ldots, f_{n+1}$. Is Alex correct? If yes, explain why. If no, provide a counter-example."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"answer = True\n",
"explanation = \\\n",
"\"\"\"\n",
"Consider the original set of functional dependencies $f_1,f_2,\\ldots,f_n$.\n",
"The assumption that $K$ is a superkey implies that the original set of FDs implies the FD $K \\rightarrow {A}$ for any attribute A.\n",
"Adding an additional functional dependency cannot cause any of the inferred conditions to become false, so all of the conditions for $K$ to be a superkey will still hold.\n",
"\"\"\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (b)\n",
"\n",
"Consider again a relation $R(X, Y, Z)$ with some list of functional dependencies $f_1, f_2, \\ldots, f_n$. Now suppose that $K$ is a **key** for this relation, given these functional dependencies.\n",
"\n",
"CS145 student Bryan claims that if we add any new functional dependency $f_{n+1} = U \\rightarrow V$ to our list of functional dependencies, then $K$ will still be a key for $R$ given $f_1, f_2, \\ldots, f_{n+1}$. Is Bryan correct? If yes, explain why. If no, provide a counter-example."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"answer = False\n",
"X = \"X\"\n",
"Y = \"Y\"\n",
"Z = \"Z\"\n",
"K = set((X, Y))\n",
"FDs = [(set((X, Y)), set(Z))]\n",
"new_FD = (set(X), set(Y))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (c)\n",
"\n",
"Consider now a more general relation $R(X_1, X_2, \\ldots, X_m)$ with some list of functional dependencies $f_1, f_2, \\ldots, f_n$. Suppose again that $K$ is a **key** for this relation, given these functional dependencies (recall that any superkey $K$ must be a subset of the attributes of $R$, which are $\\{X_1, X_2, \\ldots X_m\\}$).\n",
"\n",
"CS145 student Chris claims that if we add any new functional dependency $f_{n+1} = U \\rightarrow V$ (where $U$ and $V$ are subsets of $\\{X_1, X_2, \\ldots, X_m\\}$) to our list of functional dependencies, and the new functional dependency **does not include any of the attributes that are in the key $K$** (that is, $U \\cap K = V \\cap K = \\emptyset$), then $K$ will still be a key for $R$ given $f_1, f_2, \\ldots, f_{n+1}$. Is Chris correct? If yes, explain why. If no, provide a counter-example."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"answer = False\n",
"X1 = \"A\"\n",
"X2 = \"B\"\n",
"X3 = \"C\"\n",
"X4 = \"D\"\n",
"K = set((X1, X2))\n",
"FDs = [(set((X1, X2)), set((X3, X4))),\n",
" (set((X3, X4)), set(X2)),\n",
" (set(X1), set(X3))]\n",
"new_FD = (set(X3), set(X4))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Problem 3\n",
"---------\n",
"\n",
"Dan wants to find the keys of a relation based on an instance (the concrete rows stored in a database), rather than the functional dependencies (which are external constraints on the schema), but he isn't sure when this problem is possible. In this problem, you will study potential errors that can occur when using an instance instead of the functional dependencies.\n",
"\n",
"Dan defines a set $K$ to be a **plausible key** for an instance `T` if $K$ could be a superkey for `T`, and no subset of $K$ could possibly be a superkey for `T`. (Equivalently, $K$ is a plausible key for `T` if there is an set of functional dependencies that are consistent with `T` in which $K$ is a superkey, and for all subsets $U \\subset K$, there are no sets of functional dependencies that are consistent with `T` in which $U$ is a superkey.)\n",
"\n",
"Consider a relation $R(A, B, C, D)$ that satisfies the following set of functional dependencies:\n",
"\\begin{align}\n",
"\\{A, B\\} &\\rightarrow \\{C\\} \\\\\n",
"\\{C\\} &\\rightarrow \\{D\\} \\\\\n",
"\\{D\\} &\\rightarrow \\{A, B\\}\n",
"\\end{align}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (a)\n",
"\n",
"**_[10 points]_**\n",
"\n",
"Create an instance `T` of $R$ for which $\\{A, B\\}$ is a plausible key. If you believe that `T` cannot be created, provide it as an empty table.\n",
"\n",
"Use a series of `INSERT` statements below to populate the table `T`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%sql\n",
"DROP TABLE IF EXISTS T; \n",
"CREATE TABLE T(A int, B int, C int, D int);\n",
"\n",
"INSERT INTO T VALUES(0, 0, 0, 0);\n",
"INSERT INTO T VALUES(0, 1, 1, 1);\n",
"INSERT INTO T VALUES(1, 0, 2, 2);\n",
"INSERT INTO T VALUES(1, 1, 3, 3);\n",
"INSERT INTO T VALUES(1, 2, 4, 5);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (b)\n",
"\n",
"**_[10 points]_**\n",
"\n",
"Create an instance `T` of $R$ for which $\\{A\\}$ is a plausible key. If you believe that `T` cannot be created, provide it as an empty table.\n",
"\n",
"Use a series of `INSERT` statements below to populate the table `T`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%sql\n",
"DROP TABLE IF EXISTS T; \n",
"CREATE TABLE T(A int, B int, C int, D int);\n",
"\n",
"INSERT INTO T VALUES(0, 0, 0, 0);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (c)\n",
"\n",
"**_[10 points]_**\n",
"\n",
"Create an instance `T` of $R$ for which $\\{A, B, C\\}$ is a plausible key. If you believe that `T` cannot be created, provide it as an empty table.\n",
"\n",
"Use a series of `INSERT` statements below to populate the table `T`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"%%sql\n",
"DROP TABLE IF EXISTS T; \n",
"CREATE TABLE T(A int, B int, C int, D int);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Problem 4\n",
"---------\n",
"\n",
"### Part (a)\n",
"\n",
"**_[10 points]_**\n",
"\n",
"Consider a schema $R(A, B, C, D)$ which has functional dependencies\n",
"\\begin{align}\n",
" \\{A, C\\} &\\rightarrow \\{ B \\} \\\\\n",
" \\{D \\} &\\rightarrow \\{ C \\}. \\\\\n",
"\\end{align}\n",
"Create an instance of $R$ which is consistent with these functional dependencies, but not with any other functional dependencies (i.e. **all functional dependencies that are not inferred from these are violated**).\n",
"\n",
"Use a series of `INSERT` statements below to populate the table `R`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%sql\n",
"DROP TABLE IF EXISTS R; \n",
"CREATE TABLE R (A int, B int, C int, D int);\n",
"\n",
"INSERT INTO R VALUES(0, 0, 0, 0);\n",
"\n",
"-- Violate {A, B, C} => {D}\n",
"INSERT INTO R VALUES(0, 0, 0, 1);\n",
"\n",
"-- Nothing to violate for {A, B, D}\n",
"-- Nothing to violate for {A, C, D}\n",
"-- Violate {B, C, D} => {A}\n",
"INSERT INTO R VALUES(2, 0, 0, 0);\n",
"\n",
"-- Violate {A, B} => {C, D}\n",
"INSERT INTO R VALUES(0, 0, 3, 3);\n",
"\n",
"-- Violate {A, C} => {D}\n",
"-- (already violated for {A, B, C} => {D})\n",
"\n",
"-- Nothing to violate for {A, D}\n",
"-- Violate {B, C} => {A, D}\n",
"INSERT INTO R VALUES(4, 0, 0, 4);\n",
"\n",
"-- Violate {B, D} => {A}\n",
"-- (Already violated by {B, C, D} => {A})\n",
"\n",
"-- Violate {C, D} => {A, B}\n",
"INSERT INTO R VALUES(5, 5, 0, 0);\n",
"\n",
"-- Violate {A} => {B, C, D}\n",
"INSERT INTO R VALUES(0, 6, 6, 6);\n",
"\n",
"-- Violate {B} => {A, C, D}\n",
"INSERT INTO R VALUES(7, 0, 7, 7);\n",
"\n",
"-- Violate {C} => {A, B, D}\n",
"INSERT INTO R VALUES(8, 8, 0, 8);\n",
"\n",
"-- Violate {D} => {A, B}\n",
"-- (Already violated by {C, D} => {A, B})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (b)\n",
"\n",
"**_[10 points]_**\n",
"\n",
"Now consider a schema $S(A, B, C, D, E)$ which has an additional attribute $E$, but the same functional dependencies\n",
"\\begin{align}\n",
" \\{A, C\\} &\\rightarrow \\{ B \\} \\\\\n",
" \\{D \\} &\\rightarrow \\{ C \\}. \\\\\n",
"\\end{align}\n",
"Create an instance of $S$ which is consistent with these functional dependencies, but not with any other functional dependencies (i.e. **all functional dependencies that are not inferred from these are violated**).\n",
"\n",
"Write one or more SQL statements that populate the table `S` using the table `R` you constructed in part (a)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%sql\n",
"DROP TABLE IF EXISTS S;\n",
"CREATE TABLE S (A int, B int, C int, D int, E int);\n",
"\n",
"INSERT INTO S SELECT *, 0 FROM R;\n",
"INSERT INTO S SELECT *, 1 FROM R;"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.10"
}
},
"nbformat": 4,
"nbformat_minor": 0
}