Homework 5
Last updated: Mon, 18 Mar 2024 11:58:43 -0400
Out: Mon Mar 18, 12:00pm EST (noon) Due: Mon Mar 25, 12:00pm EST (noon)
This assignment begins to explore context-free grammars (CFGs) and context-free languages (CFLs).
Homework Problems
A Regular Language? (11 points)
A Context-Free Language? (11 points)
CFGs and Their String Derivations (11 points)
More Induction Practice (11 points)
README (1 point)
Total: 45 points
Submitting
Submit your solution to this assignment in Gradescope hw5. Please assign each page to the correct problem and make sure your solutions are legible.
A submission must also include a README containing the required information.
1 A Regular Language?
Here is a language that approximates basic whitespace checking in a language like Python.
\mathit{WS} = \left\{\sqcup^*\texttt{if:}\downarrow\sqcup^*\texttt{else:}\mid\downarrow\textrm{ is newline; }\sqcup\textrm{ is space; spaces on each line match}\right\}
\Sigma = \left\{\texttt{if:},\texttt{else:},\sqcup,\downarrow\right\} (in real-world grammars, terminal (i.e., alphabet) "symbols" can be, and often are, whole words)
Prove that this language is not a regular language.
The proof must use the Pumping Lemma and proof by contradiction. Make sure your proof has all the required components.
2 A Context-Free Language?
Now prove that the \mathit{WS} language from problem A Regular Language? above is a context-free language (CFL).
Your proof must be in the form of a Statements and Justifications table, but it must use the CFG representation of CFLs.
3 CFGs and Their String Derivations
Here’s a context-free grammar (CFG), called \mathit{LIKEPY}, representing the core of a Python-like language:
\left\langle STMTS+\right\rangle | \rightarrow | \left\langle STMT\right\rangle; \left\langle STMTS+\right\rangle \mid \left\langle STMT\right\rangle |
\left\langle STMT\right\rangle | \rightarrow | \left\langle ID\right\rangle \texttt{=} \left\langle EXPR\right\rangle \mid \texttt{if:} \left\langle EXPR\right\rangle\texttt{then:} \left\langle STMT\right\rangle \texttt{else:} \left\langle STMT\right\rangle |
\mid \texttt{print(} \left\langle EXPR\right\rangle \texttt{)} \mid \left\langle FNDEF\right\rangle \mid \left\langle EXPR\right\rangle | ||
\left\langle FNDEF\right\rangle | \rightarrow | \texttt{def} \left\langle ID\right\rangle \texttt{(}\left\langle IDS+\right\rangle \texttt{)}\texttt{:} \left\langle STMT\right\rangle |
\left\langle EXPR\right\rangle | \rightarrow | \left\langle ID\right\rangle \mid \left\langle NUM\right\rangle \mid \left\langle EXPR\right\rangle \texttt{==} \left\langle EXPR\right\rangle \mid \left\langle EXPR\right\rangle \texttt{*} \left\langle EXPR\right\rangle |
\mid \left\langle EXPR\right\rangle \texttt{+} \left\langle EXPR\right\rangle \mid \left\langle LAM\right\rangle \mid \left\langle ID\right\rangle\texttt{(} \left\langle EXPRS\right\rangle \texttt{)} | ||
\left\langle LAM\right\rangle | \rightarrow | \texttt{lambda} \left\langle IDS+\right\rangle \texttt{:} \left\langle EXPR\right\rangle |
\left\langle EXPRS\right\rangle | \rightarrow | \left\langle EXPRS+\right\rangle \mid \varepsilon |
\left\langle EXPRS+\right\rangle | \rightarrow | \left\langle EXPR\right\rangle, \left\langle EXPRS+\right\rangle \mid \left\langle EXPR\right\rangle |
\left\langle NUM\right\rangle | \rightarrow | 0\mid 1\mid 2\mid 3\mid 4\mid 5\mid 6\mid 7\mid 8\mid 9 |
\left\langle IDS+\right\rangle | \rightarrow | \left\langle ID\right\rangle \texttt{,} \left\langle IDS+\right\rangle \mid \left\langle ID\right\rangle |
\left\langle ID\right\rangle | \rightarrow | \texttt{f}\mid\texttt{g}\mid\texttt{x}\mid\texttt{y} |
(Yes, real-world languages are much more complicated than textbook examples.)
The variables (nonterminals) of the CFG are all the names enclosed in angle brackets, e.g., \left\langle EXPR\right\rangle is a variable name. All other symbols used in the rules are terminals (ignoring whitespace) (also, you may treat multi-symbol terminals as one terminal, e.g., \texttt{if:}).
Give two strings in \mathit{LIKEPY}’s language by showing their derivation steps. Each given string must have at least six derivation steps.
Give a formal description of the grammar \mathit{LIKEPY}. You may assume that the given rules are in a set called \mathit{PYRULES}.
- Give parse trees for the following strings in the language of \mathit{LIKEPY}:
\texttt{def f(x): x + 1}
\texttt{lambda x: x + 1}
\texttt{f = lambda x,y: x + y; def g(x,y): x*y; f(2,3)==g(2,3);}
4 More Induction Practice
Prove that the following statement is for string derivations using some CFG G=\left\langle V,\Sigma,R,S\right\rangle:
If \alpha\Rightarrow_G^*\beta then \gamma_1\alpha\gamma_2\Rightarrow_G^*\gamma_1\beta\gamma_2
where \alpha,\beta,\gamma_1,\gamma_2\in (V\cup\Sigma)^*.
In other words, if a grammar can derive a string, then it can derive the same string as a substring in a larger string.
Since \Rightarrow^* is a recursive defintion, the proof must also be recursive, i.e., you must use proof by induction.
what the induction is "on",
base case(s),
and inductive case(s) (where each includes an inductive hypothesis)
In addition, the proof of each case should be clearly explained with a Statements and Justifications table, as described in class.