Installation Guide for Homework Environment

Prerequisites:

Ensure that you're using Python version 3.12. If you have a different version, you might experience GUI-related issues. Check your Python version by running:

    python --version
    

Installing Miniconda:

Windows:

  1. Download the Miniconda installer for Windows from the official site.
  2. Double-click the .exe file to start the installation.
  3. Follow the installation prompts. When asked to add Miniconda to your PATH, choose "Yes."

Linux:

  1. Download the Miniconda installer for Linux from the official site.
  2. Navigate to the directory where you downloaded the installer and run:
  3. chmod +x Miniconda3-latest-Linux-x86_64.sh
    ./Miniconda3-latest-Linux-x86_64.sh
  4. Follow the installation prompts.

Mac:

  1. Download the Miniconda installer for Mac from the official site.
  2. Open the downloaded .pkg file to start the installation.
  3. Follow the installation prompts.

Setting Up the Homework Environment:

After installing Miniconda, set up your environment with the following commands:

conda create --name cs221 python=3.12
conda activate cs221

This homework does not require any additional packages, so feel free to reuse the cs221 environment you installed earlier for hw1 and hw2.


In this assignment, you will get some hands-on experience with logic. You'll see how logic can be used to represent the meaning of natural language sentences, and how it can be used to solve puzzles and prove theorems. Most of this assignment will be translating English into logical formulas, but in Problem 4, we will delve into the mechanics of logical inference. Finally, for the ethics question you will revisit the ethical frameworks and concepts you learned about this quarter through once again writing negative impact statements.

NOTE: There are mandatory written questions after the extra credit problem 5 - Problem 6 and 7. Do not forget to answer these questions! Also for this assignment only, there are no hidden test cases for the programming questions -- if you pass all the visible test cases, then you will get full credit!

We've created a LaTeX template here for you to use that contains the prompts for each question.

To get started, launch a Python shell and try typing the following commands to add logical expressions into the knowledge base.

from logic import *
Rain = Atom('Rain')           # Shortcut
Wet = Atom('Wet')             # Shortcut
kb = createResolutionKB()     # Create the knowledge base
kb.ask(Wet)                   # Prints "I don't know."
kb.ask(Not(Wet))              # Prints "I don't know."
kb.tell(Implies(Rain, Wet))   # Prints "I learned something."
kb.ask(Wet)                   # Prints "I don't know."
kb.tell(Rain)                 # Prints "I learned something."
kb.tell(Wet)                  # Prints "I already knew that."
kb.ask(Wet)                   # Prints "Yes."
kb.ask(Not(Wet))              # Prints "No."
kb.tell(Not(Wet))             # Prints "I don't buy that."
To print out the contents of the knowledge base, you can call kb.dump(). For the example above, you get:
==== Knowledge base [3 derivations] ===
* Or(Not(Rain),Wet)
* Rain
- Wet
In the output, '*' means the fact was explicitly added by the user, and '-' means that it was inferred. Here is a table that describes how logical formulas are represented in code. Use it as a reference guide:
Name Mathematical notation Code
Constant symbol $\text{stanford}$ Constant('stanford') (must be lowercase)
Variable symbol $x$ Variable('$x') (must be lowercase)
Atomic formula (atom) $\text{Rain}$

$\text{LocatedIn}(\text{stanford}, x)$
Atom('Rain') (predicate must start with uppercase)

Atom('LocatedIn', 'stanford', '$x') (arguments are symbols)
Negation $\neg \text{Rain}$ Not(Atom('Rain'))
Conjunction $\text{Rain} \wedge \text{Snow}$ And(Atom('Rain'), Atom('Snow'))
Disjunction $\text{Rain} \vee \text{Snow}$ Or(Atom('Rain'), Atom('Snow'))
Implication $\text{Rain} \to \text{Wet}$ Implies(Atom('Rain'), Atom('Wet'))
Equivalence $\text{Rain} \leftrightarrow \text{Wet}$ (syntactic sugar for $\text{Rain} \to \text{Wet} \wedge \text{Wet} \to \text{Rain}$) Equiv(Atom('Rain'), Atom('Wet'))
Existential quantification $\exists x . \text{LocatedIn}(\text{stanford}, x)$ Exists('$x', Atom('LocatedIn', 'stanford', '$x'))
Universal quantification $\forall x . \text{MadeOfAtoms}(x)$ Forall('$x', Atom('MadeOfAtoms', '$x'))

The operations And and Or only take two arguments. If we want to take a conjunction or disjunction of more than two, use AndList and OrList. For example: AndList([Atom('A'), Atom('B'), Atom('C')]) is equivalent to And(And(Atom('A'), Atom('B')), Atom('C')).

Problem 1: Propositional logic

Write a propositional logic formula for each of the following English sentences in the given function in submission.py. For example, if the sentence is "If it is raining, it is wet," then you would write Implies(Atom('Rain'), Atom('Wet')), which would be $\text{Rain} \to \text{Wet}$ in symbols (see examples.py). Note: Don't forget to return the constructed formula!

  1. "If it's summer and we're in California, then it doesn't rain."
  2. "It's wet if and only if it is raining or the sprinklers are on."
  3. "Either it's day or night (but not both)."
You can run the following command to test each formula:
    python grader.py 1a
    
If your formula is wrong, then the grader will provide a counterexample, which is a model that your formula and the correct formula don't agree on. For example, if you accidentally wrote And(Atom('Rain'), Atom('Wet')) for "If it is raining, it is wet,", then the grader would output the following:
    Your formula (And(Rain,Wet)) says the following model is FALSE, but it should be TRUE:
    * Rain = False
    * Wet = True
    * (other atoms if any) = False
    

In this model, it's not raining and it is wet, which satisfies the correct formula $\text{Rain} \to \text{Wet}$ (TRUE), but does not satisfy the incorrect formula $\text{Rain} \wedge \text{Wet}$ (FALSE). Use these counterexamples to guide you in the rest of the assignment.

Problem 2: First-order logic

Write a first-order logic formula for each of the following English sentences in the given function in submission.py. For example, if the sentence is "There is a light that shines," then you would write Exists('$x', And(Atom('Light', '$x'), Atom('Shines', '$x'))), which would be $\exists x . \text{Light}(x) \wedge \text{Shines}(x)$ in symbols (see examples.py).

Tips:

  1. "Every person has a parent."
  2. Note: You do NOT have to enforce that the parent is a "person".

  3. "At least one person has no children."
  4. Note: You do NOT have to enforce that the child is a "person".

  5. Create a formula which defines Father(x,y) in terms of Male(x) and Parent(x,y).
  6. Create a formula which defines Granddaughter(x,y) in terms of Female(x) and Child(x,y).
  7. Note: It is ok for a person to be their own child.

Problem 3: Liar puzzle

Someone crashed the server, and accusations are flying. For this problem, we will encode the evidence in first-order logic formulas to find out who crashed the server. You've narrowed it down to four suspects: John, Susan, Mark, and Nicole. You have the following information:
  1. Mark says: "It wasn't me!"
  2. John says: "It was Nicole!"
  3. Nicole says: "It wasn't me, it was Susan!"
  4. Susan says: "Nicole's a liar."
  5. You know that exactly one person is telling the truth.
  6. You also know exactly one person crashed the server.
  1. Fill out liar() to return a list of 6 formulas, one for each of the above facts.
    The grader is set up such that you could run individual parts 3a-0 to 3a-5 to debug each formula only if you implement them in the order specified.
You can test your code using the following commands:
    python grader.py 3a-0
    python grader.py 3a-1
    python grader.py 3a-2
    python grader.py 3a-3
    python grader.py 3a-4
    python grader.py 3a-5
    python grader.py 3a-all  # Tests the conjunction of all the formulas
    
To solve the puzzle and find the answer, tell the formulas to the knowledge base and ask the query CrashedServer('$x'), by running:
    python grader.py 3a-run
    

Problem 4: Odd and even integers

In this problem, we will see how to use logic to automatically prove mathematical theorems. We will focus on encoding the theorem and leave the proving part to the logical inference algorithm. Here is the theorem:

If the following constraints hold:
  1. Each number $x$ has exactly one successor, which is not equal to $x$.
  2. Each number is either odd or even, but not both.
  3. The successor of an even number is odd.
  4. The successor of an odd number is even.
  5. For every number $x$, the successor of $x$ is larger than $x$.
  6. Larger is a transitive property: if $x$ is larger than $y$ and $y$ is larger than $z$, then $x$ is larger than $z$.
Then we have the following consequence:

Note: in this problem, "larger than" is just an arbitrary relation, and you should not assume it has any prior meaning. In other words, don't assume things like "a number can't be larger than itself" unless explicitly stated.

  1. Fill out ints() to construct 6 formulas for each of the constraints. The consequence has been filled out for you (query in the code).
    The grader is set up such that you could run individual parts 4a-0 to 4a-5 to debug each formula only if you implement them in the order specified. You can test your code using the following commands:
        python grader.py 4a-0
        python grader.py 4a-1
        python grader.py 4a-2
        python grader.py 4a-3
        python grader.py 4a-4
        python grader.py 4a-5
        python grader.py 4a-all  # Tests the conjunction of all the formulas
        
    To finally prove the theorem, tell the formulas to the knowledge base and ask the query by running model checking (on a finite model):
        python grader.py 4a-run
        

Problem 5: Semantic parsing (extra credit)

Semantic parsing is the task of converting natural lanugage utterances into first-order logic formulas. We have created a small set of grammar rules in the code for you in createBaseEnglishGrammar(). In this problem, you will add additional grammar rules to handle a wider variety of sentences. Specifically, create a GrammarRule for each of the following sentence structures. Note that CAs will not be answering specific questions about extra credit; this part is on your own!

  1. Example: Every person likes some cat. General template:
    $Clause ← every $Noun $Verb some $Noun
  2. Example: There is some cat that every person likes. General template:
    $Clause ← there is some $Noun that every $Noun $Verb
  3. Example: If a person likes a cat then the former feeds the latter. General template:
    $Clause ← if a $Noun $Verb a $Noun then the former $Verb the latter
After implementing these functions, you should be able to try some simple queries using nli.py! For example:
    $ python nli.py
    
    > Every person likes some cat.
    
    >>>>> I learned something.
    ------------------------------
    > Every cat is a mammal.
    
    >>>>> I learned something.
    ------------------------------
    > Every person likes some mammal?
    
    >>>>> Yes.
    

Problem 6: Explainability of Logic-Based Systems

The notion that AI systems should be explainable and their decisions interpretable has gained traction. For instance, GDPR’s Article 15 requires that individuals be provided “meaningful information” about the logic of automated decisions. Independently of legal considerations, explainable AI protects the interests of both end users (the people employing AI to make or inform decisions) and to stakeholders (individuals affected by automated decisions). These are some of the considerations cited in the literature:

  1. Respect: when decisions that affect you are made intelligible to you, this is considered an expression of respect for your status as a human being whose interests are worthy of consideration.
  2. Assessing fairness of rules: knowing what rules are applied to reach a decision, we have the ability to evaluate whether the rules are fair and whether they are fairly applied to us.
  3. Contesting and correcting decisions: understanding a decision we disagree with, allows us to contest it.
  4. Ability to change user behavior: understanding a decision that has a negative impact on us allows us to adapt our behavior in order to obtain different results in the future.

Explanability has many different meanings (see this paper by Andrew Selbst and Solon Barocas, this paper by Finale Doshi Velez and Been Kim, and the AI Explainability and Interpretability module). Intuitively, logic-based systems should have more explanability power than machine learning-based “black-box” systems. In this problem, let us explore what kind of explanability logic-based systems might provide and which human interests are protected.

Consider the Liar’s Puzzle from Problem 3. Here, the first-order logic system takes in a set of facts and reasons over them to produce a conclusion. In this case, the conclusion judges the truthfulness of each person, which could carry with it real-world consequences. For example, someone might be fired if they are found guilty of crashing the server, whether they did it or not (AI is used widely for the allocation of healthcare resources, credit scoring, insurance, and content moderation, among other consequential domains).

  1. In the context of a general first-order logic system, like the one used in Problem 3’s Liar Puzzle, what does the system return (besides the final answer) that could be considered an explanation?
    1-2 sentences answering the question.
  2. Why might this explanation not be adequate?

    Hint: think about model misspecification or framing of the question posed to the logic-based system.

    1-2 sentences answering the question.
  3. Now, reflect on the four explainability considerations defined in this problem. Are any of these met by the explanations produced by a logic-based system?
    2-4 sentences answering the question.

Problem 7: Applications of Soundness and Completeness

Two important properties in logic-based systems are soundness and completeness. A system is sound if everything that is provable (derivable) is in fact true whereas a system is complete if everything that is true has a proof (derivation).

  1. GPT-3.5 is a generative LLM pre-trained on a large corpus of diverse texts from the internet. The model can be fine-tuned to perform well on a wide array of downstream tasks. However, LLMs like GPT-3.5 are prone to hallucinations - or responses that are presented as fact but actually include false or misleading information [1]. Consider anything that’s generated by the LLM to be what it derives. Applying the logic definitions of soundness and completeness, explain whether GPT-3.5 is sound and/or complete. Explain your reasoning by drawing on examples and the definitions above.
    2-3 sentences for each definition of soundness and completeness (4-6 sentences total). You may use either natural language or mathematical notation in your answer.
  2. Imagine you are choosing between two proof-based systems. System A is sound but not complete. System B is complete but not sound. Assume the only difference between System A and System B is the difference between soundness and completeness.

  3. You want to deploy one of these systems in a safety-critical setting, where acting on a false conclusion has severe consequences. Which system would you choose and why?
    2-3 sentences explaining your choice, tying in the definitions of soundness and completeness. You may use either natural language or mathematical notation in your answer.
  4. Imagine you are in a setting where you have a separate verifier system that can validate hypotheses (e.g., a human in the loop that inspects the hypotheses). You want to capture as much true information as possible in this setting. Which system would you choose and why?
    2-3 sentences explaining your choice, tying in the definitions of soundness and completeness. You may use either natural language or mathematical notation in your answer.
Submission

Submission is done on Gradescope.

Written: When submitting the written parts, make sure to select all the pages that contain part of your answer for that problem, or else you will not get credit. To double check after submission, you can click on each problem link on the right side, and it should show the pages that are selected for that problem.

Programming: After you submit, the autograder will take a few minutes to run. Check back after it runs to make sure that your submission succeeded. If your autograder crashes, you will receive a 0 on the programming part of the assignment. Note: the only file to be submitted to Gradescope is submission.py.

More details can be found in the Submission section on the course website.



[1] https://en.wikipedia.org/wiki/Hallucination_(artificial_intelligence)/

[2] What's in the Box? A Preliminary Analysis of Undesirable Content in the Common Crawl Corpus https://arxiv.org/abs/2105.02732