projects
project #4 (due tuesday 4/29) -- prolog
NOTE: If you are registered for CSCI-4430, you may work in teams of 2 on this project;
if you are registered for CSCI-6969, complete this project individually.
Your Prolog file(s) must execute correctly using the SWI-Prolog environment.
Once you complete and test your Prolog file(s), tar and gzip them and submit via WebCT.
Your submitted file must be named <RCS_id>_proj4.tar.gz (e.g. goldsd3_proj4.tar.gz).
Be sure to put your name(s) as comments at the beginning of each Prolog file submitted.
Write the following relations in Prolog (see below for usage):
IMPORTANT: You are not allowed to use the variety of relations Prolog provides for you, including append, length, sort, msort, merge, etc.
Only use list-processing (dismantling) notation and other predicates and relations we've covered in class examples.
If naming conflicts occur, feel free to rename relations below (e.g. use mymerge instead of merge).
- Based on the count relation, write a countall relation that counts all occurrences of an atom in an arbitrarily deep list.
More specifically, define relation countall(Atom, List, N).
- Define max and min relations to determine the maximum and minimum numeric values on a list (ignoring non-numeric atoms).
More specifically, define relations max(List, Max) and min(List, Min).
Also provide a maxmin relation: maxmin(List, Max, Min).
- Define an ordered predicate to determine whether the given list is in numeric order (again ignoring non-numeric atoms).
More specifically, define predicate ordered(List).
- Define a sumlist relation to produce the sum of a given list of numbers (again ignoring non-numeric atoms).
More specifically, define relation sumlist(List, Sum).
- Define a sublist predicate to determine whether a given list L1 is a subset of another list L2.
More specifically, define predicate sublist(L1, L2).
- Define a merge relation that merges two sorted numeric lists together in sorted order (discarding non-numeric atoms).
More specifically, define relation merge(L1, L2, MergedList).
- Define a mergesort relation that uses merge to sort two numeric lists together using the mergesort algorithm (again discarding non-numeric atoms).
More specifically, define relation mergesort(L1, SortedList).
ADDITIONAL CLARIFICATIONS:
--- If a list consists of more than 9 elements, Prolog will display
as [2, 2, 2, 3, 3, 4, 4, 4, 4|...]. This is acceptable output
for your project (e.g. see the mergesort example below).
--- For the sublist relation, the first parameter is considered a sublist
only if it appears in the exact order in the second parameter, which
should likely make implementation easier. So, list [1,2,3] is a
sublist of [1,2,1,3,1,2,3,3,2], but not a sublist of [2,1,2,4,3,3].
> In Project 4, do we have to come up with only one answer,
> or the first correct is allowed? What I mean is that the
> first answer may be correct, but if you don't use '.' and
> use ';' instead, you receive more answers as the recursion
> continues.
>
To match the expected output, your recursion must stop and
you must produce the correct result without expecting the
user to stop the recursion. So, please use cut ! appropriately.
> For the ordered operation (and other predicates), is it
> supposed to respond with No and Yes, or true and false?
>
Either one is fine -- i.e. whatever the Prolog environment
produces.
> Are we allowed to use "number(N)"? You never reviewed the
> method in class, yet a few functions asks for it. For example,
> in something like "merge([1, 2, 3], [1, [2], 3], L).", it
> should technically fail from what you've described, and the
> only way to do that is to use "number()", or other variants.
>
Yes, sorry for any confusion -- feel free to use the
very basic predicates (e.g. number and atom).
> I wasn't sure what you meant when you said in class that
> we were allowed to use flatten in the count all function.
> Are we allowed to use the flatten function built in prolog?
> Or did you mean the flatten that we did in class?
>
You are allowed to write your own flatten relation, but not
allowed to use the built-in one (so you could create a
myflatten relation and use that).
> For the mergesort, does the algorithm have to divide the
> list in the typical fashion
> ([a, b, c, d, e, f] --> [a, b, c] , [d, e, f])
> or can it divide it in a different fashion that still
> does an O(nlog(n))
> sort ([a, b, c, d, e, f] --> [a, c, e] , [b, d, f]) ?
>
This sounds okay -- as long as the only change to the
traditional mergesort algorithm is how the elements are split.
TEST CASES AND REQUIRED BEHAVIOR: The transcript below shows the required behavior of each of the above relations, as well as the test cases used to grade this project:
?- countall(a, [a,b,c], N).
N = 1
?- countall(a, [b,c,d], N).
N = 0
?- countall(a, [b,a,c,[d,c,b,a,a],a,d], N).
N = 4
?- countall(a, [a,b,[b,a,[a,b,a],a],c], N).
N = 5
?- max([5,7,9,4,5,2,8], Max).
Max = 9
?- max([4,a,8,3,c,7,2], Max).
Max = 8
?- min([4,2,1,2], Min).
Min = 1
?- min([2,3,2,4], Min).
Min = 2
?- maxmin([5,4,8,2,a,3,b,3,9,2,4], Max, Min).
Max = 9,
Min = 2
?- ordered([2,4,6,8]).
Yes
?- ordered([3,5,d,5,6]).
Yes
?- ordered([2,4,1,5,6]).
No
?- sumlist([1,5,2,4,2], Sum).
Sum = 14
?- sumlist([-2,0,2,a,1], Sum).
Sum = 1
?- sublist([5,1,5,4,2], [5,4]).
No
?- sublist([5,1,5,4,2], [6,5,5,1,5,4,2,3]).
Yes
?- sublist([], [a,b,c]).
Yes
?- sublist([a], [b,c,d]).
No
?- merge([3,3,4,6], [2,4,4,5,8], MergedList).
MergedList = [2, 3, 3, 4, 4, 4, 5, 6, 8]
?- merge([3,a,b,3,4], [a,4,4], MergedList).
MergedList = [3, 3, 4, 4, 4]
?- merge([], [a,4,5,8,b,9], MergedList).
MergedList = [4, 5, 8, 9]
?- mergesort([9,2,4,2,4,8,3,3,2,4,6,4], SortedList).
SortedList = [2, 2, 2, 3, 3, 4, 4, 4, 4|...]
?- mergesort([a,b,c], SortedList).
SortedList = []
?- mergesort([3,a,2,b,b,9,3,5,2,d], SortedList).
SortedList = [2, 2, 3, 3, 5, 9]
project #3 (due friday 4/18) -- scheme
NOTE: If you are registered for CSCI-4430, you may work in teams of 2 on this project;
if you are registered for CSCI-6969, complete this project individually.
Your Scheme functions must execute correctly using MIT/GNU Scheme (though I expect no problems if you use DrScheme instead).
Once you complete and test your Scheme file(s), tar and gzip them and submit via WebCT.
Your submitted file must be named <RCS_id>_proj3.tar.gz (e.g. goldsd3_proj3.tar.gz).
Be sure to put your name(s) as comments at the beginning of each Scheme file submitted.
Write the following functions in Scheme (be sure all functions react accordingly and do not produce errors with null or other invalid input):
- Write a Scheme function called palindrome? to determine whether a simple list is a palindrome or not.
Your function should return either #t or #f.
- Write a Scheme function called palindrome_all? to determine whether a nested list is a palindrome or not (i.e. the list and all sublists are palindromes, including the order).
Your function should return either #t or #f.
- Write a Scheme function called odd_count that takes a simple list of numbers and returns the count of the number of odd numbers in the list.
- Write a Scheme function called find_min_and_max that takes a simple list of numbers (e.g. (5 4 2 3 10 9)) as its input parameter and returns a new list with the minimum and maximum values, as in (2 10).
Note: You are not allowed to use min and max functions!
- Write a Scheme function called sort that takes a simple list of numbers (e.g. (5 4 2 3 10 9)) as its input parameter and returns a new list with the numbers sorted, as in (2 3 4 5 9 10).
- Write a Scheme function called sort_all that takes a nested list of numbers (e.g. (5 (4 2) 3 (10 9 7) 9)) as its input parameter and returns a new list with the numbers sorted in ascending order at each level.
Further, the first element of a sublist is used at the level above.
For the example list, the resulting list is ((2 4) 3 5 (7 9 10) 9).
TEST CASES AND REQUIRED BEHAVIOR: The transcript below shows the required behavior of each of the above functions, as well as the test cases used to grade this project:
> (palindrome? '())
#t
> (palindrome? '(a))
#t
> (palindrome? '(a b))
#f
> (palindrome? '(a b b a))
#t
> (palindrome? '(a b c b a))
#t
> (palindrome? '(m a d a m i m a d a m))
#t
> (palindrome_all? '())
#t
> (palindrome_all? '(a))
#t
> (palindrome_all? '(a b))
#f
> (palindrome_all? '(a b b a))
#t
> (palindrome_all? '(a b c b a))
#t
> (palindrome_all? '(a (b c) d e d (c b) a))
#t
> (palindrome_all? '(a (b c) d e d c b a))
#f
> (palindrome_all? '(a (b (c d)) d e d ((d c) b) a))
#t
> (odd_count '())
0
> (odd_count '(10 20 30 40 50))
0
> (odd_count '(11 21 31 41 51))
5
> (odd_count '(10 11 12 13 14 17 20))
3
> (find_min_and_max '())
(0 0)
> (find_min_and_max '(5 4 2 3 10 9))
(2 10)
> (find_min_and_max '(2 2 2 2 2))
(2 2)
> (find_min_and_max '(3))
(3 3)
> (sort '(5 4 2 3 10 9))
(2 3 4 5 9 10)
> (sort '(1))
(1)
> (sort '())
()
> (sort '(10 9 8 7 6 5 4))
(4 5 6 7 8 9 10)
> (sort '(3 4 5 6))
(3 4 5 6)
> (sort_all '(5 4 2 3 10 9))
(2 3 4 5 9 10)
> (sort_all '(1))
(1)
> (sort_all '())
()
> (sort_all '(10 9 8 7 6 5 4))
(4 5 6 7 8 9 10)
> (sort_all '(3 4 5 6))
(3 4 5 6)
> (sort_all '(5 (4 2) 3 (10 9 7) 9))
((2 4) 3 5 (7 9 10) 9)
> (sort_all '(17 (7 9 (3 8 1) 12) 19 (4 6)))
(((1 3 8) 7 9 12) (4 6) 17 19)
optional project #2.5 (due friday 4/4) -- python
NOTE: Regardless of whether you are registered for CSCI-4430 or CSCI-6969, complete this project individually or in a team of 2.
This project is optional, but, if submitted, will be weighted equally with the other four projects.
See course Web site for grading options.
Your Python program must execute successfully under UNIX. Once completed and tested, tar and gzip your files (including a README) and submit via Blackboard. Your submitted file must be named <RCS_id>_proj2_5.tar.gz (e.g. goldsd3_proj2_5.tar.gz).
PART I: Using Python and urllib, create an interactive program that asks the user a series of questions.
Based on the user's answers, information is obtained from various Web pages through a technique called screen-scraping in which results are obtained by removing HTML tags and programmatically identifying information on existing Web pages.
Your text-based program must support at least two of the following commands: weather, lookup, search, and translate. More specifically, commands correspond to providing a weather report given locale information (i.e. city, state, country, zip code, etc.), dictionary definitions given English words and phrases, spoken/written language translation, and generic Yahoo search results (or try using Google).
Below is sample output, with user input shown in bold.
Your program input and output must be as similar as possible, dependent only on the variations in sourced Web pages.
bash$ python proj2_5_1.py
Welcome to the new (text-based) Web.
==> weather 12203
Albany, NY (12203)
temperature: 32.0 F (feels like 22.0 F)
tomorrow's forecast: 47.0 F (Sunny)
==> weather hartford, ct
Hartford, CT
temperature: 41.0 F (feels like 36.0 F)
tomorrow's forecast: 49.0 F (Sunny)
==> lookup python
python -- any of several Old World boa constrictors of the subfamily
Pythoninae, often growing to a length of more than 20 ft. (6 m): the
Indian python, Python molurus, is endangered.
==> translate money to german
English: money
German: Geld
==> search python
Keyword(s): python
Hits: 167,000,000
Top 5 URLs:
1. www.python.org
2. en.wikipedia.org/wiki/Python_programming_language
3. www.python.org/download
4. en.wikipedia.org/wiki/Python
5. opensource.nokia.com/projects/pythonfors60
==> quit
PART II: Using Python, develop a Sudoku puzzle solver that loads a Sudoku puzzle from a file.
Provide an interactive text-based interface showing the original puzzle.
Then show each iteration of the Sudoku solver (i.e. show the puzzle after each new symbol is placed).
Stop when the solution is obtained.
RULES: A Sudoku puzzle is a 9x9 grid with nine 3x3 subgrids.
Each row of a Sudoku puzzle has a symbol from a set of nine distinct symbols (e.g. digits 1-9) exactly once.
Likewise, each column has each symbol exactly once.
Further, each 3x3 subgrid within the 9x9 grid has each symbol exactly once.
Your Sudoku solver must find a solution or state that a solution cannot be found.
In other words, no infinite loops or infinite executions; your program must halt.
Further, backtracking is not allowed! Your program must only use logic rules and must not make any mistakes.
Your program should accept Sudoku puzzles in text format using a period to indicate a blank space.
For example:
.45....7.
8...296.3
.6.15...8
.2.6.59..
.57...84.
..37.4.1.
9...38.6.
7.254...1
.3....28.
or:
.DE....G.
H...BIF.C
.F.AE...H
.B.F.EI..
.EG...HD.
..CG.D.A.
I...CH.F.
G.BED...A
.C....BH.
Your program should validate the given input, ensuring the size of the puzzle is appropriate and contains a unique set of symbols.
Further, your program should identify the set of nine symbols for the given puzzle, using only those symbols throughout the solving process.
Test your program with the following Sudoku puzzles (puzzles 1-6 based on puzzles from the Albany Times Union, puzzles 7-8 based on puzzles from the Polytechnic):
- sudoku-01.txt
- sudoku-02.txt
- sudoku-03.txt
- sudoku-04.txt
- sudoku-05.txt
- sudoku-06.txt
- sudoku-07.txt
- sudoku-08.txt
CLARIFICATIONS: Please read below for answers to questions and other clarifications about Project #2.5.
(1) For full credit, your program must be able to solve
all eight example Sudoku puzzles without guessing
or backtracking.
(2)
project #2 (due tuesday 3/4) -- parse trees and operational semantics
NOTE: If you are registered for CSCI-4430, you may work in teams of 2 or 3 on this project;
if you are registered for CSCI-6969, complete this project individually or in a team of 2 and perform additional work described below.
Your C program must compile using gcc on UNIX. Once completed and tested, tar and gzip your files (including a README) and submit via WebCT. Your submitted file must be named <RCS_id>_proj2.tar.gz (e.g. goldsd3_proj2.tar.gz).
Please be sure to use the following naming convention for your C source file that contains your main() function:
<RCS_id>_proj2.c (e.g. goldsd3_proj2.c)
In addition, your program should process the first command-line argument (i.e. argv[1]) as the input program filename.
PART I: Using C (not C++) and the standard C library only, write a recursive-descent parser for the grammar below.
The required output (to stdout) is a trace, as shown in the lecture notes and the text.
Further, your program should generate a parse tree, which is used in Part II.
<program> --> begin <stmts> end
<stmts> --> <stmt> | <stmt> <stmts>
<stmt> --> <declare> ; | <assign> ;
<declare> --> <type> identifier
<type> --> integer
<assign> --> identifier := <expr>
| identifier , <assign>
<expr> --> <expr> + <term>
| <expr> - <term>
| <term>
<term> --> <term> * <factor>
| <term> / <factor>
| <factor>
<factor> --> ( <expr> )
| integer-literal
| identifier
The given grammar contains direct left recursion and does not pass the pairwise disjointness test.
Rewrite the grammar before implementation.
Feel free to use EBNF.
An identifier begins with a lowercase letter, which is optionally followed by lowercase letters and digits. The maximum length of an identifier is 32 characters (you can assume all variables are of valid length).
An integer-literal optionally begins with a negative sign (-), followed by a series of one or more digits.
Aside from 0, no other integer-literal begins with a 0.
Further, -0 is not valid.
For CSCI-6969, add the following rules to the above grammar (i.e. adding support for floating-point numbers):
<type> --> real
<factor> --> decimal-literal
A decimal-literal follows the same rules as an integer-literal; further, a decimal-literal may contain a single decimal point (.) and digits after the decimal point.
As reviewed in class, the following cases are considered invalid: "." and "-." (i.e. there must be at least one digit either before or after the decimal point).
PART II: Again using C and the standard C library only, use your parse tree from Part I to generate intermediate code based on the following operational semantics description, a subset of C:
int identifier;
identifier = identifier;
identifier = integer-literal;
identifier += identifier;
identifier += integer-literal;
identifier -= identifier;
identifier -= integer-literal;
identifier *= identifier;
identifier *= integer-literal;
identifier /= identifier;
identifier /= integer-literal;
Further, curly brackets ({ and }) may be used to specify a block of code with new variable scope.
As an example, PROGRAM #1 (see below) should generate the following intermediate code, with internal nodes of associated parse trees represented by hidden variables _a#.
Though exact results may vary, for full credit, be sure to use only the operations described above; further, your generated code should be valid C and produce the same values in all specified variables (so add main() as necessary).
int k;
int m;
int p;
k = 495;
m = -11;
{
int _a1;
_a1 = k;
_a1 += m;
p = _a1;
}
p *= m;
Output the operational semantics code to an output file called proj2_output.c and create the file in the local working directory.
Feel free to add main(), etc. for testing your code using gcc.
For CSCI-6969, add the following statements to the operational semantics description above (i.e. again adding support for real (floating-point) numbers).
Be sure to produce an error if a type mismatch occurs.
double identifier;
identifier = decimal-literal;
identifier += decimal-literal;
identifier -= decimal-literal;
identifier *= decimal-literal;
identifier /= decimal-literal;
INPUT FILES: Your program (for both Part I and Part II) must correctly parse and process the following input files (or produce an error message and exit, if necessary):
PROGRAM #1
begin
integer k;
integer m;
integer p;
k := 495;
m := -11;
p := (k + m) * m;
end
PROGRAM #2
begin
integer a;
integer b;
integer c;
integer d;
integer e;
a, b := 1;
c := a + b;
d := b + c;
e := (d * (d + 1)) / 2;
end
PROGRAM #3
begin
integer x;
integer y;
integer z;
x, y, z := 10;
z := z * 6 - 45 / y;
y := 1600 / z;
x := y * 2 * z;
end
PROGRAM #4
begin
integer f;
integer a;
a := 7;
f := (a * (a-1) * (a-2)) / ((a-4) * (a-5));
end
PROGRAM #5 (CONTAINS ERROR)
begin
integer a;
integer b;
a := 1;
b := a * + 100;
end
PROGRAM #6 (CONTAINS ERROR)
begin
integer a;
integer b;
a, b := 19 * 86;
a, b := 27 * * 10;
end
PROGRAM #7 (CONTAINS ERROR)
begin
integer a := 14;
integer b := a * 100;
end
PROGRAM #8 (FOR CSCI-6969)
begin
integer a;
real b;
real c;
a := 2;
b := -40.15;
c := b * a + 12.5;
end
PROGRAM #9 (FOR CSCI-6969)
begin
real x;
real y;
real z;
x, y := 127.95;
z := 0.08;
y := y * z;
z := x + y;
end
PROGRAM #10 (FOR CSCI-6969 -- CONTAINS ERROR)
begin
real x;
real y;
real z;
x := 127.95 - * 10;
y := x * 3;
z := x + y;
end
CLARIFICATIONS: Please read below for answers to questions and other clarifications about Project #2.
(1) Regarding whether whitespace may be used as a token
delimiter: no --- please use basic lexical analysis,
as done in multiple examples and Project #1 to obtain
the necessary lexemes. So, the sentence "x:=50+y;"
will be lexically analyzed to produce lexemes "x",
":=", "50", "+", "y", and ";".
(2) One of you asked about sample PROGRAM #4, in particular,
about the (a-1) and other similar terms. The question
is: is the latter part "-1" (i.e. integer-literal) or a
minus operator followed by integer literal "1"?
Answer: sample PROGRAM #4 is correct according to the
given grammar. Whitespace is not part of the language
and is therefore ignored. Here's a hint on how to process
this correctly: always treat the '-' character as
MINUS_OPERATOR at the lexical analysis stage; further,
accept integer (and decimal) literals as positive
numbers only. At the parsing stage, if you encounter a
MINUS_OPERATOR as part of the "<expr> --> <expr> - <term>"
rule, you're all set; otherwise, if you're expecting an
integer (or decimal) literal and encounter a
MINUS_OPERATOR, combine the MINUS_OPERATOR with the
positive integer (or decimal) literal that follows
to form a negative literal.
(3) You do NOT need to keep track of variables (e.g. using a
symbol table), so detecting an undeclared variable
is NOT necessary. As such, replace sample PROGRAM #5 with
the following:
PROGRAM #5 (CONTAINS ERROR)
begin
integer a;
integer b;
a := 1;
b := a * + 100;
end
(4) A few of you have noted the given grammar, even after revision,
is still problematic due to the following rules:
<assign> --> identifier := <expr> | identifier := <assign>
The problem is that <expr> eventually leads to identifier, as
does <assign>, so a one-token lookahead is not sufficient. Rather than
have you implement a multiple-token lookahead mechanism, let's
simplify the project by introducing a comma operator.
More specifically, replace the above rules with:
<assign> --> identifier := <expr> | identifier , <assign>
NOTE: If you have already implemented Project #2 using the initial
grammar, you may disregard clarification (4) above.
In particular, you may keep the following rule:
<assign> --> identifier := <expr> | identifier := <assign>
Other approaches to handling this grammar are to use a
two-token lookahead or to further revise the grammar to work
with a recursive-descent parser.
(5) Regarding the operational semantics code, the specification
states, "Output the code to an output file." Please call the
output file proj2_output.c and create the file in the
local working directory.
(6) For handling errors, displaying a meaningful error message
and exiting the program is fine. For example, you could show
the line number and the text of the line in which the error
occurred.
(7) For CSCI-6969, sample PROGRAM #10 has an error that requires
the data type of x to be known. Because we have omitted the
need for a symbol table (see (3) above), please use the revised
sample PROGRAM #10 shown here:
PROGRAM #10 (FOR CSCI-6969 -- CONTAINS ERROR)
begin
real x;
real y;
real z;
x := 127.95 - * 10;
y := x * 3;
z := x + y;
end
> It was asked, though never answered, if it was okay to use the example
> lexical analysis code (lex, getChar, addChar, etc.) in the book and in
> the notes as a starting point toward performing lexical analysis on this
> project. Is it in fact alright to do so, or must I find an entirely
> different way to do this? Thank you.
>
(8) Definitely feel free to use the lexical analysis code from the book
and in-class examples. I'd suggest you use that code as a starting
point so you don't get bogged down with re-inventing the wheel to
do lexical analysis.
> Can you clarify what is meant exactly by outputting a 'trace' for
> the first part of project 2.
>
(9) For the "trace" output for Part I, refer to your text (section 4.4),
as well as the lecture notes with example recursive-descent parser
trace output.
> What should the parse tree output look like?
>
(10) You only need to generate recursive-descent parser output -- i.e. a
"trace" as described above. For the parse tree, use whatever
internal representation you like, as long as you obtain proper
operational semantics output for Part II.
> Since we no longer worry about data types or symbol table (thank yoU!),
> how do we grad students handle output of temporary variables? Do we
> use "double" or will this not be sufficient?
>
(11) Good question. You may assume that all temporary variables are of
type "double" -- though in reality, you'd use the actual data type
from a symbol table, etc. For full credit, simply use "double" for
temporary variables.
> Are identifiers allowed to have the same names as
> the keywords ("begin", "end", and "integer")?
>
(12) Either way is fine. I'd suggest not allowing this.
project #1 (due thursday 2/7) -- interpreting basic using c
NOTE: If you are registered for CSCI-4430, you may work in teams of 2 on this project;
if you are registered for CSCI-6969, complete this project individually and perform additional work described below.
Your C program must compile using gcc on UNIX. Once completed and tested, tar and gzip your files (including a README) and submit via WebCT. Your submitted file must be named <RCS_id>_proj1.tar.gz (e.g. goldsd3_proj1.tar.gz).
Please be sure to use the following naming convention for your C source files:
<RCS_id>_part1.c (e.g. goldsd3_part1.c)
<RCS_id>_part2.c (e.g. goldsd3_part2.c)
Also see below for clarifications and answers to questions.
PART I: Using C (not C++) and the standard C library only, build a lexical analyzer for BASIC.
Do not perform syntax analysis yet!
Start with the partial BASIC grammar given in Chapter 3 Notes - Part II.
Your program reads a BASIC program as input, identifies the lexemes, and produces a reformatted BASIC program as output.
The output is a prettified version of the input, delimiting all lexemes with a single space character. Resist the temptation to remove comments (i.e. REMark statements).
Your program should ensure line numbers are in the proper order, reordering as necessary. Thus each line should begin with an integer literal (a string of one or more digit characters). If line numbers are not in order, display a warning message for each error.
For example, given the following sample BASIC program:
10 REM *** THIS PROGRAM DISPLAYS VALUES OF VARIABLES
20 LET X = 47
30LETY=X
40 REM *** ALSO DISPLAY A STRING
50 LET Z$="THIS IS REALLY BASIC."
60 PRINT Z$
35PRINTX
37PRINTY
70END
Your program should display the following warning messages:
*** WARNING: LINE 35 IS NOT IN THE PROPER ORDER
*** WARNING: LINE 37 IS NOT IN THE PROPER ORDER
And your program should output the following prettified format:
10 REM *** THIS PROGRAM DISPLAYS VALUES OF VARIABLES
20 LET X = 47
30 LET Y = X
35 PRINT X
37 PRINT Y
40 REM *** ALSO DISPLAY A STRING
50 LET Z$ = "THIS IS REALLY BASIC."
60 PRINT Z$
70 END
Use command-line arguments to specify the filename of the input BASIC program and the filename of the output BASIC program.
If there is not exactly two command-line arguments, or the command-line arguments do not identify valid files, the program should display the appropriate error message(s) and exit:
ERROR: error message
USAGE: program-name <BASIC-input-filename> <BASIC-output-filename>
Hint: program-name is argv[0].
PART II: Again using C and the standard C library only, build a BASIC interpreter that reads prettified BASIC programs produced by PART I above.
Program output should be sent to stdout and stderr.
As with Part I, start with the partial BASIC grammar given in Chapter 3 Notes - Part II.
Parse each line of BASIC code and execute the code within your C program.
Ignore comments (i.e. REMarks).
Determine line numbers as you go (e.g. 10, 20, 30, 35, 37, etc.).
Valid line numbers are in the range 0-65535 (i.e. an unsigned short).
Display an error message if an invalid line number is detected.
Add support for variables (e.g. X, Y, PI, Z$, Y$), integer literals (e.g. 47, 5000), and string literals (e.g. "THIS IS REALLY BASIC.").
Perform rudimentary type-checking, displaying error messages and halting program execution as necessary.
Hint: After the line number, the first lexeme is always a keyword (e.g. PRINT, INPUT, LET, END, etc.).
Once you identify the keyword, you know how to parse the rest of the line.
Extend your program to support the GOTO statement to jump to the specified line number.
Extend your program to support the + operator to add multiple operands in an assignment statement.
The additional BNF is as follows:
<assignment> --> <integer-variable> = <expression>
| <string-variable> = "string-literal"
<expression> --> integer-literal
| <integer-variable>
| <expression> + <expression>
Extend your program to support an IF statement, as follows:
<if-stmt> --> IF <boolean-expression> THEN <stmt-not-if>
<boolean-expression> --> <operand> <operator> <operand>
<operand> --> integer-literal | <integer-variable>
<operator> --> = | < | >
For CSCI-6969, add support for GOSUB and RETURN statements, which provide a makeshift subroutine, as in the following example code:
...
40 LET X = 100
50 GOSUB 1000
60 PRINT X
...
1000 LET X = X + 50
1010 LET Y = 10
1020 LET X = X + Y
1030 RETURN
...
Note that this requires jumping forward to code your BASIC interpreter might not have read yet.
One way to handle this is to read in the entire BASIC program into various data structures such that all lines are available.
At a minimum (i.e. for full credit), your program must be able to execute the following sample BASIC programs (use these as test cases) with sample input and output shown:
10 REM ****** PROGRAM #1
15 REM *** THIS PROGRAM SUMS TWO INTEGERS
20 PRINT "ENTER FIRST NUMBER:"
25 INPUT X
30 PRINT "ENTER SECOND NUMBER:"
35 INPUT Y
40 LET S = X + Y
50 PRINT "THE SUM IS:"
60 PRINT S
70 END
------ SAMPLE OUTPUT:
ENTER FIRST NUMBER:
? 19
ENTER SECOND NUMBER:
? 27
THE SUM IS:
46
10 REM ****** PROGRAM #2
15 REM *** THIS PROGRAM COMPARES TWO INTEGERS
20 PRINT "ENTER FIRST NUMBER:"
25 INPUT X
30 PRINT "ENTER SECOND NUMBER:"
35 INPUT Y
40 IF X > Y THEN PRINT "THE FIRST NUMBER IS GREATER"
50 IF X < Y THEN PRINT "THE SECOND NUMBER IS GREATER"
60 IF X = Y THEN PRINT "BOTH NUMBERS ARE THE SAME"
70 END
------ SAMPLE OUTPUT:
ENTER FIRST NUMBER:
? 19
ENTER SECOND NUMBER:
? 27
THE SECOND NUMBER IS GREATER
10 REM ****** PROGRAM #3
15 REM *** THIS PROGRAM SUMS INTEGERS UNTIL
20 REM *** SENTINEL VALUE 999 IS ENTERED
30 LET S = 0
40 PRINT "ENTER A NUMBER (999 TO END):"
50 INPUT X
60 IF X = 999 THEN GOTO 100
70 LET S = S + X
80 GOTO 40
100 PRINT "YOUR TOTAL IS:"
110 PRINT S
120 END
------ SAMPLE OUTPUT:
ENTER A NUMBER (999 TO END):
? 19
ENTER A NUMBER (999 TO END):
? 27
ENTER A NUMBER (999 TO END):
? 15
ENTER A NUMBER (999 TO END):
? 999
YOUR TOTAL IS:
61
As with Part I, use command-line arguments to specify the filename of the input BASIC program.
Your program should display appropriate error message(s) if used incorrectly:
ERROR: error message
USAGE: program-name <BASIC-input-filename>
For CSCI-6969, add an optional flag that provides a "validate-only" mode in which the BASIC program is not executed, but it is checked for errors.
If your program runs into a syntax error (e.g. unknown keyword), display an error message and keep executing the next line, if possible.
In other words, recover gracefully.
For example, given the following BASIC program (another test case):
10 REM ****** PROGRAM #4
15 REM *** THIS PROGRAM HAS DELIBERATE SYNTAX ERRORS
20 LET Q = 47
30 PRINR "WELCOME TO MY PROGRAM"
40 IF Q > 40 THEN PRINT "GOING"
50 GOTO 25
60 PRINT "GOING"
70 LET R = 52
75 LETR = 5RS
80 IF R > Q THEN PRINT "GONE"
90 END
------ EXPECTED OUTPUT:
*** SYNTAX ERROR ON LINE 30 (SKIPPED)
GOING
*** SYNTAX ERROR ON LINE 50 (CANNOT FIND LINE 25)
GOING
*** SYNTAX ERROR ON LINE 75 (SKIPPED)
GONE
Further, if a variable is used without being defined, show the following warning message and use a default value of 0 or the empty string:
10 REM ****** PROGRAM #5
15 REM *** THIS PROGRAM HAS DELIBERATE LOGIC ERROR
30 PRINT X
40 PRINT Z$
50 IF Y < 20 THEN PRINT "WOW!"
60 END
------ EXPECTED OUTPUT:
*** WARNING: UNINITIALIZED VARIABLE ON LINE 30
0
*** WARNING: UNINITIALIZED VARIABLE ON LINE 40
*** WARNING: UNINITIALIZED VARIABLE ON LINE 50
WOW!
CLARIFICATIONS: Please read below for answers to questions and other clarifications about Project #1.
> ...can we process the whole program first, and then execute it?
> The project page states that we should "determine line numbers as you go",
> but for the extra parts for the 6000 level class it says that it may be
> helpful to parse the entire file first.
>
(2) Yes, you can process the entire program first, then execute it. In reality,
this is not the best idea, since long programs would require much initial processing;
further, such long programs would likely have sections of code that are rarely or
never used. Nonetheless, feel free to read the entire program into memory in
internal data structures (which means we're not exactly building an interpreter).
> We just need to deal with the keywords in your PPT and project statement.
> Do not need to consider other,e.g. AND,right?
>
(3) Yes, your program is only required to support the grammar in the lecture
notes --and-- the additional keywords and functionality described in the
project description. Features such as AND, OR, and string concatenation
are not required to implement.
> For the command line arguments, if "....arguments do not identify valid files",
> prompt error. what do you mean by "valid files"? with wrong postfix or ..?
>
(4) When processing command-line arguments, if the input file is not a valid
filename (e.g. file not found), display an error message and exit the program
(since you have no input file to process). In short, handle any filesystem
errors here.
> For the part 1, I do not think we need any technology,e.g. regular expression,
> to do lexical analysis. Basically, we just need to take care the keywords
> and remove redundant spaces?
>
(5) Correct -- though regular expressions would potentially make the task easier,
you certainly don't need to use them. And, to be clear, regular expressions are
not allowed -- stick to the standard C library.
> you did not give us a complete example of GOSUB..RETURN.
> Do we need to come up with our own one?
>
(6) Yes, I'd suggest writing your own sample BASIC program(s) using GOSUB and RETURN.
To ensure your interpreter's correctness, I'd suggest writing a few additional
BASIC programs (e.g. the latest BASIC programs assigned as homework).
More clarifications:
(1) You can assume the maximum number of BASIC lines is 100.
Be sure to define this as a constant.
(2) You can assume the maximum line length is 80 characters.
Be sure to define this as a constant.
(3) In Part I, in addition to low-level lexical analysis, identify the
first instruction of each line. For example,
input:
30LETY=X
output:
30 LET Y = X
(4) Further, in Part I, if the first lexeme after the line number is
not a valid instruction, your output should be as follows:
input:
30LERX=Y
40 50 LET Z = 10
60 * PRINT Z
70 BLAH*INPUT ZZ$
output:
30 LERX = Y
40 50 LET Z = 10
60 * PRINT Z
70 BLAH * INPUT ZZ$
In addition, you could display an error message, or even bail out with
this error (i.e. stop execution). Displaying an error is optional.
(5) In Part I, lexemes consist of punctuation and groups of letters
and digits.
input:
30LETX,Y
output:
30 LET X , Y
As there are numerous cases to consider, use your best judgement in
how to delimit the lexemes. In particular, review the following
examples:
input:
40 LET ABCDEF=12
50 LET AB$CDE$EF$="BLAH"
60 LET A1B1C1=18
70 LET ABCDEF$="BLAH BLAH"
output:
40 LET ABCDEF = 12
50 LET AB$ CDE$ EF$ = "BLAH"
60 LET A1B1C1 = 18
70 LET ABCDEF$ = "BLAH BLAH"
(6) In Part I, do not change text inside double-quotes:
input:
30LETZ$="HELLO HELLO HELLO!"
output:
30 LET Z$ = "HELLO HELLO HELLO!"
(7) In Part I, if the closing double-quote is missing, rely on the
newline character and output the line as is. The syntax error should
be identified in Part II.
input:
30 PRINT "HELLO HELLO!
output:
30 PRINT "HELLO HELLO!
(8) Part I and Part II must be two separate programs and two
separate executables. Nonetheless, you can share code
(e.g. functions) between Part I and Part II.
(9) WebCT assignment dropbox should be available Friday;
further, grading criteria should be established and available
on Friday. For grading, test with BASIC programs from specs
for Part I; your output from Part I tested through Part II;
if Part I does not work, we will have our own (correct) input
for Part II. Also additional input to Part II (e.g. error
checking) will be used.
(10) For REMarks in Part I, just echo the rest of the line
to Part II (similar to a double-quoted string).
input:
30REM**** BLAH BLAH BLAH BLAH
output:
30 REM **** BLAH BLAH BLAH BLAH
(11) Both Part I and Part II must support the BASIC extensions
described in Part II -- i.e. the GOTO, IF-THEN, and (for CSCI-6969)
GOSUB-RETURN instructions.
Additional clarifications:
Hello folks, a few more questions (and answers) for you regarding Project #1:
> how should we name our source files?
>
Please be sure to use the following naming convention for your C source files:
<RCS_id>_part1.c (e.g. goldsd3_part1.c)
<RCS_id>_part2.c (e.g. goldsd3_part2.c)
Also, name your gzipped tar file as follows:
<RCS_id>_proj1.tar.gz (e.g. goldsd3_proj1.tar.gz)
> Do we have to error check to make sure that the argument given is
> actually a BASIC program? Or can we assume that whatever input is given
> will have the proper structure (a bunch of lines seperated by \n)
> Also, can we assume that every line in the input will start with a line
> number? Basically, do we have to be able to deal with unnumbered lines?
>
Unnumbered lines should produce an error in both Part I and Part II.
Feel free to exit the program with an error message. In general, Part I
should pass just about anything through, whereas Part II is a little more
particular (as described in the project description and other Q&A).
> And how do we handle the case when 2 (or more) lines have the same number?
>
Good question. If a duplicate line number is detected in either Part I or
Part II, feel free to bail out with an error. This will not be one of the
test cases used for grading.
GRADING CRITERIA:
PART I [40 points]
-- Successfully prettifies sample BASIC programs [25 points]
-- Command-line error checking [5 points]
-- Errors detected
(e.g. no line number, unrecognized instruction) [10 points]
PART II [60 points]
-- Successfully "runs" sample BASIC programs [30 points]
-- Command-line error checking [5 points]
-- Errors detected
(e.g. unknown instruction, uninitialized variable, syntax errors) [25 points]
The sample BASIC programs are the very same ones on the Project #1
description Web page (i.e. above) -- no more, no less. All programs in
the project description will be fed as input into both Part I and Part II.