CPPGM Programming Assignment 4 (macro)
Write a C++ application called
macro that accepts a C++ Source File on standard input and executes phases 1 through 6 and the tokenization part of phase 7.
The input file may contain:
The input file does NOT include any...
- preprocessing directives other than
- pre-defined macro names
- the pragma operator
Output the resulting tokens as per PA2
You should complete Programming Assignment 2 before starting this assignment.
The starter kit can be obtained from:
$ git clone git://git.cppgm.org/pa4.git
It contains a stub implementation of
macro, a compiled reference implementation and a test suite.
You will also want to reuse most of your code from PA2.
The C++ Source File shall be read from standard input
std::cin in UTF-8 format as per PA1
The tokens shall be output as per PA2
If an error occurs
return EXIT_FAILURE as per the behaviour of
pptoken in PA1.
For an input of:
#define a 3 #define f(x) x x f(a)
Produces an output of:
literal 3 int 03000000 literal 3 int 03000000 eof
As per PA1
Once you have executed phase 1-3 as per PA1
pptoken to produce a sequence of
preprocessing-tokens, you will divide them into preprocessing directives and
Preprocessing directives can be identified by their prefix, one of:
start-of-file # start-of-file whitespace-sequence # new-line # new-line whitespace-sequence #
start-of-file denotes the beggining of the token sequence.
Any of these patterns must start a preprocessing directive. The preprocessing directive is terminated by the next
The only preprocessing directives for PA4 will be macro defines and undefs:
// object-like macros # define identifier new-line # define identifier must-whitespace replacement-list new-line // function-like macros # define identifier no-whitespace ( ) replacement-list new-line # define identifier no-whitespace ( identifier-list ) replacement-list new-line # define identifier no-whitespace ( ... ) replacement-list new-line # define identifier no-whitespace ( identifier-list , ... ) replacement-list new-line // undefine macro # undef identifier new-line
no-whitespace means that there is no
whitespace-sequence between the two adjacent tokens.
must-whitespace means there must be a
whitespace-sequence between the two adjacent tokens. Otherwise
whitespace-sequence may optionally appear between any tokens.
replacement-list is a possibly empty sequence of any
preprocessing-tokens apart from
identifier-list is a comma-separated sequence of
#define directives define macros,
#undef directives undefine them.
text-sequence is a maximal sequence of tokens that is not part of a preprocessing directive.
text-sequence must be macro replaced using the currently defined set of macros at that point in the file. Once this has been done the macro replaced
preprocessing-tokens should be processed by phase 5-7 as per PA2
posttoken and output.
The rules for macro replacement are described in section 16.3 Macro Replacement of the standard.
If the name of the macro being replaced is found during this scan of the replacement list (not including the rest of the source file’s preprocessing tokens), it is not replaced. Furthermore, if any nested replacements encounter the name of the macro being replaced, it is not replaced. These nonreplaced macro name preprocessing tokens are no longer available for further replacement even if they are later (re)examined in contexts in which that macro name preprocessing token would otherwise have been replaced.
We shall course-define two invocations X and Y to be nested with respect to each other if the head macro name
identifier token of Y was part of the replaced tokens of the X invocation. This nestedness relationship shall not survive parameter substitution:
#define f(x) 1 g(x) #define g(x) 2 f(x) f(f(x))
The trace of the above is:
f(f(x)) f(1 g(x)) f(1 2 f(x)) 1 g(1 2 f(x)) 1 2 f(1 2 f(x))
However a macro name token that is not replaced due to nesting is never replaced again, even after substitution as:
#define z z #define f(x) 1 x f(z)
The trace is:
f(z) f(z) 1 z
z remains unavailable for invocation even after it is substituted in a parameter.
Finally we show:
#define f(x) 1 x #define g(x) 2 x g(f)(g)(3)
The trace is:
g(f)(g)(3) 2 f(g)(3) <-- f is nested with respect to g as the token is part of the previous g invocation 2 1 g(3) <-- g is nested with respect to f, therefore it is not replaced (g -> f -> g)
Note that this course-defined interpretation of the nesting semantics is not necessarily the same one used by all other preprocessors (in particular gcc).
Design Notes (optional)
As usual these design notes only describe one possible way to do this assignment, and are entirely optional.
First of all, it is recommended that you read 16.3 a couple of times before you continue reading these notes, and certainly before you start writing code. Some find the logic of macro replacement quite challenging to implement, so be prepared for this.
You can start by making a
PreprocessingToken class that packs up the output of
Perform a simple top-down predictive parse to get a stream of preprocessing directives and
Keep a table of currently defined macros.
When you encounter a
#define add the macro to the table. If it currently exists check to see that the
replacement-list is the same as the currently defined one. It is an error if they do not match.
When you encounter an
#undef remove the macro from the table.
When you encounter a
text-sequence you need to process any macro invocations it contains.
As a first step note that
new-lines are not significant once a
text-sequence has been identified, so you can collapse
whitespace-sequence at this point. After that you will also want to collapse each resulting contiguous sequences of
whitespace-sequences into a singular
As you find macro invocations you will invoke them and replace the invocation with a new sequence of tokens. These new tokens have to be rescanned.
Consider the following:
#define f(x) 1 x ( #define g(x) 2 x f(g)b)
text-sequence is as follows:
After the first invocation
f(g) is replaced by
1 g ( resulting in:
1 g (b)
This must be rescanned, and a new invocation is found. That of
g (b), which is replaced by
1 2 b
Given the nature of that operation, a good data structure to use is a stack, where the top of the stack is the start of the text sequence (ie token order is in reverse). That way when an invocation is replaced, its tokens can be popped off the top of the stack and the replacement list can be pushed onto it, and processing can continue from the top of the stack again.
After you identify a function-like macro invocation and its arguments, there are three possible transformations you may need to perform on each argument before they are used:
- the argument is used as is
- the argument is macro replaced before being used
- the argument is stringized
Which of the three options is applied depends on the usage of the parameter within the
replacement-list of the macro. If it follows a
#, the argument is stringized. If it is next to a
## the argument is used as is. If neither, than it is macro replaced before being used.
The rules for nesting and recursion are difficult to understand, and even more difficult to implement.
We suggest keeping with each identifier token, a blacklist of nested macro names and a noninvokable flag. When an identifier token is produced as part of a macro invocation its blacklist is assigned to be that of the head token of the macro that was just invoked, and also the name of the macro that was used is added.
Once you have identified a new potential macro invocation, check to see if the heads name is contained in its own blacklist, if it is flag it as noninvokable. If the head token is flagged as noninvokable, abort the invocation.