前言

本章为针对语法分析的第三个实验

语法分析主要就是检测是否符合该语言的语法，比如检查你一个完整的句子是否有主谓宾等

需要的一些前置知识，首先还是flex跟bison的手册（重点看第三章）：

flex手册

bison

然后是COOL的手册（重点关注16、17页）：

COOL手册

How to start

问题来了，语法分析怎么写

首先重点看下COOL手册的第16页，其中有对COOL语法的描述，然后结合bison手册来编写相应的语法即可

重点看下前几个：

program：程序由一个或多个类组成
class:一个类的定义应为class xxx，后面可跟继承某个类，一个类由0个或多个feature构成（feature就是类里面的方法）
feature：feature的定义有两种形式，一个是包含一串formal的（其实formal就是参数）
formal定义：ID: TYPE

由此，我们可以定义一个class_list来表示[class;]+，因此class_list必定不为空

用feature_list来表示[feature;]*，feature_list可能为空

用formal_list来表示[formal[,formal]*]，formal_list也不为空

除此之外还会有两个expr_list，与之类似

bison代码

解析得差不多之后就可以开始写代码了

整体代码如下：

/*
*  cool.y
*              Parser definition for the COOL language.
*
*/
%{
  #include <iostream>
  #include "cool-tree.h"
  #include "stringtab.h"
  #include "utilities.h"
  
  extern char *curr_filename;
  
  
  /* Locations */
  #define YYLTYPE int              /* the type of locations */
  #define cool_yylloc curr_lineno  /* use the curr_lineno from the lexer
  for the location of tokens */
    
    extern int node_lineno;          /* set before constructing a tree node
    to whatever you want the line number
    for the tree node to be */
      
      
      #define YYLLOC_DEFAULT(Current, Rhs, N)         \
      Current = Rhs[1];                             \
      node_lineno = Current;
    
    
    #define SET_NODELOC(Current)  \
    node_lineno = Current;
    
    /* IMPORTANT NOTE ON LINE NUMBERS
    *********************************
    * The above definitions and macros cause every terminal in your grammar to 
    * have the line number supplied by the lexer. The only task you have to
    * implement for line numbers to work correctly, is to use SET_NODELOC()
    * before constructing any constructs from non-terminals in your grammar.
    * Example: Consider you are matching on the following very restrictive 
    * (fictional) construct that matches a plus between two integer constants. 
    * (SUCH A RULE SHOULD NOT BE  PART OF YOUR PARSER):
    
    plus_consts  : INT_CONST '+' INT_CONST 
    
    * where INT_CONST is a terminal for an integer constant. Now, a correct
    * action for this rule that attaches the correct line number to plus_const
    * would look like the following:
    
    plus_consts  : INT_CONST '+' INT_CONST 
    {
      // Set the line number of the current non-terminal:
      // ***********************************************
      // You can access the line numbers of the i'th item with @i, just
      // like you acess the value of the i'th exporession with $i.
      //
      // Here, we choose the line number of the last INT_CONST (@3) as the
      // line number of the resulting expression (@$). You are free to pick
      // any reasonable line as the line number of non-terminals. If you 
      // omit the statement @$=..., bison has default rules for deciding which 
      // line number to use. Check the manual for details if you are interested.
      @$ = @3;
      
      
      // Observe that we call SET_NODELOC(@3); this will set the global variable
      // node_lineno to @3. Since the constructor call "plus" uses the value of 
      // this global, the plus node will now have the correct line number.
      SET_NODELOC(@3);
      
      // construct the result node:
      $$ = plus(int_const($1), int_const($3));
    }
    
    */
    
    
    
    void yyerror(char *s);        /*  defined below; called for each parse error */
    extern int yylex();           /*  the entry point to the lexer  */
    
    /************************************************************************/
    /*                DONT CHANGE ANYTHING IN THIS SECTION                  */
    
    Program ast_root;        /* the result of the parse  */
    Classes parse_results;        /* for use in semantic analysis */
    int omerrs = 0;               /* number of errors in lexing and parsing */
    %}
    
    /* A union of all the types that can be the result of parsing actions. */
    %union {
      Boolean boolean;
      Symbol symbol;
      Program program;
      Class_ class_;
      Classes classes;
      Feature feature;
      Features features;
      Formal formal;
      Formals formals;
      Case case_;
      Cases cases;
      Expression expression;
      Expressions expressions;
      char *error_msg;
    }
    
    /* 
    Declare the terminals; a few have types for associated lexemes.
    The token ERROR is never used in the parser; thus, it is a parse
    error when the lexer returns it.
    
    The integer following token declaration is the numeric constant used
    to represent that token internally.  Typically, Bison generates these
    on its own, but we give explicit numbers to prevent version parity
    problems (bison 1.25 and earlier start at 258, later versions -- at
    257)
    */
    %token CLASS 258 ELSE 259 FI 260 IF 261 IN 262 
    %token INHERITS 263 LET 264 LOOP 265 POOL 266 THEN 267 WHILE 268
    %token CASE 269 ESAC 270 OF 271 DARROW 272 NEW 273 ISVOID 274
    %token <symbol>  STR_CONST 275 INT_CONST 276 
    %token <boolean> BOOL_CONST 277
    %token <symbol>  TYPEID 278 OBJECTID 279 
    %token ASSIGN 280 NOT 281 LE 282 ERROR 283
    
    /*  DON'T CHANGE ANYTHING ABOVE THIS LINE, OR YOUR PARSER WONT WORK       */
    /**************************************************************************/
    
    /* Complete the nonterminal list below, giving a type for the semantic
    value of each non terminal. (See section 3.6 in the bison 
    documentation for details). */
    
    /* Declare types for the grammar's non-terminals. */
    %type <program> program
    %type <classes> class_list
    %type <class_> class
    
    /* You will want to change the following line. */
    %type <features> feature_list
    %type <feature> feature
    %type <formals> formal_list
    %type <formals> noempty_formal_list
    %type <formal> formal
    %type <expressions> expr_list1
    %type <expressions> noempty_expr_list1
    %type <expressions> expr_list2
    %type <expression> expr
    %type <expression> let
    %type <cases> case_list
    %type <case_> case
    
    /* Precedence declarations go here. */
    %right ASSIGN
    %left NOT
    %nonassoc LE '<' '='
    %left '+' '-'
    %left '*' '/'
    %left ISVOID
    %left '~'
    %left '@'
    %left '.'
    
    %%
    /* 
    Save the root of the abstract syntax tree in a global variable.
    */
    program  : class_list  { @$ = @1; ast_root = program($1); }
    ;
    
    class_list : class ';' {
        $$ = single_Classes($1);
        parse_results = $$;
    }
    | class ';' class_list {
        $$ = append_Classes(single_Classes($1), $3); 
        parse_results = $$;
    }
    ;

    class : CLASS TYPEID '{' feature_list '}' {
        /* If no parent is specified, the class inherits from the Object class.
           class_(Symbol name, Symbol parent, Features features, Symbol filename) */
        $$ = class_($2, idtable.add_string("Object"), $4, stringtable.add_string(curr_filename));
    }
    | CLASS TYPEID INHERITS TYPEID '{' feature_list '}' {
        $$ = class_($2, $4, $6, stringtable.add_string(curr_filename));
    }
    | error {}
    ;

    feature_list : {
        /* empty list */
        $$ = nil_Features();
    }
    | feature ';' feature_list {
        /* Features single_Features(Feature);
        Features append_Features(Features, Features); */
        $$ = append_Features(single_Features($1), $3);
    }
    ;

    feature : OBJECTID '(' formal_list ')' ':' TYPEID '{' expr '}' {
        /* method(Symbol name, Formals formals, Symbol return_type, Expression expr) */
        $$ = method($1, $3, $6, $8);
    }
    | OBJECTID ':' TYPEID ASSIGN expr {
        /* attr(Symbol name, Symbol type_decl, Expression init) */
        $$ = attr($1, $3, $5);
    }
    | OBJECTID ':' TYPEID {
        $$ = attr($1, $3, no_expr());
    }
    | error {}
    ;

    formal_list : {
        $$ = nil_Formals();
    }
    | noempty_formal_list {
        $$ = $1;
    }
    ;

    noempty_formal_list : formal {
        $$ = single_Formals($1);
    }
    | formal ',' noempty_formal_list {
        $$ = append_Formals(single_Formals($1), $3);
    }
    | error ',' {}
    ;

    formal : OBJECTID ':' TYPEID {
        /* formal(Symbol name, Symbol type_decl) */
        $$ = formal($1, $3);
    }
    | error {}
    ;

    expr_list1 : {
        $$ = nil_Expressions();
    }
    | noempty_expr_list1 {
        $$ = $1;
    }
    ;

    noempty_expr_list1 : expr {
        $$ = single_Expressions($1);
    }
    | expr ',' noempty_expr_list1 {
        $$ = append_Expressions(single_Expressions($1), $3);
    }
    | error ',' {}
    ;
    expr_list2 : expr ';' {
        $$ = single_Expressions($1);
    }
    | expr ';' expr_list2 {
        $$ = append_Expressions(single_Expressions($1), $3);
    }
    ;

    expr : OBJECTID ASSIGN expr {
        $$ = assign($1, $3);
    }
    | expr '.' OBJECTID '(' expr_list1 ')' {
        $$ = dispatch($1, $3, $5);
    }
    | expr '@' TYPEID '.' OBJECTID '(' expr_list1 ')' {
        $$ = static_dispatch($1, $3, $5, $7);
    }
    | OBJECTID '(' expr_list1 ')' {
        $$ = dispatch(object(idtable.add_string("self")), $1, $3);
    }
    | IF expr THEN expr ELSE expr FI {
        /* cond(Expression pred, Expression then_exp, Expression else_exp) */
        $$ = cond($2, $4, $6);
    }
    | WHILE expr LOOP expr POOL {
        /* loop(Expression pred, Expression body) */
        $$ = loop($2, $4);
    } 
    | WHILE expr LOOP error {}
    | '{' expr_list2 '}' {
        $$ = block($2);
    }
    | LET let {
        $$ = $2;
    }
    | CASE expr OF case_list ESAC {
        /* typcase(Expression expr, Cases cases) */
        $$ = typcase($2, $4);
    }
    | NEW TYPEID {
        $$ = new_($2);
    }
    | ISVOID expr {
        $$ = isvoid($2);
    }
    | expr '+' expr {
        $$ = plus($1, $3);
    }
    | expr '-' expr {
        $$ = sub($1, $3);
    }
    | expr '*' expr {
        $$ = mul($1, $3);
    }
    | expr '/' expr {
        $$ = divide($1, $3);
    }
    | '~' expr {
        $$ = neg($2);
    }
    | expr '<' expr {
        $$ = lt($1, $3);
    }
    | expr LE expr {
        $$ = leq($1, $3);
    }
    | expr '=' expr {
        $$ = eq($1, $3);
    }
    | NOT expr {
        $$ = comp($2);
    }
    | '(' expr ')' {
        $$ = $2;
    }
    | OBJECTID {
        $$ = object($1);
    }
    | INT_CONST {
        $$ = int_const($1);
    }
    | STR_CONST {
        $$ = string_const($1);
    }
    | BOOL_CONST {
        $$ = bool_const($1);
    }
    | error {}
    ;

    let : OBJECTID ':' TYPEID ASSIGN expr IN expr {
        $$ = let($1, $3, $5, $7);
    }
    | OBJECTID ':' TYPEID IN expr {
        $$ = let($1, $3, no_expr(), $5);
    }
    | OBJECTID ':' TYPEID ASSIGN expr ',' let {
        $$ = let($1, $3, $5, $7);
    }
    | OBJECTID ':' TYPEID ',' let {
        $$ = let($1, $3, no_expr(), $5);
    }
    | error {}
    ;

    case_list : {
        $$ = nil_Cases();
    }
    | case ';' case_list{
        $$ = append_Cases(single_Cases($1), $3);
    }
    ;

    case : OBJECTID  ':' TYPEID DARROW expr {
        $$ = branch($1, $3, $5);
    }
    ;
    %%

    void yyerror(char *s)
    {
      extern int curr_lineno;
      
      cerr << "\"" << curr_filename << "\", line " << curr_lineno << ": " \
      << s << " at or near ";
      print_cool_token(yychar);
      cerr << endl;
      omerrs++;
      
      if(omerrs>50) {fprintf(stdout, "More than 50 errors\n"); exit(1);}
    }

下面把代码每一部分抽离出来记录下

类class

这里定义了一个class和一个class_list，class_list用于表示多个类：

program  : class_list  { @$ = @1; ast_root = program($1); }
;

class_list : class ';' {
    $$ = single_Classes($1);
    parse_results = $$;
}
| class ';' class_list {
    $$ = append_Classes(single_Classes($1), $3); 
    parse_results = $$;
}
;

class : CLASS TYPEID '{' feature_list '}' {
    /* If no parent is specified, the class inherits from the Object class.
       class_(Symbol name, Symbol parent, Features features, Symbol filename) */
    $$ = class_($2, idtable.add_string("Object"), $4, stringtable.add_string(curr_filename));
}
| CLASS TYPEID INHERITS TYPEID '{' feature_list '}' {
    $$ = class_($2, $4, $6, stringtable.add_string(curr_filename));
}
| error {}
;

类的继承为可选项，因此当一个类没有写继承哪个类的时候，就将Object作为其父类

注意这里用到的class_(), append_Classes(), single_Classes()等函数都定义和实现在cool-tree.h和cool-tree.c中，后面的函数同理

feature

feature用于描述定义在类中的方法和属性

feature_list : {
        /* empty list */
        $$ = nil_Features();
    }
    | feature ';' feature_list {
        /* Features single_Features(Feature);
        Features append_Features(Features, Features); */
        $$ = append_Features(single_Features($1), $3);
    }
    ;

    feature : OBJECTID '(' formal_list ')' ':' TYPEID '{' expr '}' {
        /* method(Symbol name, Formals formals, Symbol return_type, Expression expr) */
        $$ = method($1, $3, $6, $8);
    }
    | OBJECTID ':' TYPEID ASSIGN expr {
        /* attr(Symbol name, Symbol type_decl, Expression init) */
        $$ = attr($1, $3, $5);
    }
    | OBJECTID ':' TYPEID {
        $$ = attr($1, $3, no_expr());
    }
    | error {}
    ;

一个类里面方法和属性可能为空，分号作为结束符

方法可以拥有形参，这里用formal_list来描述形参列表

形参formal

formal_list : {
        $$ = nil_Formals();
    }
    | noempty_formal_list {
        $$ = $1;
    }
    ;

    noempty_formal_list : formal {
        $$ = single_Formals($1);
    }
    | formal ',' noempty_formal_list {
        $$ = append_Formals(single_Formals($1), $3);
    }
    | error ',' {}
    ;

    formal : OBJECTID ':' TYPEID {
        /* formal(Symbol name, Symbol type_decl) */
        $$ = formal($1, $3);
    }
    | error {}
    ;

形参列表也可以为空，即无参的方法

这里有个点值得注意，关注COOL手册对formal_list部分的描述：

可以看到，formal之间用逗号分隔，而并不是作为结束符

因此正确的语法应该是参数比逗号多一个，即：(1,2,3)如果跟我一样一开始未考虑周全，就可能让(1,2,3,)这样的形式通过语法分析器

后面的expr_list1同理

expr_list

这里定义两个expr_list来描述以下两种形式：

代码如下，写法与前面的list相类似：

expr_list1 : {
        $$ = nil_Expressions();
    }
    | noempty_expr_list1 {
        $$ = $1;
    }
    ;

    noempty_expr_list1 : expr {
        $$ = single_Expressions($1);
    }
    | expr ',' noempty_expr_list1 {
        $$ = append_Expressions(single_Expressions($1), $3);
    }
    | error ',' {}
    ;

    expr_list2 : expr ';' {
        $$ = single_Expressions($1);
    }
    | expr ';' expr_list2 {
        $$ = append_Expressions(single_Expressions($1), $3);
    }
    ;

let list 和case list

针对let和case定义的两个list：

代码如下：

let : OBJECTID ':' TYPEID ASSIGN expr IN expr {
        $$ = let($1, $3, $5, $7);
    }
    | OBJECTID ':' TYPEID IN expr {
        $$ = let($1, $3, no_expr(), $5);
    }
    | OBJECTID ':' TYPEID ASSIGN expr ',' let {
        $$ = let($1, $3, $5, $7);
    }
    | OBJECTID ':' TYPEID ',' let {
        $$ = let($1, $3, no_expr(), $5);
    }
    | error {}
    ;

    case_list : {
        $$ = nil_Cases();
    }
    | case ';' case_list{
        $$ = append_Cases(single_Cases($1), $3);
    }
    ;

    case : OBJECTID  ':' TYPEID DARROW expr {
        $$ = branch($1, $3, $5);
    }
    ;

expressions

前面的铺垫工作都已完成，这里的expressions写起来就很轻松了，直接对着COOL手册上写就行：

expr : OBJECTID ASSIGN expr {
    $$ = assign($1, $3);
}
| expr '.' OBJECTID '(' expr_list1 ')' {
    $$ = dispatch($1, $3, $5);
}
| expr '@' TYPEID '.' OBJECTID '(' expr_list1 ')' {
    $$ = static_dispatch($1, $3, $5, $7);
}
| OBJECTID '(' expr_list1 ')' {
    $$ = dispatch(object(idtable.add_string("self")), $1, $3);
}
| IF expr THEN expr ELSE expr FI {
    /* cond(Expression pred, Expression then_exp, Expression else_exp) */
    $$ = cond($2, $4, $6);
}
| WHILE expr LOOP expr POOL {
    /* loop(Expression pred, Expression body) */
    $$ = loop($2, $4);
} 
| WHILE expr LOOP error {}
| '{' expr_list2 '}' {
    $$ = block($2);
}
| LET let {
    $$ = $2;
}
| CASE expr OF case_list ESAC {
    /* typcase(Expression expr, Cases cases) */
    $$ = typcase($2, $4);
}
| NEW TYPEID {
    $$ = new_($2);
}
| ISVOID expr {
    $$ = isvoid($2);
}
| expr '+' expr {
    $$ = plus($1, $3);
}
| expr '-' expr {
    $$ = sub($1, $3);
}
| expr '*' expr {
    $$ = mul($1, $3);
}
| expr '/' expr {
    $$ = divide($1, $3);
}
| '~' expr {
    $$ = neg($2);
}
| expr '<' expr {
    $$ = lt($1, $3);
}
| expr LE expr {
    $$ = leq($1, $3);
}
| expr '=' expr {
    $$ = eq($1, $3);
}
| NOT expr {
    $$ = comp($2);
}
| '(' expr ')' {
    $$ = $2;
}
| OBJECTID {
    $$ = object($1);
}
| INT_CONST {
    $$ = int_const($1);
}
| STR_CONST {
    $$ = string_const($1);
}
| BOOL_CONST {
    $$ = bool_const($1);
}
| error {}
;

运算符

关于运算符的优先级，来看COOL手册的第17页：

其中有这样一句话：

All binary operations are left-associative, with the exception of assignment, which is right-associative, and the three comparison operations, which do not associate.

所有的二元运算都是左结合的，除了赋值是右结合的，以及不关联的三个比较操作

所以：

%right ASSIGN
%left NOT
%nonassoc LE '<' '='
%left '+' '-'
%left '*' '/'
%left ISVOID
%left '~'
%left '@'
%left '.'

遇到的问题

遇到的问题其实不多，主要还是错误处理

举个例子，有些情况下，一行之中可能包含了两个错误，例如：x(,1,2,)

很明显，这里多了两个逗号，但是当正确的语法分析器去解析它时，只会报第一个逗号的错，第二个逗号的问题将会在第一个被修复后再次解析时提供

然而我的分析器的策略是一次性会把这两个错误都报告出来，所以我针对性地做了修改

参考

http://doraemonzzz.com/2021/04/24/2021-4-24-Stanford-Compiler-PA3/

https://github.com/afterthat97/cool-compiler/tree/master/assignments/PA3

https://github.com/skyzluo/CS143-Compilers-Stanford/tree/master/PA3

https://github.com/dychen/compilers/blob/master/PA3/cool.y