Previous Up Next

2  Metavariables for Transformations

The rulename portion of the metavariable declaration can specify properties of a rule such as its name, the names of the rules that it depends on, the isomorphisms to be used in processing the rule, and whether quantification over paths should be universal or existential. The optional annotation expression indicates that the pattern is to be considered as matching an expression, and thus can be used to avoid some parsing problems.

The metadecl portion of the metavariable declaration defines various types of metavariables that will be used for matching in the transformation section.

metavariables   ::=  @@ metadecl * @@
|@ rulename @ metadecl * @@
rulename   ::=  id [extends id] [depends on [scope] dep] [iso] [disable-iso] [exists] [rulekind]
scope   ::=  exists
|forall
dep   ::=  id
|!id
|!(dep)
|ever id
|never id
|dep && dep
|dep || dep
|file in string
|(dep)
iso   ::=  using string (, string) *
disable-iso   ::=  disable COMMA_LIST(id)
exists   ::=  exists
|forall
rulekind   ::=  expression
|identifier
|type
COMMA_LIST(elem)   ::=  elem (, elem) *

The keyword disable is normally used with the names of isomorphisms defined in standard.iso or whatever isomorphism file has been included. There are, however, some other isomorphisms that are built into the implementation of Coccinelle and that can be disabled as well. Their names are given below. In each case, the text describes the standard behavior. Using disable-iso with the given name disables this behavior.

The depends on clause indicates conditions under which a semantic patch rule should be applied. Most of these conditions relate to the success or failure of other rules, which may be virtual rules. Giving the name of a rule implies that the current rule is applied if the named rule has succeeded in matching in the current environment. Giving ever followed by a rule name implies that the current rule is applied if the named rule has succeeded in matching in any environment. Analogously, never means that the named rule should have succeeded in matching in no environment. The boolean and, or and negation operators combine these declarations in the usual way. The declaration file in checks that the code being processed comes from the mentioned file, or from a subdirectory of the directory to which Coccinelle was applied. In the latter case, the string is matched against the complete pathname. A trailing / is added to the specified subdirectory name, to ensure that a complete subdirectory name is matched. The declaration file in is only allowed on SmPL code-matching rules. Script rules are not applied to any code in particular, and thus it doesn’t make sense to check on the file being considered.

As metavariables are bound and inherited across rules, a tree of environments is built up. A rule is processed only once for all of the branches that have the same metavariable bindings for the set of variables that the rule depends on. Different branches, however, may be derived from the success or failure of different sets of rules. A depends on clause can further indicate whether the clause should be satisfied for all the branches (forall) or only for one (exists). exists is the default. These annotations can for example be useful when one rule binds a metavariable x, subsequent rules have the effect of testing good and bad properties of x, and a final rule may want to ensure that all occurrences of x have the good property (forall) or none have the bad property (exists). forall and exists are currently only supported at top level, not under conjunction and disjunction.

The possible types of metavariable declarations are defined by the grammar rule below. Metavariables should occur at least once in the transformation code immediately following their declaration. Fresh identifier metavariables must only be used in + code. These properties are not expressed in the grammar, but are checked by a subsequent analysis. The metavariables are designated according to the kind of terms they can match, such as a statement, an identifier, or an expression. An expression metavariable can be further constrained by its type. A declaration metavariable matches the declaration of one or more variables, all sharing the same type specification (e.g., int a,b,c=3;). A field metavariable does the same, but for structure fields. In the minus code, a statement list metavariable can only appear as a complete function body or as the complete body of a sequence statement. In the plus code, a statement list metavariable can occur anywhere a statement list is allowed, i.e., including as an element of another statement list.

metadecl   ::=  fresh identifier pmids_with_seed ;
|metavariable pmids_with_constraints ;
|identifier pmvids_with_constraints ;
|identifier list pmvids_with_constraints ;
|field [list] pmids_with_constraints ;
|parameter [list] pmids_with_constraints ;
|type pmids_with_constraints ;
|statement [list] pmids_with_constraints ;
|declaration pmids_with_constraints ;
|initialiser [list] pmids_with_constraints ;
|initializer [list] pmids_with_constraints ;
|[local global] idexpression [ctype] pmids_with_constraints ;
|[local global] idexpression [{ ctypes } * *] pmids_with_constraints ;
|[local global] idexpression * + pmids_with_constraints ;
|expression list pmids_with_constraints ;
|expression [enum struct union] * * pmids_with_constraints ;
|ctype [[ ]] pmids_with_constraints ;
|{ ctypes } * * [[ ]] pmids_with_constraints ;
|constant [ctype] pmids_with_constraints ;
|constant [{ ctypes } * *] pmids_with_constraints ;
|format [list] pmids_with_constraints;
|assignment operator COMMA_LIST(assignopdecl) ;
|binary operator COMMA_LIST(binopdecl) ;
|unary operator COMMA_LIST(unopdecl) ;
|position [any] pmids_with_constraints ;
|symbol pmids;
|typedef pmids ;
|attribute name ids ;
|attribute ids ;
|declarer name ids ;
|declarer pmids_with_constraints ;
|iterator name ids ;
|iterator pmids_with_constraints ;
list   ::=  list
|list [ id ]
|list [ integer ]
assignopdecl   ::=  pmid [ = assignop_constraint]
assignop_constraint   ::=  {COMMA_LIST(assign_op)}
|assign_op
binopdecl   ::=  pmid [ = binop_constraint]
binop_constraint   ::=  {COMMA_LIST(bin_op)}
|bin_op
unopdecl   ::=  pmid [ = unop_constraint]
unop_constraint   ::=  {COMMA_LIST(unary_op)}
|unary_op

fresh identifier metavariables can only be used in + code and will generate new identifiers according to the optionally given seed:

Examples are found in demos/plusplus1.cocci and demos/plusplus2.cocci

metavariable declares a metavariable for which the parser tries to figure out the metavariable type based on the usage context. Such a metavariable must be used consistently. These metavariables cannot be used in all contexts; specifically, they cannot be used in context that would make the parsing ambiguous. Some examples are the leftmost term of an expression, such as the left-hand side of an assignment, or the type in a variable declaration. These restrictions may seem somewhat arbitrary from the user’s point of view. Thus, it is better to use metavariables with metavariable types. If Coccinelle is given the argument --parse-cocci, it will print information about the type that is inferred for each metavariable.

An identifier is the name of a structure field, a macro, a function, or a variable. It is the name of something rather than an expression that has a value. But an identifier can be used in the position of an expression as well, where it represents a variable.

The list modifier allows to match over multiple elements of a given kind in a row and store them as one metavariable. It is possible to specify its length. If no length element is provided then the list will be the longest possible. If an integer length is provided, then only lists of the given length are matched. If an id is provided, then it will store the length of the matched list. This id can be used to ensure other lists have the same length, or can be manipulated in script code.

An identifier list is only used for the parameter list of a macro. It matches multiple identifiers in a row and stores them as one metavariable.

A field only matches an identifier that is a structure field.

A parameter matches a parameter declaration. Arguments (values given at function call) are not matched through this but using other kinds of metavariables (e.g. expression).

A type matches a type appearing in code whether it is in the declaration of a function, a variable, in a cast or anywhere else where it is explicitly a type. It also matches a type name defined by a typedef

A statement matches anything that falls into the statement definition of the C99 standard.

A statement list can only match a complete sequence of statements between braces. Therefore, no size can be specified for it and no statement can contiguously surround it for context (it has to be absorbed).

A declaration matches the declaration of one or more variables sharing the same type specification.

An initialiser or initializer matches the right hand side of a declaration.

An idexpression is a variable used as an expression. It is useful to restrict a match to be both an identifier and to have a particular type. A more complex description of a location, such as a->b is considered to be an expression not an idexpression. The optional local modifier restricts the matched variable to be a local variable. The optional global indicates that the matched variable is not a local one. If neither local or global is specified, then any variable reference can be matched. It is possible to specify a ctype or a set of them and/or a pointer level using * to restrict the types of variables that can be matched.

An expression is any piece of code that falls into the expression definition of the C99 standard. Therefore, any combination of sequences of operators and operands that computes a value, designates an object or a function, or generates side effects is matched as en expression. It is possible to specify some type information using enum, struct, or union, and/or a pointer level using * to restrict the types of expressions that can be matched. It is possible to only match expressions of a specific ctype or a set of them with a pointer level using * by writing these instead of the expression designator pattern. One can also specify the matched expression must be of array type by adding brackets after the initial type specification. The ctype and ctypes nonterminals are used by both the grammar of metavariable declarations and the grammar of transformations, and are defined on page ??.

A constant metavariabe matches a constant in the code, such as 27. It also considers an uppercase identifier as a constant as well, because the names given to macros in Linux usually have this form.

When used, a format or format list metavariable must be enclosed by a pair of @s. A format metavariable matches the format descriptor part, i.e., 2x in %2x. A format list metavariable matches a sequence of format descriptors as well as the text between them. Any text around them is matched as well, if it is not matched by the surrounding text in the semantic patch. Such text is not partially matched. If the length of the format list is specified, that indicates the number of matched format descriptors. It is also possible to use in a format string, to match a sequence of text fragments and format descriptors. This only takes effect if the format string contains format descriptors. Note that this makes it impossible to require to match exactly in a string, if the semantic patch string contains format descriptors. If that is needed, some processing with a scripting language would be required. And example for the use of string format metavariables is found in demos/format.cocci.

Matching of various kinds of format strings within strings is supported. With the --ibm option, matching of decimal format declarations is supported, but the length and precision arguments are not interpreted. Thus it is not possible to match metavariables in these fields. Instead, the entire format is matched as a single string.

An assignment operator (resp. binary operator) metavariable matches any assignment (resp. binary) operator. The list of operators that can be matched can be restricted by adding an operator constraint, i.e. a list of accepted operators.

A position metavariable is used by attaching it using @ to any token, including another metavariable. Its value is the position (file, line number, etc.) of the code matched by the token. It is also possible to attach expression, declaration, type, initialiser, and statement metavariables in this manner. In that case, the metavariable is bound to the closest enclosing expression, declaration, etc. If such a metavariable is itself followed by a position metavariable, the position metavariable applies to the metavariable that it follows, and not to the attached token. This makes it possible to get eg the starting and ending position of f(...), by writing f(...)@E@p, for expression metavariable E and position metavariable p. This attachment notation for metavariables of type other than position can also be expressed with a conjunction, but the @ notation may be more concise.

Other kinds of metavariables can also be attached using @ to any token. In this case, the metavariable floats up to the enclosing appropriate expression. For example, 3 +@E 4, where E is an expression metavariable binds E to 3 + 4. A particular case is Ps@Es, where Ps is a parameter list and Es is an expression list. This pattern matches a parameter list, and then matches Es to the list of expressions, ie a possible argument list, represented by the names of the parameters. Another particular case is E@S, where E is any expression and S is a statement metavariable. S matches the closest enclosing statement, which may be more than what is matches by the semantic match pattern itself.

A symbol declaration specifies that the provided identifiers should be considered to be C identifiers when encountered in the body of the rule. Identifiers in the body of the rule that are not declared explicitly are by default considered symbols, thus symbol declarations are optional. It is not required, but it will not cause a parse error, to redeclare a name as a symbol. A name declared as a symbol can, furthermore, be redeclared as another metavariable. It will be considered to be a metavariable in such rules, and will revert to being a symbol in subsequent rules. These conditions also apply to iterator names and declarer names.

A typedef declaration specifies that the provided identifiers should be considered as types when encountered in the code for match. Such a declaration is useful to ensure spatch will match some identifiers as types properly when the declaration is not available in the processed code. It is not always necessary to specify a type that has no declaration in the given code is a type, because spatch can sometimes extrapolate that information from context. A declaration of a name as a typedef extends through the rest of the semantic patch. It is not required, but it will not cause a parse error, to redeclare a name as a typedef. A name declared as a typedef can, furthermore, be redeclared as another metavariable. It will be considered to be a metavariable in such rules, and will revert to being a typedef in subsequent rules.

An attribute metavariable matches an attribute. Attribute metavariables are only allowed in context or minus code, and not in added code. Indeed, attributes in added code are not parsed, to allow them to be placed at places that go beyond what is supported by the SmPL parser.

An attribute name declaration indicates the given identifiers should be considered to be attributes.

A declarer is a macro call used at top level which generates a declaration. Such macros are used in the Linux kernel.

The name modifier specifies that instead of declaring a metavariable to match over some kind, the identifiers are to be considered as elements of that kind when they appear in the code.

An iterator is a macro call used in place of an iteration statement header (e.g. for (size_t i = 0; i < 10; ++i)) which generates it. Such macros are used in the Linux kernel.

Subsequently, we refer to arbitrary metavariables as metaidty, where ty indicates the metakind used in the declaration of the variable. For example, metaidType refers to a metavariable that was declared using type and stands for any type.

ids   ::=  COMMA_LIST(id)
pmids   ::=  COMMA_LIST(pmid)
pmids_with_constraints   ::=  COMMA_LIST(pmid [constraints])
pmvids_with_constraints   ::=  COMMA_LIST(pmvid [constraints])
pmids_with_seed   ::=  COMMA_LIST(pmid [seed])
pmvid   ::=  pmid
|virtual.id
pmid   ::=  id
|mid
mid   ::=  rulename_id.id
constraints   ::=  ANDAND_LIST(constraint)
constraint   ::=  compare_constraint
|regexp_constraint
|: script
compare_constraint   ::=  id_compare_constraint
|int_compare_constraint
id_compare_constraint   ::=  = pmid
|= { COMMA_LIST(pmid) }
|!= pmid
|!= { COMMA_LIST(pmid) }
int_compare_constraint   ::=  = integer
|= { COMMA_LIST(integer) }
|!= integer
|!= { COMMA_LIST(integer) }
regexp_constraint   ::=  =~ regexp
|!~ regexp
seed   ::=  = string
|= CONCAT_LIST(string pmid)
|= script
script   ::=  script:ocaml ( COMMA_LIST(mid) ) { expr }
|script:python ( COMMA_LIST(mid) ) { expr }
ANDAND_LIST(X)   ::=  X [&& ANDAND_LIST(X)]
CONCAT_LIST(X)   ::=  X [## CONCAT_LIST(X)]

A meta identifier with virtual as its “rule name” is given a value on the command line. For example, if a semantic patch contains a rule that declares an identifier metavariable with the name virtual.alloc, then the command line could contain -D alloc=kmalloc. There should not be space around the =. An example is in demos/vm.cocci and demos/vm.c.

Most metavariables can be given constraints to indicate authorized/forbidden values. These constraints fall in different categories:

Multiple constraints can be attached to a single metavariable by separating them using &&, and all the constraints must be met at the same time for their composition to be true. It is also possible to include inherited identifier metavariables among the constraints.

Metavariables can be associated with constraints implemented as OCaml or python script code. The form of the code is somewhat restricted, due to the fact that it passes through the Coccinelle semantic patch lexer, before being converted back to a string to be passed to the scripting language interpreter. It is thus best to avoid complicated code in the constraint itself, and instead to define relevant functions in an initialize rule. The code must represent an expression that has type bool in the scripting language. The script code can be parameterized by any inherited metavariables. It is implicitly parameterized by the metavariable being declared. In the script, the inherited metavariable parameters are referred to by their variable names, without the associated rule name. The script code can also be parameterized by metavariables defined previously in the same rule. Such metavariables must always all be mentioned in the same “rule elem” as the metavariable to which the constraint applies. Such a rule elem must also not contain disjunctions, after disjunction lifting. The result of disjunction lifting can be observed using --parse-cocci. A rule elem is eg an atomic statement, such as a return or an assignment, or a loop header, if header, etc. The variable being declared can also be referenced in the script code by its name. All parameters, except position variables, have their string representation. An example is in demos/poscon.cocci.

Script constraints may be executed more than once for a given metavariable binding. Executing the script constraint does not guarantee that the complete match will work out; the constraints are executed within the matching process.

Warning:

Each metavariable declaration causes the declared metavariables to be immediately usable, without any inheritance indication. Thus the following are correct:

@@
type r.T;
T x;
@@

[...] // some semantic patch code
@@
r.T x;
type r.T;
@@

[...] // some semantic patch code

But the following is not correct:

@@
type r.T;
r.T x;
@@

[...] // some semantic patch code

This applies to position variables, type metavariables, identifier metavariables that may be used in specifying a structure type, and metavariables used in the initialization of a fresh identifier. In the case of a structure type, any identifier metavariable indeed has to be declared as an identifier metavariable in advance. The syntax does not permit r.n as the name of a structure or union type in such a declaration.


Previous Up Next