Previous Up Next

2  Metavariables for transformations

The rulename portion of the metavariable declaration can specify properties of a rule such as its name, the names of the rules that it depends on, the isomorphisms to be used in processing the rule, and whether quantification over paths should be universal or existential. The optional annotation expression indicates that the pattern is to be considered as matching an expression, and thus can be used to avoid some parsing problems.

The metadecl portion of the metavariable declaration defines various types of metavariables that will be used for matching in the transformation section.

metavariables   ::=  @@ metadecl * @@
|@ rulename @ metadecl * @@
rulename   ::=  id [extends id] [depends on dep] [iso] [disable-iso] [exists] [expression]
dep   ::=  id
|ever id
|never id
|dep && dep
|dep || dep
|file in string
iso   ::=  using string (, string) *
disable-iso   ::=  disable COMMA_LIST(id)
exists   ::=  exists
COMMA_LIST(elem)   ::=  elem (, elem) *

The keyword disable is normally used with the names of isomorphisms defined in standard.iso or whatever isomorphism file has been included. There are, however, some other isomorphisms that are built into the implementation of Coccinelle and that can be disabled as well. Their names are given below. In each case, the text describes the standard behavior. Using disable-iso with the given name disables this behavior.

The depends on clause indicates conditions under which a semantic patch rule should be applied. Most of these conditions relate to the success or failure of other rules, which may be virtual rules. Giving the name of a rule implies that the current rule is applied if the named rule has succeeded in matching in the current environment. Giving ever followed by a rule name implies that the current rule is applied if the named rule has succeeded in matching in any environment. Analogously, never means that the named rule should have succeeded in matching in no environment. The boolean and, or and negation operators combine these declarations in the usual way. The declaration file in checks that the code being processed comes from the mentioned file, or from a subdirectory. The declaration file in is only allowed on SmPL code-matching rules. Script rules are not applied to any code in particular, and thus it doesn’t make sense to check on the file being considered.

The possible types of metavariable declarations are defined by the grammar rule below. Metavariables should occur at least once in the transformation code immediately following their declaration. Fresh identifier metavariables must only be used in + code. These properties are not expressed in the grammar, but are checked by a subsequent analysis. The metavariables are designated according to the kind of terms they can match, such as a statement, an identifier, or an expression. An expression metavariable can be further constrained by its type. A declaration metavariable matches the declaration of one or more variables, all sharing the same type specification (e.g., int a,b,c=3;). A field metavariable does the same, but for structure fields. In the minus code, a statement list metavariable can only appear as a complete function body or as the complete body of a sequence statement. In the plus code, a statement list metavariable can occur anywhere a statement list is allowed, i.e., including as an element of another statement list.

metadecl   ::=  metavariable ids ;
|fresh identifier ids ;
|identifier COMMA_LIST(pmid_with_regexp) ;
|identifier COMMA_LIST(pmid_with_virt_or_not_eq) ;
|parameter [list] ids ;
|parameter list [ id ] ids ;
|parameter list [ const ] ids ;
|identifier [list] ids ;
|identifier list [ id ] ids ;
|identifier list [ const ] ids ;
|type ids ;
|statement [list] ids ;
|declaration ids ;
|field [list] ids ;
|typedef ids ;
|attribute ids ;
|declarer name ids ;
 |declarer COMMA_LIST(pmid_with_regexp) ;
|declarer COMMA_LIST(pmid_with_not_eq) ;
|iterator name ids ;
|iterator COMMA_LIST(pmid_with_regexp) ;
|iterator COMMA_LIST(pmid_with_not_eq) ;
 |[local global] idexpression [ctype] COMMA_LIST(pmid_with_not_eq) ;
|[local global] idexpression [{ctypes} * *] COMMA_LIST(pmid_with_not_eq) ;
|[local global] idexpression * + COMMA_LIST(pmid_with_not_eq) ;
|expression list ids ;
|expression * + COMMA_LIST(pmid_with_not_eq) ;
|expression enum * * COMMA_LIST(pmid_with_not_eq) ;
|expression struct * * COMMA_LIST(pmid_with_not_eq) ;
|expression union * * COMMA_LIST(pmid_with_not_eq) ;
|expression COMMA_LIST(pmid_with_not_ceq) ;
|expression list [ id ] ids ;
|expression list [ const ] ids ;
|ctype [ ] COMMA_LIST(pmid_with_not_eq) ;
|ctype COMMA_LIST(pmid_with_not_ceq) ;
|{ctypes} * * COMMA_LIST(pmid_with_not_ceq) ;
|{ctypes} * * [ ] COMMA_LIST(pmid_with_not_eq) ;
|constant [ctype] COMMA_LIST(pmid_with_not_eq) ;
|constant [{ctypes} * *] COMMA_LIST(pmid_with_not_eq) ;
|position [any] COMMA_LIST(pmid_with_not_eq_mid) ;
|symbol ids;
|format ids;
|format list [ id ] ids ;
|format list [ const ] ids ;
|assignment operator COMMA_LIST(assignopdecl) ;
|binary operator COMMA_LIST(binopdecl) ;
assignopdecl   ::=  id [ = assignop_contraint]
assignop_contraint   ::=  {COMMA_LIST(assign_op)}
binopdecl   ::=  id [ = binop_contraint]
binop_contraint   ::=  {COMMA_LIST(bin_op)}

A metavariable declaration local idexpression v means that v is restricted to be a local variable. If it should just be a variable, but not necessarily a local one, then drop local. A more complex description of a location, such as a->b is considered to be an expression, not an idexpression.

Constant is for constants, such as 27. But it also considers an identifier that is all capital letters (possibly containing numbers) as a constant as well, because the names given to macros in Linux usually have this form.

An identifier is the name of a structure field, a macro, a function, or a variable. It is the name of something rather than an expression that has a value. But an identifier can be used in the position of an expression as well, where it represents a variable.

It is possible to specify that an expression list or a parameter list metavariable should match a specific number of expressions or parameters.

An identifier list is only used for the parameter list of a macro. It is possible to specify its length.

It is possible to specify some information about the definition of a fresh identifier. See the wiki.

A symbol declaration specifies that the provided identifiers should be considered C identifiers when encountered in the body of the rule. Identifiers in the body of the rule that are not declared explicitly are by default considered symbols, thus symbol declarations are optional. It is not required, but it will not cause a parse error, to redeclare a name as a symbol. A name declared as a symbol can, however, be redeclared as another metavariable. It will be considered to be a metavariable in such rules, and will revert to being a symbol in subsequent rules. These conditions also apply to iterator names and declarer names.

An attribute declaration indicates a name that should be considered to be an attribute. It is not possible to match or remove an attribute, only to add one.

A position metavariable is used by attaching it using @ to any token, including another metavariable. Its value is the position (file, line number, etc.) of the code matched by the token. It is also possible to attach expression, declaration, type, initialiser, and statement metavariables in this manner. In that case, the metavariable is bound to the closest enclosing expression, declaration, etc. If such a metavariable is itself followed by a position metavariable, the position metavariable applies to the metavariable that it follows, and not to the attached token. This makes it possible to get eg the starting and ending position of f(...), by writing f(...)@E@p, for expression metavariable E and position metavariable p. This attachment notation for metavariables of type other than position can also be expressed with a conjunction, but the @ notation may be more concise.

When used, a format or format list metavariable must be enclosed by a pair of @s. A format metavariable matches the format descriptor part, i.e., 2x in %2x. A format list metavariable matches a sequence of format descriptors as well as the text between them. Any text around them is matched as well, if it is not matched by the surrounding text in the semantic patch. Such text is not partially matched. If the length of the format list is specified, that indicates the number of matched format descriptors. It is also possible to use in a format string, to match a sequence of text fragments and format descriptors. This only takes effect if the format string contains format descriptors. Note that this makes it impossible to require to match exactly in a string, if the semantic patch string contains format descriptors. If that is needed, some processing with a scripting language would be required. And example for the use of string format metavariables is found in demos/format.cocci.

Assignment (resp. binary) operator metavariables match any assignment (resp. binary) operator. The list of operators that can be matched can be restricted by adding an operator constraint, i.e. a list of accepted operators.

Other kinds of metavariables can also be attached using @ to any token. In this case, the metavariable floats up to the enclosing appropriate expression. For example, 3 +@E 4, where E is an expression metavariable binds E to 3 + 4. A particular case is Ps@Es, where Ps is a parameter list and Es is an expression list. This pattern matches a parameter list, and then matches Es to the list of expressions, ie a possible argument list, represented by the names of the parameters. Another particular case is E@S, where E is any expression and S is a statement metavariable. S matches the closest enclosing statement, which may be more than what is matches by the semantic match pattern itself.

Matching of various kinds of format strings within strings is supported. With the --ibm option, matching of decimal format declarations is supported, but the length and precision arguments are not interpreted. Thus it is not possible to match metavariables in these fields. Instead, the entire format is matched as a single string.

ids   ::=  COMMA_LIST(pmid)
pmid   ::=  id
mid   ::=
pmid_with_regexp   ::=  pmid =~ regexp
|pmid !~ regexp
pmid_with_not_eq   ::=  pmid [!= id_or_meta]
|pmid [!= { COMMA_LIST(id_or_meta) }]
pmid_with_virt_or_not_eq   ::=
pmid_with_not_ceq   ::=  pmid [!= id_or_cst]
|pmid [!= { COMMA_LIST(id_or_cst) }]
id_or_cst   ::=  id
id_or_meta   ::=  id
pmid_with_not_eq_mid   ::=  pmid [ANDAND_LIST(pos_constraint)]
pos_constraint   ::=  != mid
|!= { COMMA_LIST(mid) }
|: script:ocaml (COMMA_LIST( mid )) {expr }

Subsequently, we refer to arbitrary metavariables as metaidty, where ty indicates the metakind used in the declaration of the variable. For example, metaidType refers to a metavariable that was declared using type and stands for any type.

metavariable declares a metavariable for which the parser tried to figure out the metavariable type based on the usage context. Such a metavariable must be used consistently. These metavariables cannot be used in all contexts; specifically, they cannot be used in context that would make the parsing ambiguous. Some examples are the leftmost term of an expression, such as the left-hand side of an assignment, or the type in a variable declaration. These restrictions may seems somewhat arbitrary from the user’s point of view. Thus, it is better to use metavariables with metavariable types. If Coccinelle is given the argument -parse_cocci, it will print information about the type that is inferred for each metavariable.

The ctype and ctypes nonterminals are used by both the grammar of metavariable declarations and the grammar of transformations, and are defined on page ??.

An identifier metavariable with virtual as its “rule name” is given a value on the command line. For example, if a semantic patch contains a rule that declares an identifier metavariable with the name virtual.alloc, then the command line could contain -D alloc=kmalloc. There should not be space around the =. An example is in demos/vm.cocci and demos/vm.c.

It is possible to give an identifier metavariable a list of constraints that it should or should not be equal to. If the constraint is a list of (unquoted) strings, then the value of the metavariable should be the same as one of the strings, in the case of an equality constraint, or different from all of the strings, in the case of an inequality constraint. It is also possible to include inherited identifier metavariables among the constraints. In the case of a positive constraint, things work in the same way, but not with respect to the inherited value of the metavariable. On the other hand, an inequality constraint does not work so well, because the only value available is the one available in the current environment. If the proposed value is different from the one in the current environment, but perhaps the same as the one in some other environment, the match will still succeed.

Position metavariables can be associated with constraints implemented as OCaml script code. The code must have the form of a single C expression, typically a function call with a tuple of arguments. This expression must have type bool. The script code can be parameterized by any inherited metavariables. It is implicitly parameterized by the metavariable being declared. In the script, the inherited variable parameters are referred to by their variable names, without the associated rule name. The variable being declared is also referenced by its name. All parameters, except position variables, have their string representation. An example is in demos/poscon.cocci.

A declaration of a name as a typedef extends through the rest of the semantic patch. It is not required, but it will not cause a parse error, to redeclare a name as a typedef. A name declared as a typedef can, however, be redeclared as another metavariable. It will be considered to be a metavariable in such rules, and will revert to being a typedef in subsequent rules.


Each metavariable declaration causes the declared metavariables to be immediately usable, without any inheritance indication. Thus the following are correct:

type r.T;
T x;

[...] // some semantic patch code
r.T x;
type r.T;

[...] // some semantic patch code

But the following is not correct:

type r.T;
r.T x;

[...] // some semantic patch code

This applies to position variables, type metavariables, identifier metavariables that may be used in specifying a structure type, and metavariables used in the initialization of a fresh identifier. In the case of a structure type, any identifier metavariable indeed has to be declared as an identifier metavariable in advance. The syntax does not permit r.n as the name of a structure or union type in such a declaration.

Previous Up Next