Taint analysis with CPA
Taint analysis with CPA
Taint analysis aims for detecting a data flow between taint sources and sinks. Configurable program analysis (CPA) is a formalism suitable for integrating multiple data flow analyses in one tool. Taints can be traced in few simple steps.
Modeling the control flow
A control flow automaton (CFA) is a graph with nodes being bytecode offsets and edges being instructions or calls connecting them. You can create a CFA from the program class pool:
// Create the control flow automaton (CFA).
JvmCfa cfa = CfaUtil.createInterproceduralCfaFromClassPool(programClassPool);
Defining taint sources
Every taint analysis data flow starts from a taint source. Any Java method can be a taint source. You have several options of how a taint source can behave. A source may:
- taint the calling instance,
- return the taint,
- taint its actual parameters of nonprimitive types,
- taint static fields.
For creating a taint you need its fully qualified name and the expected tainting pattern. Let us create a simple taint source returning a tainted string:
// Create a taint source.
TaintSource source = new TaintSource("LMain;source()Ljava/lang/String;", // the fully qualified name of a source method
false, // whether the source taints the calling instance
true, // whether the source taints its return
Collections.emptySet(), // a set of tainted arguments
Collections.emptySet()); // a set of tainted global variables
Defining taint sinks
Taint sinks are the counterpart of taint sources sensitive to a taint. A taint sink may be sensitive to
- the calling instance,
- actual parameters,
- static fields.
Given the fully qualified name and the sensitivity model you can straightforwardly create a taint sink like the one sensitive to its only argument:
// Create a taint sink.
JvmTaintSink sink = new JvmTaintSink("LMain;sink(Ljava/lang/String;)V", // the fully qualified name of a sink method
false, // whether the sink is sensitive to the calling instance
Collections.singleton(1), // a set of sensitive arguments
Collections.emptySet()); // a set of sensitive global variables
Note: The argument enumeration for both taint sources and taint sinks starts from one and does not depend on whether the method is static. The calling distance is handled by a separate boolean constructor parameter.
Setting up a CPA run
CPA runs encapsulate the initialization of CPA components and allow configuring the analysis.
The CPA run needs to know in which method the analysis needs to start and how deep the call stack
for the interprocedural analysis should be. All calls overflowing the stack, as well as all
library methods, are approximated intraprocedurally as propagating the taint from their
calling instance and arguments into the return value. You can create a CPA run for analyzing
Main.main(String args)
with an unlimited call stack as follows:
// Create the CPA run.
JvmTaintMemoryLocationBamCpaRun cpaRun = new JvmTaintMemoryLocationBamCpaRun(cfa, // a CFA
Collections.singleton(source), // a set of taint sources
new MethodSignature("Main",
"main",
"([Ljava/lang/String)V"), // the signature of the main method
-1, // the maximum depth of the call stack analyzed interprocedurally.
// 0 means intra-procedural analysis.
// < 0 means unlimited depth.
TaintAbstractState.bottom, // a cut-off threshold
Collections.singleton(sink)); // a collection of taint sinks
Running the analysis and obtaining witness traces
The analysis execution can be done in a single line together with generating witness traces:
// Run the analysis and get witness traces.
Set<List<JvmMemoryLocation>> traces = cpaRun.extractLinearTraces();
Interpreting the analysis result
The result of the analysis is a set of witness traces, if there is a data flow detected. A witness trace is a list of memory locations at specific program locations. For instance, the class below
// Run the analysis and get witness traces.
public class Main
{
public static void main()
{
sink(callee());
}
public static String callee()
{
return source();
}
}
would generate a witness trace consisting of two top stack locations, one after the taint source in
callee()
and another before the call to sink(String s)
:
[JvmStackLocation(0)@LMain;main()V:3, JvmStackLocation(0)@LMain;callee()Ljava/lang/String;:3]
Note that the traces returned by the CPA run go from the taint sink to the taint source. There are four types of memory locations:
- stack locations identified by their offsets from the operand stack top,
- local variable locations identified by their indices in the local variable array,
- static field locations identified by their fully qualified names,
- heap locations identified by their abstract references.
Complete example: AnalyzeTaints.java