The Failure Trace Archive (FTA) is centralized public repository of availability traces of distributed systems, and tools for their analysis.
The purpose of this archive is to facilitate the design, validation, and comparison of fault-tolerant models and algorithms.
In particular, the FTA contains the following:
availability traces of distributed systems, differing in scale, volatility, and usage
a standard format for failure traces
scripts and tools for analyzing these traces
The FTA allows the following:
the comparison and cross-validation of a fault-tolerant model or algorithm across identical trace data sets
the evaluation of the generality of a model or algorithm across different types of resources (in terms of reliability or user base, for example)
the evaluation the generality of a failure trace, i.e., to determine whether measurements are biased to particular platform or middleware
the determination of which trace data set is most interesting or applicable for a given algorithm or model
the analysis of the evolution of availability in different systems across long timescales
the integration of failure models with other types of models (such as workloads)
the incorporation of traces with a common format into fault simulators or emulators for model or algorithm evaluation