This document describes the Failure Trace Archive format.
The trace format is organized hierarchically as follows:
Platform -> Node -> Component -> Event Trace.
INSERT picture of schema.
We summary the meaning of each table below. Table names are shown in bold.
A platform contains a set of nodes. Examples of a platform include SETI@home, desktops at Microsoft.
A node contains a set of components, which is a software module or hardware resource of the node. Each node can have several components (e.g. CPU speed, availability memory, client availability), each of which has a corresponding trace.
A component describes attributes of a software module or hardware resource of a node.
component_perf is the component performance, as measured through benchmarks for example.
A creator is the person responsible for the trace data set. This table stores details about citations and copyright.
An event_trace is the trace of an event, with all of corresponding timing information
event_state is the state corresponding to an event_trace. For example, for CPU availability, the event_state could be the idleness of the CPU. For host availability, it could be the monitoring information associated with the event.
A description of the table attributes appears below.
platform_id | A unique number identifying this platform. >> It allows one to differentiate pools of nodes. |
platform_name | name of the platform (e.g. "Berkeley_NOW_Lab_Fall_1998") |
platform_location | location name of the platform source (e.g. "Berkeley NOW Lab - Soda Hall 2nd Floor, USA, Planet Earth") |
platform_type | type of the platform (cluster, multicluster, grid, desktop_grid, or volunteer_computing) |
misc_notes | miscellaneous notes |
node_id | unique ID for this node |
platform_id | id of the platform containing node |
node_name | name of node |
node_ip | IP address |
node_location | location of the node (e.g. country, geographic coordinates) |
timezone | time zone of the resource (second offset from GMT) |
proc_model | processor name, model, version number |
os_name | name and version of the resource OS |
cores_per_proc | number of cores per processor |
num_procs | number of processors for this node |
mem_size | number of bytes of memory |
disk_size | number of bytes of disk space |
up_bw | number of bytes/sec of upload speed |
down_bw | number of bytes/sec of download speed |
metric_id | unique ID for performance metric (e.g. benchmark) |
notes | other notes related to this resource |
metric_id | unique ID for performance metric (benchmark) |
component_id | unique ID for the component |
node_id | unique ID for this node |
platform_id | ID of platform containing node |
sfpop_speed | maximum single precision floating point speed (ops/sec) |
dfpop_speed | maximum double precision floating point speed (ops/sec) |
iop_speed | integer operation speed (ops/sec) |
i_val | integer |
f_val | float |
s_val | string |
component_id | unique ID for this component |
node_id | ID of the node containing this component |
platform_id | ID of platform containing this node |
node_name | Name of the node |
component_type | type of this component trace (0 -> host availability, network, CPU, client, memory, etc) |
trace_start | when the trace event first appeared (epoch time) |
trace_end | when the trace event last appeared (epoch time) |
resolution | resolution of the traces in seconds |
creator_id | ID of creator of this component trace data |
component_id | unique ID for this component trace data |
node_id | ID of the node corresponding to this trace |
platform_id | ID of platform containing node |
creator | name(s) of the person(s) who recorded the event traces |
cite | citation (bibtex, etc) for using the data from the event traces |
copyright | details of the copyright and rights reserved |
event_id | unique ID of event state |
component_id | unique ID for this component trace data |
node_id | unique ID for this node |
platform_id | ID of platform that is the node parent |
node_name | name of node |
event_type | type of event (0 -> unavailability, 1-> availability). Event id's up to 10,000 are reserved; the rest can be user defined |
event_start_time | start of this event (UNIX epoch time) |
event_end_time | end of this event (UNIX epoch time) |
event_end_reason | reason the event type or state changed at the end of this trace (for example, reason that CPU became unavailable: 0=undefined, 1=miscellaneous, 2=mouse_activity, 3=keyboard_activity, 4=scheduled_downtime, 5=graceful_shutdown, 6=hard_shutdown) |
event_id | unique ID of event state |
component_id | unique ID for this component trace data |
node_id | unique ID for this node |
platform_id | ID of platform that is the node parent |
i_val | integer |
f_val | float (for example, 0% - 100% for CPU availability) |
s_val | string |