Probable causes:
If you’re new to using TensorBoard, and want to find out how to add data and set up your event files, check out the README and perhaps the TensorBoard tutorial.
If you think TensorBoard is configured properly, please see the section of the README devoted to missing data problems and consider filing an issue on GitHub.
To lay out the dashboard, pass a Layout protobuffer to the set_layout method. For example,
Layout
set_layout
from tensorboard import summary from tensorboard.plugins.custom_scalar import layout_pb2 ... # This action does not have to be performed at every step, so the action is not # taken care of by an op in the graph. We only need to specify the layout once. # We only need to specify the layout once (instead of per step). layout_summary = summary_lib.custom_scalar_pb(layout_pb2.Layout( category=[ layout_pb2.Category( title='losses', chart=[ layout_pb2.Chart( title='losses', multiline=layout_pb2.MultilineChartContent( tag=[r'loss.*'], )), layout_pb2.Chart( title='baz', margin=layout_pb2.MarginChartContent( series=[ layout_pb2.MarginChartContent.Series( value='loss/baz/scalar_summary', lower='baz_lower/baz/scalar_summary', upper='baz_upper/baz/scalar_summary'), ], )), ]), layout_pb2.Category( title='trig functions', chart=[ layout_pb2.Chart( title='wave trig functions', multiline=layout_pb2.MultilineChartContent( tag=[r'trigFunctions/cosine', r'trigFunctions/sine'], )), # The range of tangent is different. Let's give it its own chart. layout_pb2.Chart( title='tan', multiline=layout_pb2.MultilineChartContent( tag=[r'trigFunctions/tangent'], )), ], # This category we care less about. Let's make it initially closed. closed=True), ])) writer.add_summary(layout_summary)
import tensorflow as tf from tensorflow.python import debug as tf_debug sess = tf.Session() sess = tf_debug.TensorBoardDebugWrapperSession(sess, "[[_host]]:[[_port]]") sess.run(my_fetches)
import tensorflow as tf from tensorflow.python import debug as tf_debug hook = tf_debug.TensorBoardDebugHook("[[_host]]:[[_port]]") my_estimator.fit(x=x_data, y=y_data, steps=1000, monitors=[hook])
import tensorflow as tf from tensorflow.python import debug as tf_debug import keras keras.backend.set_session( tf_debug.TensorBoardDebugWrapperSession(tf.Session(), "[[_host]]:[[_port]]")) # Define your keras model, called "model". model.fit(...)
Alerts are sorted from top to bottom by increasing timestamp.
No numeric alerts so far. That is likely good. Alerts indicate the presence of NaN or (+/-) Infinity values, which may be concerning.
To store a graph, create a tf.summary.FileWriter and pass the graph either via the constructor, or by calling its add_graph() method. You may want to check out the graph visualizer tutorial.
tf.summary.FileWriter
add_graph()
If you're new to using TensorBoard, and want to find out how to add data and set up your event files, check out the README and perhaps the TensorBoard tutorial .
No bookmarks yet, upload a bookmarks file or add a new bookmark by clicking the "+" below.
If you'd like to share your visualization with the world, follow these simple steps. See this tutorial for more.
Host tensors, metadata, sprite image, and bookmarks TSV files publicly on the web.
One option is using a github gist. If you choose this approach, make sure to link directly to the raw file.
Nearest points in the original space:
{{metadataColumn}} labels (click to apply):
Run
For faster results, the data will be sampled down to [[getUmapSampleSizeText()]] points.
Learn more about UMAP.
Run Pause Perturb
Iteration: 0
For faster results, the data will be sampled down to [[getTsneSampleSizeText()]] points.
How to use t-SNE effectively.
PCA is approximate.
tf.train.Saver
saver.save(session, LOG_DIR/model.ckpt, step)
Section 1: Summary of input-pipeline analysis
[[_summary_conclusion]]
Recommendation for next step: [[_summary_nextstep]]
Section 2: Device-side analysis details
Section 2.1: Device step time
Device step-time statistics (in ms)
Average: [[_steptime_ms_average]] ms (σ = [[_steptime_ms_stddev]] ms)
Range: [[_steptime_ms_minimum]] - [[_steptime_ms_maximum]] ms
training step number
Section 2.2: Range of device time waiting for input data across cores at each step
% of device step time waiting for input data (average over the maximum waiting time across cores at each step)
Average: [[_infeed_percent_average]] % (σ = [[_infeed_percent_stddev]] %)
Range: [[_infeed_percent_minimum]] - [[_infeed_percent_maximum]] %
% of device step time
Section 3: Host-side analysis details
What can be done to reduce above components of the host input time:
Click the "Show" button below to see the source data of the breakdown. [[_toggle_button_text]]
Average step time (lower is better): [[_steptime_ms_average]] ms (standard deviation = [[_steptime_ms_stddev]] ms)
Host idle time (lower is better): [[_host_idle_time_percent]]
TPU idle time (lower is better): [[_device_idle_time_percent]]
Utilization of TPU Matrix Units (higher is better): [[_mxu_utilization_percent]]
Number of Hosts used: [[_host_count]]
TPU type: Cloud TPU
Number of TPU cores: [[_tpu_core_count]]
[[node.xla.expression]]
[[node.xla.provenance]]
Modifying your model's architecture, data dimensions, and improving the efficiency of CPU operations may help reach the TPU's FLOPS potential.
"Idle" represents the portion of the total execution time on device that is idle.
[[node.shape]]
[[node.tfOpName]]
Modifying your model's architecture, batch size and data dimentions may help reduce the memory footprint.
Data Transferred: [[_sizeMiB(node.dataSize)]] MiB
Latency: [[_format(node.durationUs)]] µs
BW: [[_bandwidth(node.dataSize, node.durationUs)]] GiB/s
Send Delay: [[_format(node.sendDelayUs)]] µs
Hlo Names: "[[item]]"
"[[item]]"
Replica Groups {[[item.replicaIds]]}
{[[item.replicaIds]]}
[[item.label]]: [[_getStepBreakdownValue(node, item.key)]] µs [[_getStepBreakdownPct(node, item.key)]]
(x-axis: channel id, y-axis: time (µs))
(x-axis: short names for all-reduces ops (a#) or fusion (f#), y-axis: time (µs))
(x-axis: global chip id, core id, y-axis: time (µs))
Website traffic data by country from an external JSON resource where the data is in raw DataTable format.
Please see GitHub issue #1913 for more information.
If you have a model running on CPU, GPU, or Google Cloud TPU, you may be able to use the above button to capture a profile.
If you’re a CPU or GPU user, please use the IP address option. You may want to check out the tutorial on how to start a TensorFlow profiler server and profile a Keras model on a GPU.
If you're a TPU user, please use the TPU name option and you may want to check out the tutorial on how to interpreting the profiling results.
If you think profiling is done properly, please see the page of Google Cloud TPU Troubleshooting and FAQ and consider filing an issue on GitHub.
Controls disabled: directory is not writeable.
Beholder requires write access to the log directory in order to communicate visualization changes to the Beholder instance in your model.
Beholder
tf.trainable_variables()
b.update(arrays=[NP_ARRAYS])
b.update(frame=NP_ARRAY)
Note: Beholder currently only works well on local file systems.
beholder.update()
To use Beholder, import and instantiate the Beholder class, and call its update method with a Session argument after every train step:
update
Session
from tensorboard.plugins.beholder import Beholder beholder = Beholder(LOG_DIRECTORY) # inside train loop beholder.update( session=sess, arrays=list_of_np_arrays, # optional argument frame=two_dimensional_np_array, # optional argument )
If using tf.train.MonitoredSession, you can use BeholderHook:
tf.train.MonitoredSession
BeholderHook
from tensorboard.plugins.beholder import BeholderHook beholder_hook = BeholderHook(LOG_DIRECTORY) with MonitoredSession(..., hooks=[beholder_hook]) as sess: sess.run(train_op)
If you think everything is set up properly, please see the README for more information and consider filing an issue on GitHub.
Are you sure you want to delete the selected datapoint?
Please select a session group to see its metric-graphs here.
Please enable some metrics to see content here.
This can occur if the TensorBoard backend is no longer running. Perhaps this page is cached?
If you think that you’ve fixed the problem, click the reload button in the top-right. We’ll try to reload every [[autoReloadIntervalSecs]] seconds as well.
Last reload: [[_lastReloadTime]]
Log directory: [[_dataLocation]]
Data location: [[_dataLocation]]
You can select a dashboard from the list above.