Design a System to Monitor the Health of a Cluster


🙋 Here are some details you should know about this question:

What metrics will you track for health?

How will you collect those metrics? Will you utilize a heartbeat?

How will you log the status of each metric?

What will you do if a metric is down? Will you have an action for each metric? For example, if a cluster is not responding, will you ask for a restart?

How will you display the monitoring to the user? Through a dashboard?


← Back to Main Table