首页 > 技术点滴 > The USE Method: Mac OS X Performance Checklist

The USE Method: Mac OS X Performance Checklist

2013年10月14日 baoz 阅读评论

This is my example USE Method-based performance checklist for the Apple Mac OS X operating system, for identifying common bottlenecks and errors. This draws upon both command line and graphical tools for coverage, focusing where possible on those that are provided with the OS by default, or by Apple (eg, Instruments). Further notes about tools are provided after this table.

Some of the metrics are easy to find in various GUIs or from the command line (eg, using Terminal; if you’ve never used Terminal before, follow my instructions at the top of this post). Many metrics require some math, inference, or quite a bit of digging. This will hopefully get easier in the future, as tools include a USE method wizard or the metrics required to follow this easily.

Last Updated: 29-Sep-2013

Physical Resources, Standard

component type metric
CPU utilization system-wide: iostat 1, “us” + “sy”; per-cpu: DTrace [1]; Activity Monitor → CPU Usage or Floating CPU Window; per-process: top -o cpu, “%CPU”; Activity Monitor → Activity Monitor, “%CPU”; per-kernel-thread: DTrace profile stack()
CPU saturation system-wide: uptime, “load averages” > CPU count; latency, “SCHEDULER” and “INTERRUPTS”; per-cpu: dispqlen.d (DTT), non-zero “value”; runocc.d (DTT), non-zero “%runocc”; per-process: Instruments → Thread States, “On run queue”; DTrace [2]
CPU errors dmesg; /var/log/system.log; Instruments → Counters, for PMC and whatever error counters are supported (eg, thermal throttling)
Memory capacity utilization system-wide: vm_stat 1, main memory free = “free” + “inactive”, in units of pages; Activity Monitor → Activity Monitor → System Memory, “Free” for main memory; per-process: top -o rsize, “RSIZE” is resident main memory size, “VSIZE” is virtual memory size; ps -alx, “RSS” is resident set size, “SZ” is virtual memory size; ps aux similar (legacy format)
Memory capacity saturation system-wide: vm_stat 1, “pageout”; per-process: anonpgpid.d (DTT), DTrace vminfo:::anonpgin [3] (frequent anonpgin == pain); Instruments → Memory Monitor, high rate of “Page Ins” and “Page Outs”; sysctl vm.memory_pressure [4]
Memory capacity errors System Information → Hardware → Memory, “Status” for physical failures; DTrace failed malloc()s
Network Interfaces utilization system-wide: netstat -i 1, assume one very busy interface and use input/output “bytes” / known max (note: includes localhost traffic); per-interface: netstat -I interface 1, input/output “bytes” / known max; Activity Monitor → Activity Monitor → Network, “Data received/sec” “Data sent/sec” / known max (note: includes localhost traffic); atMonitor, interface percent
Network Interfaces saturation system-wide: netstat -s, for saturation related metrics, eg netstat -s | egrep 'retrans|overflow|full|out of space|no bufs'; per-interface: DTrace
Network Interfaces errors system-wide: netstat -s | grep bad, for various metrics; per-interface: netstat -i, “Ierrs”, “Oerrs” (eg, late collisions), “Colls” [5]
Storage device I/O utilization system-wide: iostat 1, “KB/t” and “tps” are rough usage stats [6]; DTrace could be used to calculate a percent busy, using io provider probes; atMonitor, “disk0″ is percent busy; per-process: iosnoop (DTT), shows usage; iotop (DTT), has -P for percent I/O
Storage device I/O saturation system-wide: iopending (DTT)
Storage device I/O errors DTrace io:::done probe when /args[0]->b_error == 0/
Storage capacity utilization file systems: df -h; swap: sysctl vm.swapusage, for swap file usage; Activity Monitor → Activity Monitor → System Memory, “Swap used”
Storage capacity saturation not sure this one makes sense – once its full, ENOSPC
Storage capacity errors DTrace; /var/log/system.log file system full messages
  • [1] eg: dtrace -x aggsortkey -n ‘profile-100 /!(curthread->state & 0×80)/ { @ = lquantize(cpu, 0, 1000, 1); } tick-1s { printa(@); clear(@); }’
  • [2] Until there are sched:::enqueue/dequeue probes, I suspect this could be done using fbt tracing of thread_*(). I haven’t tried yet. It might be worth seeing what Instruments uses for its “On run queue” thread state trace, and DTracing that.
  • [3] eg: dtrace -n ‘vminfo:::anonpgin { printf(“%Y %s”, walltimestamp, execname); }’
  • [4] the kernel source under bsd/vm/vm_unix.c describes this as “Memory pressure indicator”, although I’ve yet to see this as non-zero.
  • [5] the netstat(1) man page reads: “BUGS: The notion of errors is ill-defined.”
  • [6] it would be great if Mac OS X iostat added a -x option to include utilization, saturation, and error columns, like Solaris “iostat -xnze 1″.
  • atMonitor is a 3rd party tool that provides various statistics; I’m running version 2.7b, although it crashes if you leave the “Top Window” open for more than 2 seconds.
  • Activity Monitor is a default Apple performance monitoring tool with a graphical interface.
  • Instruments is an Apple performance analysis product with a graphical interface. It is comprehensive, consuming performance data from multiple frameworks, including DTrace. Instruments also includes functionality that was provided by separate previous performance analysis products, like CHUD and Shark, making it a one stop shop. It’d be wonderful if it included latency heat maps as well :-).
  • Temperature Monitor: 3rd party software that can read various temperature probes.
  • PMC == Performance Monitor Counters, aka CPU Performance Counters (CPC), Performance Instrumentation Counters (PICs), and more. These are processor hardware counters that are read via programmable registers on each CPU.
  • DTT == DTraceToolkit scripts, many of which were ported by the Apple engineers and shipped by default with Mac OS X. ie, you should be able to run these immediately, eg, sudo runocc.d.

Physical Resources, Advanced

component type metric
GPU utilization directly: DTrace [7]; atMonitor, “gpu”; indirect: Temperature Monitor; atMonitor, “gput”
GPU saturation DTrace [7]; Instruments → OpenGL Driver, “Client GLWait Time” (maybe)
GPU errors DTrace [7]
Storage controller utilization iostat 1, compare to known IOPS/tput limits per-card
Storage controller saturation DTrace and look for kernel queueing
Storage controller errors DTrace the driver
Network controller utilization system-wide: netstat -i 1, assume one busy controller and examine input/output “bytes” / known max (note: includes localhost traffic)
Network controller saturation see network interface saturation
Network controller errors see network interface errors
CPU interconnect utilization for multi-processor systems, try Instruments → Counters, and relevent PMCs for CPU interconnect port I/O, and measure throughput / max
CPU interconnect saturation Instruments → Counters, and relevent PMCs for stall cycles
CPU interconnect errors Instruments → Counters, and relevent PMCs for whatever is available
Memory interconnect utilization Instruments → Counters, and relevent PMCs for memory bus throughput / max, or, measure CPI and treat, say, 5+ as high utilization; Shark had “Processor bandwidth analysis” as a feature, which either was or included memory bus throughput, but I never used it
Memory interconnect saturation Instruments → Counters, and relevent PMCs for stall cycles
Memory interconnect errors Instruments → Counters, and relevent PMCs for whatever is available
I/O interconnect utilization Instruments → Counters, and relevent PMCs for tput / max if available; inference via known tput from iostat/…
I/O interconnect saturation Instruments → Counters, and relevent PMCs for stall cycles
I/O interconnect errors Instruments → Counters, and relevent PMCs for whatever is available
  • [7] I haven’t found a shipped tool to provide GPU statistics easily. I’d like a gpustat that behaved like mpstat, with at least the columns: utilization, saturation, errors. Until there is such a tool, you could trace GPU activity (at least the scheduling of activity) using DTrace on the graphics drivers. It won’t be easy. I imagine Instruments will at some point add a GPU instrument set (other than the OpenGL instruments), otherwise, 3rd party tools can be used, like atMonitor.
  • CPI == Cycles Per Instruction (others use IPC == Instructions Per Cycle).
  • I/O interconnect: this includes the CPU to I/O controller busses, the I/O controller(s), and device busses (eg, PCIe).
  • Using PMCs is typically a lot of work. This involves researching the processor manuals to see what counters are available and what they mean, and then collecting and interpreting them. I’ve used them on other OSes, but haven’t used them all under Instruments → Counters, so I don’t know if there’s a hitch with anything there. Good luck.

Software Resources

component type metric
Kernel mutex utilization DTrace and lockstat provider for held times
Kernel mutex saturation DTrace and lockstat provider for contention times [8]
Kernel mutex errors DTrace and fbt provider for return probes and error status
User mutex utilization plockstat -H (held time); DTrace plockstat provider
User mutex saturation plockstat -C (contention); DTrace plockstat provider
User mutex errors DTrace plockstat and pid providers, for EDEADLK, EINVAL, … see pthread_mutex_lock(3C)
Process capacity utilization current/max using: ps -e | wc -l / sysctl kern.maxproc; top, “Processes:” also shows current
Process capacity saturation not sure this makes sense
Process capacity errors “can’t fork()” messages
File descriptors utilization system-wide: sysctl kern.num_files / sysctl kern.maxfiles; per-process: can figure out using lsof and ulimit -n
File descriptors saturation I don’t think this one makes sense, as if it can’t allocate or expand the array, it errors; see fdalloc()
File descriptors errors dtruss or custom DTrace to look for errno == EMFILE on syscalls returning fds (eg, open(), accept(), …)
  • [8] eg, showing adaptive lock block time totals (in nanoseconds) by calling function name: dtrace -n ‘lockstat:::adaptive-block { @[caller] = sum(arg1); } END { printa(“%40a%@16d ns\n”, @); }’

Other Tools

I didn’t include fs_usage, sc_usage, sample, spindump, heap, vmmap, malloc_history, leaks, and other useful Mac OS X performance tools, as here I’m beginning with questions (the methodology) and only including tools that answer them. This is instead of the other way around: listing all the tools and trying to find a use for them. Those other tools are useful for other methodologies, which can be used after this one.

What’s Next

See the USE Method for the follow-up methodologies after identifying a possible bottleneck. If you complete this checklist but still have a performance issue, move onto other methodologies: drill-down analysis and latency analysis.

For more performance analysis, also see my earlier post on Top 10 DTrace Scripts for Mac OS X.

Acknowledgements

Resources used:

Filling this this checklist has required a lot of research, testing and experimentation. Please reference back to this post if it helps you develop related material.

It’s quite possible I’ve missed something or included the wrong metric somewhere (sorry); I’ll update the post to fix these up as they are understood, and note at the top the update date.

Also see my USE method performance checklists for Solaris, SmartOS, Linux, and FreeBSD.

  1. 本文目前尚无任何评论.