“If quantitative metrics are inescapable, we suggest focusing on Service Level Objectives (SLOs) and cost of coordination data.”
Faced with exactly the same problem, I came up with the same solution: instead of focusing on number of incidents (which is much more problematic than MTTR), we should be focussing on SLO transgressions. SLOs are set by the org itself and can also be adjusted which means that that is an ongoing conversation the org can and needs to have with itself around reliability.
Secondary, if you need to report on incidents to the business, I think you can’t go wrong with attaching a justified cost at the order of magnitude to an incident: this incident cost us thousands, ten thousands, millions etc. and how we calculated that number. It won’t be immensely accurate, but again the point is not accuracy but to build numerical muscle throughout the organization.