I find it kind of appalling — and frankly a little sad — that the decision to write this article wasn’t a tough one. It should be self-evident that business metrics such as conversion rate and average order value are no reliable measures for the user’s experience of a digital product, yet I encounter this over and over again. Self-proclaimed experts say it at meetups, round-tables or summits, stakeholders say it in business meetings, and “UX companies” say it on landing pages for some product they want to sell. Once, I asked an applicant for a UX position: “What is user experience?” They replied: “When a user can complete a funnel and convert successfully.” Needless to say, I was screaming internally. This faulty understanding is in part fueled by a growing “conversion rate optimization industry” and the spreading misconception that optimizing conversion rates and optimizing user experience are more or less the same thing. I’ve written about this in more detail in my previous article “Growth Marketing Considered Harmful” .
Now, where to start? First of all, it’s important to note that due to my professional background, I’m talking about e-commerce here. Second, for the sake of clarity, when I talk about user experience (UX), I mean the UX of a digital product (e.g., a webshop or app) and refer to everything that can be influenced by a digital design team. That is, the experience that stems from interacting with the digital product, independent of factors such as merchandise, pricing, shipping, customer service, etc., which is often referred to as customer experience (CX) for a clear distinction. Hence, we’re looking at UX, not CX. Third, when I say UX metric, I rely on a definition by Nielsen Norman Group: “A UX metric is a piece of numerical data that tells us about some aspect of the user experience of a product or service.” 
Without a doubt, conversion rate (CR) and average order value (AOV) are important key performance indicators (KPIs) for every e-commerce business. Hell, they’re probably the most looked-at numbers in almost every single company — and for very good reasons. If too many users put nothing in their basket, or if they do but don’t complete the transaction, you won’t make money, you can’t pay your employees, and ultimately, your business will fail. So, yeah, let’s monitor CR and AOV very closely. But here’s the thing — what I just said already highlights the essence of the matter. They are business KPIs, and while good UX can have a great impact on revenue , CR and AOV can’t reliably reflect the UX of a digital product for three reasons.
Primo, based on my extensive experience in e-commerce, I can say that (potential) customers care much, much more about whether they need/want a particular product and value for money than the “surrounding” UX. Exclusive merchandise, a voucher campaign, or better payment and delivery options can, in most cases, tweak a conversion rate much more than a digital design team ever could. Consider A/B tests, which usually quantify a change to the user interface. As a rule of thumb, most fail and if they don’t, they improve key metrics (like CR and AOV) by 0.1‒1% . There are exceptions to this rule, but they are extremely rare . McDowell et al. investigated associations between website features and CR and found statistically significant but what are usually considered weak correlations, since none exceeded .29 . That being said, UX is a deciding factor when you and your competitors offer the same products at similar prices or when there are massive usability or information architecture barriers that prevent users from finding products or moving through your site (and especially your checkout) . A digital design team can completely break the conversion rate if they don’t get their hygiene factors right (cf. Maslow’s hierarchy of needs ). So, please don’t get me wrong here. Good design and UX are absolutely business-critical nowadays , but the point is: You can’t really tell that from CR and AOV, at least in the case of relatively mature online shops.
Secondo, there are plenty of examples in which decisions that lead to a worse UX at the same time boost CR and/or AOV. Take, for instance, the case of the “buy now” button that got relabelled “reserve and continue”, thus suggesting to users that they weren’t making a definite purchase decision yet ; or the deceptive countdown timer that is supposed to invoke a scarcity effect where there’s de facto no scarcity . There’s a seemingly endless list of dark patterns that are employed in today’s e-commerce industry, and it doesn’t even have to be on purpose. When considering CR and AOV as “UX metrics” and looking neither left nor right, it’s extremely easy to accidentally introduce dark patterns that boost your business KPIs while at the same time worsening the user’s experience. A/B testing, which often solely focused on those KPIs, plays a crucial part in the proliferation of such dark patterns [1,8].
Terzo, it can also be the other way ‘round: improving UX and at the same time hurting CR and/or AOV. This can in particular happen when users are provided with more information about a product or there’s more transparent communication about issues with fulfillment . For instance, a certain product detail page might display a piece of information about compatibility that prevents a user from buying, while without the info they would have completed the transaction . In the latter case we, of course, must assume that they would return the product, but this highlights an additional problem with CR and AOV — they’re short-term metrics . Therefore, it’s essential to monitor return rates as well.
It shall not go unmentioned that in many cases, UX improvements and optimizing CR/AOV do go hand in hand . However, that still doesn’t make them UX metrics. To quickly summarize at this point: CR and AOV are short-term business metrics on which changes in UX usually have very little impact, if at all. In some cases, they can even be opposed to UX improvements [1,8,9]. The positive impact of good UX often becomes visible only in the long run and rather affects goals like, e.g., brand reputation .
As kind of a very long side note, while we’re at it, let’s take the opportunity to have a look at some other measures besides CR and AOV that are also commonly used as UX metrics. I’ll limit myself to four particularly popular ones here: task success, time on task, Net Promoter Score® (NPS), and the System Usability Scale (SUS).
Task success is relatively self-explaining. It’s the user’s ability to successfully complete a task that’s important for them (and the company), such as finding a product they like, adding a product to the basket, choosing a payment method, or completing a checkout. Now, I think we can all agree that not being able to complete a task you want to complete is really frustrating and therefore makes for a suboptimal UX. Hence, according to the definition given above — yes — task success, or the task success rate, is “a piece of numerical data that tells us about some aspect of the user experience” , but that aspect is very limited. For one, task success is more of a usability than a UX measure . If users can’t complete an essential task, that tells you there’s probably a huge usability bug in your digital product, but provides little insight into anything else UX-related. If task success is (close to) 100% and everything’s seemingly good, that doesn’t say a lot about the experience of getting there. I have to spontaneously think about the countless times I had to fill in various insurance forms online — and all of them successfully. Task success is a hygiene factor so basic and UX a concept so complex, with so many variables , that measuring the latter shouldn’t mostly rely on the former. For another, who says that there are no cases in which not completing a task can be a good experience? What about the user who fails on the first try, then seeks help in the support section, and completes only the second attempt successfully; but the experience of finding and getting exactly the right help at the right time was such a delightful and seamless experience that they simply don’t bother about having had to try twice? It’s not for nothing that Nielsen and Budiu describe task success as the “simplest usability metric” and “UX bottom line” .
Time on task I find even more difficult than task success. Technically, it as well fits the definition of a UX metric, but has more or less the same problems described above — and it’s even more difficult to interpret. Is lower better? Is higher better? The unsatisfying answer is that it depends, not only on the specific task at hand, but also on the type of user you’re dealing with (and — as a completely external factor — in how much of a hurry they are). Everyone who’s had the pleasure to work with proper e-commerce personas will agree that almost always, there’s a transactional persona and one that actually likes passing time and browsing around (or at least doesn’t mind). This, in turn, means that unless you do have proper personas in place and can appropriately screen for the corresponding users, the meaning of your measured times on task will be really difficult to interpret. And even then, it can be complicated. Take the case of an e-commerce checkout, for instance, which is a task for which — I think you’ll agree — “the quicker the better” should hold universally. Still, I remember a study my colleague Johanna Jagow conducted, which tested a single-page vs. a multi-page checkout. Even though measured time on task was in fact higher for the single-page checkout, reported time on task (how long users felt they needed) was lower and the overall experience rated better. That being said, couldn’t it also be the case that your website is simply so enjoyable to use that (some) users actually want to spend more time on it?
The Net Promoter Score® (NPS) is a business KPI that measures “customer loyalty, satisfaction, and enthusiasm with a company” . From this definition it should be clear that NPS doesn’t count as a UX metric, but there’s an interesting detail. There exists a certain connection between UX and NPS. Bradner and Sauro report that “user experience variables such as ease-of-use, contribute between 32% and 40% to users’ likelihood to recommend a software product” . Still, NPS was not designed to directly measure (aspects of) UX. It can merely be influenced by its changes. Since there are still at least 60% non-UX variables that contribute to NPS, an increase or decrease does not necessarily mean a respective change in UX. In this respect, NPS is similar to CR and AOV, with one important difference. It’s at least better at reflecting long-term effects and therefore, in comparison, better suited as a proxy measure for potential UX changes since those are often not sufficiently captured by short-term metrics .
Last but not least, let’s have a look at the System Usability Scale (SUS) . The question whether this is a UX metric or not looks relatively straightforward at first — after all, it’s not called the System User Experience Scale — but comes down to something more fundamental: To which extent is usability an aspect of user experience? I’ve had discussions with many people throughout my career who thought usability and UX are simply synonymous. I think I don’t need to elaborate on why that’s obviously wrong. Others hold the opinion that usability is a “subset” of UX, which is probably closer to reality — but this reality turns out to be pretty complicated. Law et al. describe that usability “focuses primarily on user cognition and user performance in human-technology interactions”  while UX “highlights non-utilitarian aspects of such interactions, shifting the focus to user affect, sensation, and the meaning as well as value of such interactions in everyday life” ; and that “UX is seen as something desirable, though what exactly something means remains open and debatable” . In line with this, Hassenzahl found that UX has two dimensions, a pragmatic and a hedonic one, which people usually consider unrelated . The former supports “do-goals” (like functionally completing a task) and the latter “be-goals” (like “being special”) . Usability is located in the pragmatic dimension , which means that by simply measuring usability based on SUS you capture less than “half” of what UX comprises. On top, there are cases in which good usability is not a prerequisite for good UX . This, however, probably doesn’t apply in e-commerce, to be fair. To get to the point, SUS does fulfill the definition of a UX metric, since it measures an aspect of UX. Yet, if you’re asking your users to fill out a questionnaire anyway, why not just use one that has been designed to capture UX as a whole, such as AttrakDiff  or UEQ ?
To be honest, this article became much longer than expected. My initial idea was to write a couple of sentences about why CR and AOV are not UX metrics, but somehow this has turned into a fully-fledged review of six commonly used measures and even partly gets to the heart of what UX really is. Originally, my plan even comprised defining heuristics for easily identifying valid measures that actually capture UX. However, I’ve decided to dedicate a different article to that.
To conclude, there are various metrics in use in industry that supposedly measure UX. Two particularly popular ones — especially but not exclusively in A/B testing — are CR and AOV. Those, however, cannot reliably reflect UX since they — just like NPS — do not directly measure aspects of UX, but rather can to a certain extent be influenced by changes in the latter. Measures that do fulfill the considered definition of a UX metric are task success, time on task, and SUS, which are, however, prone to misinterpretations and very limited in the aspects of UX they capture. None of the six discussed metrics can reliably measure UX to its full extent. If one had to strictly choose from just those, a combination of task success (as the bottom line), SUS, and NPS might be the best choice. However, one should rather rely on an instrument specifically developed to measure UX in its entirety. More on that in the next article.