User Experience Guidelines Metrics

From IDESG Wiki
Jump to: navigation, search


The contents of this page are meant to provide both practical examples of usability and guidance that can be adapted by participants of the ecosystem and systems administrators to fit their specific circumstances.

UX Metrics should enable measurement of the evolving baseline for participation in the Identity Ecosystem.

The contents of this page are based upon evolving requirements for IDESG participants.

What is the baseline?

Improving the security, privacy, usability, and interoperability of everyday online transactions

What benefits could the everyday consumer see if this baseline was established?

e.g., reduced account compromise through increased use of multifactor authentication; greater user control through notice, consent requirements; etc.


The following are some methodologies for measuring user experience to determine if the baseline requirements are met. Note that the first metrics are for overall usability that will indicate compliance with the terms of the IDESG requirements and the Framework. Obtaining quantitative metrics via surveys will reveal the quality of end user's experience. Qualitative research will provide an understanding of why users are making choices and what they understand about a system and the choices presented. Heuristics evaluations will provide a better understanding of system errors and interfaces.

How are UX metrics obtained

Metrics are quantitative measures that can be tracked over time. They come from questions presented to users and from the evaluation of direct observations of users on particular sites. The Wikipedia entry on User Experience Evaluation (at this site [[1]]) defines these terms: "User experience (UX) evaluation or User experience assessment (UXA) -- which refer to a collection of methods, skills and tools utilized to uncover how a person perceives a system (product, service, non-commercial item, or a combination of them) before, during and after interacting with it. It is non-trivial to assess user experience since user experience is subjective, context-dependent and dynamic over time."

Since the evaluation of the user is dependent on the specific UX presented to the user on a specific site, the broad measure of the success of the whole ecosystem will only be measurable when multiple sites supporting the IDESG ecosystem are widely available and questions about the overall experience are possible. In the meantime, specific implementations of web sites supporting the IDESG are encouraged. The metrics provider here should be able to act as a base set that would allow comparison between different researcher's results. A common base set of metrics would allow the UXC to use independently generated research reports in the compilation of an ecosystem wide report.

All measurements below (except the verbatim data) should have qualitative and quantitative ways to measure success from users. Follow up measurements should be included over time.

Guidelines of Behavior for Designing Surveys and Evaluating Results

Usability researchers should adhere to Institutional Review Board requirements for the treatment of users and user collected information. Some of the guidance below is based upon the IRB language, which may be downloaded here: Institutional Review Board Guidebook. "Chapter 3, Section A: Risk/Benefit Analysis.” pp. 1-10, Washington D.C. : U.S. Department of Health and Human Services, The Office of Human Research Protection, 1993.

Survey guidance

1. Consent to participate must be voluntarily given, as well as the option to decline participation in the survey.
2. Individuals should be able to exit the survey at any time.
3. Survey questions should be comprehensible by a wide audience.
4. Surveys should be concise and easy to answer.
5. Surveys should be open to all users to ensure the widest possible range of individual respondents.
6. Surveys should include an introductory explanation of how the user's personal information and responses will be treated, what level confidentiality they can expect and links to privacy policy, trust frameworks and other policies that govern collection of personal information.
7. Any results of surveys should be aggregated and depersonalized to protect the participant's privacy.
8. Results of the survey could be made available as long as privacy of participants can be insured by the system.

Possible Quantitative Usability Methods and Suggestions for Evaluation of Results

Measurements (Quantitative)

Quantitative questions below are meant to show possible queries for evaluating the success of a system, followed by possible success metrics. NOTE: these queries may include reviews of system logs as well as direct user surveys. It is important to understand that this kind of review of what happened does not necessarily lead to understanding why a user chose to do one thing versus another. This sort of inquiry only allows for understanding what and possibly how things are happening in a system.

  1. Can the user accomplish the task set out to accomplish? (A goal might be 90%, a minimum acceptable might be 60%)
  2. What is the System Usability Scale (John Brooke's The System Usability Survey)
  3. It the Trustmark discoverable and self-describing. (90%, 70%)
  4. Does the user feel safer as a result of the appearance of the Trustmark. (99%, 80% of those answering in the affirmative above.)
  5. Does the user feel that the site is safe overall? (the metric is a comparisons of the positives to the negatives.)
  6. Does the user understand the necessity for a strong identity for their providers?
  7. Does the user know whether the identity of the provider is strongly bound to a real-world entity?
  8. Collected verbatim responses used for site improvement.

The System Usability Scale (Survey)

The The System Usability Survey (SUS) is a suggested 10 item questionnaire where the typical 5-response option might be used to ask for feedback. For example, users could be asked to provide answers on a scale from 1-5, where 1 represents "success" or "good" experience and 5 represents "failure" or a "poor" experience, with specific descriptions of these success and failure marks tuned to the system and questions asked. We recommend that if surveys where the questions and problems are "supplied" to the user, that each question provide a large "share more information" box with each question. This is so that if users see a different set of problems than those being asked about, they will have an opportunity to share their perspective.

If that recommendation is followed, we would recommend that open ended answers be codified and categorized, and surveys be adjusted in order to incorporate the user perspective as much as possible. Taking this idea one step further, "surveys" could be developed that ask less specific questions and instead allow the user to submit problems without prompting in order that feedback directly relates to problems the users perceive verses on that which the company wants to focus surveys. Additionally, providers could support a continual feedback system that allowed users to submit whatever they want, whenever they want, and then categorize responses and submit user feedback to IDESG.

However, IDESG could also establish a constant set of survey questions for comparison of success metrics across vendors. The survey below could be used to create that common survey system for comparison.

Sample questions for surveys

  1. I would like to use this system frequently.
  2. I found the system unnecessarily complex.
  3. I thought the system was easy to use.
  4. I would need the support of a technical person to be able to use this system.
  5. I found the various functions in this system were well integrated.
  6. I thought there was too much inconsistency in this system.
  7. I would imagine that most people would learn to use this system very quickly.
  8. I found the system very cumbersome to use.
  9. I felt very confident using the system.
  10. I needed to learn a lot of things before I could get going with this system.

Usability Heuristics

According to the Nielsen/Norman Group, "Heuristic evaluation (Nielsen and Molich, 1990; Nielsen 1994) is a usability engineering method for finding the usability problems in a user interface design so that they can be attended to as part of an iterative design process. Heuristic evaluation involves having a small set of evaluators examine the interface and judge its compliance with recognized usability principles (the "heuristics")." Nielsen states that "They are called "heuristics" because they are broad rules of thumb and not specific usability guidelines."

In the context of website and application UX, this would mean the user interface of the identity system, but it might also include any cross channel interaction and "experience" such as brand, trustmark, paper correspondence (such as the acknowledgements we sometime get in the mail about online transactions), advertisement/solicitations, etc. and how well it integrates with provider and other external/3rd party systems.

Following are Jakob Nielsen's 10 general principles for interaction design:

  1. Visibility of system status: The system should always keep users informed about what is going on, through appropriate feedback within reasonable time.
  2. Match between system and the real world: The system should speak the users' language, with words, phrases and concepts familiar to the user, rather than system-oriented terms. Follow real-world conventions, making information appear in a natural and logical order.
  3. User control and freedom: Users often choose system functions by mistake and will need a clearly marked "emergency exit" to leave the unwanted state without having to go through an extended dialogue. Support undo and redo.
  4. Consistency and standard: Users should not have to wonder whether different words, situations, or actions mean the same thing. Follow platform conventions.
  5. Error prevention: Even better than good error messages is a careful design which prevents a problem from occurring in the first place. Either eliminate error-prone conditions or check for them and present users with a confirmation option before they commit to the action.
  6. Recognition rather than recall: Minimize the user's memory load by making objects, actions, and options visible. The user should not have to remember information from one part of the dialogue to another. Instructions for use of the system should be visible or easily retrievable whenever appropriate. (Read full article on recognition vs. recall in UX.)
  7. Flexibility and efficiency of use: Accelerators -- unseen by the novice user -- may often speed up the interaction for the expert user such that the system can cater to both inexperienced and experienced users. Allow users to tailor frequent actions.
  8. Aesthetic and minimalist design: Dialogues should not contain information which is irrelevant or rarely needed. Every extra unit of information in a dialogue competes with the relevant units of information and diminishes their relative visibility.
  9. Help users recognize, diagnose, and recover from errors: Error messages should be expressed in plain language (no codes), precisely indicate the problem, and constructively suggest a solution.
  10. Help and documentation: Even though it is better if the system can be used without documentation, it may be necessary to provide help and documentation. Any such information should be easy to search, focused on the user's task, list concrete steps to be carried out, and not be too large.

Severity Ratings

Severity ratings can be used to determine where to allocate resources to fix the most serious problems discovered in the heuristic analysis and can also help estimate of the need for additional usability efforts. Ratings that indicate a disastrous set of usability problems may indicate that the product should not be released; however, a product may be released if it's problems are judged to be primarily cosmetic.

  • The severity of a usability problem is a combination of three factors:
  • The frequency with which the problem occurs: Is it common or rare?
  • The impact of the problem if it occurs: Will it be easy or difficult for the users to overcome?
  • The persistence of the problem: Is it a one-time problem that users can overcome once they know about it or will users repeatedly be bothered by the problem?
  • The market impact of the problem: what effect does the problem have on the popularity of a product, even if they are "objectively" quite easy to overcome.

Even though severity has several components, it is common to combine all aspects of severity in a single severity rating as an overall assessment of each usability problem in order to facilitate prioritizing and decision-making.

The following 0 to 4 rating scale can be used to rate the severity of usability problems:

0 = I don't agree that this is a usability problem at all
1 = Cosmetic problem only: need not be fixed unless extra time is available on project
2 = Minor usability problem: fixing this should be given low priority
3 = Major usability problem: important to fix, so should be given high priority
4 = Usability catastrophe: imperative to fix this before product can be released

NOTE: some heuristics systems use different ratings scales. We recommend using a consistent scale for all reviews and systems.

Guidelines of Behavior for Designing Qualitative Research

Qualitative Research can be conducted in person through interviews and task assessments, through journal entry, via group discussion and through observations in personal settings. Qualitative research can involve Ethnographic Research which is generally a longer term research around cultural phenomena. Due to the personal nature of all these types of Qualitative Research, some guidelines are included below:

Additional links to help research:

NOTE: these links didn't resolve to anything.

User Interviews (Qualitative Research Including Ethnographic Studies)

Qualitative Research generally involves direct in-person interviews with human subjects, and as such requires care with respect to personal information and user stories shared. Some organizations use formal academic IRB standards and certification, and generally erring on the side of personal information privacy and confidentiality is preferred when conducting direct interviews with subjects or collecting other personal information.

Qualitative Research is conducted in order to gather in-depth information about the reasons that subjects choose one path or method over another, how they understand systems one way versus another, etc. It is conducted in an effort to get at the deeper issues in a system, that cannot be understood just by reviewing usage logs or via survey data which rarely explains why or how people understand a system. For example, often subjects develop work-arounds for systems that don't work, and it is through watching them work that the original problem the work-around is meant to solve becomes apparent. It is this understanding that often isn't quite clear to the subject that is often discovered through interviews.

Sample User Research Study

A qualitative study of three users could be designed as follows:

  • Identifying three users who are target testing subjects by criteria developed to meet goals for understanding the system.
  • Initial survey of 10 questions developed or used from standard quantitative work.
  • Three tasks designed around use of the system where questions about why users are choosing elements of the system and how they understand the specific tasks
  • Each user would be invited in, meet with a tester, along with a video system or another note taker.
  • Analysis of the results written up, reviewed, and if video is taken, a report including video clips developed.

Other Resources

For guidance on how to evaluate and identify issues, please see our Identity Design Patterns.