User Experience Guidelines Metrics

From IDESG Wiki
Revision as of 21:50, 14 March 2015 by Mary Hodder (talk | contribs) (formatting fix)
Jump to navigation Jump to search

Introduction

UX Metrics should enable measurement of the evolving baseline for participation in the Identity Ecosystem. NOTE: prior work on this document can be found here: User_Experience_Trust_Metrics

The requirements for this page are based on a TFTM presentation of October 2014 and are intended as input to the TFTM process. The following questions are taken from that presentation.

What is the baseline?

Improving the security, privacy, usability, and interoperability of everyday online transactions

What benefits could the everyday consumer see if this baseline was established?

e.g., reduced account compromise through increased use of multifactor authentication; greater user control through notice, consent requirements; etc.

Content

The following are some methodologies for measuring user experience to determine if the base line requirements are met. Note that the first metrics are for overall usability moving into the trust measurements that will indicate compliance of a particular implementation with the terms of the IDESG or of the Framework. Obtaining metrics via quantitive form will answer questions about what is happening and sometime how it happens. Qualitative research will help understand why users are making choices and what they understand about a system and the choices presented.

How are UX metrics obtained

Metrics are quantitative measures that can be tracked over time. They come from questions presented to users and from the evaluation of direct observations of users on particular sites. The Wikipedia entry on User Experience Evaluation (at this site [[1]]) defines these terms: "User experience (UX) evaluation or User experience assessment (UXA) -- which refer to a collection of methods, skills and tools utilized to uncover how a person perceives a system (product, service, non-commercial item, or a combination of them) before, during and after interacting with it. It is non-trivial to assess user experience since user experience is subjective, context-dependent and dynamic over time."

Since the evaluation of the user is dependent on the specific UX presented to the user on a specific site, the broad measure of the success of the whole ecosystem will only be measurable when multiple sites supporting the IDESG ecosystem are widely available and questions about the overall experience are possible. In the meantime, specific implementations of web sites supporting the IDESG are encouraged. The metrics provider here should be able to act as a base set that would allow comparison between different researcher's results. A common base set of metrics would allow the UXC to use independently generated research reports in the compilation of an ecosystem wide report.

Template:All measurements below (except the verbatim data) should have qualitative and quantitative ways to measure success from users. Follow up measurements should be included over time.

Guidelines of Behavior for Designing Surveys and Evaluating Results

Usability researchers should adhere to Institutional Review Board requirements for the treatment of users and user collected information. Some of the guidance below is based upon the IRB language, which may be downloaded here: The Office of Human Research Protection. Institutional Review Board Guidebook. "Chapter 3, Section A: Risk/Benefit Analysis.” pp. 1-10, http://www.saylor.org/site/wp-content/uploads/2011/08/PSYCH202A-3.1.4-Institutional-Review-Board.pdf.

===Survey guidance===
1. Consent to participate must be voluntarily given, as well as the option to decline participation in the survey.
2. Individuals should be able to exit the survey at any time.
3. Survey questions should be comprehensible by a wide audience.
4. Surveys should be concise and easy to answer.
5. Surveys should be open to all users to ensure the widest possible range of individual respondents.
6. Surveys should include an introductory explanation of how the user's personal information and responses will be treated, what level confidentiality they can expect and links to privacy policy, trust frameworks and other policies that govern collection of personal information.
7. Any results of surveys should be aggregated and depersonalized to protect the participant's privacy.
8. Results of the survey could be made available as long as privacy of participants can be insured by the system.

Possible Quantitative Usability Methods and Suggestions for Evaluation of Results

Measurements (Quantitative)

Quantitative questions below are meant to show possible queries for evaluating the success of a system, followed by possible success metrics. NOTE: these queries may include reviews of system logs as well as direct user surveys. It is important to understand that this kind of review of what happened does not necessarily lead to understanding why a user chose to do one thing verses another. This sort of inquiry only allows for understanding what and possibly how things are happening in a system.

  1. Can the user accomplish the task set out to accomplish? (A goal might be 90%, a minimum acceptable might be 60%)
  2. What is the System Usability Scale (John Brooke's SUS)
  3. It the Trustmark discoverable and self-describing. (90%, 70%)
  4. Does the user feel safer as a result of the appearance of the Trustmark. (99%, 80% of those answering in the affirmative above.)
  5. Does the user feel that the site is safe overall? (the metric is a comparisons of the positives to the negatives.)
  6. Does the user understand the necessity for a strong identity for their providers?
  7. Does the user know whether the identity of the provider is strongly bound to a real-world entity?
  8. Collected verbatim responses used for site improvement.

The System Usability Scale (Survey)

The SUS is a suggested 10 item questionnaire where the typical 5-response option might be used to ask for feedback. For example, users could be asked to provide answers on a scale from 1-5, where 1 represents "success" or "good" experience and 5 represents "failure" or a "poor" experience, with specific descriptions of these success and failure marks tuned to the system and questions asked. We recommend that if surveys where the questions and problems are "supplied" to the user, that each question provide a large "share more information" box with each question. This is so that if users see a different set of problems than those being asked about, they will have an opportunity to share their perspective.

If that recommendation is followed, we would recommend that open ended answers be codified and categorized, and surveys be adjusted in order to incorporate the user perspective as much as possible. Taking this idea one step further, "surveys" could be developed that ask less specific questions and instead allow the user to submit problems without prompting in order that feedback directly relates to problems the users perceive verses on that which the company wants to focus surveys. Additionally, providers could support a continual feedback system that allowed users to submit whatever they want, whenever they want, and then categorize responses and submit user feedback to IDESG.

However, IDESG could also establish a constant set of survey questions for comparison of success metrics across vendors. The survey below could be used to create that common survey system for comparison.

Sample questions for surveys

  1. I would like to use this system frequently.
  2. I found the system unnecessarily complex.
  3. I thought the system was easy to use.
  4. I would need the support of a technical person to be able to use this system.
  5. I found the various functions in this system were well integrated.
  6. I thought there was too much inconsistency in this system.
  7. I would imagine that most people would learn to use this system very quickly.
  8. I found the system very cumbersome to use.
  9. I felt very confident using the system.
  10. I needed to learn a lot of things before I could get going with this system.

Guidelines of Behavior for Designing Qualitative Research

Qualitative Research can be conducted in person through interviews and task assessments, through journal entry, via group discussion and through observations in personal settings. Qualitative research can involve Ethnographic Research which is generally a longer term research around cultural phenomena. Due to the personal nature of all these types of Qualitative Research, some guidelines are included below:

Additional links to help research:

NOTE: these links didn't resolve to anything. http://abledata.com/abledata.cfm?CFID=83403419&CFTOKEN=2eec2aeffb602fc3-C88E7ADB-9730-C534-AC45C274BDDACBCC
http://abledata.com/abledata.cfm?pageid=113709&ksectionid=19327

User Interviews (Qualitative Research Including Ethnographic Studies)

Qualitative Research generally involves direct in-person interviews with human subjects, and as such requires care with respect to personal information and user stories shared. Some organizations use formal academic IRB standards and certification, and generally erring on the side of personal information privacy and confidentiality is preferred when conducting direct interviews with subjects or collecting other personal information.

Qualitative Research is conducted in order to gather in-depth information about the reasons subjects choose one path or method over another, understand systems one way verses another, etc. It is conducted in an effort to get at the deeper issues in a system, that cannot be understood just by reviewing usage logs or via survey data which rarely explains why or how people understand a system. Additionally, often subjects develop work-arounds for systems that don't work, and it is through watching them work that the original problem the work-around is meant to solve becomes apparent. It is this understanding that often isn't quite clear to the subject that is often discovered through interviews.

Sample User Research Study

A qualitative study of three users could be designed as follows:

  • Identifying three users who are target testing subjects by criteria developed to meet goals for understanding the system.
  • Initial survey of 10 questions developed or used from standard quantitative work.
  • Three tasks designed around use of the system where questions about why users are choosing elements of the system and how they understand the specific tasks
  • Each user would be invited in, meet with a tester, along with a video system or another note taker.
  • Analysis of the results written up, reviewed, and if video is taken, a report including video clips developed.