Experts from the industry write about the latest trends.
Read more in the SQ mag.
Artificial Intelligence (AI) has become the new electricity or a "must-have" for every business. The increasing uptake of AI technologies is unlocking its true potential and delivering efficiencies in many domains, not just in the cutting-edge applications we hear about in the press, but in our everyday lives, on devices in your home, on your phone, and in the workplace.
AI can be loosely defined as applying acquired knowledge to make decisions, in contrast to using explicit logic. This presents both opportunities and problems for the field of software and systems quality. AI can greatly enhance test automation, but many are grappling with challenges and managing the emerging and novel risks associated with integrating AI components. One of the most challenging aspects is the imperfection and difficulty in reproducing and explaining results.
I became interested in this topic a few years ago, as my company, Dragonfly, started building a product. We built some natural language processing and machine learning into neuro, and I started getting interested in this from a number of perspectives of how do we test AI? How do we build a trustworthy personality? People want AI systems to be trustworthy, dependable, reliable. When you get to the bottom of what trustworthy really means, it’s mostly about quality.
Artificial intelligence can include symbolic rule-based expert knowledge systems and sub-symbolic machine learning systems. Machine learning is the most common AI method, and it is difficult to specify quality, and analyses how to test. It's not just QA specialists that think so either, research in Japan involving 278 machine learning engineers identified the biggest new challenges they face integrating machine learning is in decision making with customers and testing/quality assurance. Further, they identify the lack of an oracle, and imperfection as the top causes of this. [1]
I identified ten quality problems that QA and testing specialists need to think about when they’re working with AI:
AI, particularly machine learning, can also be defined automation with less or no specification. If there is less specification required to produce systems how will we define tests? If we aren’t going to explicitly say what the software is supposed to do, how are we going to know whether it’s right?
If we don’t know the answer, and we are writing the program to find out the answer, how will we test it?
If this program has complex real-world sensors, how can we predict and synthesise all the ranges of inputs? How can we measure test coverage in a complex data-driven machine learning system? In many safety conscious sectors it is necessary to ensure a high level of coverage, but this isn't as easy as lines of code.
If a system optimises itself, how will testing it, change it? Managing the interactions and data that a system has been exposed to is as important as managing your test environment, in fact it is part of your test environment.
If this system is intended to mimic human capabilities, how are those human capabilities specified? If you can’t define human capabilities, and if you can’t define broad concepts like human intellect, you’re going to have trouble validating them.
Any general-purpose algorithm is equivalent when it’s efficiency and accuracy are assessed across all possible problems. The wider the problem space, the harder it is for contemporary, narrow-AI to perform, and the broader the test scenarios need to be.
All ML models are biased based on the data used to train them, these biases are quality issues first and foremost. AI systems usually learn from the training dataset provided by humans, and there will always be an intrinsic bias in that data (mostly because it was provided by humans). Diversified datasets and diversity in developing and testing teams helps to reduce the selection and confirmation bias.
The correlation between inputs and outputs change over time in the real-world, some systems adjust for this and some systems don't. How can testing evaluate this through validation processes that involve real-world use? Even after you take an algorithm live, you need to constantly reevaluate it, constantly feed new observations into your test process.
Defining ethical quality requirements is incredibly hard. Especially globally. However, there are real risks especially around privacy. When you combine machine learning with personal data, there are some significant and unique risks that can manifest.
What if AI systems are trusted too much? What if humans accept all the recommendations blindly, can this reduce the quality-in-use of the system?
Even more problems are present when you touch on areas of physical actuation in the world, and safety, such as with semi-autonomous vehicles and robots. There is significant need in industry for new guidance and best practice, ranging from how we specify acceptance criteria through to how we generate test data for machine learning. Of course, the field is still developing rapidly, so many of the answers about how to manage these issues are still evolving in parallel. DIN, the German national standards body, released a new standard for an AI Quality Meta Model in April 2019, that starts to address some of the new quality characteristics of AI systems. There's also foundation level training for testers available from A4Q and iSQI, and working groups in standardisation bodies such as ISO/IEC, working on further reports and standards in the quality and testing field in the context of AI.
This is a fascinating field to watch develop, as it is rare that an emerging technology appears set to disrupt verification and validation techniques so much.
Adam Leon Smith is CTO of Dragonfly, and a quality and testing specialist. He is also Chair of the British Computer Society’s Special Interest Group in Software Testing, and is leading the development of the first ISO/IEC technical report on Bias in AI systems and AI aided decision making.
Dragonfly – www.wearedragonfly.co
British Computer Society's Special Interest Group in Software Testing – https://www.bcs.org/membership/member-communities/software-testing-specialist-group/
ISO/IEC technical report – https://www.iso.org/standard/77607.html
Twitter: @adamleonsmith
To see all of our certifications, please visit our website www.isqi.org
[1] ISHIKAWA, Fuyuki and YOSHIOKA, Nobukazu, 2019. How Do Engineers Perceive Difficulties in Engineering of Machine-Learning Systems? – Questionnaire Survey. In: 2019 IEEE/ACM Joint 7th International Workshop on Conducting Empirical Studies in Industry (CESI) and 6th International Workshop on Software Engineering Research and Industrial Practice (SER&IP) [online]. Montreal, QC, Canada: IEEE. May 2019. p. 29. [Accessed 1 February 2020]. ISBN 978-1-72812-264-9. Available from:https://ieeexplore.ieee.org/document/8836142/