Applying the AI Assessment Scale (AIAS): A Step-by-Step Guide for auditing and updating assessment tasks

Generative Artificial Intelligence (GenAI) has created both opportunities and challenges for assessment design and academic integrity. The AI Assessment Scale (AIAS), developed by Perkins, Furze, Roe, and MacVaugh, provides a practical framework to guide educators in making purposeful, evidence-based decisions about appropriate AI use in assessments. Rather than treating AI as a threat to be policed through detection tools, the AIAS reframes the conversation around assessment validity and purposeful design.

This resource is designed for academic staff who wish to systematically audit their existing assessment tasks and redesign them to maintain validity in an AI-abundant educational environment. The AIAS is not simply a labelling system to be retrofitted onto existing assessments; it is fundamentally an assessment design tool that requires educators to reconsider what, how, and under what conditions they assess student learning. The framework comprises five non-hierarchical levels:

No AI (Level 1),
AI Planning (Level 2),
AI Collaboration (Level 3),
Full AI (Level 4), and
AI Exploration (Level 5).

Before doing the steps outlined in this guide, it is critical to understand that validity must precede security concerns. As Dawson et al. (2024) argue, validity matters more than cheating—if an assessment does not accurately measure what students know and can do, concerns about academic misconduct become secondary to the fundamental failure of the assessment itself. Therefore, this guide begins with validity auditing before addressing AI vulnerabilities and redesign strategies. ?

Step 1: Validate the Quality and Appropriateness of Your Current Assessments

Before considering the implications of AI on assessment, you must ensure that your existing assessments are valid—that they accurately measure what you intend them to measure and are fit for purpose. Many assessments suffer from validity problems that predate GenAI, including poor alignment between learning outcomes and assessment tasks, inappropriate assessment modes, and over-reliance on single high-stakes summative tasks.

As Dawson et al. (2024) argue in their article, validity matters more than cheating: if an assessment does not accurately capture what students know and can do, concerns about academic misconduct become secondary to the fundamental failure of the assessment itself. Therefore, the audit process must begin with validation as the foundation before addressing AI vulnerabilities.

Key Validation Criteria for Higher Education Assessments

Drawing on the quality assurance frameworks used in higher education (TEQSA) consider the following dimensions when auditing your assessments:

Alignment with Learning Outcomes: Do assessment tasks directly align with the subject’s intended learning outcomes? Every assessment criterion should map explicitly to one or more learning outcomes. If you are assessing material that was not explicitly taught, or if core learning outcomes remain unassessed, your assessment lacks content validity.
Appropriateness of Design: Are tasks underpinned by suitable expectations for the year level, weighting, and subject content? Does the assessment format align with the cognitive level and complexity expected at this stage of the degree program?
Construct Validity: Does the assessment use the most appropriate method to measure the intended knowledge or skills? For example, if you aim to assess critical thinking, would a multiple-choice exam, an essay, an oral defence, or an applied problem-solving task best capture that construct?
Clarity and Accuracy: Are assessment instructions, criteria, and rubrics free from ambiguities, grammatical errors, and spelling mistakes? Poorly worded task sheets create confusion, disadvantage students, and undermine the validity of results.
Feasibility: Are timeframes set for all assessments reasonable and achievable for students? Unrealistic timelines compromise student wellbeing and can force students to rely excessively on shortcuts, including AI tools, to meet deadlines.
Transparency in Marking: Are marking criteria within rubrics clear and explicitly linked to grading standards? Students should be able to understand what constitutes excellent, satisfactory, or unsatisfactory performance.
Equity: Are assessment conditions substantially the same for all students, regardless of delivery mode, location, or individual circumstances? If some students have access to premium AI tools while others rely on free versions, the assessment may create unfair advantages.

Practical Actions for Step 1

Gather all assessment materials for your subject, including task sheets, rubrics, marking criteria, and submission guidelines.
Use a formal validation checklist. Institutional assessment validation frameworks provide systematic approaches to quality assurance. Complete the validation checklist for each assessment task, addressing each criterion systematically.
Map each assessment task to specific learning outcomes. Create an alignment matrix with learning outcomes in one column and assessment tasks in another. Identify any gaps—outcomes that are not assessed, or assessments that do not align with stated outcomes.
Evaluate the appropriateness of assessment modes. For each learning outcome, ask: Is this the best way to assess this skill or knowledge? Could an alternative format provide more valid evidence?
Identify over-reliance on single assessment points. Valid assessment requires gathering evidence over time and through multiple methods. Multiple lower-stakes assessment points provide more accurate evidence of student learning and reduce the incentive for academic misconduct.
Conduct a “red flag” audit. Furze (2024) identifies specific red flags in assessment design: tasks completed over extended unsupervised periods without interim checkpoints; rubric criteria that claim to assess skills not actually demonstrated through the task format (e.g., assessing “editing” when only final drafts are submitted); and tasks that attempt to measure too many learning outcomes simultaneously.

Key Reference

Dawson, P., Bearman, M., Dollinger, M., & Boud, D. (2024). Validity matters more than cheating. Assessment & Evaluation in Higher Education, 49(7), 1005–1016. https://doi.org/10.1080/02602938.2024.2386662

Step 2: Identify AI Vulnerabilities in Your Assessments

Once you have established that your assessments are fundamentally valid, the next step is to identify where and how they are vulnerable to GenAI completion or misuse. An assessment is vulnerable to AI when students can use GenAI tools to complete significant portions of the task without demonstrating their own knowledge, understanding, or skills.

Indicators of High AI Vulnerability

Research by Roe, Perkins, and Giray (2025), as well as practical guidance from Furtz (2024), identifies several characteristics that make assessments particularly vulnerable:

Unsupervised completion: Tasks completed entirely outside of class or supervision, with no interim checkpoints or authentication measures.
Text-heavy outputs: Essays, reports, and written reflections that can be generated end-to-end by tools like ChatGPT, Claude, or Gemini.
Generic or formulaic tasks: Assessments that follow predictable structures (e.g., standard five-paragraph essays, SWOT analyses, literature reviews) are easier for AI to complete convincingly.
Limited opportunities for process evidence: When only final products are submitted, with no drafts, notes, or evidence of thinking processes, it becomes impossible to verify student authorship.
Low authenticity: Tasks that do not reflect real-world applications or require contextualized, discipline-specific knowledge are more susceptible to generic AI outputs.

Practical Actions for Step 2

Test your assessments with AI. Input your assessment prompts into ChatGPT, Claude, or another GenAI tool. Evaluate the quality of the output against your rubric. If the AI-generated response would score well, note this as a high-vulnerability task.
Identify unsupervised segments. Highlight any portions of your assessment that students complete without supervision, authentication, or interim evidence submission.
Evaluate task authenticity and specificity. Generic tasks are more vulnerable than tasks requiring integration of course-specific materials, real-world data, or contextualized application.
Consider equity in AI access. If students are permitted to use AI, do all students have equivalent access to the same quality of tools? Unequal access to premium AI models creates fairness issues.
Document your findings. Create a vulnerability matrix that lists each assessment component, its vulnerability level (low, moderate, high), and the reasons for that rating.

Key Reference

Roe, J., Perkins, M., & Giray, L. (2025). Assessment twins: A protocol for AI-vulnerable summative assessment [Preprint]. https://doi.org/10.48550/arXiv.2510.02929

Step 3: Determine the Appropriate AIAS Level for Each Assessment Component

The AIAS provides five levels of AI engagement, each suited to different pedagogical goals and assessment contexts:

Level 1 – No AI: Assessment completed entirely without AI assistance in a controlled environment. This level is appropriate when you need to verify that students possess foundational knowledge or skills independently of AI tools. Examples include in-class discussions, supervised exams, handwritten drafts, oral presentations, and practical demonstrations.

Level 2 – AI Planning: AI may be used for brainstorming, structuring, outlining, initial research, or generating questions. Students must demonstrate how they have developed and refined AI-generated ideas independently. This level is suitable for research tasks, exploratory phases of projects, and early-stage conceptual development.

Level 3 – AI Collaboration: AI may be used to assist with drafting, editing, feedback, and refinement. Students must critically evaluate and modify AI-generated content, demonstrating their own voice, judgment, and understanding. This level is appropriate when AI functions as a writing assistant or feedback tool, but the final work must reflect the student’s knowledge and critical engagement.

Level 4 – Full AI: AI may be used extensively throughout the assessment, with students directing AI to achieve specific goals. Assessments at this level focus on the student’s ability to prompt, evaluate, and orchestrate AI outputs rather than on unaided performance. This level is suitable for tasks that mirror authentic professional practices where AI is a standard tool.

Level 5 – AI Exploration: AI is used creatively and experimentally to enhance problem-solving, generate novel insights, or co-design innovative solutions. This level encourages students to explore the boundaries and possibilities of AI within their discipline. ?

Guiding Principles for Selecting AIAS Levels

Perkins et al. (2024; 2025) emphasize that AIAS levels should be chosen based on learning outcomes and conditions, not arbitrarily assigned. Key principles include:

Outcome alignment: Select the level that best supports the demonstration of specific learning outcomes. If the outcome is “demonstrate independent critical analysis,” a Level 1 or Level 2 task may be more appropriate than Level 4.
Avoid retrofitting: Do not simply label an existing assessment with an AIAS level without redesigning the task, rubric, and student guidance to match that level. Retrofitting undermines validity and creates confusion.
Build evidence over time: Use different AIAS levels across multiple assessment points within a module to create a “Swiss Cheese” model of assessment security. By layering assessments at different levels, you reduce overall vulnerability while allowing appropriate AI use where pedagogically valuable.
Ensure equity: If you permit AI use (Levels 2-5), ensure all students have access to the required tools, or provide institutional access to specific platforms.
Communicate transparently: Clearly explain to students why a particular level has been chosen and what it means for their learning and assessment. ?

Practical Actions for Step 3

Break assessments into discrete tasks. Rather than assigning a single AIAS level to an entire assessment, segment it into distinct phases (e.g., planning, drafting, final submission, reflection) and assign appropriate levels to each phase.
Match levels to learning outcomes. For each learning outcome, ask: Does this outcome require students to demonstrate independent mastery (Level 1), or is it appropriate for them to use AI as a tool (Levels 2-5)
Sequence tasks strategically. Furze (2024) recommends beginning with Level 1 tasks to establish baseline student capability, followed by Level 2 or Level 3 tasks that allow supported development, and culminating in authentic Level 4 or Level 5 tasks where appropriate.
Consult disciplinary norms and professional standards. Some disciplines and professions have specific expectations about AI use. For example, in English as a Foreign Language (EFL) contexts, Roe et al. (2025) recommend prioritising Level 1 for core language production tasks to ensure students demonstrate independent linguistic competence.
Pilot test your level assignments. Before full implementation, pilot your redesigned assessments with a small group of students or colleagues to identify any ambiguities or unintended consequences.

Key Reference

Perkins, M., Furze, L., Roe, J., & MacVaugh, J. (2024). The Artificial Intelligence Assessment Scale (AIAS): A framework for ethical integration of generative AI in educational assessment. Journal of University Teaching and Learning Practice, 21(06), Article 06. https://doi.org/10.53761/q3azde36

Step 4: Redesign Assessment Tasks, Rubrics, and Student Guidance

Once you have determined the appropriate AIAS level for each assessment component, you must redesign the task itself, along with the rubric, instructions, and any supporting materials. Simply declaring a level without structural changes is what Corbin et al. (2025) call a “discursive approach,” which is insufficient for maintaining validity.

Redesigning for Level 1 (No AI)

When assessment must be completed without AI, ensure that the task is conducted under conditions where AI use is not possible or practical:

In-class tasks: Conduct the assessment during scheduled class time without access to digital devices.
Oral components: Use discussions, presentations, or verbal defences that require real-time articulation of ideas.
Supervised written tasks: If written work is required, have students complete drafts or notes by hand in class.
Practical demonstrations: For disciplines involving physical skills or lab work, observe students performing tasks in controlled environments.

Important: Level 1 does not mean “exam only.” Inclusive and valid assessment requires varied formats, as traditional exams may disadvantage students with disabilities, neurodivergence, or other challenges.

Redesigning for Level 2 (AI Planning)

At Level 2, students may use AI for planning, research, and ideation, but must demonstrate how they have developed these ideas independently: ?

Require evidence of AI interaction: Ask students to submit conversation logs, annotated AI outputs, or reflections on how AI suggestions influenced their thinking.
Create interim checkpoints: Collect planning documents or research summaries at specific intervals to verify student engagement with the planning process.
Assess critical evaluation: Rubrics should include criteria for how well students have evaluated, selected, and refined AI-generated ideas.

Example Task Redesign: Instead of “Submit a 2000-word research essay,” redesign as: “Phase 1 (Level 2): Use AI to generate three potential research questions. Submit a 300-word reflection explaining which question you selected and why, with reference to course readings. Phase 2 (Level 1): Draft your introduction in class without AI. Phase 3 (Level 3): Complete your essay using AI for editing support.

Redesigning for Level 3 (AI Collaboration)

At Level 3, students may use AI for drafting and editing, but must demonstrate critical engagement and ownership of the final product:

Require revision trails: Ask students to submit both an AI-assisted draft and a final version with tracked changes or commentary explaining their modifications.
Assess voice and judgment: Rubric criteria should evaluate the student’s voice, critical evaluation of AI content, and integration of course-specific knowledge.
Link to earlier work: Pair Level 3 tasks with Level 1 or Level 2 tasks that establish a baseline of the student’s independent capabilities, so that significant divergences in quality can be identified.

Redesigning for Level 4 (Full AI) and Level 5 (AI Exploration)

At these levels, AI use is unrestricted, and assessment focuses on the student’s ability to direct, evaluate, and synthesize AI outputs: ?

Assess orchestration and judgment: Rubrics should evaluate how effectively students prompt AI, evaluate outputs, integrate multiple sources (AI and human), and make reasoned decisions.
Reflect authentic practice: Use Level 4 and 5 tasks when AI use mirrors real-world professional contexts, such as business report writing, data analysis, or creative design.
Require metacognitive reflection: Ask students to explain their AI use strategies, what worked, what didn’t, and how they ensured quality and accuracy. ?

Updating Rubrics and Criteria

Your assessment rubrics must align with the chosen AIAS level. For example:

Level 1 rubrics should focus entirely on the student’s independent demonstration of knowledge and skills.
Level 2 and 3 rubrics should include explicit criteria for evaluating how students engaged with, modified, and refined AI outputs.
Level 4 and 5 rubrics should assess students’ AI literacy, including their ability to prompt effectively, critically evaluate outputs, and integrate AI into a broader workflow.

Practical Actions for Step 4

Rewrite task instructions to specify the AIAS level and provide explicit guidance on what AI use is permitted and what is not.
Incorporate AI Use Statements. La Trobe University (2025) provides model AI Use Statements that clearly communicate expectations to students. Include these statements in all assessment briefs.
Redesign rubrics to assess the skills and processes relevant to the assigned AIAS level.
Introduce AI Acknowledgement Templates. Require students to complete acknowledgement forms that declare how they used AI in their work. Templates should be specific to the AIAS level (e.g., Planning Acknowledgement for Level 2, Collaboration Acknowledgement for Level 3).
Provide exemplars. Show students examples of work completed at different AIAS levels, with annotations explaining what constitutes appropriate AI use.

Step 5: Implement, Communicate, and Support Students

Successful AIAS implementation requires more than redesigning tasks; it requires building a culture of transparency, academic integrity, and AI literacy.

Communicate Clearly and Early

Students need to understand not only what level of AI use is permitted, but why that level has been chosen and how it supports their learning:

Include AIAS levels in module handbooks and learning management systems. Make AI use expectations visible from the start of the module. ?
Explain the rationale. In your first class or in assessment briefings, explain why different tasks are set at different levels and how this scaffolds their learning.
Address equity concerns. If you permit AI use, inform students of any institutional resources or free tools they can access to ensure equity.

Teach AI Literacy Explicitly

Do not assume students know how to use AI effectively or ethically. Explicitly teach:

How to prompt AI effectively to generate useful, relevant outputs.
How to critically evaluate AI outputs for accuracy, bias, relevance, and alignment with academic standards.
How to integrate AI outputs with human knowledge, avoiding over-reliance or uncritical acceptance.

Roe, Furze, and Perkins (2025) advocate for embedding critical AI literacy into curricula, teaching students to understand AI’s capabilities, limitations, biases, and ethical implications.

Provide Formative Feedback

Use formative assessments (ungraded or low-stakes tasks) at different AIAS levels to help students practice appropriate AI use before summative assessments:

Feedback on AI use: When students submit Level 2 or Level 3 tasks, provide feedback not only on content but also on the quality and appropriateness of their AI engagement.
Model good practice: Share examples of your own AI use in research, teaching preparation, or professional tasks to normalize transparent and ethical AI engagement.

Monitor and Adjust

AIAS implementation is iterative. After the first assessment cycle:

Gather student feedback on clarity of AI use expectations and any challenges they encountered.
Review academic integrity cases to identify any patterns of confusion or misuse.
Adjust task design and rubrics based on observed outcomes.

Collaborate with Colleagues

Assessment redesign is most effective when conducted collaboratively within disciplinary teams or faculties:

Lead from the middle: Perkins et al. (2024) recommend working at the faculty or departmental level, where disciplinary norms and shared understanding can guide implementation.
Share exemplars and rubrics: Develop a repository of redesigned assessments that colleagues can adapt.
Provide professional development: Offer workshops on AIAS principles, validity-focused assessment design, and critical AI literacy.

Practical Actions for Step 5

Update all course documentation (syllabi, module handbooks, LMS pages) to include AIAS levels and AI Use Statements.
Create student-facing resources that explain the AIAS framework in accessible language, with examples relevant to your discipline.
Integrate AI literacy into teaching. Dedicate at least one class session to discussing AI use, demonstrating effective prompting, and critically evaluating AI outputs.
Use AI Acknowledgement Templates for all assessments at Levels 2-5 and make submission of these templates a requirement.
Monitor implementation and iterate. After the first cycle, convene with colleagues to review what worked, what didn’t, and how to refine your approach.

Key Reference

Furze, L., Perkins, M., Roe, J., & MacVaugh, J. (2024). The AI Assessment Scale (AIAS) in action: A pilot implementation of GenAI-supported assessment. Australasian Journal of Educational Technology. https://doi.org/10.14742/ajet.9434

Conclusion: Toward Validity-Cantered Assessment Design

The AIAS is not a silver bullet for addressing GenAI in assessment, nor is it a one-size-fits-all solution. Rather, it is a starting point for meaningful conversations about validity, authenticity, equity, and the role of AI in learning. By following the five-step process outlined in this guide—auditing validity, identifying vulnerabilities, assigning levels, redesigning tasks, and implementing with transparency—academics can create assessments that are robust, fair, and pedagogically sound in an AI-saturated world.

Ultimately, the goal is to integrate AI thoughtfully in ways that enhance rather than undermine student learning. As Dawson et al. (2024) remind us, validity matters more than cheating. If our assessments are valid—if they genuinely measure what students know and can do—then we can be confident in the integrity of our educational outcomes, regardless of the tools students use.

References

Corbin, T., Dawson, P., & Liu, D. (2025). Talk is cheap: Why structural assessment changes are needed for a time of GenAI. Assessment & Evaluation in Higher Education, 50, 1–11. https://doi.org/10.1080/02602938.2025.2503964

Perkins, M., Jasper, R., Furze, L., & Roe, J. (2025). Reimagining the Artificial Intelligence Assessment Scale: A refined framework for educational assessment. Journal of University Teaching and Learning Practice. https://doi.org/10.53761/rrm4y757

Roe, J., Furze, L., & Perkins, M. (2025a). Digital plastic: A metaphorical framework for critical AI literacy in the multiliteracies era. Pedagogies: An International Journal, 1–15. https://doi.org/10.1080/1554480X.2025.2557491

Roe, J., Furze, L., & Perkins, M. (2025b). Reflecting reality, amplifying bias? Using metaphors to teach critical AI literacy. Journal of Interactive Media in Education, 1–15. https://doi.org/10.5334/jime.961

Roe, J., Perkins, M., & Giray, L. (2025). Assessment twins: A protocol for AI-vulnerable summative assessment [Preprint]. https://doi.org/10.48550/arXiv.2510.02929

AI Use Acknowledgement

This resource, “Applying the AI Assessment Scale (AIAS) in Your Classroom: A Step-by-Step Guide for Auditing and Updating Assessment Tasks”, was created with the assistance of Perplexity AI, the AI-powered research and writing tool.

How AI Was Used

Research and Information Gathering (AIAS Level 2 – AI Planning & Level 3 – AI Collaboration):

Perplexity AI was used to search, retrieve, and synthesise information from uploaded academic documents, including research articles by Perkins et al., Roe et al., Fertz, and institutional guidance from La Trobe University.
The AI tool assisted in identifying relevant sections across multiple PDF documents, extracting key concepts, and organising information thematically.
Web searches were conducted using Perplexity AI to locate additional scholarly sources and current implementation examples of the AIAS framework.

Content Structuring and Drafting (AIAS Level 3 – AI Collaboration):

Perplexity AI generated the initial draft structure based on the user’s request for a 5-page, step-by-step resource for academics.
The AI synthesised information from multiple sources to create coherent sections aligned with the five-step implementation process.
All citations were formatted in APA 7th edition style by the AI tool based on source materials provided.

Refinement and Adaptation (Human Oversight):

The author reviewed all AI-generated content for accuracy, relevance, and alignment with the intended audience (academic staff auditing assessments).
The structure was refined based on the author’s pedagogical expertise and understanding of assessment design principles.
All references were verified against original sources to ensure citation accuracy and appropriate attribution.

AI Tools Used

Perplexity AI (Advanced Research Model, October 2025)

Transparency Statement

This work represents a collaboration between human expertise and AI assistance. The AI functioned as a research assistant and drafting tool, while the author-maintained responsibility for content validity, pedagogical appropriateness, and final editorial decisions. All information is grounded in peer-reviewed literature and institutional policies, with appropriate citations provided throughout.

This acknowledgement statement itself reflects the principles outlined in the resource: transparent declaration of AI use, critical evaluation of AI outputs, and human accountability for the final work product.

Comments

One response to “Applying the AI Assessment Scale (AIAS): A Step-by-Step Guide for auditing and updating assessment tasks”

Editors’ Choice: Applying the AI Assessment Scale (AIAS): A Step-by-Step Guide for auditing and updating assessment tasks – Digital Humanities Now

October 30, 2025

[…] CraigOctober 29, […]

Applying the AI Assessment Scale (AIAS): A Step-by-Step Guide for auditing and updating assessment tasks

Comments

One response to “Applying the AI Assessment Scale (AIAS): A Step-by-Step Guide for auditing and updating assessment tasks”

Leave a ReplyCancel reply