Reflections on GenAI Coding Tools in Creative Coding Curriculums

TL;DR: As part of my PgCert Academic Practice project, I made a customised chatbot in VS Code and a checklist for citing AI coding tools in coding assignments, with the aim of reducing technical barriers to AI literacy and fair assessments. In the end I suggest some practical considerations on how to actually embrace AI tools in coding assessments.

As a researcher / creative coder in AI and music technologies myself, AI coding tools like GitHub Copilot have become part of my everyday programming workflow.

However, as a lecturer, the explosive appearance of AI tools is bringing me more challenges than opportunities. In the last academic year, this shift has been particularly pronounced. Students are no longer using AI only for writing text and essays, but for core technical practices: coding, debugging, setting up environments, explaining error messages, etc.

Trying to articulate the problem space

The debate on acceptance and resistance toward AI (in particular, coding tools like Copilot) in technical curriculums is polarising:

(a) The ability to use AI coding tools can be desired in future workspaces in the industry. Considering future employability, some hope to encourage the use of AI tools in students’ programming workflow (Shukla et al., 2025), and
(b) The misuse of AI coding tools (e.g., shortcut learning, misconduct) has to be avoided, and ensure that the learning outcomes regarding technical programming skills can be accurately assessed.

Most critically, in the context of technical coding environments, the concept of AI-literacy can embed inequalities and assumptions (Prabhudesai et al., 2025). The ability to curate/review AI’s output assumes a level of technical coding skill in the first place, and not all learners can be confident enough to question the output of AI, especially when overwhelmed by the sheer volume of code Copilot tends to produce, plus all the media hypes.

Knowing the university-wide guidelines Student guide to generative AI (and several other policy documents, see more of that in my blog post), a recurring issue is that most AI guidance is simply not tailored to writing code. Asking students to “keep a log of AI use” sounds reasonable until you ask what that actually means in practice. When should I log something? What level of detail is appropriate? Is fixing a syntax error worth reporting? What does a good log look like? It becomes clear that generic guidance alone is not enough.

Two interventions in a test unit

During the PgCert Action Research Project cycle, I led the BSc unit Critical 1: Mathematics and Statistics for Data Science. I took it as a test unit for two interventions around the use of GenAI for coding:

1. A “Rubber Duck” Custom Agents in GitHub Copilot

A customised agent (used to be called chat mode in 2025) is a feature in Copilot that allows programmers to tailor the behaviour of Copilot to fit specialised roles. Quote from VS Code: “a planning agent could instruct the AI to collect project context and generate a detailed implementation plan”.

Inspired by rubber duck debugging, in which programmers think aloud to debug, what if we could create a customised chatbot that acts as a rubber duck? That is, a chat mode in Copilot that encourages reflection, breaks tasks down, negotiates plans, works step-by-step, and prompts users to review and revise.

A detailed write-up of the prototype can be found at my blog post, here’s a side-by-side of the same prompt and context, but on the left it’s the default Copilot agent, on the right it’s the rubber duck:

A side-by-side comparison of interactions with the **default chat mode (on the left)** and interactions with the **Rubber Duck chat mode (on the right)** I made, with annotations.

2. A GenAI Checklist (Chat Log Template)

Students who chose to use AI coding tools were asked to submit a structured reflection alongside their code.

The checklist is tailored from the UAL Student guide to generative AI, includes detailed guidelines on how to use AI coding tools in an academic context, and how “keeping track of processes” might look like in coding and programming with AI. Practically, it gives a set of template questions:

What did you ask the AI to do?
What did it attempt or add?
What did you keep, change, discard — and what did you learn?

The intent was not to document everything, but to create a pause: a moment where students step back from the AI interaction and articulate their own judgement.

genai-checklist-printed-1 Download

Rollout of the interventions and some discussion with students

Both tools were introduced through a one-hour GenAI workshop in class time, combining discussion, instructions, and some hands-on exercises on working with Copilot.

The GenAI workshop with students in the Math&Stats unit.

Miro responses to the question “What did you use AI for? (What was the model? What questions/prompts did you ask? How did it go?)”

Miro responses to the question “What excites you about using AI in the university? What worries you about using AI in the university?”

A note on research ethics: The project is conducted in keeping with the PgCert Ethical Action Plan. Students who chose to anonymously participate in the project are properly briefed and debriefed, with a participant information sheet and a consent form.

Reflections on the results

A summary of findings that stood out:

First, a number of students opted not to use AI, and their work did not suffer. In fact many demonstrated strong engagement with the course material. This reiterates the point: encouraging responsible use of AI should not penalise those who choose not to use it.

Second, students are worried about misconduct. This again suggests the need for transparency around assessment, i.e., clarity about how AI use will be interpreted and marked, a common ground between the marker and the learner.

Third, reflection is more valuable than completeness. The most effective way of citing AI were not the most detailed, but the most thoughtful. A “good” chat log should show how learners evaluate AI output, question why something worked (or didn’t), and link decisions back to learning outcomes. Conversely, asking students to log every trivial AI interaction is stressful and burdensome. Treat the log as an additional channel for students to demonstrate assessment criteria, not as a forensic tool.

On top of that, the perception of AI varies. Students’ resistance of AI due to its consequences and risks should be acknowledged. In this respect, if tools like Copilot are really being used as productivity tools, embedded in technical practices, and actually enhancing efficiency, how do we ensure fair assessments? That is, for those who opted out of using AI, how to ensure that they don’t feel left out, and ensure that they are marked equally? These aspects should be communicated in the guidelines.

Some thoughts on broader issues

I argue that a singular checklist/guideline/tailored intervention is not sufficient to reduce the barrier of AI literacy. Instead, they should belong to a larger change. In keeping with the criticality aspect across many other assessment criteria, the critical use of AI needs long-term practices. If AI literacy is a skill, it needs space to be practiced, discussed, and revisited across units, the same as many other technical skills.

Some guiding questions that I raise, in the context of creative coding pedagogy:

Can we incorporate “coding with AI” as a technical skill that needs to be learned and practiced?
Can we demonstrate what a good programmer is, in the era of coding with chatbots?
Can we adapt the assessing criteria, to better accommodate the use of GenAI? Such as assessing the skill of prompting, the skill of curating, and the skill of adding context to chatbots?

Reference

Prabhudesai, S., Kasi, A.P., Mansingh, A., Das Antar, A., Shen, H., Banovic, N., 2025. “Here the GPT made a choice, and every choice can be biased”: How Students Critically Engage with LLMs through End-User Auditing Activity, in: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, CHI ’25. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3706598.3713714
Shukla, P., Naik, S., Obi, I., Backus, J., Rasche, N., Parsons, P., 2025. Rethinking Citation of AI Sources in Student-AI Collaboration within HCI Design Education, in: Proceedings of the 7th Annual Symposium on HCI Education, EduCHI ’25. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3742901.3742909

Reflections on GenAI Coding Tools in Creative Coding Curriculums

Reflections on GenAI Coding Tools in Creative Coding Curriculums

Trying to articulate the problem space

Two interventions in a test unit

1. A “Rubber Duck” Custom Agents in GitHub Copilot

2. A GenAI Checklist (Chat Log Template)

Rollout of the interventions and some discussion with students

Reflections on the results

Some thoughts on broader issues

Reference

Leave a Reply Cancel reply

Recent Posts

Recent Comments

Archives

Categories