What's the Line Between Research Integrity and Using AI as a Tool? - Kari Weaver
#19

What's the Line Between Research Integrity and Using AI as a Tool? - Kari Weaver

Episode 19 - Kari Weaver: What's the Line Between Research Integrity and Using AI as a Tool?
===

[00:00:00]

Priten: Welcome to Margin of Thought, where we make space for the questions that matter. I'm your host, Priten, and together we'll explore questions that help us preserve what matters while navigating what's coming.

We talk about how students and teachers should use ai, but one of the most pressing and least resolved questions is how we should document that use, who gets credit, what gets disclosed, and how do we build a research and learning ecosystem we can actually trust? Today's guest is Carrie Weaver, a librarian educator and program manager for the Artificial Intelligence and Machine Learning Initiative at the Ontario Consult of University Libraries. She's also an adjunct faculty member at the University of Toronto and the creator of the AID Framework, a practical tool for helping students and researchers disclose their use of AI with clarity and [00:01:00] consistency. We're going to talk about why citation isn't enough, what a global AI disclosure standard might actually look like and why the thorniest questions about authorship and integrity don't have clean answers yet, but why? That's not a reason to stop asking them. This is about building the infrastructure of trust that AI and education desperately needs. Let's begin.

Kari: I am Kari Weaver. , I am a hybrid , librarian and educator, and currently I'm the program manager for the artificial Intelligence and Machine Learning Initiative with the Ontario Council of University Libraries. And I am in this role on a secondment or sort of on loan, from my day job as the learning, teaching, and instructional design librarian at the University of Waterloo Libraries.

The University of Waterloo, if it's not familiar to you, is a large public research university in [00:02:00] Canada. And if we think back into yesteryear, it was actually the university where Blackberry was developed. So the precursor to the modern smartphone. So it's a very. STEM focused institution on the education side.

Addition to those roles, I'm also a sessional or adjunct faculty member, at the University of Toronto in, the Ontario Institute of Studies for Education, where I teach Graduat. Students about teaching and learning in higher education and educational research. So I have a variety of different hats, but those different hats or experiences are really crucial to what I'm doing with ai, which is that currently I am not only training and educating and researching about ai, but I am also overseeing at the Ontario Council of University Libraries, which I'll call OCUL from here [00:03:00] on, which is our acronym.

At OCUL we have several large scale projects where we're looking at essentially already automated workflows in library spaces. And the extent to which we can use artificial intelligence to help augment that work, to improve the services that we're able to offer to faculty staff, and students, and hopefully ultimately the public.

But doing that in ethical ways, where we can protect people's privacy and and can make sure that we're not violating intellectual property. All of them, sort of many ethical concerns that libraries are overall concerned with. , And certainly there's a component of it when we own it and we control it we also can be much [00:04:00] more aware of the extent of the environmental footprint, and sort of what sustainability might look like in that context.

So. , The projects that we're working on right now where we're looking at augmenting these workflows, we've just finished one where we've looked at how effectively we can train whisper to do transcription of existing audio visual materials within library collections that simply aren't accessible because, the amount of lift and time that's required for human transcription is significant and beyond what staffing and libraries can really support.

I think as many people would know funding and staffing in higher education is not great. And libraries certainly suffer from that as kind of a downstream impact. We've just concluded that, and [00:05:00] that project is also particularly interesting because at OCUL we do have collections and services that we're offering, bilingually, in both French and English.

So when we're considering these things we have to actually look at the capabilities, not just in English, but also in French. And not just French, but kaba quo French, which is, particular.. We've just finished that actually quite successfully. We have another project where we're looking at, whether or not a chat bot could be used both to help end users, but also people who staff are consortium chat, reference service.

We have another project where we're looking at can we improve the remediation of accessible. Books and book chapters that we provide, as an existing service, within Ontario and, uh, more broadly within Canada. And then we have [00:06:00] a fourth project where we're looking at can we do large scale metadata extraction from a collection of about 50,000, historic Canadian government documents that right now you can find that the document exists, but you cannot tell what information might be in there without looking at the document. And some of these things are 2000 plus pages of records for an individual documents. And that's again, another example of something where with AI we can explore doing this. Without ai, this work would never happen.

Priten: I wanna talk a little bit more about those various projects, but, oftentimes when folks are thinking about the role of librarians, in higher education right now, a lot of it is about research integrity. , And I'd love to start there in terms of how you're thinking about, both in your classroom, but also supporting peers, working directly with students, what has that conversation [00:07:00] looked like and how has it evolved over the course of the last few years?

Kari: Well, research integrity is a really interesting space. So what has been determined in the research integrity space is that you can use artificial intelligence for research. That is clear. But what isn't clear, and this is not necessarily the work I've been doing at OCUL, but one of those pieces that's at the nexus of all of the different work that I do with artificial intelligence, is that for research integrity, we need to have a way to capture, what AI tools and technologies people are using in different, facets of the research process. How they used that. We need some specification about the model or the timing of that, because if we discover a year down the line, [00:08:00] that a particular iteration of an AI model had significant bias perhaps toward a particular population that could be certainly something that impacts how we how we accept or reject that research in the long term within the scholarly record.

Unfortunately, the way that we traditionally have done this is we either incorporate this information into the methodology and we use citations, as a, a core piece of how we communicate, sort of our positioning and connection to the existing scholarly record. And artificial intelligence is really interesting in that it's disconnecting the ideas from the place that those ideas resonated. It's actually quite funny because, , not funny in like truly an amusing way, but funny in a, you don't like it when it happens to you sort of way. [00:09:00] That disconnection between the ideas and the place, and the originator. This is what we have done to traditional knowledge, lived knowledge in the way that , we sort of do knowledge production in a scholarly fashion.

And AI is actually now making that happen to people who produce scholarly work. From a research integrity perspective, what this means is we can't actually use our existing practices and we have to come up with a different system. So that's really moved in the direction of AI disclosure and generating some sort of structured AI disclosure statement.

Most journals now do require if AI is allowed in the research that they're doing, they do require a disclosure statement. What's challenging right now is that disclosure statement is sort [00:10:00] of semi voluntary, and it's inconsistent. So often when you're publishing, you have the situation where you have a journal you'd prefer to publish in, but it might be a little bit of a stretch.

You do the work, you submit it anyway, and if it doesn't get accepted, at least the feedback will help you improve it. Will you submit elsewhere? From a research workflow and integrity standpoint, if I have to give different disclosures or I have to make decisions about where I'm publishing based on what use of AI is allowed in these different forms or different ways of disclosing ai, that's not very helpful in building a research ecosystem where.

We have an understanding of how and in what ways people are using these tools. And while a solution might be to just say, well, it's AI as a tool incorporated in your method. The reality [00:11:00] is that there are a lot of uses of artificial intelligence that might not be methodological in nature., I'll just use as an example related to that, that French English dichotomy. I had a recent question from a researcher about being able to use AI to translate some articles that they really wanted to read, but were in a language that they weren't familiar with, but it was a really critical scholar in, a new area of scholarship, that they wanted to be able to engage with. So while we were able to figure out a way that aligned with licensing and copyright restrictions to allow them to do that. They're then going to be wanting to use that translated information in their research. That's not a methods issue, but that is a use of artificial intelligence.

[00:12:00] And depending on the tool that they're using for that translation, that could certainly impact their interaction and think with and thinking about that information. So that's a really like very concrete example of the place where artificial intelligence use in research is happening and it's actually happening in a really beneficial way to support the research and build better connections across, across people doing this work across the globe.

But we couldn't really integrate that into the methods section. So there are all these sort of like nuanced challenges about this. .

Priten: We've always struggled with the predominant types of advice that, folks are giving students, are citing it, or including it in some sort of methodology section, especially for a, scholarly publication. And neither of those really feels . And I think you hit the actual tension, which is, , [00:13:00] there's parts of the research process that are not, the actual, , answering the actual research question. And nor are they being used as like sources of authority upon their own which hopefully not most folks are in this place, in the research world., But when we think about that in terms of, student usage, there's value to, like scholars having a standardized disclosure statement. And you pointed out just like even in terms of the, , publication process and making sure that, we're not making it super onerous on folks to, apply to different journals and it doesn't influence which journals folks are applying to. Those are all, very concrete reasons why we ought to have a standardized disclosure statement. what we've noticed is that across educators there are very different, standards for this. And do you think that the kinds of, disclosure statements you're pushing for research purposes might also be pedagogically valuable?

Kari: Well, I, I think they are pedagogically valuable. , And I say this because, in my work at the University of Waterloo, before I transitioned to my [00:14:00] current role at OCUL,, I had occasion to develop, a framework for disclosing the use of artificial intelligence. It's called the Artificial Intelligence Disclosure Aid Framework. And it's that acronym intentionally because it should be helping you. But, the aid framework was really built specifically out of a need from graduate students where we had a number of graduate students who were in the process of finishing their thesis or dissertation, and were at risk of their supervisor not signing off because the supervisor didn't feel comfortable or confident in how they had used or disclosed or cited their use of artificial intelligence. , And so I was tasked with not inventing a framework, but coming up with some sort of solution to that. And it ended, , with the AID framework. [00:15:00] But in creating that, I did quite a lot of work then with individual classes on how do we implement this, and I think there is a need for some consistency because the student experience is. They're taking three, four, sometimes five courses all at the same time. And they're experiencing such a range of policies and expectations.

The other thing is, if you are really trying to have an understanding of what students are doing with artificial intelligence, you need to give them some clear tools.

So one of the things that I did with the aid framework is actually generated a rubric. So the framework has you give information on the AI tool that you're using, and then it has a taxonomy of different use cases, for artificial intelligence, essentially in [00:16:00] learning or research contexts. And so the first part of the rubric is. Did you use it for this purpose, yes or no? And then if it's yes, you can actually go in and describe it. I think the challenge right now is that getting students to disclose at the graduate level is really easy. If you ask them to, they will because this is really impacting them quite heavily.

At the undergraduate level, voluntary disclosures are a little bit more difficult. So one of the things that I've found in practice, both myself, and working with colleagues and students is. If you separate the disclosure from the assignment, but have them submitted at the same time. So maybe you give them the rubric or you ask them to generate the disclosure and have them submit that to a Dropbox for that assignment.

And that is used not only as an opportunity for a student to [00:17:00] practice that disclosure, which is an expectation in the research world, but also in many jurisdictions increasingly. A legal requirement in the corporate working world. So it's an opportunity to practice that, but then the disclosure is there and it can either be marked separately or it can just be something that's there that if there are questions or something seems a little off with the assignment.

The instructor or the TA can look at that and then use that information in combination to help guide the conversation with the student about what happened here and are we really meeting our learning goals that we were trying to achieve with whatever that particular assignment is. So those are a couple of options.

The other thing is that sometimes I will recommend [00:18:00] working with disclosure. If we're using a framework, we can also, and especially with first year students, I often recommend this, pre-select the uses that you want them to disclose. Use a standard framework, but tell them what disclosure you're expecting.

And I think that's particularly valuable when we're thinking across disciplines. In the STEM disciplines, a lot of student use of AI is really around problem solving and problem iteration. And then in the humanities and social sciences, it's really a lot more sort of writing, editing, iteration on that side. These are very different uses of artificial intelligence even for students at the same level, just based on disciplinary differences and norms.

So thinking through that and giving them that, that concrete [00:19:00] guidance I think is helpful. But if it can be set within a standardized framework that's being used across the institution, I think that's helpful.

Priten: There's still a component of the disclosure statements that requires buy-in on the part of the person doing the disclosing. And that's true of the undergraduate students, but also true of researchers. What is the conversation like about enforceability? But also beyond that how are you viewing like the culture of research integrity? Because we've had problems with research integrity in scholarly space, and we've had problems with integrity in assessments but the scale of this is much larger, I think, than we've seen for other .

And, there's a lot more unknowns. I think the standards are a little bit less clear than we've had in the past with plagiarism where there are bright lines. We don't have bright lines here. But what does the conversation look like when you're pushing for disclosure statements?

Kari: What I would say is I don't have a solution, right? The solution is multi-pronged and systemic ultimately. But what I do [00:20:00] know in practice is that if you're clear with people about your expectations and you're also clear about how they can meet those expectations, they're much more likely to conform to that. We're in the space where a lot of the move toward disclosure is. If we all sort of collectively agree as educators and researchers, that disclosure is in fact the answer. It's not citation, it's not something else. It is disclosure. And we can generally come to a collective agreement on how we want to do that.

Then we're going to be in a space where ultimately, it's a cultural change, and if the expectation becomes normalized and consistent. It's the same thing with plagiarism. Like, are you [00:21:00] going to have students who plagiarize or violate academic integrity? Yes, there are going to be some students who will take the shortcut and do that.

As an educator at some point I do my best, but that's kind of not my business. My business are the folks who, if given the structure and the motivation. Would do it. Those are the people I'm trying to reach. And that's most of the people, right? That's when we think of the bell curve, we're trying to reach the folks in the middle.

I think as it becomes more normalized, it will become easier for conformance or compliance. I think the other thing is, certainly we're looking toward on the research side, age agentic research workflows and disclosure is really important when we're thinking about age agentic [00:22:00] workflows. Because not only as a human are you supposed to be checking in throughout those research workflows, but having a structured disclosure statement that is consistent. We can actually ask agentic workflows to produce and give us a breakdown of that disclosure statement. That sounds funny, but it gives you a much better sense about how that automated workflow is responding to the tasks that you've given it. So it actually works fairly well. But the key is having some structure and standardization to.

Priten: What has that structure? Well, not even just for the agent flow, but in general. When I think about like the number of places that you encounter AI these days, and even just in the last six months, I feel like it has changed quite a bit. What are the kinds of things you're considering when you think about what needs to be included in, a disclosure versus [00:23:00] not?

And the particular examples I'm thinking of are like, you're in Excel and you usage it to generate formulas, you're in Google and you choose some of the results that Google's AI summary provides you as your starting points, right? Like they're very like tiny places, but it becomes massive, like influences quite a bit, how folks go about their research and writing.

And there's not a very clear line for me at least in terms of, okay, this has to be disclosed. , And this, we probably could get away with. There's obvious cases, but I'm curious about how you're navigating that middle ground.

Kari: So what I would say is, I don't have answers to that, but there is a very large scale effort, that I'm lucky enough to be collaboratively leading, with Bert Segers, who's the president of the European Network of Research Integrity Offices, and works at the Flemish Commission on Research Integrity in collaboration with the International Science Council and the Committee on Publication Ethics and STM and [00:24:00] several other large publishing and research and research integrity organizations, to really dig in with people who have specific expertise on this and work through some of these issues and have some of those conversations to land at a place where we could have a global AI disclosure standard that gives some more guidance on some of these thorny issues.

So I hope it's something that we'll have a better answer to within the next year. And we're doing this work through the world conferences on research integrity as their focus track for this year because it does seem like we are at the point where we are far enough into our journey with integrating AI into research and education that now is the time to really dig in and do [00:25:00] that work. So we have an incredible network of folks who are working on this and, really thinking quite deeply about these issues. So, despite being someone very intimately connected with this and working on this, I don't have answers yet, but it is to say.

People are working. Some of the best minds on this are working to really answer those questions. I think we'll probably arrive at a place that everybody feels equally discontent about, which will probably be the correct answer for it.

Priten: When you think about the community response in general to this project. , Are you getting pushback in terms of this being the right way? I'm curious about the two alternatives you already cited have been like citations, or in incorporating in the methodology section.

Are you seeing folks advocate for those more strongly against just disclosure statements? And are there alternatives that , I haven't thought of or haven't heard about rather that other folks are [00:26:00] suggesting?

Kari: That's a complex question. What I would say is certainly people have different opinions, but I think there is a general understanding that citation is kind of not it. I appreciate that many of the large citation organizations have tried themselves to grapple with this.

They're not really able to overcome that disconnect between the ideas and the source because. The source, the thing, is really what they're concerned with when it comes to citation. So I think there is general agreement that citation is really not it. Unless what you're citing is like a discreet thing that you have produced with ai.

Like if it is an AI graph that you have produced from a data set, then you could probably cite that graph and that would make sense. But [00:27:00] for a lot of the uses that disconnection just really prevents citation from being the answer. I think there is a faction that does very strongly feel well, we should just integrate it into the methodology and that's the answer. But then when you get into a lot of these use cases they can't articulate how you would manage to integrate that into a methodology. So I think it's more of a process of elimination that we sort of arrive at probably the disclosure statement is going to be a thing that moves forward. And truthfully, I think that's okay. Realistically, if we already had something in the ecosystem that filled the need, I would be very supportive of moving forward with making that adaptation and doing that. But I think that divide on [00:28:00] the methodology often falls. People in the social sciences and humanities tend to be like, well, this doesn't fit in the method.

And people in STEM fields feel like, well, of course this fits in the method. I'm using these tools to write my code to, you know, analyze this data set like these data. Both of those perspectives are valid and correct, but if we're thinking about this as part of the whole research integrity and educational ecosystem, we have to meet all those needs, right?

So we have to meet philosophy and we have to meet theoretical physics. And both of those things have to be addressed by what we're doing.

Priten: You were talking about agentic flows, I come from a philosophy background and I was trying to figure out what a valid use of that in philosophy, might be. And that's much harder for me to imagine than in the hard sciences and probably in the social sciences, I realize that none of [00:29:00] these answers are gonna be, you know, , we're not gonna have 'em overnight. But one of the ways that I've been framing it for some educators is that, this is like the classic philosophy problem of you have a wooden ship and you're replacing plank by plank, and when is it really like the same ship versus not?

And similarly with our writing, and our research,, at what point is it. AI's, you know, as much as we can attribute possession to a piece of tech, but at what point do you lose ownership of your own research and writing? Are there concrete thoughts coming out?

What are some of the biggest questions about not just disclosing, but when it even makes sense to say this is yours.

Kari: , The authorship question is so thorny. It always has been thorny though, in the sense of, how many folks have gone through some level of graduate study and felt like they made a contribution to something that their lab or their supervisor was doing that wasn't credited with authorship [00:30:00] in a publication or a presentation about something.

And so I just want to acknowledge that these tensions don't just exist for AI. They exist in other aspects of this ecosystem. I think that what you have to do is really look at what are the sort of core irreplaceable elements of whatever that particular work or that particular method might be?

And I think you do need to really ask yourself when doing this work, is this aspect of it something that. I need to be doing, you know, I am by training a qualitative researcher. I do a lot of focus group and interview based studies. I need to do the [00:31:00] coding for that. I am the research instrument. I don't want to outsource that to AI because that is the work.

I might be able to use AI in that context for if I'm having trouble naming that idea in coding, I can get some ideas and so forth. But the actual work of doing that coding, that's the work. And honestly, that's the thing I like doing anyway, so I, I don't want to farm that out.

So I think these are the things, and for student development, one of the conversations that I'm often having with students is, where are your skills? What are the things that you are trying to upskill on? You know, if searching for literature is truly something that you've already upskilled [00:32:00] on and you wanna use AI assisted search tools to help with that.

Sure. Fine. As a librarian, a professional librarian, that's totally fine with me, but if this is something that you haven't spent a lot of time on and your skills are really not up to the level of your study or expectation. And that's an opportunity to spend some time sitting with that and really learning to do that in whatever the work is that you're doing.

And I think a lot of it comes down to that. I think the challenge is, especially on the research productivity side, for those who are in tenure track positions, certainly, there's a certain amount of production that's expected and required, and so I think there has [00:33:00] been continuing conversation about that and movement toward reconsidering those requirements across many institutions in a way that really focuses on quality and impact over quantity. I think probably what we're going to see is sort of continued drive in that direction. Um, so. You know how lucky. That doesn't really answer the authorship question though, which is to say I don't have an answer. I think you have to sort of come to those places yourself as you are doing the work.

Priten: I'm thinking about the coding in particular, right? You've isolated that as a important part of the research process for you that you need ownership over to feel good about your research. I've seen AI coding platforms, that take survey data, that take interviews and that will code them for researchers. And I don't think there's viewed as cheating tools or whatever, right? Like [00:34:00] they're a standard, part of the research process. So do you think this will all end up being individualized? What do you need in order to feel a sense of authorship or do you think some of these things we might come up with some sort of cultural or community-based standard for because

Kari: I think we're going to see disciplinary based standards around a lot of these things. I think that's really the place and that is not happening right now. There is not consistency within discourse communities around what use is acceptable or what level of use is acceptable.

I absolutely am somebody who like those coding, like AI assisted coding tools, I absolutely have experimented with 'em and I look at it and I can tell you not only does that make me as a researcher feel more disconnected from my work, but i'm [00:35:00] often not happy with how it actually categorizes things and don't feel like when I look at it, that represents the work.

The tools do continue to improve, maybe my opinion of that will change. But, I think there is certainly a difference depending on the kind of work that you do. And I do think we need to allow space for that, but that doesn't prevent us from moving forward with disclosure.

And I do have to say as a librarian and interested party. What does that mean for interdisciplinary research context, which we're often being pushed more toward? I don't know. I think that's perhaps the next frontier of like really big cultural conversation that we need to have in the scholarly community.

Priten: I would love to share with the audience, [00:36:00] how they can see more about the work that you're doing. Stay updated on any of the results of conversations that y'all are having. If they have

other thoughts that they wanna share with you, in the consortium, where might they go?

Kari: A few different places. You can find more about the work that we're doing at OCUL, at ocul.owen.ca, which is our website. We post regular updates about our progress on our exploratory projects and work that we're doing with ai, information on the work to establish an international artificial intelligence disclosure standard.

The majority of that information is being hosted on the International Science Council website. And more information can also be found on the world conferences for research integrity website. We will be moving to the second round of consultation on that work, which we'll really focus on. The content of disclosure or what we need to disclose with ai, which is part of why we'll be getting [00:37:00] into some of those conversations in much more detail, and more information about my work on artificial intelligence disclosure can be [email protected]. As well as a helpful statement builder. So if you're looking for a way to help students, especially in educational context, semi-automate their disclosure process, that statement builder is really there to help support that work.

Priten: Awesome. Thank you so much.

I appreciate Carrie for joining us and bringing a level of rigor and nuance to the AI disclosure conversation that is long overdue. Carrie reminded us that we need to build systems that make honest, transparent, use the path of least resistance. Her work on the aid framework and the Global disclosure standard is exactly the kind of structural thinking that we need to match the pace of technology.

Keep listening as we continue exploring the ethics of [00:38:00] education technology and pre-order my book for more on this [email protected]. Thanks for listening to Margin of Thought. If this episode gave you something to think about, subscribe, rate, and review us. Also, share it with someone who might be asking similar questions. You can find the show notes, transcripts, and my newsletter at priten.org. Until next time, keep making space for the questions that matter.