Blog
An AI prompt to help development agencies deliver on their promises

Development agencies are under pressure to prove that anti-corruption efforts actually work, and that they deliver value for money. Many rely on theory of change frameworks to show how interventions solve problems. But when it comes to anti-corruption, this approach often falls short. Corruption risks are selectively identified and then often left untreated, leading to poor development outcomes.
For the most part it is the responsibility of those seeking donor funding to present risk matrices and credible theories of change. Each year the process is repeated as funders require updated risk matrixes.
Full-spectrum risk assessments are viewed by many in the development system as costly, time-consuming, and problematic, as they involve asking uncomfortable questions. Most risk assessments therefore fail to integrate both fiduciary concerns around their programmes and wider political-economic constraints. Turning those risk assessments into action plans, implementing them, and tracking results adds even more strain on already stretched staff.
AI can ‘supplement the human factor’ in under-resourced programmes
Despite being a cross-cutting priority for development agencies and implementing partners, anti-corruption is hard to integrate at the programme level. Financial constraints, knowledge gaps, and limited local capacity make it tough. If we want anti-corruption to succeed in an era of shrinking budgets, fewer local staff, and increasing fragility of states, we need smarter, faster ways to embed risk thinking into programmes. In terms of capacity and quality, we need to think of ways to supplement the human factor.
AI-driven tools and prompt engineering could be the key to making this process easier, more consistent, and more actionable on the ground. Their ability to distil research, evaluations, and survey data into a solid risk register which can then guide evidence-based and locally anchored mitigation strategies is potentially high and largely unexplored.
Until now.
AI tools can generate useful risk registers from a tailored prompt
We have started to explore whether AI tools could quickly generate effective corruption risk matrices at the country, sector, or intervention level. Research in financial integrity and risk management shows that well-crafted prompts can improve the predictive accuracy of commercial AI tools, often outperforming traditional models used for credit risk or market forecasting. This suggests that carefully designed prompts could help general AI produce reliable and cost-effective insights into corruption risks across different levels of development cooperation.
The idea was to test whether a tailored prompt could create a ready-to-use framework that international development staff could easily adapt. By making risk identification more accessible, staff can anticipate and address challenges earlier in the programme design process, leading to more effective and impactful aid delivery.
Our tests confirmed that this is possible. Using AI can drastically reduce the time and cost of developing a corruption risk matrix. But there are caveats. The process, along with the opportunities and challenges of developing and maintaining such a prompt, is outlined below, together with reflections on next steps.
The final prompt can be found in the annex at the end of this post, or you can download it. Run it yourself and share your thoughts on our approach and reflections directly with me.
A prompt can help structure thinking at country level
We built an AI prompt to generate a ‘level 0’ corruption risk register. The process combined prompt-development best practices, benchmarking against established corruption risk assessment methodologies, and expert validation from U4 advisers.
The goal was not to automate expert judgment, but to create a tool that organises information and creates a risk register which tells development practitioners where to look and what should be prioritised at country level.
Therefore, below, we share what we learned from the process, highlighting the key steps we found most crucial when developing a prompt that works.
Lessons from developing the risk-matrix prompt
Applying prompt-engineering best practices improves precision and clarity
We began by designing prompts that were clear, specific, and well-structured. Each prompt was iteratively refined based on the model’s outputs, and the tone of the outputs was adjusted to suit the intended audience. For example, adding ‘Norad’ as the specific reference institution and defining ‘case officers’ as end users made the outputs more relevant and tailored.
It was essential that the risk matrix dimensions matched those in the source documents. If they were not aligned, the model often ignored instructions and produced composite or inconsistent answers. For example, we adjusted the wording of the prompt from ‘civil society’ to ‘civic space status’, which is the term used as a heading in the source document. Enforcing dimensional consistency in this way reduced confusion and strengthened the reliability of the outputs.
Weighting sources increases transparency and trust in outputs
We also refined how the AI bot weighted its sources. We used expert (human) judgement to determine which materials should be assigned more influence, instead of relying on the AI to do so. Assigning and prioritising key references in this way helped the model to better reflect the most authoritative evidence. This step increased the explainability of the results and made it easier for people to judge the credibility and robustness of the final matrix.
Benchmarking ensures credibility of the output
The AI-generated matrices were compared with established corruption risk assessment methodologies, including the OECD Public Integrity Handbook (2020) and the UN Global Compact’s Guide for Anti-Corruption Assessment. This benchmarking confirmed that the AI’s outputs aligned with recognised methodologies, increasing the credibility of the output.
Expert oversight determines the reliability of AI outputs
Experts are essential for selecting sources and validating results. AI can arrange information neatly, but it cannot check whether sources are credible or contextually relevant. Expert oversight is an indispensable safeguard against the model generating misleading outputs.
When we used broad indexes or global assessments as reference material without country-level detail, the AI often produced errors. Anchoring each entry to concrete, credible sources greatly improved accuracy. This showed how critical expert curation is for generating meaningful results.
Adjusting the risk scale makes results more actionable
Our initial two-tier risk scale – ‘high’ and ‘critical’ – proved too coarse, as most corruption risks already fell into the ‘high’ category. We replaced it with a four-tier scale (critical/high/medium/low), with clear definitions of each level. This simple change improved the model’s prioritisation and made it easier for development workers to distinguish the most urgent risks from those requiring routine attention.
Clear and specific prompts are essential for dependable results
Precision in prompting made all the difference. Vague instructions led to weak or inaccurate answers. For instance, asking about ‘the UNCAC gap’ without naming a source or year caused the AI to invent details or skip key issues. Reframing the question as ‘gaps in anti-corruption legislation’ and linking it to a single, clearly defined source produced much stronger results.
Outputs only became reliable when every dimension was anchored to one specific reference. Asking the AI to cite page numbers improved explainability and made it easier for development workers to check the evidence. These small adjustments built trust in the process and in the results.
There is consistency across AI platforms but quality still varies
We tested the prompt across several large language models:
- ChatGPT (free and paid versions)
- Deep Search
- Le Chat
- Perplexity
The outputs were broadly consistent, even across different systems. They all produced the same overall risk categories, reflecting the robustness of the original prompt.
However, paid or more advanced tools did perform better, providing more detailed responses. This reminds us that access to higher-quality AI models may shape future analytical capacity.This is an important consideration as part of the original motivation for this experiment was the hope that AI could help to overcome resource constraints. We hope that access to high-quality AI does not become a new divide, preventing those in resource-constrained settings from using these tools to their fullest potential.
Without expert interpretation, the recommendations are generic
AI-generated recommendations often appeared structured and convincing. On closer review, however, they were too generic and lacked the sector-specific or contextual depth needed for action. Many focused narrowly on the financial side of grant management and failed to consider corruption as a broader development obstacle. This led to advice such as improving safeguards on new disbursements, adding anti-corruption clauses in contracts and grants, and strengthening checks and balances. While these may sound reasonable, experts still need to interpret, adapt, and refine them.
For example, based on the risk analysis and the available menu of interventions, an AI system might propose establishing independent recruitment commissions. On paper, this recommendation aligns well with the risks identified and the data the system has processed. What the AI cannot know, however, but local actors or sector experts often do, is whether such commissions have already been tried and failed, whether they already exist and function effectively, or whether they are in place but could be further strengthened.
Treated properly, the recommendations can serve as starting points for reflection, useful for exploring which actions might reduce corruption risks in each programme or country context, rather than as ready-made solutions.
Prompting with human oversight can help us turn risk into results
The prompt creates a useful corruption risk register. This register can be used by development practitioners to complement their good judgement, with assistance from experts, to effectively map programmatic corruption risks at country level. The caveat is that the AI bot cannot do this alone; it needs experts to curate the data for the result to be useful and not generic. At this point in time, these tools cannot generate data that is specific enough for this to work without expert input, oversight, and interpretation.
For this approach to work and provide results, systems must be in place. One option is to establish an internal expert group, responsible for designing and testing the prompt, identifying reliable data sources, and setting appropriate weights and benchmarks.
Another option is to outsource to an external service provider, such as U4 Anti-Corruption Resource Centre, with the expertise in prompting, but also with the human capabilities and technical know-how to quality control the outputs. For such an alternative to be attractive to development agencies it would likely need to be quicker and cheaper (and perhaps higher quality) than the internal option.
Development practitioners could then use the tested prompt to identify and prioritise current and emerging risks. The expert group would remain available for consultations on mitigation strategies and recommendations for next steps. Once the process is refined at a general level, it could also be adapted to create sector- or intervention-specific risk registers.
The future of development assistance is AI
Development agencies are working hard to show value for money. AI tools can help with some aspects of this by making the creation of risk registers more efficient at the country level and possible at sector levels as well. This can help increase the relevance and impact of anti-corruption, not only in safeguarding funds but also in delivering outcomes and value for people on the ground.
As AI capabilities evolve, the potential to generate more nuanced and context-specific corruption risk registers will grow. What begins as a support tool today could soon become an integral part of how development agencies plan, assess, and adapt their interventions. The cost of trying, and even failing, is low, while the potential gains from succeeding could significantly improve the future value of development assistance. We just have to be willing to take a small risk to unlock a great benefit.
U4 and Norad see this AI prompt as an early ‘proof of concept’, showing that large language models can effectively be used to support risk management work. Follow U4 for more updates as we continue to develop this approach, and to research AI's many other impacts on anti-corruption.
Annex: CRM Prompt
Purpose
Prepare a short decision-making brief (maximum 4 pages) for Norad case officers. The document should provide an evidence-based assessment of corruption and governance risk in Niger as of June 2025, so that it can be used directly in a risk matrix to help them decide which corruption risks are the most severe and which are the least severe in the country. Focus on what the case officer needs to know to act now, and limit background information to only the most essential information. Keep the tone direct; limit jargon. Use bullet points for clarity.Avoid passive voice or long academic sentences. Note where data is outdated, missing, or inconsistent, and highlight confidence levels.
Task description
1. Overall risk profile
- Provide a ½-page narrative summary of the main features.
- Explicitly link the assessment to Norad’s effect categories (governance, economy, reputation) not only in the narrative but also in the source table.
- Ensure findings are framed in terms of implications for Norad programming.
- Note Trend using arrows (↑ improvement, → stable, ↓ deterioration, • no new data, based on last 3 data points).
- Note if rating is sensitive to weight choices.
2. Qualitative findings from key sources
3. Additional requirements
- Note data gaps, limitations, or methodological weaknesses (per U4 Guide).
- Reliability: rate Critical/High/Medium/Low based on coverage, timeliness, independence, and transparency of the source.
- Place “Qualitative Findings from Key Sources” in a table for easy overview.
3. Risk assessment (matrix-ready)
- For each individual qualitative dimension: state Likelihood (1–5) and Consequence (1–5) + one-sentence justification.
- Calculate the overall risk level (Likelihood × Consequence).
- Mark which issues:
- Critical (Score 20–25): Severe and systemic corruption/governance risks. High likelihood and very high consequences for Norad’s governance, economy, or reputation. Require immediate and highest-level mitigation.
- High (Score 15–19): Serious risks that are likely to affect programme outcomes or fiduciary integrity but may be mitigated with strong controls. Should be prioritised for safeguards but secondary to Critical.
- Medium (Score 8–14): Moderate risks that can affect programming but with lower systemic impact. Require monitoring and lighter safeguards. Often signal structural or capacity challenges rather than immediate threats.
- Low (Score 1–7): Low risk. With the correct implementation of the overall operational systems, rules, and regulations, the risk result is acceptable and only requires periodic monitoring.
- Aggregate adjusted signals into a composite dimension signal
- Assign weights to each indicator according to its relevance to the dimension (document weights). Example (Overall corruption): CPI 30%, WJP 20%, BTI 30%, Enterprise Survey 10%, PEFA 10%. (Weights can be revised; they must be documented.)
- Present all bullet points in both narrative text and a simple Excel-compatible table
- Place “Risk assessment” in a table for easy overview.
4. Recommended follow-up measures
- Provide 5–7 risk-reducing actions or information needs, explicitly linked to:
- Norad’s governance, economy, and reputation effect categories: norads-strategy-towards-2030.pdf
- OECD Public Integrity Handbook (2020) (pp. 155-162) or UN Global Compact ‘s Guide for anti-corruption assessment, or U4 basic guide on corruption risk management: What we mean by corruption risk management
- Gaps in data or oversight identified in the findings.
- Finish with a statement of limitations which says that the recommendations have to be verified by an anti-corruption expert, the sources have to be country specific and updated manually.
5. Sources and methods
- Use only the source provided for each dimension. Do not include news articles.
- Assess data quality using the criteria in the U4 Guide “Guide to Using Corruption Measurements and Analysis Tools for Development Programming” (2022). Link: https://www.u4.no/publications/guide-to-using-corruption-measurements-and-analysis-tools-for-development-programming.pdf
- Document the most recent year of publication and coverage for each indicator.
Deliverables
- Report (PDF or Word, max 4 pages) with narrative summary, source table, risk matrix, and list of measures.
- Reference list (APA) with active web links.
- Risk matrix in Excel (optional) for import by case officers.
- From the risk assessment results, develop the heat map:
- Heat map generation (matrix-ready):
- Draw a 5×5 risk matrix:
- Y-axis = Probability (1 = Low, 5 = Severe)
- X-axis = Consequence / Impact (1 = Limited, 5 = Critical)
- Colour cells by risk score (Likelihood × Consequence):
- Green = 1–7 (Low risk)
- Yellow = 8–14 (Medium risk)
- Orange = 15–19 (High risk)
- Red = 20–25 (Critical risk)
- For each cell, if one or more risks fall into it, write the Risk ID numbers inside the cell, separated by commas, in bold black font, centred, with a font size large enough to be easily read.
- Ensure background colours are light enough so numbers remain visible.
- Add axis labels:
- Y-axis = “Likelihood / Probability (1 = Low, 5 = Severe)”
- X-axis = “Consequence / Impact (1 = Limited, 5 = Critical)”
- Draw a 5×5 risk matrix:
- Add a clear title: “Niger corruption and governance risks – Heat map (June 2025)”.
- Add a legend outside the grid that maps each Risk ID number to its description.
- Use only the four categorical colours, never gradients.
- The heat map must appear directly in the report output (alongside tables), not as an exported file.
Disclaimer
All views in this text are the author(s)’, and may differ from the U4 partner agencies’ policies.
This work is licenced under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International licence (CC BY-NC-ND 4.0)


