Trusted Facts
After building the skill that helps language models speak in line with Australia’s livestock standards, I have been thinking about something that sits upstream of most of the digital debates we hear in agriculture. We spend a lot of time talking about data quality, but we almost never talk about facts. They are not the same thing.
Data quality is about fixing errors, formatting inconsistencies and missing values. It is important, but it is mostly housekeeping. Facts are different. Facts are the shared, validated, authoritative pieces of knowledge that industries rely on. They underpin decisions, models, regulations and the day-to-day tools that people use. They carry real economic weight.
Agriculture has plenty of data, some of it very good and some of it not. What we lack is a dependable, agreed set of facts that sit above the noise. As AI becomes more capable and we start feeding it larger volumes of context, the importance of these facts becomes much sharper. How do we know that the information guiding a model is accurate? How do we know it is current? How do we know it was produced with rigour rather than scraped from an unreliable source?
This is where RDCs and industry organisations come in. They already commission research, build capability and curate knowledge. They already hold much of the long-term evidence base for their sectors. They are the logical custodians for industry-level fact repositories. These would not be compliance databases or marketing portals. They would be carefully curated collections of agreed truths that the whole sector can rely on.
Facts must be visible, traceable and explainable. That means trials, lab studies and sustained research programs remain essential. Generative AI can turn research into tools more quickly than before, but it cannot manufacture truth. The quality of the fact determines the quality of the output. This is where many data-quality conversations fall short. You can have pristine data feeding a model, but if the underlying facts are wrong, the model will still mislead.
We also need clearer frameworks that allow industry facts to move through research, modelling and decision-making without becoming tangled in outdated restrictions or creating new risks. The boundary between industry facts and farmers’ operational data must stay clear. Farmers use facts to interpret their own data. They do not surrender their rights by doing so.
Research capability is another crucial pillar. High-quality facts do not appear by accident. They come from independent teams with the funding and freedom to investigate properly. When research becomes unstable, the facts that anchor the industry start to weaken.
There is also a growing external risk. Many of the places we have historically trusted for reference material are showing signs of strain. The internet is increasingly saturated with synthetic content written to game search engines. Wikipedia faces political and organisational pressure. Scientific publishing is dealing with an influx of fabricated papers. Even long-standing repositories like the Internet Archive cannot be assumed to last forever. If AI is going to become part of our cognitive infrastructure, we need to know that the material we feed into it is trustworthy.
Industry-run fact repositories give us a way to secure that trust. They create a stable foundation for training models, building advisory tools, validating analytics and benchmarking performance. They help farmers understand their own data through comparison with industry truths rather than guesses or outdated assumptions. They reduce the risk of feeding models with unreliable context. They make research more reusable and more valuable.
This is not a reinvention of data quality. It is a recognition that the sector needs a layer above data quality. A layer that defines what is true, what is agreed and what the industry can depend on when building tools, policies and models.
If we want AI to be genuinely useful in agriculture, and if we want confidence in the systems we build, we need to invest in these fact repositories now. RDCs and industry bodies are the natural stewards. They have the mandate, the reach and the long-term responsibility to do it well.
Facts do not attract much attention. They are not shiny. But they are the thing everything else rests on, and they are becoming more important with every model we train.