LLMs for Optimisation Problems
Can AI language models solve complex optimization problems? Exploring whether GPT and other transformers can generate constraint programming models that tackle real-world challenges.
Photo by Growtika on Unsplash
LLMs for Optimisation Problems
What if you could describe a complex optimization problem in plain English, and an AI would automatically write the mathematical model to solve it? Not just understand what you’re asking, but translate it into working optimization code that finds actual solutions. That’s the provocative question driving this research.
The Challenge
Picture an operations manager facing a logistics puzzle: “We need to schedule 50 delivery trucks across 200 locations, minimizing fuel costs while respecting time windows, driver hours regulations, and vehicle capacities.” Or imagine a factory planner trying to organize production: “Assign 30 jobs to 10 machines, minimizing total completion time while balancing machine workload.”
These are optimization problems—challenges with countless possible solutions where you need to find the best one according to specific criteria. Solving them requires writing mathematical models in specialized languages like MiniZinc, Gurobi, or CPLEX. And here’s the catch: it takes years of training to write those models correctly.
The Traditional Bottleneck: Creating optimization models demands dual expertise—deep understanding of both the real-world problem domain and the formal mathematics of constraint programming. Most domain experts (the operations managers, factory planners, logistics coordinators) lack the programming skills. Most optimization specialists lack intimate knowledge of every industry’s unique constraints.
The Translation Barrier: Converting natural problem descriptions into precise mathematical formulations is error-prone. Miss a constraint, and your “optimal” solution violates real-world requirements. Formulate something incorrectly, and the solver either fails or produces nonsense.
The Specialist Shortage: Skilled optimization modelers are scarce and expensive. Organizations often can’t access this expertise when they need it, leaving optimization opportunities unexploited.
The Iteration Burden: Real-world problems evolve. Requirements change, new constraints emerge, business rules shift. Each change requires model revision by someone fluent in both the problem domain and the optimization language.
This research asks: Can large language models bridge this gap? Can GPT and similar transformers, trained on vast amounts of text and code, learn to generate valid optimization models from natural language descriptions?
The Insight: Language Models as Optimization Translators
The breakthrough that made this research possible came from an unexpected convergence: the same large language models that learned to write software code, translate between human languages, and explain complex concepts might also learn the specialized language of mathematical optimization.
Think about what GPT does when it writes Python code from your description. You say “write a function that sorts a list,” and it generates syntactically correct code that actually works. It learned this not through explicit programming of syntax rules, but through exposure to millions of code examples during training.
Could the same learning process work for optimization modeling? If a transformer model sees enough examples of natural problem descriptions paired with their corresponding MiniZinc formulations, might it learn the translation patterns?
The hypothesis was compelling because optimization modeling, while specialized, follows patterns:
Structured Problem Descriptions: Optimization problems have recognizable components—decision variables, objective functions, constraints. These patterns repeat across different domains.
Formalization Conventions: Constraint programming languages have consistent syntax for expressing common concepts like “for all items” or “minimize the sum.”
Domain-Independent Abstractions: The same mathematical structures appear across industries. A scheduling problem in healthcare shares structural similarities with one in manufacturing, even though the terminology differs.
Example Availability: Decades of optimization research and teaching materials provide extensive examples of problems described informally and modeled formally—exactly the kind of paired data that transformer models excel at learning from.
If GPT could learn to translate English to Python, German to Spanish, and technical concepts to explanations, perhaps it could also learn to translate problem descriptions to MiniZinc models.
Building the Research: From Hypothesis to Validation
This wasn’t about deploying GPT as-is and hoping it magically understood optimization. It required systematic investigation of whether and how transformers could generate valid, effective optimization models.
The Experimental Design: Testing the Limits
The research needed to answer specific questions: Can transformers generate syntactically correct models? Are those models semantically meaningful? Do they actually solve the intended problems? How does performance vary with problem complexity?
Prompt Engineering: We designed prompts that provided GPT with problem descriptions in natural language, sometimes including examples of similar problems and their models. The goal was finding what information the model needed to generate correct formulations.
Problem Diversity: Testing spanned different optimization problem types—scheduling, routing, resource allocation, packing problems. This diversity revealed whether the approach generalized or worked only for narrow cases.
Validation Framework: Every generated model was tested rigorously. Did it compile? Did it produce solutions? Were those solutions actually optimal (or at least feasible)? How did solution quality compare to human-written models?
Iteration and Refinement: When GPT generated incorrect models, we analyzed the errors. Were they syntax mistakes? Logical errors in constraint formulation? Misunderstandings of problem semantics? This analysis guided prompt refinement.
The Reality: Successes, Failures, and Insights
The results weren’t uniformly impressive, but they were revealing:
Clear Successes: For well-structured, relatively simple optimization problems with clear descriptions, GPT generated working models surprisingly often. Problems like basic scheduling, simple knapsack variants, and straightforward assignment tasks saw success rates that exceeded initial expectations.
Instructive Failures: Complex problems with intricate constraints or subtle requirements frequently produced models that were syntactically correct but semantically wrong. GPT would generate plausible-looking MiniZinc code that compiled but didn’t actually represent the intended problem correctly.
The Abstraction Challenge: Problems requiring creative problem reformulation or non-obvious modeling tricks proved difficult. GPT could reproduce patterns it had seen but struggled with genuine model design creativity.
Prompt Sensitivity: Small changes in how problems were described dramatically affected success rates. This suggested the model was pattern-matching rather than truly “understanding” optimization modeling.
Verification Necessity: Even successful models required careful validation. Trusting generated code blindly would be dangerous—the models needed expert review to confirm correctness.
The Practical Applications: Where This Actually Helps
Despite limitations, clear use cases emerged where LLM-generated models provide genuine value:
Rapid Prototyping: Optimization researchers can quickly generate initial model drafts, then refine them. What might take hours to write from scratch becomes minutes of generation plus review.
Educational Tools: Students learning optimization modeling can see examples generated from problem descriptions, helping them understand the translation from informal to formal. The LLM becomes an interactive teaching assistant.
Accessibility Bridge: Domain experts without deep MiniZinc knowledge can generate starting-point models for their problems, then collaborate with optimization specialists to refine them. The barrier to entry drops significantly.
Documentation and Explanation: Interestingly, the process works bidirectionally. Given an existing MiniZinc model, LLMs can generate natural language explanations of what the model does—helping with understanding and maintenance of optimization code.
The Research Journey: Publications and Open Science
This work exemplifies exploratory research—investigating what’s possible rather than optimizing known approaches.
Peer-Reviewed Investigation
The central publication, “Towards an Automatic Optimisation Model Generator Assisted with Generative Pre-trained Transformer” (arXiv, 2023), documented the investigation systematically. This wasn’t claiming LLMs solve optimization completely—it was honest assessment of capabilities and limitations.
The research presented:
- Methodology for testing LLM-generated optimization models
- Benchmark results across various problem types
- Analysis of failure modes and their patterns
- Discussion of practical implications and realistic use cases
- Honest acknowledgment of where the approach falls short
Transparent Data Sharing
All experimental data is publicly available through Figshare, including:
Problem Descriptions: The natural language problem statements used as LLM prompts Generated Models: Complete MiniZinc code produced by the transformer models Validation Results: Whether each model compiled, solved correctly, and produced optimal solutions Comparative Analysis: How LLM-generated models compared to human-written versions
This transparency allows others to verify claims, build on the methodology, and explore improvements.
Accessible Explanation
“Can Large Language Models Solve Optimisation Problems?” provides a narrative exploration of the research accessible to broader audiences. It explains the concepts without assuming expertise in either transformers or constraint programming, making the work relevant to anyone curious about AI capabilities and limitations.
Why This Matters: The Bigger Picture
This research sits at the intersection of two transformative technologies: large language models and mathematical optimization.
We’re witnessing a moment where AI language models demonstrate remarkable capabilities—writing code, reasoning through problems, generating creative content. But understanding their genuine strengths versus hype requires careful investigation. This research contributes that understanding for a specific domain: can they actually help with optimization modeling?
The honest answer: partially. They’re not replacing optimization experts, but they’re creating new possibilities for accessibility and workflow acceleration. That’s valuable even if it’s not revolutionary.
There’s also methodological value. As LLMs become ubiquitous, we need rigorous frameworks for testing their capabilities in specialized domains. How do you validate AI-generated mathematical models? What verification steps are essential? What failure modes should users watch for? This research establishes patterns applicable beyond optimization.
The work also highlights a broader principle: AI tools are most effective when combined with human expertise, not as replacements for it. LLM-generated models serve as drafts requiring expert refinement. That human-AI collaboration model probably applies across many professional domains.
Looking Forward: Where the Research Goes Next
Every answer raises new questions worth investigating:
Fine-Tuning Experiments: Could transformers specifically fine-tuned on optimization modeling datasets improve performance? Would domain-specific training reduce error rates?
Interactive Model Refinement: What if the system could iteratively improve models based on validation feedback? Generate a model, test it, explain what’s wrong, generate a revised version.
Hybrid Approaches: Combining LLM model generation with automated verification tools could catch errors before human review. Integrate constraint checkers, solution validators, and formal verification into the generation pipeline.
Multi-Modal Problems: Can models learn to generate optimizations from diagrams, tables, or other non-textual problem descriptions? Many real-world problems are communicated visually.
Explainability Enhancement: Improving the system’s ability to explain why it formulated a model in a particular way would increase trust and enable better human-AI collaboration.
Broader Language Support: Extending beyond MiniZinc to other optimization frameworks (Pyomo, JuMP, Gurobi Python) would increase practical utility.
The investigation continues because the potential remains partially realized. We’ve established feasibility and identified clear limitations. The next phase involves systematically addressing those limitations.
Commitment to Open Science and Honest Assessment: This research embraces transparency about both successes and failures. The publication is openly accessible. Complete datasets are publicly available. Limitations are discussed as prominently as achievements.
If you’re exploring AI applications in specialized technical domains, developing educational tools for optimization, or researching transformer capabilities and limitations—this work is meant to help. Progress happens through honest evaluation of what works, what doesn’t, and why. Science advances when we share both our successes and our instructive failures.