The giant language mannequin (LLM) race is accelerating, with new architectures, fine-tunes, and specialised techniques arriving earlier than the final ones have even settled. With such intense dynamics, choosing the fitting mannequin takes intention, pace, and fixed re-evaluation.
Somewhat than committing to a single supplier or structure, we systematically benchmark fashions throughout a variety of real-world duties and domain-specific situations. By constantly integrating and testing the most recent LLMs, we be certain that Hostinger Horizons, your all-in-one, no-code AI companion, is at all times powered by high tech to ship the strongest efficiency, reliability, and worth. Right here’s what our newest assessments and experiences reveal.
Who leads the race?
Out of dozens of main LLMs at present competing available on the market – every with its personal strengths and weaknesses – we at all times use a mixture of no less than a number of and keep updated with the most recent developments and releases. One such instance was the launch of Gemini 3 by Google in mid-November final 12 months. It generated fairly a buzz, and our inside analysis confirmed that Gemini 3 is certainly well worth the hype.
At this time, Gemini 3 powers elements of Hostinger Horizons, delivering extra exact, higher-quality code than Gemini 2.5. It additionally fixes errors extra reliably, with our autofix success leaping from 50% to 80%. Although some coding-oriented benchmarks nonetheless put Gemini 3 behind GPT-5 mini, GPT-5.1, and now additionally GPT-5.2, in our expertise, Google’s latest mannequin really delivers.
Skilled remark
Gemini 3 is sort of succesful, particularly with extra nuanced duties. For instance, whereas testing it, we have been capable of generate an intricate finance web site with only one immediate. Whereas correct and highly effective, Gemini 3 is fairly sluggish. That’s the reason we don’t use it for easier modifications the place a quicker mannequin can ship an analogous resolution.”
Gemini 3 is among the LLMs powering Hostinger Horizons. It handles coding duties and is paired with our communication agent – a brand new characteristic that enables AI to ask clarifying questions each time the immediate is unclear or imprecise. The communication agent helps Horizons perceive what the consumer desires, which results in extra correct code era, an improved closing end result, and a smoother general expertise. Importantly, these clarifying messages are free – AI credit are solely required for code modifications.
The newcomer: Opus 4.5
Simply days after Google launched Gemini 3, Anthropic launched Claude Opus 4.5. In our inside high quality rating for landpage era, this newcomer ranks among the many top-performing fashions – proper up there with the most recent GPT fashions, in addition to Gemini 3.
Nonetheless, Opus 4.5 makes use of extra tokens to realize the identical end result because the older Claude Sonnet 4.5.
“For preliminary prompts, we’re nonetheless primarily utilizing Sonnet 4.5, which has confirmed dependable for many era duties. However we’re investigating Opus 4.5 as a substitute. It follows instructions very nicely, doesn’t make errors, and produces stunning web sites. Technically, it’s a very highly effective mannequin,” stated Dainius Kavoliūnas, Head of Hostinger Horizons.
The actual capabilities of Opus 4.5 shine when one pushes the mannequin to its limits – resembling by asking it to generate a complete planning app with superior colour palettes, quite a few buttons, gradients, and animations in a single shot. That is supported by many benchmark scores, indicating that Opus 4.5 outperforms Sonnet 4.5 in areas resembling novel problem-solving and superior reasoning. On SWE-bench Verified, a benchmark used to evaluate mannequin efficiency for coding duties, Opus 4.5 barely edges out the current GPT-5.2 Considering (80.9% vs. 80%) and fairly considerably beats Gemini 3 (76.2%).
Discovering the stability
By mixing and mixing numerous AI fashions, we’ve decreased the overall response time of Hostinger Horizons by 25%. Additionally, the background error verify after coding now takes solely 12 seconds, in comparison with 40 seconds a month in the past.
“Ultimately, all of it comes all the way down to utilizing the fitting mannequin for the fitting activity and in the fitting context. Up to now, we now have discovered that Sonnet 4.5 takes the lead within the preliminary prompting stage, and Gemini 3 is perfect for subsequent fixes and changes, with different fashions invoked relying on the scenario. There’s clearly no single formulation, and high scores on benchmarks don’t assure the very best outcomes when LLMs are utilized in real-life merchandise. Due to this fact, we continuously work on testing, enhancing, and discovering the fitting stability to convey the very best expertise to our purchasers,” stated Kavoliūnas.
Whether or not present leaders will preserve their positions or be displaced by opponents stays to be seen. However one factor is definite: we’re intent on staying forward by constantly testing, evaluating, and optimizing. Our purpose stays the identical: making web site creation and administration so simple as attainable.


![[STUDY] Younger Customers Say Web sites Matter Most — Even As AI and Social Commerce Increase [STUDY] Younger Customers Say Web sites Matter Most — Even As AI and Social Commerce Increase](https://i2.wp.com/www.dreamhost.com/blog/wp-content/uploads/2026/01/1220x628_ogimage_2026_study_website_credibility.webp?w=360&resize=360,180&ssl=1)






