Large Language Models – The current state of play and the future possibilities

May 6, 2024 | Technology

A quick note before we start: In a rather predictable move, we started out by drafting this content using ChatGPT – currently the world’s most famous large language model (LLM) tool.

It seemed remiss for us to create content about the ability of LLMs to create content while, in fact, NOT using an LLM to create said content. Does your head hurt yet?

The draft generated by ChatGPT was well-written and informative but included a caveat that the information was correct at last update in 2022. If Conversis were new to the subject and this were a knowledge-gathering exercise, and if LLMs were moving at a slower pace than they currently are, it would have been perfect. However, our aim here is not to create content for content’s sake, but to add to the conversation and produce something new and meaningful for our readers (no pressure!). So, with that in mind, here is our completely unassisted-by-LLM discussion of the role of LLMs now and going forward. Where does progress stand, what’s next, and what does that potentially mean for all of us?

What are LLMs?

First of all, it’s always helpful to start by defining your terms. In this case, what is an LLM? LLMs or Large Language Models are a form of generative AI (i.e. AI that is capable of creating content) based on text. One of the first language models was produced at MIT back in 1966 – the Eliza model – so the concept is not new. But the sophistication and (most notably) the scale has increased dramatically since then, with models now being based on huge datasets – hence the first “L” in LLM.

These LLMs (also referred to as neural networks) work by imitating the human brain, ingesting information, processing and inferring logical relationships within that information, and using what they have learned to create new content. Essentially, an LLM determines how language works by reading vast amounts of it, figuring out the connections, and then creating content by predicting the next word in a sentence based on when and where it’s seen that word used before.

If you’re looking for a practical example, this New York Times article offers a simple illustration of how a basic LLM works.

What are LLMs currently used for?

The main use of LLMs currently is in the area of content creation, in one form or another. They are used to create text on a given subject, to translate content, and to summarize or rewrite it. They can be used to classify content, detect sentiment, and generate conversations (e.g. in the case of an AI chatbot). LLMs can also perform other kinds of language or language-adjacent tasks, like code writing or analyzing protein structures.

The benefits are well-known – huge time savings on rote tasks, and the all-important cost savings once up and running – you can find discussions of these benefits across all corporate content at the moment. There has, however, been an equal amount of conversation on the other side of the fence – primarily about the costs and resources associated with setting up an LLM, and about the quality (debated) of the content LLMs produce.

No new innovation is completely without issue or controversy, and AI is certainly receiving more than its fair share of scrutiny. So we thought we’d try to take a balanced look at the current LLM landscape, the realities of current performance, and what future innovations could really look like. So, here goes…

Current landscape: Huge language models; Questions around costs and resources

The received wisdom up to recently, when it comes to LLMs, has been that more is more. The more data, the better; the more parameters, the better; the more computing power, the better. This logic remains sound to an extent – the wider the dataset, the more varied and complex relationships the model is exposed to and can learn from. However, opponents of AI have also raised some pretty valid concerns about this approach. Namely, training LLMs on datasets this size requires huge server farms – essentially supercomputers – that cost a lot of money and – more worryingly – create a significant carbon footprint. As well as that, these LLMs, given their sheer size, are usually proprietary, meaning innovations and customizations can be slow and – again – expensive, potentially limiting meaningful LLM usage only to businesses over a certain size.

The associated innovations: Smaller, open-source models; Sparse expert models; and Self-training

To combat issues relating to compute power, cost, and environmental impact, smaller, open-source models are becoming increasingly popular. There is a good argument that models trained on smaller datasets perform just as well, if not better, than their much larger counterparts. It’s the quality of the data used to train the model that counts here. And this is particularly true in the case of niche areas and subject matters (such as Life Sciences or clinical trials). Companies like Meta are making available smaller, open-source models, such as their LlaMa series, which would allow users to spend less, use less, and adapt easily for specific uses. Open-source models can also be brought in-house, allowing for greater control over data, information security, and even customization to individual end-clients.

Another developing – but, thus far, less common – solution lies in sparse expert models. The majority of models currently in popular use are dense, meaning that every time they receive a prompt, they run through every single parameter they have (generally numbering in the billions at least). This requires a lot of compute power. Sparse expert models, on the other hand, only activate the parameters that are relevant to a specific query. Meaning they can include just as much, or even more, data than a dense model, but require less time and power to respond to a prompt. As this Forbes article succinctly explains it: “sparse models can be thought of as consisting of a collection of “sub-models” that serve as experts on different topics. Depending on the prompt presented to the model, the most relevant experts within the model are activated while the other experts remain inactive.” An added advantage of these models is that they are more interpretable than dense models, i.e. it is easier for a human to understand what the model has done and why, primarily because one can see which parts of the model were activated. Most of the LLMs currently in use make it nigh on impossible to understand the logic behind their decisions, which is a major obstacle to usage in the likes of clinical settings, where checking and verification by human experts is essential. As such, making LLMs’ inner workings easier to interpret opens up the potential scope of application significantly.

Another potential impact of the size of current LLMs is that they could actually run out of data with which to train themselves in the not-too-distant future! Some estimates put total global text data at somewhere between 4.6 and 17.2 trillion tokens (a token being a unit of text). That’s the world’s entire stock of books, articles, quality online content, etc. Given that some LLMs have already been trained on 1.4 trillion tokens plus, the lower end of that estimate doesn’t provide very much road for further iterations. A possible innovation currently being researched is enabling LLMs to continue their own training, in much the same way a human would conduct their own further learning. Put simply, LLMs that have been trained would take the knowledge they have already accumulated, and use that to create new content, making new connections in the process, and then – in turn – use the new content and connections to train themselves further. This would, at the very least, slow down the pace at which LLMs require new data to continue evolving.

Current landscape: The hotly debated quality of LLM output

As we have discussed elsewhere, though they do a very good impression of it, LLMs do not actually think for themselves. Whether this is a positive or a negative tends to vary depending on who you’re talking to. What it definitely does mean is that the quality you get out is only as good as what you put in, and it’s necessary to conduct a level of due diligence commensurate with the intended use. This VentureBeat author describes it well, saying “These models are trained to generate text that is plausible, not grounded in real facts. This is why they can make up stuff that never happened.” As we mentioned in our introduction, content produced by the likes of ChatGPT tends to read really well. According to detractors, the very fact of this content looking like real thought is what makes it potentially quite dangerous. If taken at face value and replicated, they say – especially given the speed and ease with which this content can be produced by non-experts – this could “leverage misinformation spread at an unprecedented scale”, creating an “AI-driven infodemic”.[1] That’s at the macro level. But, by this logic, possible effects are just as harmful at the micro level, where one could take information from an LLM which is “a reduction of the original knowledge, possibly misinterpreted, possibly remixed to confusion”[2] and apply it to high-stakes (e.g. clinical) situations, producing potentially life-threatening results. So far, so scary. But is it true? Most in the know would argue that ethical, responsible usage is important in the implementation of any tool, and that the greatest impact will be on good-faith users being able to produce more and better content. But there are plenty of innovations being created in this space too, to limit the impact of any bad-faith or lazy usage that might occur.

The associated innovation: Built-in fact-checking

For instance, a number of LLM creators have recently put out versions of their models, like WebGPT from OpenAI and Sparrow from DeepMind, that search for and pull relevant information from external sources. This allows the LLM to produce more up-to-date content, complete with source citations, that can be referenced and fact-checked. Searching for corroborating information from sources outside itself essentially allows the LLM to fact-check itself, and – as with the sparse expert models described above – citations also give more insight into the foundations and sources of an LLM’s decisions. Innovations like this, coupled with agreed best practice, should make it pretty easy to head off all of the more doom and gloom predictions above, and enable LLM users to produce meaningful, verifiable information quickly and reliably.

While it may seem like they appeared from nowhere around the end of 2022, large language models have been around for a while, and are likely here to stay. The benefits they can bring at all professional levels are too significant to be ignored, but, by the same token (pun fully intended), the concerns some have deserve to be aired and addressed, and, where possible, innovations created to alleviate them.

Luckily, the creators of LLMs are nothing if not proactive problem-solvers, and we can trace a direct line from current concerns to innovations. The trend in the near-to-medium future seems likely, therefore, to be towards smaller, more niche models and sparse expert models; as well as LLMs that can self-train and self-check. No doubt, as the use of LLMs becomes more widespread (partly as a result of these solutions) other use cases will become apparent, and trigger a whole new set of innovative responses. One thing is certain – the pace of change is not slowing any time soon, so keep watching this space!

[1] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10166793/

[2] https://stackoverflow.blog/2023/07/03/do-large-language-models-know-what-they-are-talking-about/