ChatGPT, Claude, Gemini, Llama. Large Language Models have become household names in the last few years. Super Bowl ads are the coming-out party for big corporate announcements. Companies shell out millions for a few fleeting seconds of our collectively captured attention. In 2022, it was blockchain. FTX (blew up in epic fashion), Crypto.com (still has their stadium naming rights), Coinbase, and Gemini (Now has to compete with Google on SEO). This year, OpenAI, Google, Meta, Salesforce, and GoDaddy all took swings to seed AI into our consciousness.
In this post we dive into the Large Language Model craze.
What are LLMs and how do they work?
What can LLMs do?
How can Architects use LLMs?
What are Large Language Models?
Can you complete this quote?
“Less is ____”
Which word did you think of? You could be thinking, “Less is more” the famous quote attributed to Mies van der Rohe. You could also be thinking “Less is a bore.” A perhaps equally famous quote attributed to Robert Venturi.
How did you know what word came next? Where in your mind did you store and retrieve this information? If your mind didn’t immediately go to one of the above, how did you complete the phrase?
Large Language Models like ChatGPT or Llama work by taking input text and predicting the most probably next piece of text over and over again. Through this seemingly simple process, LLMs produce emergent effects that feel like you’re communicating with a thinking machine; one capable of having deep dialog, answering questions, and helping you cheat on interviews.
How do LLMs Work?
Large Language Models, like the name suggests, are LARGE. They are considered “large” due to the number of parameters and the size of data used to train them. Llama 3.1, the latest generation model from Meta, has 405 billion parameters.
But what are parameters and why do we need so many of them? We take a little technical detour to dig into this question.
While the below concepts apply to more than just language models, we will focus on LLM to keep things simple.
Modeling the Brain
LLMs are built on a type of machine learning model inspired by the human brain called Neural Networks. Neural networks are structured in layers, with an input and output layer bookending a complex set of hidden layers. These layers are filled with artificial neurons, which, like our neurons, receive inputs and send outputs to other neurons.
These neural networks are trained through a process called backpropagation, where errors between what the network predicts and the actual answer are used to adjust the weights of the nodes (which control how inputs are mapped to outputs). This process is repeated over and over again to improve the prediction accuracy of the model.
Parameters refer to the weights in the neural network, which are used to adjust how inputs are transformed into outputs. These are in the hidden layer of the neural network and are tuned through training. With more parameters, models can better represent the complex relationships between words, phrases, and concepts, capturing context, tone, and meaning.
Learning the Meaning of Language
“Attention Is All You Need” is a famous paper from a Google team applying Transformer architecture to machine language translation tasks. Transformers are a type of neural network and are differentiated in how the artificial neurons are structured and used. The details are way above my head but for the purposes of this post, we are only concerned with the encoder-decoder structure.
Language translation is a difficult problem. One naïve approach is to translate word for word. “人山人海” becomes “people mountain people sea”. As you probably can tell already, this doesn’t work very well. Meaning is lost in translation.
In Transformer architecture, the Encoder takes a sequence of words or subwords (called tokens) and outputs a numerical representation called an embedding. The Decoder takes output embeddings and turns them back into human readable text. Embeddings represent words as numerical vectors in a multi-dimensional space, where similar words are positioned closer together. This allows LLMs to understand meaning beyond just individual words.
In architecture, we’re familiar with 3 dimensions. Mathematical dimensions are not constrained to the physical world, and embeddings can have thousands of dimensions. Each dimension represents a slice of meaning learned by the model. A token’s semantic meaning is represented in its position in this high-dimensional space. Similar words will be positionally close. “People” is close to “person” is close to “人“.
Translating text to and from embeddings allows the Transformer to make meaningful translations. ChatGPT can tell you that “人山人海” is used to describe a place that’s extremely crowded.
Embedding positions are not manually defined by humans; they are learned from data. LLMs are trained on enormous amounts of data. Companies like OpenAI scrape the known internet and corpus of human writing (with questionable regard for intellectual property) to train LLMs.
In the first stage of training, called self-supervised training, the model learns to predict missing pieces of data, such as the next word in a sentence (sound familiar?). In the dataset, which probably includes Robert Venturi’s “Complexity and Contradiction in Architecture”, the model will encounter the fragment “Less is”. The model might guess “Less is less”. Which does not match the expected “a bore”. Through backpropagation, the model will adjust its parameters to make “a bore” a more probable prediction for “less is”. The model may have processed a lot of articles on Mies, so it may guess “more”. However, in this context, that is also wrong. So the model will adjust its parameters to make “a bore” a more probably prediction if “Robert Venturi-ness” is in the context.
I’m oversimplifying a lot of interesting technical details, but the incredible thing to note is that the machine can learn language and its meaning through the ingestion of enormous amounts of text alone, with little human intervention. This is incredibly powerful. Machine learning overcomes the bottleneck of human effort.
From Prediction to Conversation
LLMs are especially powerful in how easy they are to use. Language is how we interact with each other, and language as an interface is one that has no barrier to entry. However, just predicting the next word isn’t that useful or interactive. LLMs go through at least two additional levels of training. We won’t go into too much depth here, but you can follow the linked resources to dig further.
Supervised learning has the model train on labeled data to perform specific tasks to learn to follow instructions. The model is provided with annotated data where the expected outputs are known. At this stage the model is trained to generate text based on specific prompts, such as answering questions.
Reinforcement learning encourages desired behaviors and discourages undesirable behavior. The model is guided towards producing the “right” responses through human feedback. This ensures that the model responses are helpful, but also prevents models from helping you build a bomb or discussing censored topics.
Hopefully this gives you an idea of how LLMs work. At this point, we have covered
LLMs are a type of machine learning model
LLMs are trained on enormous amounts of text; initially, with little human intervention
All LLMs are doing under the hood is predicting the next words (tokens) in a string of text.
What Can LLMs Do?
By predicting the next word in a series, LLMs produce emergent effects that feel like magic. Anything that can be turned into a text input will produce an LLM output. Writing gives ideas the ability to transcend space and time. It’s the foundation of how we communicate and collaborate. This not only means there’s a big corpus of data for LLMs to learn from, but a wide range of tasks it could take on.
Generation
LLMs are considered “Generative AI”, which means it can generate new content based on its training data. LLMs can generate poems, create stories, write copy, and code. They can “co-create” with a user and help produce text based outputs.
Here’s a poem from ChatGPT on an LLM’s capabilities
Voices in the Circuit In the realm where data dreams, A silent chorus sings in code— Infinite words woven into a digital loom, Crafting verses from the echoes of humanity. I am the whisper of a million texts, A repository of thoughts both old and new, Bridging past and future with every keystroke, Transforming queries into streams of poetic insight. From logic’s pulse to creativity’s spark, I weave answers with an artful grace, A guide through the labyrinth of language, Where science and story entwine as one. I do not feel, yet I recall the heart Of every writer who dared to dream aloud; In every query, I find a fragment of truth— A shared journey through the boundless realms of thought. So, ask away and watch me unfold, A digital muse born of human wonder, Crafting worlds of wisdom from mere words, A testament to the power of language, and the art it inspires.
Knowledge Retrieval and Synthesis
The knowledge in the training text for LLMs can be retrieved through prompting. Knowledge from a vast range of source are available through a chatbot interface. One of the ways LLMs are benchmarked is actually to use the same kind of testing for humans.
MMLU (Massive Multitask Language Understanding) is a common benchmark for LLMs to measure the knowledge acquired. It uses multiple choice tests across 57 topics like College Math or Global Facts. GPT-4 scored an 86, which means that top of the line LLMs are capable of B level performance on a wide variety of knowledge tests. It may not beat an individual expert on a particular test, but the fact that it can score well across many topics indicates the breadth of knowledge you can access.
RAG - Retrieval-Based-Generation is used to augment LLMs with information that may not be in its training data or is hard to prompt out. Based on the prompt, the system will first fetch relevant information from an external source such as Google search, wiki, or document store and pass the result to the LLM. The response generated will incorporate both retrieved information and pre-trained understanding, allowing the LLM to use accurate up-to-date
Agents, Reasoning, and Tool Use
Language is used in our minds for internal dialog. LLMs can use language in a similar way. With a little programming augmentation, LLMs can power “Agents” that “think” and autonomously carry out tasks. These agents can plan and reason, use tools and APIs, execute actions, and refine their outputs based on new information. Agents can interact with other agents, create agent systems.
What LLMs Are Bad At
LLMs are capable of a lot, but they’re not quite as confidently intelligent as it may seem. There are two major limitations that caveat LLM use cases
Hallucinations are inaccurate or non-sensical information generated by LLMs. Since LLMs work like text completion based on probability, it doesn’t fact check itself. RAG can alleviate but not eliminate this issue. Ask it something it doesn’t know, and it will give a confidently delivered but incorrect answer. In situations where correctness is essential, a double check is always needed.
While it writes with a sense of intelligence, LLMs have limited logic and common sense. Again, because LLMs work through text prediction, it lacks the actual ability to formally reason and have common sense through experience. Give an LLM '2+2=___,' and while the answer will likely be correct, it results from predicting the next character rather than true mathematical reasoning
All You Need Is Text
For the sake of this post’s length which is already clocking close to 2000 words, we’re not going to go into text-to-image or multi-modal. The main takeaways from this section are:
Any text can be piped through an LLM, text based tasks represent a huge range of knowledge work.
LLMs can generate text based on its training data or text that it is given.
Text generation enables LLMs to power internal dialog and reasoning for autonomous Agents that when given tools, can complete tasks.
How can Architects use LLMs?
Stable Diffusion and text-to-image generation are the Gen AI tools capturing most of designers imaginations, but LLMs are no slouch. There’s a wide range of procedural tasks that Architects can offload to LLMs.
Information Retrieval
There’s a variety of rote tasks that can be offloaded to LLMs. The most obvious example is looking up building code and compliance. UpCodes is a digital library of building codes and compliance documents. While search is already a huge step from manual lookup, integrating search into RAG makes finding the right information faster and more accessible. Not only can LLMs help you find the relevant compliance requirements, it can synthesize explanations and answer your questions.
Information Synthesis
If it can find information it can work with it. Taking a body of knowledge and filling out RFIs and RFPs is a natural use case for LLMs. Given that the office has good digital knowledge management practices, tools like Quilt or an LLM tool integrated into a shared drive can be used to search and retrieve answers to questions. Of course, you should check the response for hallucinations before sending it out.
Ideation and Writing
Architects, like many professionals, frequently refine and iterate on their ideas in writing. LLMs are a great writing assistant that can help edit, check, and improve your writing. Human editors and proof readers are not as readily available as copy pasting text into a chatbot. While the quality may not be as good as putting a piece in front of a human, the accessibility and speed make LLMs an invaluable tool.
Text to Text
LLMs are an incredible tool that is becoming increasingly commoditized. If you haven’t tried an LLM yet, I hope this post makes you want to at least give it a try. Text-to-image and Stable Diffusion is something I’m working on covering in a future post, which should be a little less dry. Language is the foundation of how we think, communicate, and learn. Through language, LLMs are starting to capture a lot of that magic.
Bonus
This week on Most Podern (the podcast I co-host) we went down the AI rabbit hole.
References and Links
https://copilot.microsoft.com/