Welcome to our first artificial intelligence (AI) performance test on the portal. These tests will continue to evolve over time thanks to constant changes and improvements in hardware acceleration of various AI models. This time, we'll dive into a specific usage type: Stable Diffusion.
Since it is the first AI test series that uses hardware, we have prepared a short guide for those users new to the world of artificial intelligence. We understand that some everyday concepts and terms in AI applications may be unfamiliar, so we want to provide clarity. Hopefully, in future tests, we will evaluate the performance of video cards in AI inference using terms like “horsepower.”
For now, let's explore the first part of the guide, where we will use AI (ChatGPT and others) to explain concepts in a simple way, accessible to any reader.
Table of Contents
Information and concepts: What is AI?
Artificial intelligence (AI) can be seen as a digital tool that imitates human intelligence. Instead of following specific programmed instructions, AI can learn from data and experiences to perform tasks and make decisions.
Imagine a program that helps you write emails by suggesting words based on your writing history. That's a basic example of AI. It adapts to your style and preferences as you interact with it.
Now, let's take this to a more advanced level with virtual assistants like Siri or Google Assistant. These use AI to understand your questions, learn from your past requests, and improve over time. For example, if you ask it about traffic, it will learn when and how you prefer to receive that information.
In short, AI is like a digital assistant that evolves and improves as you interact with it, using information to do tasks more efficiently.
What is generative AI?
Generative artificial intelligence is an evolution that goes beyond simply answering questions or following instructions. Rather than being limited to specific tasks, this form of AI has the ability to autonomously create original content.
We can compare this to a predictive writing program, but taken to the next level. Instead of suggesting words or phrases, generative AI can compose entire texts in a style that mimics the data pattern it has been trained on.
A concrete example of this is GPT-3, a language model developed by OpenAI. This model can receive a fragment of text and generate story continuations, write poetry or other coherent and contextually relevant texts.
In short, generative AI not only responds but also creates new content based on pattern learning. It's like having a digital writer that can produce original texts based on the style it has captured from the data it has been fed.
Deep learning and generating images from text
Deep learning, in simple terms, is a branch of artificial intelligence that mimics the way the human brain processes information to learn and make decisions. It's like having a digital assistant that not only follows basic instructions, but can also understand complex patterns and learn from past experiences.
Now, when we talk about text-to-image models based on diffusion techniques, we are entering even more fascinating territory. Imagine that you can ask your digital assistant to create an image from a description you give it. Text-to-image models use deep learning to understand the words in that description and then apply diffusion techniques.
Let's talk about that. Diffusion is like mixing colors intelligently. It is as if your assistant, instead of simply painting, took those colors and details that you provide in the description and “diffused” them to create something completely new and original. It's like having a digital artist who understands how to combine elements in a unique and creative way.
A concrete example of this could be a model called DeepDream, which uses deep learning and diffusion techniques to interpret and amplify specific patterns in an image, creating visually striking results.
In short, deep learning and text-to-image models based on diffusion techniques allow us to have a digital assistant that not only understands our words, but can also create original and amazing images from those words. It's as if technology becomes a true creative collaborator.
What is Stable Diffusion?
Stable Diffusion is an artificial intelligence model that creates images based on text prompts. Works similarly to other generative artificial intelligence modelsas ChatGPT. When given a text prompt, Stable Diffusion generates images based on its training data.
For example, the cue “apple” would produce an image of an apple. It can also handle more complicated prompts, such as creating an image of an apple in a specific art style. In addition to generating images, you can replace parts of an existing image and enlarge images to make them larger. Adding or replacing elements within an image is called “inpainting,” and enlarging an image to make it larger is called “outpainting.” These processes can modify any image, whether the original image was created with artificial intelligence or not.
Stable Diffusion uses something called a latent diffusion model (LDM). It starts with random noise that resembles static from an analog television. From that initial static, it goes through many steps to remove noise from the image until it matches the text prompt. This is possible because the model was trained by adding noise to existing images, so you are essentially reversing that process. Stable Diffusion was trained with many images from the internet, mainly from websites like Pinterest, DeviantArt y Flickr. Each image was accompanied by descriptive text, so the model knows what different things look like, can reproduce various art styles, and can take a text prompt and turn it into an image.
Stable Diffusion can create photorealistic images that are difficult to distinguish from reality and also images that are difficult to distinguish from hand-drawn or painted works of art. However, one way to identify AI-generated art is to look at hands, as Stable Diffusion and other models struggle in that area.
Important terms related to AI (Glossary)
Prompt: A prompt is a short instruction or stimulus that is provided to elicit a response, whether in the form of text, image, or other information. In the context of artificial intelligence, a prompt can be the input given to a model to generate a specific output.
txt2img: txt2img is a term often used to refer to the process of converting text prompts (prompts) into images using artificial intelligence.
Stable Diffusion Benchmark – Ranking of the best video cards (GPUs)
We do not have a specific Stable Diffusion tool, unlike others developed by software creators such as 3DMark and its TimeSpy benchmark. Therefore, we have had to develop our own testing methodology using Stable Diffusion, making sure to control the parameters, with the video card being the only variable.
After several days of testing, we are now able to apply a standardized assessment to establish a hierarchy of video cards and measure the performance differences between them in this type of task.
There are several versions of Stable Diffusion available to the general public, the most used being 1.5 and SDXL 1.0.
For our testing, we will opt for the SDXL 1.0 version as we have been impressed with the results obtained using simple instructions, ideal for users who are experimenting with this imaging tool for the first time.
As usual, we will share the testbed configuration that we will use in the following evaluations.
Test bench (GPU Benchmarks – Artificial Intelligence – 2023)
In our test bench, we have selected the highest performing processor in our inventory, the Intel Core i9-13900K. Although the processor does not play a crucial role in executing AI training using Stable Diffusion, we have chosen to use the best available to avoid potential mishaps.
The central focus of our testing is on achieving optimal 100% video card performance in AI imaging. Additionally, we sought to evaluate the performance differences between different video cards.
For these tests, we are using Windows 11 and have disabled VBS (Virtualization-Based Security).
CPU: Intel Core i9-13900K (Power Limiters Disabled) (https://amzn.to/3X53WQS)
Board: Z790 AORUS ELITE AX (BIOS F6) (https://amzn.to/3ClPWde)
RAM: G.Skill Flare X5 Series (AMD Expo) 32GB (2 x 16GB) DDR5 6000 CL36-36-96 (https://amzn.to/3Z8g45y)
T.video (what we are testing): Several
Operating system: Windows 11 Home Edition 22H2 – VBS OFF
Liquid refrigeration: Lian Li Galahad 360 (https://amzn.to/3jMvNXO)
SSD: Samsung 980 Pro 1TB + TeamGroup MP34 4TB SSD (https://amzn.to/3PuIAvX)
Driver: NVIDIA GeForce GameReady 545.84
Power supply: Seasonic Prime Gold 1300W (https://amzn.to/3Qd102w)
There are several versions of Stable Diffusion, and for our test, we opted for the SDXL 1.0 (base) variant. Regarding image resolution, we chose a final resolution higher than 512×512, using the quality preset that implies a greater number of “samples”. The task consisted of completing six images from the following prompt:
“A gamer pig, humanoid.”
During the process, we recorded the time necessary to generate the six images, seeking to minimize this time. Finally, we keep track of the GPU power consumption during the execution of this task.
Stable Diffusion Benchmark - Performance - GPUs
Measured in time (seconds). Less is better.
No data found
La NVIDIA GeForce RTX 4090 It is the best video card for this type of tasks. The results scale adequately, except for the GeForce GTX 1660 Super, which we will dedicate our appreciation to in the results analysis part.
Stable Diffusion Benchmark - Consumption - GPUs
Watts. GPU Power only
No data found
Consumption of unused video cards TensorRT, It is quite similar to what we observe in the average consumption of a video card while playing (Gaming). Compared to GeForce RTX 30 series, RTX 40 series It is much more efficient for this type of task.
Analysis of results
The test parameters are personalized, anticipating the evolution of image generation models using artificial intelligence in the coming months/years. Technological advancement has been exponential; Just a year ago, generating 512x512 images was a demanding task. Thanks to advances in hardware and software optimizations, imaging at this resolution has now been considerably simplified, even for laptop GPUs.
The focus is on posing a more challenging task to measure generational changes when next-generation video cards are released. Although the GeForce GTX 1660 Super manages to complete the task, it does so at a substantially higher time cost, since VRAM usage needs to be reduced for the configuration we used.
This provides useful guidelines when selecting a video card:
-6GB video cards can acceptably handle tasks in traditional configurations such as 512x512, but for larger images, an 8GB video card is recommended.
-In addition to the 8GB capacity of a video card, it is crucial to check the performance, as it can vary considerably depending on the card model.
Stable Diffusion Benchmark - Relative Performance - GeForce RTX 4060
Relative performance - Base video card (100%) - GeForce RTX 4060
No data found
We would recommend, at a minimum, opting for an 3060GB or 8GB GeForce RTX 12 if you plan to start AI imaging. Any higher choice will depend on the profitability or importance assigned to this type of use through artificial intelligence inference. The jump from a GeForce RTX 4060 to a 4060 Ti is notable, and choosing any GPU above this will depend on the budget or the return on investment (ROI) associated with this specific task.
Video cards with 4/6GB VRAM will work, but if you use demanding tasks, they can be very slow.
It has been a fascinating experience to initially explore this technology and evaluate the performance differences between various GPUs. As we progress on this task, we plan to review and update our test parameters. We hope this guide has been useful to you.