Guides

ESRGAN vs. Real-ESRGAN - from theoretical to real-world super-resolution with AI

ESRGAN vs. Real-ESRGAN. Implementation, cost, and use-cases of both models. Which is the best model for upscaling and super-resolution?

Example super-resolution of existing images using Real-ESRGAN. Let's compare ESRGAN vs. Real-ESRGAN.

Ever stumbled upon an old, fuzzy snapshot of your ancestors and wished for it to be clearer? Or tried to zoom into a cherished picture from your buddy's Facebook collection, only to meet with unsightly pixelation?

Don't despair. AI image super-resolution technology is here to infuse new vitality into your dull photographs. These models restore and enhance details in low-resolution pictures. They can even be used to clean up AI-generated images that have unsightly defects or artifacts.

In this guide, we'll compare and contrast two ground-breaking AI models - ESRGAN and Real-ESRGAN. Though they share similar names, they serve subtly different purposes. They both employ deep learning to augment resolution and correct degradation, but each excels in its own specific area:

ESRGAN: Offers outstanding results for ideal simulated degradation, such as bicubic downscaling. If you're dealing with pristine bicubic downscaled images or weird noise from AI generations, ESRGAN is still a good option for you, despite being older.
Real-ESRGAN: Tailored to tackle real-world corruptions like the compression common in social media. If your goal is to restore old photos or social media images affected by unknown blur or noise, Real-ESRGAN should be your pick. Real-ESRGAN also performs pretty well on synthetic images.

In this exploration, you'll understand how each model functions, its capabilities, limitations, and criteria for choosing one over the other.

Subscribe or follow me on Twitter for more content like this!

So, if you have a collection of vintage family photos needing an upgrade, or you desire to recapture the lost detail in your social media shares, or just want to clean up some Stable Diffusion generations, this guide is here to help you sift throught the the 'software magic' behind AI photo enhancement and place you with the right model.

How ESRGAN brought a huge leap in image quality

Back in 2018, a unique type of artificial intelligence model named ESRGAN revolutionized how we see and enhance images. It demonstrated the remarkable ability of AI to generate highly realistic textures and details for improving image resolution. This leap forward resulted in significantly better visual quality compared to older methods.

How Does ESRGAN Work?

At the core of ESRGAN, there's a friendly game of deception going on. One part of the system, called the 'generator,' tries to upgrade the resolution of a low-quality image. Meanwhile, another part named the 'discriminator' is trained to tell the difference between a high-quality image and one that's been artificially enhanced by the generator.

The fascinating part? They're competing with each other. This competition pushes both parts of the system to get better and better, resulting in images that look astonishingly photorealistic. You can read more about this implementation in the ESRGAN paper on Arxiv.

Qualitative results of ESRGAN as compared to different models, from the paper.

The Building Blocks of ESRGAN

ESRGAN uses some cool techniques to produce the best possible results. Here's a quick look at them, in simpler terms:

Residual-in-residual dense blocks: These are the fundamental units of the 'generator' that help process and enhance features in the image.
Relativistic GAN loss: It's a kind of training method that ensures the system produces stable and consistent results.
Perceptual loss: This technique helps ESRGAN to recover textures in the image by comparing high-level image features from a pre-trained network.

However, there's one caveat with ESRGAN. It depends on a method called 'artificial bicubic downsampling' of high-resolution images during training. In layman's terms, this means it assumes a smooth and regular process of image degradation, which may not reflect the more random ways images can degrade in the real world.

Nevertheless, ESRGAN was able to deliver top-tier image quality on research datasets and pioneered the use of AI for super-resolution, showing us a glimpse of the future of image enhancement.

How Real-ESRGAN went from synthetic to real-world images

ESRGAN is a star performer when it comes to enhancing 'clean,' digitally downscaled images. However, real photographs encounter numerous, often unpredictable distortions that can be tricky to replicate or correct. These can include:

Blur: Brought on by factors such as camera misfocus, object motion, or atmospheric disturbances.
Noise: A result of imaging sensors or compression algorithms.
Artifacts: Unwanted effects from JPEG compression or resizing operations.
Multiple degradation types: A combination of the above distortions, among others.

To deal with these complex, real-world issues and enable super-resolution for actual photographs, Real-ESRGAN stepped up the game with some major enhancements:

Key Improvements in Real-ESRGAN

Unlike ESRGAN, Real-ESRGAN has a more complicated and robust stystem for restoring photos. Let's use some analogies to understand what each part of the algorithm is doing. The following descriptions are based on the Real-ESRGAN paper on Arxiv, uploaded by the creator Xinntao. Note that you can also access the model on Github if you'd like to further dive into the details.

Comparison of super-resolution outputs by real-ESRGAN vs ESRGAN - note the improvement in real-world images. From the paper.

The Diverse Degradation Model: A Skilled Detective

A good detective begins with understanding the problem. For Real-ESRGAN, the Diverse Degradation Model plays this detective role. It helps identify the many ways images can degrade - from noise to blur and compression artifacts. By understanding these real-world issues, Real-ESRGAN can plan an effective strategy to counteract them. This "detective work" allows Real-ESRGAN to tackle image restoration more effectively.

The Sinc Filter: An Expert Artisan

After understanding the problems, Real-ESRGAN acts as an artisan to start the restoration process. The Sinc Filter adds in subtle details, like the unique 'ringing' effects that are often present in high-quality, real-life photos. It's as if Real-ESRGAN uses the finest paintbrush to enhance the authenticity of your restored image.

The U-Net Discriminator: The Quality Control Inspector

Quality is key in any renovation. The U-Net Discriminator acts as Real-ESRGAN's internal quality control, analyzing every single pixel for perfection. This careful inspection ensures that the image restoration doesn't miss any fine textures and details, ensuring your photo's quality is top-notch.

Spectral Normalization: The Project Manager

Finally, to ensure a smooth renovation process, Real-ESRGAN needs a project manager. The Spectral Normalization feature keeps the training process balanced and stable, just like a project manager ensures a renovation project stays on track. This stability is crucial in achieving consistent, high-quality results.

In essence, Real-ESRGAN combines these four elements to seamlessly transform low-quality, real-world images into high-resolution masterpieces.

The Upshot: A Game Changer for Real-world Images

These enhancements empower Real-ESRGAN to handle the challenging aspects of real-world image corruption with finesse. It greatly outshines ESRGAN when it comes to dealing with low-resolution images affected by unknown blur, noise, and JPEG compression artifacts. In other words, Real-ESRGAN is a game changer when you're trying to improve the quality of real-world images.

Choosing the Right AI Tool: When to Use ESRGAN vs. Real-ESRGAN

Artificial Intelligence is a powerful tool, but like any tool, its efficacy largely depends on the task at hand. Understanding when to employ ESRGAN versus Real-ESRGAN can make a significant difference in the quality of the outcome.

ESRGAN: The Master of Synthetic Images

ESRGAN is a virtuoso when dealing with artificially manipulated images. If you're working with images that have been changed by algorithms rather than natural processes, ESRGAN might be your best bet.

For instance, consider digital artworks or images generated by other AI models. These synthetic images often exhibit a certain type of degradation, like bicubic downscaling, which ESRGAN is specifically designed to handle. It's the perfect tool to refine the textures and details of these images and give them a photorealistic touch.

Real-ESRGAN: Conqueror of Real-World Corruptions

On the other hand, Real-ESRGAN is a powerhouse for real-world photographs. The unpredictable nature of real-world corruptions, such as blurs, noise, or compression artifacts, can be quite a challenge to rectify.

But with Real-ESRGAN's diverse degradation model and other advanced features, these challenges become manageable. Whether it's enhancing old family photos, improving social media images, or providing clarity to medical or satellite images, Real-ESRGAN is a real-world savior.

Which AI model is better?

The choice between ESRGAN and Real-ESRGAN is not a matter of which one is "better". Rather, it's about finding the best fit for your specific needs. While both models excel at enhancing images, they shine in different scenarios.

By understanding the unique strengths of ESRGAN and Real-ESRGAN, you can effectively leverage these powerful AI tools to maximize their potential. Whether you are working with synthetic images or real-world photos, the right AI can make all the difference.

Ideal Scenarios for Each Model

Bicubic Downsampling: If your images have undergone bicubic downsampling, ESRGAN still achieves excellent results. It's built to handle this kind of synthetic degradation effectively.
Real Photographs with Complex Distortions: For images distorted by real-world elements like camera blur, sensor noise, or compression artifacts, Real-ESRGAN is a much stronger choice. It's designed to handle these unpredictable corruptions that ESRGAN isn't ideally suited for.
Face Restoration: Another advantage of Real-ESRGAN is that it can be used with GFPGAN, which enhances its ability to restore faces in photographs. If you're trying to improve the clarity of real facial images, Real-ESRGAN has an edge.

Recognizing Their Limitations

While both models are impressive, they also have certain limitations that are important to acknowledge:

Extremely Low Resolutions or Heavy Degradations: If an image is of extremely low resolution or has been heavily degraded, both models may struggle to produce satisfactory results. The lack of details in the source image can limit what the models can recreate.
Video Super-Resolution: Neither model is ideally suited for video super-resolution due to flicker effects. As they enhance each frame individually, inconsistencies can occur, causing a flickering or jumpy appearance in the video output.
Hardware Requirements: Both ESRGAN and Real-ESRGAN require powerful GPUs and significant memory. If your hardware isn't up to the task, you might experience performance issues.

By keeping these guidelines in mind, you can make an informed decision about which model to use for your specific needs, and get the most out of these powerful AI tools.

The Best Upscaler AI Models: Exploring Alternatives

When selecting an AI model for your project, considering different options can help you make a more informed decision. Here are a few AI models that offer similar functionalities to the ESRGAN and Real-ESRGAN models:

Gfpgan by Tencentarc: This model specializes in face restoration for old photos or AI-generated faces. It's comparable to ESRGAN in its image enhancement capabilities but specifically targets facial images. It operates on the Replicate platform under the Image-to-Image category and costs $0.0033 per run.
Codeformer by Sczhou: Similar to ESRGAN, Codeformer provides a robust solution for face restoration in old photos or AI-generated faces. The main contrast lies in the specific restoration algorithms utilized. Running on Replicate, it falls under the Image-to-Image category and costs $0.0055 per run. You can check out this guide to compare GFPGAN and Codeformer.
Swinir by Jingyunliang: This model provides image restoration capabilities using the Swin Transformer. While ESRGAN and Real-ESRGAN use GAN-based approaches, Swinir's primary focus is on leveraging transformer architecture for image enhancement. It operates on the Replicate platform, under the Image-to-Image category, and costs $0.0276 per run.
Scunet by Cszn: Scunet, a practical model for blind denoising, also has a notable contrast in methodology, utilizing the Swin-Conv-UNet and Data Synthesis. It runs on Replicate, classified as Image-to-Image, and costs $0.00275 per run. Further reading on this model is available here.

These models provide varying solutions for image restoration and enhancement tasks. Though they serve similar purposes, the differences in their design, methodology, and cost per run demonstrate the variety of options available for developers. When selecting a model, consider your project's specific requirements, your available budget, and the unique capabilities and methodologies of each model.

Conclusion

Image enhancement has seen a significant evolution with the advent of AI models. They can breathe new life into your cherished but dull photos, enhancing details, and correcting degradation. Both ESRGAN and Real-ESRGAN utilize AI to serve this purpose, but each has its unique forte and is suited for different scenarios. ESRGAN shines when enhancing bicubic downscaled images, while Real-ESRGAN excels in managing real-world corruptions like blur, noise, and compression artifacts.

By comprehending the unique strengths and limitations of ESRGAN and Real-ESRGAN, you can effectively choose the right tool for your specific needs. Whether it's enhancing synthetic images or real-world photos, the right AI can be a game-changer. However, it's crucial to understand that both models have their limitations, particularly with extremely low-resolution images or those with heavy degradation, and require powerful hardware for optimal performance.

Remember, the choice between ESRGAN and Real-ESRGAN isn't about which one is superior overall, but which one is best suited for your unique needs. Thanks for reading and happy enhancing!

Subscribe or follow me on Twitter for more content like this!

Additional Resources

Here are some useful resources to further explore the ESRGAN and Real-ESRGAN models, as well as their implementations:

Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data: The academic paper presents the Real-ESRGAN model, discussing its design, methodology, and results in-depth.
ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks: This academic paper outlines the conception, design, and capabilities of the ESRGAN model.
ESRGAN details: The detailed page for the ESRGAN model, including its description, specifications, capabilities, and cost per run.
Real-ESRGAN implementation by NightmareAI: A popular Real-ESRGAN implementation available on the Replicate platform, created by NightmareAI.
Real-ESRGAN implementation by Xinntao: The implementation of Real-ESRGAN by the original author, Xinntao, also available on the Replicate platform.

These resources provide comprehensive insights into the design and functionalities of the ESRGAN and Real-ESRGAN models, as well as their practical applications. They can be beneficial for anyone interested in these models, whether you're a developer looking to integrate them into a project or just someone interested in learning more about cutting-edge AI technology.

ESRGAN vs. Real-ESRGAN - from theoretical to real-world super-resolution with AI