Fixing an impossible 1-million-image speed-killing media library.

WordPress Mobile Speed

Updated


There are no affiliate links on PagePipe.

Case study:
Remediation of a dying pop-culture WordPress website.

Cause of eminent site death:
Complications due to an aging media library and lack of self-care.

Our client’s old website media library soon will cross the million image mark. The client site is on self-destruct with the present media library strategy. There is no media management strategy. Zero discipline. Independent authors can upload anything. Unrestrained chaos.

The pop-culture blog began in 2011, the media library worsened starting in 2017. And got worse each successive year. More and fatter images were being uploaded. The worst is 2021 with over 20-percent of the media library images in that folder. We don’t know the upper limit for WordPress. Can the database keep track of so many images? There is no limit on the number of WordPress posts or pages.

This site is huge because of so many and so heavy image files. Backup of the live site fails restoration or migration in its present condition. Bad signs of site flakiness.

The media library isn’t managed strategically – but grown organically. The site owner had no idea of the impending danger. The old theme created 11 thumbnails on the server for every original image uploaded. And many images were in huge PNG format. The average weight of an image was 1 megabyte file size.

Are we heroes or victims?
We’re creative problem-solvers.

We exhausted all methods of known plugin cleaner automation from the WordPress plugin directory. On the test site, the plugins would hang, freeze, or fail to do anything. The library is too huge for culling the thumbnails online. It overruns the server resources. So we set up an offline emulator to run the same automation on our computer. That also would crash or freeze from overruns, too.

So we got creative.

When you upload an image to your Media Library, WordPress makes several copies of it. Different themes and plugins may also request a variety of image sizes. So, your Media Library can amass a lot of hidden files. These are not shown in the media library. The different-size duplicates are on the server. We check using the free WP File Manager Pro plugin.

This duplicity and redundancy bogs down an entire website. All images take up space on your web server. Configured automatic backup process become large and unwieldy. Then you may backup but not be able to restore from backup. If you can’t restore, what good is a backup? Worthless.

Depending on your web hosting plan, extra files cost you more dollars and speed each month. It’s important to make a copy of your current website before tackling a Media Library clean out. This process requires deleting some files for good. We had to download the media library via Filezilla, an FTP client.

REFERENCE: https://wpengine.com/resources/wordpress-media-library-clean-up/

REFERENCE: https://wpengine.com/resources/prevent-wordpress-image-size-generation/

REFERENCE: https://kinsta.com/blog/wordpress-image-sizes/

On the server, the images are in folders by year and month (2021/11). In our case, each month has up to 10,000 images in each folder. We don’t know what the upper limit is. But we suspect the site is maxing out WordPress and server capabilities. Everything is running super slow on the WordPress frontend and backend.

We invented an unconventional workaround. A miracle?

We used a search computer program that permits us to select and delete the thumbnails. The search criteria are the letter “x.” That character is in all thumbnails to specify the dimensions. But not used in originals unless the image file name has an “x” character.

As a test, we located and removed those additional “X” images by hand for months 10 and 11 for year 2021. Then we put them back after the cleaning of the two month folders. Thus far, those files with x’s have names like: executioners, NXT, EXCALIBER, and X-men. There may be others. Missing some will not cause fatal errors. Using a link checker after migration, can identify missing images and hunt them down. But we would like to reduce that workload. Even a 1 percent error on 1 million images is too much. That would mean linking 10,000 images by hand. Ouch.

So we won’t use this “x” character search method. That might work for a smaller library.

Will the WordPress media library fall over with too many assets?

WordPress and technical blogs report, “No problem. No limitations.” But that isn’t true. You can’t back up or restore your site in an acceptable time. To download this 1-million-image media library took 48 hours of continuous download time. And that was on a 50/50 fiber-optic connection using a direct FTP Filezilla client.

For the database to keep track of this many images is a big slowdown for the server. Especially on the backend.

Using our offline desktop sorting method on a folder with 9,998 images, we estimate a reduction to around 500 final images. It’s crazy but it works without crashing – as long as we do one month at a time. Searching the 1 million images at once blows the computer’s brains. So we cleaned them using 10,000 image chunks.

This proved too tedious and we explored and discovered other methods.
We describe them soon. Keep reading.

After cleaning, the remaining original images are often in the wrong format and bloated. They need compression and resizing. We found an optimization computer program so we can do these tasks offline. Even with this number reduction, the total of the media library is too much data to process on-site. So we invented creative solutions.

Will the changed creation dates on media files during download cause link breakage? This turns out not a problem. But we tested some samples to be sure. Other developers download images via FTP and restored them with success. A large amount of images is about 100,000. We’re working with 10X that amount.

The worst format to upload to your media library is large PNG photos.

REFERENCE: https://pagepipe.com/worst-image-speed-offense-png-24-bit-transparency-format/

Using oversized PNGs on this site started to worsen in 2017. PNG format photographs can’t compress as efficiently with automation as JPEG format can. We converted all PNGs to JPEGs and resized them to 2000 pixels wide or less. We did this with plugins.

REFERENCE: https://pagepipe.com/worst-image-speed-offense-png-24-bit-transparency-format/

REFERENCE: https://pagepipe.com/optimize-images-for-mobile-speed-with-imsanity-plugin/

A cheap trick for sorting and bulk-deleting media files:

You can delete files in bulk using Catfish on a desktop. Catfish is an open-source Linux GUI tool that enables you to search your desktop for any kind of file. Here are the steps:

  • Put folder of images in the home folder
  • isolate search to that folder
  • Search for x
  • CTRL+a – hold down
  • right-click mouse while still holding CNTRL+a
  • lift up on CNTRL+a while holding the mouse button
  • After lifting up on contrl+a, a menu appears in 10 seconds.
  • Select all and delete.

Cleaning left non-existent ghost images in the media library.

We set the screen options to display 700 images. We then delete 700 “ghosts” per page at a time from the media library. That was repeated 210 times using 1 to two minutes each. It takes 3 clicks to delete an empty image.

We upload the cleaned media library via FTP to the server folder. We then re-linked all images with the free Bulk Media Register plugin. Other plugins like “Add from Server” and “Media from FTP” plugins won’t work. They don’t have the capacity for that many images.

Our test took two hours to reattach two months of folders compressed and stripped. That reattachment created thumbnails. 1054 files plus 3 thumbs for each.

That worked.

We added “Find Posts Using Attachment” plugin. That broke the media library. Not good for a big job.

Recommended Image Sizes

Make sure you don’t upload images larger than the needed largest size. WordPress won’t use those large images – and it eats into unnecessary space on your server.

Some tutorials suggest you input “0” for the default image sizes. We recommend against this. Even if it may help you save on space. Why?

Changing the default values to “0”, WordPress uses original images across all devices. Those are the biggest and heaviest images. This leads to site bandwidth usage and slows down page load time – especially on mobile devices.

The default image sizes we recommend for our test site are as follows:

  • 150px square for thumbnails.
  • 300px width for medium images.
  • 768px max width for medium-large images.

For featured images in headers, we recommend uploading images at least 1600 pixels x 1600 pixels, but not larger than 2000 pixels x 2000 pixels. WordPress processes and formats your images. This allows the browser to choose the correct size based on page space and screen quality. Uploading a 2000 pixels full-width image is often wasteful. Most site viewers will see it in smaller dimensions.

Uploading compressed and properly sized images prevent long load times.

After changing the antiquated theme to modern GeneratePress theme for speed, we had a problem. All the featured images were not linked. They were missing on every post. And there were tons of posts. There are 117,376 posts. Does anyone want to fix those by hand? I didn’t think so.

Also all the embedded images on posts were hardwired for a small column width.

A workaround was bulk stripping all image ID codes to show the original size image – instead of the thumbnail image. WordPress then dynamically resizes the image to fit. Then pages show the original size image – instead of the thumbnail image. WordPress then resizes the image to fit the screen using image swaps. This function’s been included in WordPress for several years. WordPress uses onboard automation to select the right size image. It displays the stored thumbnail images based on the screen size.

It’s not a matter of replacing code but removing code. Much easier problem to solve.

We used a conditional logic search and replace plugin to solve this problem of too small images. Better Find and Replace plugin uses a formula – and can’t be removed afterwards.

How to cleanse a polluted media library.

First, use the FTP protocol to download the media library from the host server. It’s not possible to download an oversized library with plugin helpers like:

If your media library has bloated up to 40GB to 100GB+, you may have a successful site backup. But you can’t restore or download the site using conventional plugins or servers. You can’t restore via the WordPress dashboard or with backup plugins.

REFERENCE: https://pagepipe.com/efficient-plugins-strategies-for-optimizing-images/

For this case study, we use Filezilla freeware FTP utility. FTP clients are available for Windows, Linux, and macOS. Download it here: https://filezilla-project.org/

To access your server with Filezilla, you need a host, username, password, and port number. Find these credentials on your host server. If you need help, Google “FTP and hostname.” Most good hosts have a FAQ or help pages written on how to use Filezilla with their services.

Once logged in with Filezilla, you navigate to the WordPress folder listed by year.
wp-content > uploads > folder

You then drag and drop the folders to your desktop computer. There are tutorials online about how to use Filezilla.

This will take a long time. For example: in our case study, there are around 20,000 plus images in each month’s folder. Well over 1 million images in the entire media library. And many are large PNG files. Very heavy. This took 48 hours to download on a fiber connection.

After downloading, we remove the thumbnails created by WordPress and old theme settings. There were over 11 images created for each original file uploaded. So 12 images total for each upload.

By removing the thumbnails, we reduce the media library to a tenth of the weight. In other words, a 100GB media library then becomes about 10GB. We still consider 10GB fat. But it’ll be backed up and restored.

Thumbnails are not stored in future backups. We use: Exclude Image Thumbnails From UpdraftPlus Backups plugin to not junk up the server and make restoration possible.

We rebuild thumbnails after restoration with:

Force Regenerate Thumbnails

Force Regenerate Thumbnails is the correct plugin for this big job. Not the other plugins that rebuild thumbnails. For some reason, the alternatives would choke on the amount of data processed.

How to strip all thumbnails from the downloaded media library.

We found free desktop software to use. XnView Multi-Platform is available for download and installation for Windows, Apple, and Linux operating systems.

With 1 million images, you can’t remove thumbnails with automation on a remote “rented” server. The host will shut down your site – or throttle the work to a crawl. There are free plugins to do this kind of work but not at this scale of bloat.

We have to strip the library offline on a desktop computer. The software for the job is XnView MP (multi-platform). XnView is an image organizer and general-purpose file manager. It’s used for viewing, converting, organizing, and editing raster images.

XnView MP sorts fast. Minutes at the most. But deleting thumbnails via the trash takes a long time like 30 minutes to an hour. That is to move them to the trash. Then you have to erase them. That takes time also.

REFERENCE: https://www.xnview.com/en/xnviewmp/

It’s free for private use. It operates on Windows, Mac, and Linux systems.

The trick is using Regular Expression (RegEx) syntax to sort or filter the contents of folders.

https://www.regular-expressions.info/refquick.html

We will use the \d (backslash-d-lowercase). That’s called “digits shorthand.”

Since certain character classes are used often, a series of shorthand character classes are available. \d is short for [0-9].

https://www.regular-expressions.info/shorthand.html

We want to locate all images with dimensions. These dimensions are in the file name as -XXXxXXX.

EXAMPLE file name:
3BC4276D-7543-4980-B816-3CEC333558AB-620×315.jpeg

The characters and symbols in bold are unique thumbnail identifiers. But digits (0 to 9) and the number of them vary from 2 to 3 digits. Original images don’t have these dimensions.

We repeat for emphasis: These are the thumbnails. Images without these dimensions are originals. We want to keep the originals in the same folders – and delete all the thumbs from inside the folders.

To do this we use regular expressions.

https://en.wikipedia.org/wiki/Regular_expression

A regular expression (shortened as regex or regexp; also referred to as rational expression) is a character sequence specifying a search pattern in text. These patterns are searching algorithms for “find” or “find and replace” operations on strings. It is a technique developed in theoretical computer science and formal language theory.

Regular expressions are used in search engines, search and replace dialogs of word processors and text editors, in text processing utilities

So here is what we need:

\d\d\dx\d\d\d\.|\d\dx\d\d\.|\d\d\dx\d\d\d\d\.|\d\d\d\dx\d\d\d\d\.

\d is short for digits 0 to 9.

vertical line | means “or”.

\. allows us to use the “dot” in the file name. Otherwise, a period (.) means “match any single character.”

We then have covered 4 different thumbnail formats. They work depending upon the thumbnail image sizes. So we flag 1 or 2 or 3 or 4 patterns. Those 4 cover all the sequences identifying our thumbnail dimensions.

So we use XnView MP for sorting the files. We could do about 1 year as a batch. Any more than that would crash the computer. There were times we couldn’t empty the desktop trash. It wouldn’t be uncommon to have 80,000 plus images in the trash for erasure. WE’d have to reboot and delete trash.

After removing the thumbnails, there are around 120,000 images instead of 1 million.

Now we want to reduce any image wider than 1130px wide. That is our largest column width on the website pages.

The Original media library contained 874,643 items, totaling 106 gigabytes of files.

After stripping all thumbnails, the media library contained 119,692 items, totaling 70 gigabytes of files.

Convert PNG format to JPEG images.

Our next goal is converting all PNGs to JPEG image format. We explain why this is a necessity here:

REFERENCE: https://pagepipe.com/worst-image-speed-offense-png-24-bit-transparency-format/

The enormity of the large media library file affects processing time. Even offline, to process each folder of images – and there are 132 folders! – takes about 30 minutes each approximately. That is the worst case for the heavier folders. To attempt processing the entire library at once crashes the desktop computer. We used a Dell OptiPlex 7020 Quad-core 64-bit CPU with 32 gigabytes of RAM. The clock speed is 3.9 megahertz. Operating System: Linux Mint 20.1. Local storage: 2 terabytes solid-state hard drive. This isn’t a supercomputer. But it’s no flyweight either.

The fast-processing folder was the oldest year. It took 6 minutes to resize.

For example: Converting PNG to JPG changed an original 614MB PNG with 19 seconds of processing conversion. PNG > JPG: 255MB

Conversions for the entire media library:

Original: 105.4GB

Final: 23.2GB

Uploading via FTP to Uploads transfers take about 1 second per image.

We estimate it took about 75 hours to complete the FTP transfer of 272,900 JPG images.

Then we back up the media library without thumbnails.

You need to use Better Search and Replace plugin to change all .png extensions to .jpg images.

Using search-and-replace all .png extensions (image links in posts and pages) are renamed to .jpg. No PNGs are used anymore on this site. This renaming will introduce a few errors because some errant file names have the extension .png embedded inside the file name for a who-knows-why reason. But those are few.

But hallelujah, the media library is cleansed and purified.

The “uploads” (media library) are split into 79 bundles or packets for Updraft Plus backup. Each of those is about 400M. That means the final library after cleaning will be about 32 gigabytes. To give you an idea of the size of the media library backup, most websites only have one or two upload bundles. This site is 80 times bigger.

The “upload for download” process takes about 9 hours of prep time. This is the time it takes to retrieve from remote storage. Slow. Not including actual restoring to the server.

We entered 3 thumbnail sizes: 150 square, 300 square, and 1360 square.

And then ran Force Regenerate Thumbnails plugin. It indicated needing to regenerate 1061 thumbnails. That’s the number of images in the media library from our December tests. So we need to reconnect the original images using Bulk Media Register plugin. It took twenty man-hours to re-link the entire media library to the server.

We can only reconnect one month at a time (about 1,000 images) because of server and plugin processing limitations. Boring! This takes about 10 to 15 minutes to reconnect one month’s images. No errors.

The processing will take around 20 to 25 work hours, guesstimating.

One chunk at a time, we’ll get there.

SERVER PROBLEMS: OVERUNS

Inode is a Linux data structure used to keep information about the files, folders, emails, code, and everything else on your server. The number of inodes corresponds to the number of files and folders you have.

Our host, GreenGeeks, during our media library processing reported:

“Unfortunately, according to the Ecosite Premium hosting package, your inodes limit is 600,000. Your current inodes usage is 660,815.”

The server was shut down for 24 hours because of this resource overrun and then we could finish the last year of media library connections.

PROBLEM 2: Featured images didn’t migrate.

We can’t create new featured images for the entire media library database at once with free plugins. We tried a couple more. It swamps the WordPress database.

But this paid plugin (66 euros) is $88.50 US:

https://www.quickfeaturedimages.com/

They can add featured images by category or other parameters (like author, date, etc). It means making custom featured images by grouping. They also can pull from the first image on the post.

That batch processes in chunks, thus freeing up the workload.

We’re running up a tab on this project. We’re into overtime and throwing money at solutions. That’s painful.

We’ve not successfully backed up the site after two attempts with Updraft Plus since uploading the reduced and resized media library.

When the images were “reattached” to the server using a plugin method, WordPress created more thumbnails. This was not expected. New behavior. We stripped these offline. Before we had 12 thumbnails. Now we have three. Better but not good enough apparently.

We’re pretty sure the UpdraftPlus backup plugin failure is the sheer enormity of the media library. Our goal is backuping before the thumbnails were created. WordPress doesn’t work like it used to. It forces the creation of thumbnails even if the setting dimensions are zero. It seems, WordPress insists on having mobile images prepared in advance from originals.

The media library is smaller and cleaner now than it used to be – but still large. It was 110GB. We suspect it is now around 30 to 40GB. Still fat. We’ve worked with this size before with no luck backing up. Those clients had to move their sites to more expensive hosting – and just leave the media library in a tainted state.

There are two problems and potential solutions:

1. The original images are still very large and heavy. Individual images are around 500k. We don’t want to nuke those until after we’ve rebuilt and re-linked the featured images.

Then we can use Imsanity plugin (which still works with so many images) to nuke the large originals that aren’t used on pages and posts (unattached). That will take a long time (days) but is completely automated. This will reduce the size of the media library.

2. The featured images are built from sampling post content or using a fallback image. Then we cleaned the media library with Imsanity to free up the server space. This added days of processing time.

Splitting the batch of so many images is not ideal. But, unlike other plugins, Quick Featured Images Pro at least offers a solution.

It took 16 days to compress the original images library with ShortPixel.

It then took days to produce a backup and days to download it. The final download of the entire site using Updraft Plus with thumbnails excluded was 53 zip folders weighing 20.9 GIGABYTES. The original media library was 110 GIGABYTES alone. The original media library included a rouge file called gallery that had 2,136 items and weighed 424 megabytes. That unused junk was discarded.

The reattachment of featured images for 118 thousand posts was a daunting problem and task. Here’s what we discovered:

1. We tested all 15 free featured-image-replacement plugins in the WordPress directory. None would work. Those plugins sampled the entire database at once. The server would timeout before completion from the overload. This is even after upgrading the staging area to VPN hosting for more resources.

2. To solve the timeout problem, we purchased a paid featured-images plugin. This allows us to find and reattach featured images to blog posts. The plugin allows setting a window date for sorting and rebuilding the database. We’ve found the plugin can only batch reattach 2 weeks of posts because of the intensity of the search calculations – and the size and number of images per post.

The plugin authors got involved in specifying the criteria to solve the problem.

The search criteria are set in 5 steps:

1. Set the selected image as a new featured image.

2. Set the first embedded image in the post as the featured image.

3. Take the first external post image, download it, and add it to the media library.

4. Consider only posts without any featured image: Posts with featured images are ignored.

5. Time Filter: Search by time specifications. This is a 6-step process of setting by year and month with starting day and ending day.

There is no way to save these settings. They must be reentered by hand after each filtering process of two-week data.

This is the only featured image plugin that can do sorting and reattachment work.

The time for filtering and processing 2 weeks of database and media library is 3 minutes to 5 minutes. At 5 minutes, the server times out. We then have to set a smaller sample time of 1 week – and reprocess. Trial and error.

Because of the tediousness of this time-consuming process, we are only reattaching posts back to 2012. If no featured image is found, the header background is a default featured image with white-letter overprinting with the post title and publication information.

We have invested considerable time, money, and energy into solving the site problems. It’s been a technical challenge. The past problems were generated by poor web design practices over time by many people. It became worse for bloat over the past 5 years. That recent period is the worst for overloading with heavy oversized images. It got worse each successive year. Going forward, we will use the Imsanity plugin safeguards to prevent future site decay.

80 percent of the total media library count and weight was added in the last 5 years.

We figure it took about 1 week to 2 weeks to complete the backlog of featured images through 2012.

We successfully rebuilt the media library and reattach new featured images to posts within limitations. The limitation is if there are no images on a post, there is no featured image. In that case, we substitute a universal default featured image.

FINAL COMPARISON: BEFORE AND AFTER

Original library was 874,643 items, totaling 106.2GB

New size: 141,476 images in the media library after rebuilding featured images. About 20 to 30 gigabytes.

Removed unneeded plugins and optimized the database with a cleaner plugin.

Checking with WP File Manager Pro
There are now up to 7 thumbnails for each image in the media library.

After cleaning, we deleted Quick Featured Image Pro for speed. It’s not needed after maintenance.

Godspeed-

Steve Teare
performance engineer
December 2022

 

PagePipe Site Tuning Services for Speed

Instead of band-aid approaches, we drill down to the root cause of your slow site. This is origin optimization. Also known as site tuning. To do this, we analyze site components:

  • Hosting
  • Theme
  • Plugins
  • Scripts and third-party services.
  • Images and media library.
  • We minimize globally loading plugin effects.

Find out more details about Site TuningGet Speed!