# Ideal Document Archive Resolution

I started scanning away physical documents at high resolution and found each page was coming out to about 290-330 KBs. A 3 page document could cost me 1.05 MBs to store, which sounded higher than it needed to be. That would start to add up quickly when you have hundreds of documents to scan. Could I save some storage space by scanning at a lower resolution? Here are my current storage size estimates:

 Pages Storage Size 1 310 KB 10 3.1 MB 100 31 MB 1,000 310 MB 10,000 3.1 GB 100,000 31 GB

The fine details of most documents I’m scanning aren’t that important, all that matters is that the text is legible. For example, I like to keep a history of my car maintenance and most shops only provide a physical copy. A smog check receipt is important to save but the quality only needs to be good enough to verify some key text like certificate number, location, and date. It’s not a document I’m going to refer back to regularly, and may never even need to look at again, so it doesn’t need to have crisp text.

I decided to try out various resolutions for the smog check receipt, a standard letter size document, to find the pixel density sweet spot, meaning a density as small as possible but with text that’s still easily readable. I calculated the PPI, pixels per inch, for a standard letter size paper with the formula:

$$PPI = {Pixels \over Inches} = {\sqrt{W^2 + H^2} \over \sqrt{8.5^2 + 11^2}}$$

Here are the samples I took using my phone:

 Sample Camera Setting PPI Size (KB) Size Reduction 1 12M (4048×3036) 327 286 0% 2 7M (3200×2400) 250 214 25% 3 5M (2592×1944) 202 156 45% 4 3M (2048×1536) 168 118 59% 5 2M (1600×1200) 129 79 72% 6 1M (1440×1080) 115 63 78%

I’ve heard that the high quality PPI tipping point is 300, where increasing the density beyond that value doesn’t make a significant difference to the quality. Even the Library of Congress recommends 300 PPI for preserving digital resources. So I’m not surprised our first sample is sharp and very close to the original source.

Moving on to the next samples, 2 and 3 are both pretty easy to read. Sample 4 at 168 PPI is where it starts to become questionable then samples 5 and 6 become too difficult to read. That makes our winner sample 3! So 200 PPI is about as low as you can go and still retain sharp enough text. Of course this could vary by document depending on personal preferences, the size of the text, or other small details you are trying to preserve. Let’s revise our size estimate chart from the beginning using our new density:

 Pages Storage Size Reduced Storage Size 1 310 KB 140 KB 10 3.1 MB 1.4 MB 100 31 MB 14 MB 1,000 310 MB 140 MB 10,000 3.1 GB 1.4 GB 100,000 31 GB 14 GB

I’m able to shave off about 45% of my initial estimated storage space without any loss of information. That will add up to a significant savings as I accumulate more and more scanned documents over time.

## 1 thought on “Ideal Document Archive Resolution”

1. […] x 13.1 D x 6.1 H inches. But the scanner works perfectly and scans at a resolution well beyond the 300 DPI we need. I found the original specs posted on the Canon […]