Ideal Document Archive Resolution

Messy stack of papers.
My messy stack of papers.

I started scanning away physical documents at high resolution and found each page was coming out to about 290-330 KBs. A 3 page document could cost me 1.05 MBs to store, which sounded higher than it needed to be. That would start to add up quickly when you have hundreds of documents to scan. Could I save some storage space by scanning at a lower resolution? Here are my current storage size estimates:

PagesStorage Size
1310 KB
103.1 MB
10031 MB
1,000310 MB
10,0003.1 GB
100,00031 GB

The fine details of most documents I’m scanning aren’t that important, all that matters is that the text is legible. For example, I like to keep a history of my car maintenance and most shops only provide a physical copy. A smog check receipt is important to save but the quality only needs to be good enough to verify some key text like certificate number, location, and date. It’s not a document I’m going to refer back to regularly, and may never even need to look at again, so it doesn’t need to have crisp text.

I decided to try out various resolutions for the smog check receipt, a standard letter size document, to find the pixel density sweet spot, meaning a density as small as possible but with text that’s still easily readable. I calculated the PPI, pixels per inch, for a standard letter size paper with the formula:

$$PPI = {Pixels \over Inches} = {\sqrt{W^2 + H^2} \over \sqrt{8.5^2 + 11^2}}$$

Here are the samples I took using my phone:

SampleCamera SettingPPISize (KB)Size Reduction
112M (4048×3036)3272860%
27M (3200×2400)25021425%
35M (2592×1944)20215645%
43M (2048×1536)16811859%
52M (1600×1200)1297972%
61M (1440×1080)1156378%

I’ve heard that the high quality PPI tipping point is 300, where increasing the density beyond that value doesn’t make a significant difference to the quality. Even the Library of Congress recommends 300 PPI for preserving digital resources. So I’m not surprised our first sample is sharp and very close to the original source.

Moving on to the next samples, 2 and 3 are both pretty easy to read. Sample 4 at 168 PPI is where it starts to become questionable then samples 5 and 6 become too difficult to read. That makes our winner sample 3! So 200 PPI is about as low as you can go and still retain sharp enough text. Of course this could vary by document depending on personal preferences, the size of the text, or other small details you are trying to preserve. Let’s revise our size estimate chart from the beginning using our new density:

PagesStorage SizeReduced Storage Size
1310 KB140 KB
103.1 MB1.4 MB
10031 MB14 MB
1,000310 MB140 MB
10,0003.1 GB1.4 GB
100,00031 GB14 GB

I’m able to shave off about 45% of my initial estimated storage space without any loss of information. That will add up to a significant savings as I accumulate more and more scanned documents over time.

1 thought on “Ideal Document Archive Resolution”

Leave a Reply