PostScript and PDF: The Printer Language That Became The Page

How PostScript revolutionized desktop publishing and birthed the PDF ecosystem.

Mar 03, 2026

Every PDF is the frozen output of a programming language called PostScript. PDF’s structure is PostScript’s structure with the execution layer removed. Every design decision that looks arbitrary in the spec traces back to a specific choice made in 1984.

This post is a technical dive. PDF is a deep rabbit hole and there will be several follow-up posts about different aspects like sub-formats, fonts, and encryption.

Adobe Was Founded on PostScript

John Warnock was working at Xerox PARC in the late 1970s, building Interpress, a page description language for laser printers. Xerox declined to commercialize it. Warnock and his manager Charles Geschke left, founded Adobe in December 1982, and built PostScript as a cleaner, licensable version of the same idea. That decision, to publish the spec and license it openly, is what produced the ecosystem that eventually became PDF.

undefined — John Warnock — Credit: Marvalous - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=17821121

The core problem Interpress and PostScript were both solving was that printers at the time were raster devices. We sent them a bitmap and the host computer did all the rendering work. The output was locked to the resolution at which it was rendered and every printer model needed its own driver. PostScript proposed that we send the printer a program that describes the page, and let the printer’s own processor render it at native resolution, resolving the resolution lock and driver proliferation problems in one swoop.

PostScript Level 1 shipped in 1984. The Apple LaserWriter launched January 23, 1985 at $6,995, with a Motorola 68000 processor, AppleTalk networking for up to 16 Macs, and PostScript built in. Paired with Aldus PageMaker and Steve Jobs’s $2.5 million investment in Adobe, within three years 23 manufacturers had licensed PostScript.

PostScript succeeded with openness where Interpress didn’t. The Blue Book (tutorial and cookbook, 1985), the Green Book (program design, 1994), and the PostScript Language Reference Manual were all publicly available. Any manufacturer could build a compatible implementation.

The key innovation was that any PostScript compatible printer could generate page content dynamically at render time, scaling any fonts and geometry to any size cleanly. We could proof print on a lower resolution printer and be certain that our file would print faithfully on a high resolution commercial printer.

The PostScript Language

PostScript is a Turing-complete, dynamically-typed, stack-based programming language. That it is a real programming language and not just a drawing format is what shapes every architectural decision that follows.

Turing-complete - capable of computing anything a general-purpose computer can compute, given enough time and memory. PostScript can implement any algorithm, not just draw shapes.
Stack-based - the primary data structure is a stack: a last-in, first-out pile. Operations take inputs from the top of the stack and push results back.
Reverse Polish notation (RPN) - a way of writing expressions where the operator comes after the operands. Instead of 3 + 4, we write 3 4 add. The operands go onto the stack first and the operator pops them and pushes the result.
First-class objects - values that can be stored, passed around, and used like any other data. In PostScript, a block of code in curly braces is a value we can push onto the stack, store in a variable, or pass to an operator, the same way we would a number or a string.

In 1989, Don Hopkins at the University of Maryland built ps.ps, a PostScript interpreter written in PostScript, as part of the PSIBER (PostScript Interactive Bug Eradication Routines) project. The interpreter maintains its own meta-execution stack as a PostScript array, storing continuations as dictionaries, so its state does not interfere with the program being interpreted. Before each instruction, it calls a user-redefinable trace function, which is how it becomes a debugger. A related project, Glenn Reid’s Distillery, was a partial evaluator written in PostScript that optimized drawing programs by unrolling loops and pre-computing calculations at write time rather than render time. Both exist because PostScript is a complete programming language that can inspect and transform itself.

PostScript uses reverse Polish notation. Operands go on the stack before the operator that consumes them.

3 4 add - pushes 3, pushes 4, pops both, pushes 7

The interpreter maintains five distinct stacks:

Operand stack - holds data, intermediate results. Adobe’s spec caps it at 500 elements.
Dictionary stack - variable bindings, scoped. When the interpreter looks up a name, it searches the dictionary stack top-to-bottom. begin and end push and pop dictionary frames.
Execution stack - suspended procedure contexts, analogous to a call stack. Capped at 250 levels.
Graphics state stack - current rendering parameters: color, line width, font, transformation matrix, clipping region, and the active clipping boundary. gsave pushes, grestore pops.

Procedures are first-class objects. Code in curly braces is pushed onto the operand stack as data and only executes when explicitly invoked:

/double { 2 mul } def
5 double   % result: 10

This is how PostScript implements control flow. if, for, loop, repeat are all operators that take procedure objects from the operand stack and execute them conditionally or iteratively. A PostScript file is a program.

Standard infix notation requires lookahead: a parser reading 3 + needs to see what follows before it knows the expression is complete. With RPN, the operator can execute on arrival. The complete execution state lives in the stacks at every moment. This means the interpreter can pause between any two tokens, save the stacks, and resume when more input arrives. This is a design choice shaped by the constraints of an era when memory was scarce and networks were slow.

The PostScript scanner and interpreter are cleanly separated. The scanner tokenizes the byte stream into typed objects (numbers, names, strings, arrays, procedures). The interpreter dispatches on type: numbers push to the operand stack, names trigger dictionary lookup, and operators execute immediately. This separation means the interpreter doesn’t need the whole file before it starts. When it runs out of input mid-job, it signals “give me more” and hands back control instead of sitting idle, which is why PostScript worked over slow AppleTalk connections in 1985.

The scanning and dispatch model survives in PDF content streams. Content streams use the same tokenizer structure with typed objects, RPN operand-before-operator order, and dispatch on type. When a PDF renderer processes a content stream, it is doing exactly what a PostScript interpreter does; scanning tokens, pushing operands, executing operators. The lineage is structural, not cosmetic.

Standard PDF uses a cross-reference table that permits random access to objects within the file by specifying byte offsets, allowing readers to jump directly to objects without reading the entire file sequentially. This assumes we have the whole file, the reverse of PostScript’s stream design. However, large PDFs over the web would be painfully slow if we needed to download the whole file before rendering, so the sub-format of linearized PDFs reorganize the file so page 1’s objects come first, letting a browser render the first page while the rest downloads. The web is the reason for a bolt-on recovery of PostScript’s streaming property.

Linearized PDF is one of many sub-formats we’ll cover in another post.

The Graphics Model

PDF inherited PostScript’s graphics model almost verbatim.

Everything is a path. We construct a path with moveto, lineto, curveto, and arc, then render it with stroke or fill. The path construction and rendering are separate steps: build the path, then decide what to do with it.

newpath
100 100 moveto
200 100 lineto
200 200 lineto
closepath
fill

Curves are cubic Béziers. The curveto operator takes three control points and produces a smooth curve that leaves the current point in the direction of the first control point and arrives at the endpoint from the direction of the second.

Fill rules decide which regions inside a complex path to paint and which to leave empty. They matter when paths cross themselves or contain shapes inside shapes.

Winding number - for any point inside a path, count how many times the path winds around it, tracking direction. Counterclockwise adds 1, clockwise subtracts 1. A simple closed shape has winding number 1 everywhere inside it.
Non-zero winding rule - paint a region if its winding number is anything other than zero.
Even-odd rule - paint a region based purely on how many times the path boundary crosses a line drawn outward from that point. Odd crossings means paint; even crossings means don’t. Direction doesn’t matter, only count.

fill uses the non-zero winding rule: scan the path crossings and track direction; paint where the sum is non-zero. eofill uses even-odd: each crossing toggles on or off, regardless of direction. A ring shape, an outer circle drawn counterclockwise and an inner circle drawn clockwise, renders correctly with the winding rule because the inner circle’s winding contribution is −1, canceling the outer circle’s +1, giving 0 inside the hole.

Clipping paths can only shrink. The clip operator intersects the current clipping path with the current path. There is no expand. The one-way behavior was intentional - a clipping path that can only shrink requires no undo mechanism. To recover a wider clip, save the graphics state before clipping with gsave and restore it with grestore. PDF carries this constraint forward unchanged.

User space and device space are two separate coordinate systems that the current transformation matrix (CTM) bridges. User space is the coordinate system we write in. The default unit is one typographic point, 1/72 of an inch, with the origin at the bottom-left of the page. When we write 100 100 moveto, those coordinates are in user space. Device space is the coordinate system of the output device - printer pixels or screen pixels, at whatever physical resolution that device runs. The same moveto call produces physically identical output on a 300 dpi LaserWriter and a 2,540 dpi imagesetter because the CTM maps user space to device space independently for each device.

The CTM is a 3×3 matrix (stored as six numbers [a b c d e f]) that lives in the graphics state. translate, rotate, and scale modify the CTM. Because the CTM is part of the graphics state, gsave and grestore save and restore transformations along with everything else. We are never moving pixels; we are changing how user space coordinates project onto device space.

Let It RIP

A PostScript printer contains a Raster Image Processor. The RIP does two things: interprets the PostScript program, then renders the resulting graphics model to a raster bitmap at the printer’s native resolution. A 300 dpi LaserWriter and a 2,540 dpi Linotronic imagesetter both accept the same PostScript file and produce output at their respective resolutions, because the file describes the page geometrically rather than as a fixed bitmap.

The RIP has its own processor. In 1985, the LaserWriter’s Motorola 68000 was more powerful than the Mac it was attached to. The computation happened in the printer. As desktop CPUs improved through the 1990s, this stopped being economically rational, and the case for onboard RIPs in consumer printers collapsed. Once the host computer was doing the rendering anyway, sending it a program to execute rather than a pre-rendered document stopped making sense. That shift is part of what made PDF viable.

PDF as PostScript’s Compiled Form

Adobe introduced PDF in 1993 and the relationship is direct. PDF takes PostScript’s graphics model and removes the programming language. A PDF file describes pages that have already been laid out, with embedded fonts, compressed image streams, and precise object offsets. The document is pre-rasterizable. What the RIP had computed inside the printer, PDF now carried as a finished document.

PDF kept the path model (moveto, lineto, curveto), the graphics state (CTM, gsave/grestore, fill rules, clipping), the font model (Type 1, CID-keyed fonts), and the imaging model (compositing, color spaces) from PostScript. PostScript’s graphics section is the PDF specification’s graphics section with the execution layer stripped out.

PDF dropped Turing completeness, dynamic execution, and the ability to generate page content programmatically at render time. A PDF viewer does not need a RIP. It reads a static object graph and renders it.

PostScript continued as the language that professional RIPs in printing presses speak internally. The PostScript files themselves largely disappeared from workflows by the early 2000s after Adobe shifted all new imaging innovation into PDF starting with version 1.4. Today, Adobe’s PDF Print Engine powers over 200,000 commercial presses and proofers globally. PostScript still runs inside every commercial press that renders a PDF as invisible infrastructure.

What is PostScript May Never Die

PostScript didn’t disappear when PDF won; it receded into infrastructure. The Library of Congress maintains roughly 20,000 PostScript files and 350,000+ EPS files in its digital preservation collection. ArXiv still offers PostScript downloads for physics preprints. macOS still ships PostScript rendering in Preview.

What looks arbitrary in the PDF spec is usually a decision made in 1984. In later posts, we’ll dig into all the pain this legacy has wrought on this ubiquitous format.

A Brief History of PDF

Libo

Feb 26

Read full story

Building Probable

A Brief History of PDF

Discussion about this post

Ready for more?