-- Binary Data Project

When Printing the Web first began in 1991, the content of the Web was primarily hypertext.1 It made sense then, that Printing the Web focused its efforts on capturing textual information. As technology has improved and become more ubiquitous, bringing multimedia applications to the ordinary desktop, so to has the World Wide Web adapted to delivering these new formats. In addition to character encoded text markup languages, there are now many binary encoded formats seeing wide use on the Internet. Some of the more popular formats include the MPEG-1 Layer III audio encoding format, more commonly known as MP3, and the MPEG-1 video encoding format. With the infusion of these new formats into the architecture of the World Wide Web, we have had to adapt our current archiving model to include these.

Multimedia came, in force, to the web in 1998 with the release of Justin Frankel's shareware media player application, Winamp (Justin would also go on to release Gnutella, a peer-to-peer file sharing system, in 2000). With multimedia capabilities in place, the Web grew hungry for content. In fall of 1999, Shawn Fanning's peer-to-peer multimedia file sharing program, Napster, fed that hunger. Having been active in the World Wide Web community for nearly eight years already, anticipated this explosion of new content. In early 1997 a pilot project called the Binary Data Exploration Project was initiated to explore the impact that multimedia would have on Printing the Web. The technical working group that formed the core of this initiative comprised a number of experts in the field of representational knowledge. Early efforts resulted in a knowledge extraction application, drawing upon experimental linguistic techniques then being researched at the Université de Savoie.2 Feeding a binary multimedia file into this program produced a mots du média, or, a textual representation of audio and video that captured the essence of each file. While this proved to be an interesting endeavor, ultimately, we felt that the final product filtered out too much of the important information.

After much deliberation and consultation with some of our colleagues that had previously addressed the problem of representing arbitrary data in a text format,3 it was decided that the most elegent solution was to simply print the BASE64 encoded binary data directly. This fit in nicely with the ongoing efforts of Printing the Web; the Binary Data Project (BDP) received its first set of Star NX-1000C printers in April of 1998, just in time for the influx of MPEG encoded files.

The introduction of peer-to-peer file sharing networks has had a very important influence on the BDP. It has challenged our idea of what constitutes the WWW. These ad-hoc networks share thousands and thousands of binary data files, sometimes only for a very short time. Therefore, it is important for us to be able to take an accurate snapshot of content at brief intervals and extract the meaningful data.

We have created a number of tools in order to help us do our job. The ShareAlike program that our development team designed in early 2000 is an intelligent agent that patrols these transient networks, looking for new information. How does it know when it finds a new binary data file? Each file that we print is assigned a checksum, created by a very sophisticated algorithm tailored specifically for our purposes. The ShareAlike agent can rapidly calculate this checksum on each file it touches, and compare this with a centralized database housing checksums of files our warehouse already contains. In this way, we are able to capture as much information as possible with as little overhead as possible.

While believes that all information should be free, we recognize that many countries have very strict laws regarding copyright. In as much as is possible, the BDP endeavors to respect the rights of artists to profit from their labor. Our policies and practices are continually examined and evaluated by a team of legal experts. We feel confident that the conversion of binary data to the print medium will not unduly interfere with the artist's control over his or her intellectual property. Such practices can only encourage the generation of artistic and intellectual capital.

1. See for an example WWW Project Page from 1992
2. Département de la Connaissance a l'Université de Savoie
3. Borenstein, N. and N. Freed, "MIME (Multipurpose Internet Mail Extensions): Mechanisms for Specifying and Describing the Format of Internet Message Bodies," Internet Engineering Task Force, 1992. Article on-line. Available from