Library Home
  Library Home   |   Catalogs   |   E-Resources   |   Get Help/Ask Us   |   Penn Library A to Z
E-Resource Locator
Franklin
Catalog
Digital Library Projects
*  New and noteworthy
*  Locally developed digital collections
*  Research and development
*  Tools and technology
*  Publications and presentations
*  Contact us

A summary listing

This page gives information on tools, formats, and protocols that we support, or are considering supporting, in Penn's Digital Library.

Use this page to find out more about these technologies, how to use them, and where you can get help using them.

List of Tools, Formats, and Protocols

Details

Cataloging and Metadata

Voyager

  • What it is: Software managing our library acquisitions, cataloging, and circulation.
  • Software provided by: Endeavor Information Systems (see their Voyager Product Information page). Endeavor is a wholly owned subsidiary of Elsevier.
  • How we acquire it: We buy the software and support from Endeavor. We cannot legally redistribute the software, and cannot practically modify it, except through the customization options the software provides.
  • Local use and support: Used by many librarians, especially those in cataloging, collection development, and circulation.
  • Local documentation: See our local Voyager documentation.
  • Related technologies: Voyager maintains its data in an Oracle database. Cataloging information is in the MARC format. WebVoyage provides a Web-based front end to Voyager's cataloging search capabilities.
  • Local contact: Sandra Kerbel, Director, Public Services (skerbel@pobox.upenn.edu)

WebVoyage

  • What it is: Software that allows our users to search our catalog over the Web.
  • Software provided by: Endeavor Information Systems (see above).
  • How we acquire it: We buy the software and support from Endeavor. We cannot legally redistribute the software, and cannot practically modify it, except through the customization options the software provides.
  • Local use and support: This is the primary software our patrons use to access our catalog. (It's also possible to search our catalog via Z39.50 clients. Some local applications also have direct access to the database via SQL.)
  • Local documentation: See our local Franklin Help.
  • Related technologies: See the entry for Voyager above.
  • Local contact: Sandra Kerbel, Director, Public Services (skerbel@pobox.upenn.edu)

MARC format

  • What it is: The standard format for library catalog entries. Can also be used to store structured metadata similar to bibliographic data.
  • Software using this format: Voyager (and WebVoyage, the software that runs Franklin)
  • Format specified by: The Library of Congress (see their MARC Standards Page).
  • Local use and support: Used most extensively in the Information Processing Center (for cataloging). We are also considering using MARC records (with some new field definitions) for metadata for such items as digital images and electronic journals. Most librarians have some degree of knowledge and experience of MARC.
  • Local documentation: See our local Voyager documentation for some information on how we use MARC locally.
  • Local contact: Carton Rogers, Director, IPC (rogers@pobox.upenn.edu)

EAD

  • What it is: An SGML-based format for describing archival finding aids.
  • Format specified by: The Library of Congress and the Society of American Archivists (see their official EAD web site).
  • Local use and support: The Rare Books and Manuscripts division has prepared some finding aids using EAD, and has provided them to RLG, but currently we don't have software for processing EAD directly. (We have some translations of some EAD descriptions into HTML.) We hope to acquire SGML/XML software soon that will allow us to make better use of our EAD descriptions.
  • Local contact: Delphine Khanna, Digital Projects Librarian (delphine@pobox.upenn.edu)

Documents

HTML format

  • What it is: The standard format for Web documents.
  • Software using this format: All standard Web browsers, including Netscape and Internet Explorer.
  • Format specified by: The World Wide Web Consortium (see their official HTML Home Page).
  • Local use and support: Used throughout the Library Web by staff in all divisions. Overseen by our Web Manager and Web Advisory Group.
  • Local documentation: See our local Web Developer Pages
  • Notes on use:
    • Penn Library Web authors should follow these guidelines for HTML development.
    • We recommend that large bodies of information that follow a highly structured pattern, or that should be presented multiple ways, be maintained in a database, instead of being maintained as individual HTML documents. We have tools, such as Cold Fusion, that can automatically turn database information to HTML documents.
  • Local contact: Mike Winkler, Web Manager (winkler4@pobox.upenn.edu)

PDF format

  • What it is: A format for publishing documents, designed primarily for encoding their appearance on screen and on paper.
  • Software using this format: Adobe's Acrobat suite, various third party tools (see PDFZone for a list of PDF-aware tools)
  • Format specified by: Adobe. The specification is public, though rather complicated. It can be found from this page.
  • Local use and support: Used for our Oxford University Press history on-line books. Also used in some other projects, and for publicity materials.
  • Notes on use: This is the preferred format for documents where the exact "look" is important, since we expect it will be supported for a long time, and that if a new format replaces it, a migration path will be available. PDF is itself a "successor" to Postscript, and most Postscript documents can be migrated to PDF with little difficulty. (We don't recommend using Postscript for archival purposes.) PDF is not recommended at this point for highly structured documents; for those, use XML or some other format designed for structured data.
  • Local contact: John Mark Ockerbloom, Digital Library Architect and Planner (ockerblo@pobox.upenn.edu)

XML

  • What it is: An emerging standard format for representing structured documents and data.
  • Software using this format: Recent versions of Internet Explorer, and a large number of tools intended for programmers, document authors, and database managers.
  • Format specified by: The World Wide Web Consortium (see their official XML Home Page). The basic XML specification is now standardized; various formats related to XML, such as XML query, schema, and pointer formats, are still under development.
  • Local use and support: Not in production use in the Library yet, but we may use it as the basis for representing data for various projects we are now developing. We are acquiring a toolset (DXLS) from the University of Michigan for managing digital library repositories that uses XML extensively.
  • Local contact: Delphine Khanna, Digital Projects Librarian (delphine@pobox.upenn.edu)

SGML

  • What it is: An older standard format for representing structured documents and data, that was the predecessor to XML.
  • Format specified by: ISO 8879, but that's not the place for newcomers to start. For general overviews of SGML, plus links to more information, see SGML: Introductions and Overviews at Oasis.
  • Formats defined in SGML: include HTML, TEI, and EAD. However, in many cases XML versions of these formats are now available, or are under development.
  • Notes on use:
    • SGML is a more complex language than XML. This means that writers of SGML documents have more flexibility than writers of XML documents. Unforunately, it also means that it can be a lot more complicated to parse and work with SGML documents. Therefore, SGML is gradually being supplanted by XML, a more strict form of markup for structured documents that is also easier to parse and interpret.
    • Since there still are many SGML documents out there that are not XML-compatible, we still need some SGML-enabled tools to work with them. However, new projects should use XML-compatible formats when feasible.
  • Local contact: Delphine Khanna, Digital Projects Librarian (delphine@pobox.upenn.edu)

TEI

  • What it is: An SGML-based format for representing text documents, designed primarily for encoding their logical structure.
  • Format specified by: The Text Encoding Initiative Consortium (see their web site)
  • Software using this format: includes DLXS, Panorama from SoftQuad, and other SGML-aware tools.
  • Notes on use:
    • TEI has been used to encode text transcriptions in many academic etext projects, including those at Virginia, Indiana, and UNC. Most of these projecst have to also provide translation to HTML, since Web browsers typically do not support direct display of TEI documents.
    • The original "official" TEI is a SGML-based format. Because of its complexity, though, a subset known as TEI Lite was created, which is what many electronic text projects use. Efforts are underway to make XML versions of the TEI formats, but the XML versions are not yet official.
  • Local contact: Delphine Khanna, Digital Projects Librarian (delphine@pobox.upenn.edu)

Microsoft Word

  • What it is: A proprietary, but widely used format for word processing documents.
  • Software provided by: Microsoft
  • How we acquire it: Through a site license (possibly with a limited number of installations). We cannot modify or redistribute it.
  • Local use and support: Available on most Windows and Mac staff desktops. Supported by Systems and ISC.
  • Notes on use: While Word is often useful for internal communications, the format still cannot be read by many outside users, is not specified in a public document, and is subject to repeated change. Therefore, documents meant either for public consumption, or to be kept more than short-term, should be provided in more stable and portable formats.
  • Local contact: Library technical support, libtech@pobox.upenn.edu

Images

TIFF

  • What it is: A standard format for representing images, suitable for archival use.
  • Format Specified by: Adobe. (They inherited it from Aldus, who specified it after consulting with a number of imaging vendors). The specification for the latest standard version (6.0, standardized in 1992) can be found in this PDF document from Adobe. Adobe does not appear to be maintaining a full TIFF home page, but see www.libtiff.org for pointers to documentation and free software.
  • Software using this format: includes most full-featured graphics editors. Most Web browsers do not have built-in TIFF support, but instead spin off a viewer application to display TIFF images (such as xv on Unix, or Imaging for Windows). There are also scripts available for Web servers to convert TIFFs to GIFs or JPEGs on the fly.
  • Notes on use: TIFF is an broad enough format that it accommodates several different ways of encoding images. Some of these encodings may involve lossy compression or a limited color palette. When using TIFFs to archive images, one should make sure that one is not using a lossy or limited TIFF encoding.
  • Local contact: Delphine Khanna, Digital Projects Librarian (delphine@pobox.upenn.edu)

JPEG

  • What it is: A group that specifies a popular format for representing still images, using a gracefully degrading compression scheme.
  • Formats specified by: The Joint Photographic Experts Group (hence the acronym JPEG). See their website for official information about the JPEG formats.
  • Software using JPEG formats: Most full-featured graphics editors, and most graphical Web browsers, support JPEG's basic image format, JFIF (which is what most people think of as JPEG). Support may be more limited for other JPEG formats.
  • Notes on use:
    • JPEG (okay, JFIF) is particularly useful for displaying photographs and other images on the Web that don't use a limited color palette or sharply-defined boundaries. It uses a compression algorithm that can be optimized either for image quality or compactness. However, since this compression loses information, this format should not be used for archival storage.
    • JPEG is working on new standards, including JBIG2 and JPEG2000, that support lossless compression, and wavelet compression (a powerful compression technique also used by MrSID). These standards are not yet finalized, but may eventually become important image formats.
  • Local contact: Delphine Khanna, Digital Projects Librarian (delphine@pobox.upenn.edu)

GIF

  • What it is: A format for representing still and (simple) animated images, used widely in Web browsers especially for line drawings and diagrams.
  • Format Specified by: CompuServe, last updated in 1990. CompuServe doesn't seem to maintain a Web site on GIF, but the specification can be found several places on the Net, including as a text file on the W3C site.
  • Software using GIF formats: Virtually all graphical Web browsers and graphical editing programs. Some freeware does not support GIF, due to patent concerns.
  • Local use and support: GIF remains the primary format for Web page icons and images (except for photographic images) on the local Library Web.
  • Notes on use:
    • GIFs support frame-by-frame animation, and transparent areas. However, no more than 256 colors can appear in a single GIF, making the format unsuitable for color photographs or other images that require fine color gradations. The format does work well for line art and simple icons.
    • Most GIFs are encoded using a compression algorithm that is patented by Unisys. There has been some controversy over Unisys' enforcement of the patent, which as of 1999 included a demand for licensing fees to be paid by Web sites that could not document that their GIFs all came from Unisys-licensed software. (The graphics programs that the Library purchases are licensed by Unisys.) PNG has been invented as an patent-free alternative to GIF, but to date has not caught on as widely as GIF has. The patent for the compression algorithm used by GIFs expires in June 2003.
  • Local contact: Mike Winkler, Web Manager (winkler4@pobox.upenn.edu)

PNG

  • What it is: An emerging format for representing still and (simple) animated images. More information to come.

MrSID

  • What it is: A format for representing highly compressed images, and a set of tools to display and manipulate them.
  • Format standard and software provided by: LizardTech
  • How we acquire it: Some of the software (like the plugin viewers for MrSID, and a low-volume image server that serves images to ordinary web browsers) is free; other components (like the encoder) are sold. The format specification is proprietary, and not published.
  • Local use and support: We are using the image server on an experimental basis, and hope to make it the basis for delivery of fine arts slide images.
  • Notes on use: Because MrSID is a proprietary, closed format, and involves lossy compression, it should not be used as an archival format for images. For the fine arts slide project, we are using TIFF as the archival version.
  • Local contact: Delphine Khanna, Digital Projects Librarian (delphine@pobox.upenn.edu)

Interactive Animation

Flash

  • What it is: A format for representing multimedia, interactive presentations. More information to come.

References

URL

  • What it is: The standard type of reference used in World Wide Web hyperlinks.
  • Software using this format: All standard Web browsers, including Netscape and Internet Explorer.
  • Format specified by: The World Wide Web Consortium (see their Web Addressing Home Page).
  • Local use and support: Used throughout the Library Web by web developers.
  • Notes on use:
    • URLs, once announced, are often copied onto many pages, which causes problems when they break. If local resources that are referred to by URL, one should avoid changing the URLs unless absolutely necessary. For more persistent Web references, consider using Handles, or other persistent identifiers like PURLs, if they are available for the resource.
  • Local contact: Mike Winkler, Web Manager (winkler4@pobox.upenn.edu)

Handle

  • What it is: A persistent identifier and reference for electronic documents; more stable, and less fragile, than a URL.
  • Format specified by: The Corportation for National Research Initiatives (see their Handle System site).
  • Software using this format: We have a Handle Server, provided by CNRI, which will take Handles encoded as URLs and redirect browsers to the actual location of the resource referred to by the Handle. Major browsers do not directly support Handles at this time, but a plugin is available for direct resolution.
  • Local use and support: Although we do not yet support Handles in production use, we plan to use them first to track electronic journals, and then later on use them as identifiers for other digital resources we create. Rules for assigning Handles are in preparation; talk to the local contact below for more details.
  • Local contact: John Mark Ockerbloom, Digital Library Architect and Planner (ockerblo@pobox.upenn.edu)

Database Technology

Oracle

  • What it is: An extremely powerful, high-performance database system. More information to come.

Access

  • What it is: A widely used database system that can run on personal computers. Not as powerful or robust as Oracle. More information to come.

Cold Fusion

  • What it is: Software that supports the display and searching of database information on the World Wide Web. More information to come.

ODBC

  • What it is: A protocol used by software (rather than end users) to interact with databases. More information to come.

SQL

  • What it is: A language used by software to query and otherwise interact with databases. More information to come.

Structured data

XML

Searching

Verity

  • What it is: Software that supports full-text searching over collections of documents (in many formats). More information to come.

Cold Fusion

Z39.50

  • What it is: A protocol used to search databases, adopted as a standard in many library databases. More information to come.

Scripting and programming

General notes on the use of scripting and programming languages:
  • We strongly advise that information used by scripts or programs be maintained separately from the programs themselves, in a standard formats, and not simply embedded in the program source code. Separation of information from tools makes the information much easier to maintain over the long term, and also allows the information to be reused in other contexts.
  • Home-grown programs can be difficult and costly to maintain. Consider whether an existing standard program can be used in place of writing your own program. If you do write your own, make sure you plan for its long-term maintenance (keeping in mind that other people may have to maintain it).
  • Except where noted, the Library as a whole does not officially support any of the following languages or programs (though individual departments might).

JavaScript

  • What it is: A scripting language designed for use on Web pages or servers.
  • Software using this language: Major graphical Web browsers will run JavaScript programs, unless users have turned off JavaScript features.
  • Language and tools provided by: Netscape (see JavaScript Developer Central). The language has been submitted to a standards body for further development. Microsoft has a competing product called JScript which also implements the basic JavaScript interpreter, but includes features that may not work on non-Microsoft browsers.
  • How we acquire the software: The JavaScript interpreter is built into Netscape and Internet Explorer.
  • Local use and support: Pages that use cascading stylesheets may depend on JavaScript for optimal appearance. Some Library web pages also have used JavaScript, but some of these have since dropped it in favor of server-side CGI scripts (which don't require any special browser configuration.)
  • Notes on use:
    • See general notes on the use of programming and scripting languages above.
    • Despite the name, JavaScript is a fundamentally different language from Java. It does, however, share some language constructs, and can be used to invoke Java programs.
    • Some of our web users don't run JavaScript, either because of physical or computer limitations, or because of security concerns (which still come up periodically). Web developers should try to accommodate non-JavaScript users, and not require the use of JavaScript when alternatives (using regular HTML or CGI scripts) are feasible.
  • Local contact: Mike Winkler, Web Manager (winkler4@pobox.upenn.edu)

Java

  • What it is: An object-oriented programming language designed to be secure, and portable across different operating systems.
  • Software using this language: Major graphical Web browsers will run Java programs, unless users have turned off Java features. Java programs can also be run standalone on any machine that has a Java Virtual Machine.
  • Language and tools provided by: Sun Microsystems (see the java.sun.com Web site), with additional tools provided by various third parties.
  • How we acquire the software: The main Java tools for Solaris and Windows NT are provided free of charge from Sun, from the site above. Apple provides a Macintosh version. The licenses may attempt to limit some rights of modification, redistribution, and commentary (!).
  • Local use and support: Some digital library software is implemented in Java, including our Handle server. We don't provide official support for Java, but if you have any questions, you can talk to the local contact below.
  • Notes on use:
    • See general notes on the use of programming and scripting languages above.
    • There is an ever-growing class library for Java that can be used in local programs. See java.sun.com for details.
    • Sun has canceled earlier plans of turning over control of Java to a standards body. Standards controlled by a single company can carry a higher risk of abrupt changes than those controlled by a standards body.
  • Local contact: John Mark Ockerbloom, Digital Library Architect and Planner (ockerblo@pobox.upenn.edu)

Perl

  • What it is: An interpreted programming language often used for Web scripts, text processing, and rapid prototyping.
  • Language and tools provided by: Larry Wall and the Perl Mongers (see the Perl Mongers Web site).
  • How we acquire the software: The main Perl tools are released as open-source software; we get it for free, and can modify or redistribute it (though we wouldn't want to modify the language interpreter). Many Perl library modules are released under the same terms as Perl itself.
  • Local use and support: Various people in the Systems group have used Perl for Web server scripts and rapid prototypes. We don't provide official support for Perl, but if you have any questions, you can talk to the local contact below.
  • Local documentation: This old lesson plan still has some useful information for people learning Perl.
  • Related technologies: When invoked by Web servers, Perl scripts are called via the CGI interface
  • Notes on use:
    • See general notes on the use of programming and scripting languages above.
    • On the plus side, Perl can be used to create rapid prototypes of programs very quickly, and is especially well-suited for programs that involve lots of manipulations of text strings. Well-written Perl programs can often run unmodified on all major operating systems. Perl is easy to learn for those already familiar with C and Unix programming, less easy for others. There's a large community of developers of open-source Perl software (see below).
    • On the minus side, it is easy to write Perl programs that are completely unreadable and unmaintainable, even by the original author. The module, object, and documentation features of Perl 5 make it possible to write and maintain larger programs than earlier versions of Perl allowed, but authors still need to take pains to ensure that their programs are written in a style that allows maintenance and reuse.
    • There is a large and growing collection of Perl modules at CPAN. For many functions, you can download and use one of these modules, instead of trying to write your own code to do the same thing. You can also contribute your own modules or improvements.
  • Local contact: John Mark Ockerbloom, Digital Library Architect and Planner (ockerblo@pobox.upenn.edu)

C

  • What it is: A versatile and efficient, but low-level, programming language.
  • Language defined by: ANSI. A standard reference for this language is The C Programming Language by Kernighan and Ritchie.
  • Tools provided by: A variety of suppliers. The Free Software Foundation provides a free compiler (gcc) and debugger (gdb) for C that is widely used. Commercial compilers and environments are also available.
  • How we acquire the software: There is no official support for this language in the Library, but the FSF tools mentioned above are open source and can be downloaded freely from their web site.
  • Local use: C is used in some Systems projects where efficiency or access to operating system-level structures is important.
  • Notes on use:
    • Many C environments introduce extra routines and constructs that might not be supported on all platforms. However, the definitions and standard library routines given in Kernighan and Ritchie (see above) should be supported on all ANSI compilers. (Gcc is ANSI-compliant; the default compiler on some systems, including Solaris, is not.)
    • C's low-level, close-to-the-machine programming model is both its great strength and its great weakness. Using C, you can write code that runs faster and leaner than virtually any other language, and that uses operating system features not available in higher-level languages. On the other hand, using low-level operating system features may lead to programs that are not easily portable. C's do-it-yourself approach to memory management makes it easy to write programs that crash by referencing memory that hasn't been properly allocated, or programs that get increasingly bloated as they run, requesting additional memory but not freeing memory that's no longer needed. It may require complex, time consuming programming to do proper memory management, or support multiple threads of control, or do complex expression searching or exception handling-- all features that are built-in for other languages but not for C.
  • Local contact: John Mark Ockerbloom, Digital Library Architect and Planner (ockerblo@pobox.upenn.edu)

CGI Interface

  • What it is: An interface that Web servers use to invoke server scripts.
  • Software using this interface: All major Web servers can run CGI scripts. The scripts themselves can be written in any language that is supported on the machine on which the Web server resides.
  • Interface specified by: NCSA (see their Common Gateway Interface web page.
  • Notes on use:
    • See general notes on the use of programming and scripting languages above.
    • CGI and other "server-side" scripts, unlike scripts that run inside a user's browser, can typically be used by any Web browser. Some languages (Perl is one) have prewritten modules you can use for handling input and output for CGI scripts, so that you don't have to write your own.
    • CGI scripts, if not very carefully written, can be exploited by hackers to gain unauthorized access to our local computing resources. See this section of the World Wide Web Security FAQ for details. Contact our local Web manager if you have any doubts about the safety of a CGI script you plan to write or install.
  • Local contact: Mike Winkler, Web Manager (winkler4@pobox.upenn.edu)

Remote access and services

HTTP

  • What it is: The standard protocol for requesting documents or operations via a Web browser.
  • Software using this protocol: All Web servers and browsers.
  • Protocol defined by: The World Wide Web Consortium (see their official HTTP home page).
  • Notes on use:
    • Because Web browsers and servers are so ubiquituous, HTTP has become the de-facto standard protocol used to request operations remotely using a Web browser.
    • The exact details of HTTP are invisible to most users of the Web, and authors of Web documents. However, if one is writing CGI scripts, or writing one's own Web-enabled servers or clients, it may be important to know how HTTP works.
  • Local contact: Mike Winkler, Web Manager (winkler4@pobox.upenn.edu)

TOM

  • What it is: A system (including protocol and software) for managing diverse data formats, and to convert between them.
  • Software provided by: John Mark Ockerbloom wrote the basic tools; CMU also has a conversion service on the Web that uses TOM
  • How we acquire it: The internal "broker" software is open-source (freely available, modifiable, and distributable). We don't have much in the way of user interfaces for it yet (CMU's conversion service software is not available at this time).
  • Local use and support: We've received a grant from the Mellon Foundation to develop TOM applications for digital preservation and courseware in 2003 and 2004.
  • Local documentation: See this page
  • Local contact: John Mark Ockerbloom, Digital Library Architect and Planner (ockerblo@pobox.upenn.edu)

Last updated 30 January 2003 by John Mark Ockerbloom (ockerblo@pobox.upenn.edu)