Are you curious about the role OCR technology plays in the world of microfilm? We've put together the facts on the importance it brings to working with microforms today.
Microfilm and microfiche have long been the storage medium of choice for secure, long-term storage of everything from rare source materials to government documents containing unique research opportunities. Unfortunately, while being safe and secure, microfilm and microfiche readers also have a reputation of being tedious and time-consuming to work with. These readers nevertheless provided a way to view materials in a format that saved space and preserved the original content. Both microfilm and microfiche have allowed libraries and other repositories to preserve hundreds and thousands of images on a single microform that has a shelf-life and durability much longer and better than that of the original document.
Today’s microfilm scanners embrace this concept and greatly improve on it. They allow scanning of documents from microfilm and microfiche into image files and other electronic files, including portable document formats (PDFs), that meet today’s archival requirements. Moreover, current universal microform scanners can OCR data content and greatly enhance access to valuable records while promoting the digitization of stored records in a user-friendly way. Today’s microform scanners are low cost, compact, fit on a desktop, scan both microfilm and microfiche at amazingly high speeds and can OCR while scanning. It has been said that these products “make working with microfilm fun”.
So, what is OCR technology and how has it transformed the microform world? OCR is a process that converts handwritten, printed, or typed text and symbols into an electronic format when a document is digitally scanned. Computer software uses algorithms that either matches the pattern of the pixels in scanned content to pre-defined patterns or detects similarities through “intelligent” recognition. In recent years, software and hardware enhancements have led to important advancements – there are fewer errors in digitally transcribed content; live image processing saves the user steps and the need for intervention, and improved speed produces results more quickly. Today OCR technology is fast, accurate, creates small file sizes and meets the stringent requirements for archival storage.
When hardware and software work in tandem to “read” content properly, OCR allows for the creation of content that can be indexed, abstracted, edited, annotated and shared. The process of “reading” the scanned content occurs at the level of individual letters, numbers, or symbols that are then strung together into words or similar content. When coupled with powerful word search utilities, links to dictionaries and similar resources to retrieve additional information, the enhanced OCR technology allows the user to examine original content, search for keywords, link to external resources, copy scanned materials and paste into other documents with a single click. The OCR software used by e-ImageData on our current microfilm scanners even allows users to OCR while automatically scanning roll film and microfiche. This “on-the-fly” operation greatly increases work speed, while still maintaining accuracy. OCR programs can be expensive, and some OCR software programs are much better than others. If you are looking to purchase a microfilm scanner that incorporates OCR capability, it is important to check them out before making a purchase decision.
In the past, a number of trusted digital repositories performed text extraction OCR processes to assure that content uploaded into digital archive collections was searchable. Built-in text extraction programs allowed digital archives to ingest large quantities of original content that went beyond the keywords and metadata entered. In many cases, staff or librarians reviewed extracted files to make sure that the OCR content accurately represented the text in the document or image and did so within an acceptable error rate. When necessary, text editing or the creation of a supplemental text file was necessary. As more repositories developed digital initiatives and cross-institutional initiatives such as the Digital Public Library of America gained popularity, OCR technology has played an ever increasing role in providing students, faculty, historians, genealogists, and other researchers access to the content they seek when working with microfilm and microfiche.
As valuable as this verification process has been, today’s OCR technology is becoming so accurate and reliable that this interim step in the archival process is soon to be a procedure of the past. Ongoing technology advancements have allowed OCR to index and manage substantial text files by detecting content in a growing number of fonts, characters, and symbols. And, multi-language support is becoming common. OCR technology continues to be advancing at an amazing pace and, coupled with advancements in microfilm scanners, brings volumes of stored information into today’s digital world.
To find out more about our ScanPro product line that offers OCR technology, visit: http://e-imagedata.com/products/