Before the advent of digital seismographs in the 1970s, scientists relied on analog seismographs to measure seismic waves. Millions of these aging seismograms are archived in observatories around the world, constituting a vast store of valuable scientific information. Until now, however, accessing this information has been problematic because modern analytical techniques were developed for use with digital seismographs and require discretized time series data.
Professor Miaki Ishii and I at the Harvard University Seismology Group have unlocked this previously inaccessible analog data by developing an interactive software tool that converts images of analog seismograms to time-series data. The DigitSeis software uses MATLAB® image processing algorithms to identify time marks and correct image distortions to establish the timing and amplitude of every signal. Our team is using DigitSeis to digitize seismograms from the 1930s through the 1950s archived at the Harvard-Adam Dziewoński Observatory (HRV). The software continues to be developed as we apply the technique to different styles of recordings. To date, about two dozen seismograms have been digitized.
One outcome of this research will be a larger, more complete catalog of earthquakes in tectonically quiet regions, such as the Northeastern U.S., where earthquakes are uncommon. By enabling earth scientists to study individual earthquakes and seismic events that occurred before the digital era, the expanded catalog will shed new light on seismological trends.
Furthermore, using DigitSeis to digitize records from other stations around the world, especially in regions with incomplete earthquake catalogs, may have an immediate practical application by improving seismic risk assessment, thus ensuring that building codes are based upon accurate data.
Scanning Seismograms and Preparing Images
Digitizing a seismogram is a multistep process involving both manual and automated steps. The first step is cleaning and scanning the original analog seismogram to create a high-resolution digital image. A typical seismogram in the HRV collection is about 14 inches by 36 inches, resulting in a JPG digital image file in the tens of megabytes.
To make large image files easier to work with, DigitSeis reduces the images from 24-bit color to 8-bit grayscale, which gives sufficient precision while enabling efficient processing. Then, using histogram correction algorithms developed in MATLAB, DigitSeis removes artifacts in the data that arose from factors such as exposure, long-term storage, and scanning procedures (Figure 2).
While our goal was to automate the digitization as much as possible, users can modify the images and files before or after the automatic processing. For example, after DigitSeis performs the contrast enhancement, the user can crop the image, remove background noise, fine-tune contrast settings, and adjust the orientation of the image. At this stage, the user can also remove unwanted artifacts, such as handwritten notes or stains from the original paper. Using the “remove region” tool in DigitSeis, which is based on the roipoly() function in Image Processing Toolbox™ the user can select a region of the image to exclude from the digitization process (Figure 3).
Identifying Traces and Time Marks
The next step is to classify objects in the preprocessed image into three categories:
Seismic traces. Seismic traces record ground movement and are the main features of a seismogram.
Time mark offsets. Each trace on a seismogram is interrupted once a minute by a time mark that is offset from the main trace. These offsets help scientists determine the accurate timing of events recorded on the seismogram.
Noise. This category includes any objects that should not be digitized, such as stains and notes that were not manually removed.
DigitSeis uses MATLAB object identification algorithms to locate and then highlight traces, time marks, and noise in white, green, and red, respectively (Figure 4). A colorblind-friendly scheme is also available.
At this stage, DigitSeis also invokes algorithms that we developed in MATLAB to quantify the image’s horizontal and vertical distortion. This distortion is corrected later in the digitization process to reduce inaccuracies in waveform timing.
Digitizing the Seismogram
The digitization algorithm uses intensity information to compute a single digital value for every point in each trace of the seismogram. DigitSeis then displays the results.
Although the digitization is automated, manual refinements are occasionally needed. For example, significant earthquakes can cause the traces to cross one another, making it difficult to distinguish the two signals algorithmically. For these cases, DigitSeis supports manual separation of the signals.
Next, DigitSeis corrects the time mark offsets, using fminbnd() from Optimization Toolbox™ to create a continuous waveform by realigning each time mark with its trace (Figure 5).
This part of the process can easily be executed in parallel on processors with multiple cores. We have created a version of DigitSeis that uses Parallel Computing Toolbox® to process multiple traces simultaneously on multicore processors.
Following the digitization process, DigitSeis saves the time series data to a .MAT file or Seismic Analysis Code (SAC) data files.
Using DigitSeis to Digitize the HRV Collection
Our initial work with the HRV archive is focusing on seismically active dates. For example, several large earthquakes were recorded at HRV from November 13th through November 15th, 1938 (Figure 6). These include a magnitude 6.9 earthquake in the Kuril Islands region (number 1), a magnitude 7.0 event in Japan (number 2), and an aftershock of the latter (number 3).
After digitizing this seismogram in DigitSeis, we generated a spectrogram using the resulting time series data. The spectrogram revealed additional earthquakes that were hardly discernible on the raw seismogram. The spectrogram also revealed distinctive noise levels (probably due to storms in the area on November 14th) with peaks at about 0.14 and 0.25 Hz. The frequencies of these peaks are consistent with those of noise recorded by modern instruments at the same location in 2014. This finding illustrates another potential use of old analog seismograms: understanding how storm activity has changed over time.
As we continue to process seismograms from the HRV archive, we are learning more about what steps in the digitization process can be simplified through improved automation. Once we have digitized a significant portion of the archive, we plan to make the results available either on the Harvard Seismology Group website or in the Incorporated Research Institutions for Seismology (IRIS) database.
We have made DigitSeis publically available as open-source MATLAB code. Other observatories have already expressed interest in using the software to digitize their own seismogram archives.
The following people have been involved in testing DigitSeis and in the digitization of the Harvard collection: Hiromi Ishii, Isabella Lorrainy Altoé, Alexandra Karamitrou, Thomas Lee, George Liu, and Victor Salles. I would also like to acknowledge that this project was supported by the U.S. Geological Survey Earthquake Hazard Program Award No. G14AP00016 and G16AP00021.