A Better Way to Measure and Understand Sound Quality in MEMS Microspeakers

While continuous sine waves are easy to measure, transducers can be excited by dynamic signals and measured in the time domain without settling time to gain more insight into their accuracy with musical signals. Unique features that cannot be easily extracted from impulse response and cumulative spectral decay (CSD) visualizations are effortlessly quantified with Total Dynamic Distortion analysis.

An ideal amplifier is a straight wire with gain, meaning the input signal is exactly the same as the output signal, only bigger. In practice this isn’t the case, of course, which is why we measure amplifiers. We compare the output to the input to characterize gain, frequency response, and total harmonic distortion (THD), among a few other critical factors. One of those other factors is slew rate, which is the ability of an amplifier to change its output quickly. Amplifiers with higher slew rates can reproduce very high frequencies at very high levels, which is sort of a measure of how “fast” it is.

Transducers are conceptually the same as amplifiers, but they convert electrical energy into mechanical energy instead of adding gain like an amplifier. The output acoustic signal should be the same as the input electrical signal. This is never true with transducers, just as with amplifiers, so we measure the frequency response and THD for them, as well. There is no equivalent to slew rate for transducers. The closest we get is measuring impulse response, which yields a lot of data, including decay time across the frequency response, but it hasn’t developed into a simple measurement used for characterization (e.g., frequency response and THD).

Audio engineers have used coil-based dynamic drivers (DDs) to turn electrical energy into acoustic energy for nearly 100 years. In that time we have learned a great deal and made advancements in physics, materials, and simulation to get closer to the ideal. However, fundamentally the concept has remained the same, using current through a coil in a magnetic field to create motion. Since the voice coil is an inductor, its impedance increases with frequency. There is also an amount of energy storage in the coil. This is a useful fact when it comes to understanding the electrical side of converting electrical energy into mechanical motion. It will become relevant later.

On the mechanical motion side, it’s useful to keep in mind that a loudspeaker can be modeled as a damped mass spring. A damped mass spring will eventually stop moving, and it will stop faster with a stiffer spring. However, the stiffer the spring, the less sensitive the system will be—it will take more energy to make it move in the first place. To get a loudspeaker cone that moves easily with minimal power, usually the impact is less control. This will also become relevant later.

The DD has remained the primary transducer technology into modern times. It dominates everything from home loudspeakers, concert sound, and microspeakers for true wireless stereo (TWS) applications. The most common diaphragm material in microspeakers is polypropylene. It is chosen for its light weight (technically, low mass), low cost, and ease of integration into very small designs. However, these are not the only transducer materials and technologies available.

Another popular magnetic transducer is the balanced armature (BA). Like dynamic drivers a BA uses a coil (inductor) to convert electrical energy into motion. The physical details are slightly different, but fundamentally BAs have the same general characteristics as DDs: they are inductive, which means they are driven by current, and their impedance increases with frequency. Because of their small, rectangular size, BAs are very popular in hearing aids, as tweeters in a two-way TWS with a DD for the bass, and in multi-way in-ear monitors (IEMs), which can be hybrid with DDs or entirely populated by BAs. While other magnetic technologies (e.g., planar magnetic) are growing in popularity, DDs and BAs are still the market leaders for magnetic transducers in microspeakers.

By contrast with DDs and BAs, xMEMS' monolithic micro-electromechanical systems (MEMS) transducers are made from silicon diaphragms deposited with a thin-film piezoelectric material called PZT. When voltage is applied to the PZT electrodes, they contract and bend the silicon diaphragms. This is true for the xMEMS legacy monolithic MEMS transducers as well as for the new class of Sound From Ultrasound transducers, including the Cypress, for in-ear applications, and the Sycamore, the world's first MEMS nearfield loudspeaker. All xMEMS monolithic MEMS transducers are capacitive in nature. This means there is also some amount of energy storage in the PZT electrodes, just as there is energy storage in the coil of a DD or BA.

Transducers have been synonymous with inductive operation for 100 years. Everything from amplifier design to performance tests and system design rules of thumb derive from the inextricable link between transducers and induction. Because inductive transducers are current mode devices, the amplifier driving them primarily needs to source current at an adequate operating voltage for the load power. The impedance is lowest at low frequencies, where the speaker is least efficient. Amplifier stability at high frequency is eased by the high load impedance.

When transducers are capacitive, however, the differences completely change everything you thought you knew about those rules, design techniques, and even tests. The amplifier driving capacitive loads primarily needs to source voltage. And piezoelectric materials, such as PZT, need fairly high voltage compared to inductive devices in the same applications. A typical DD or BA reaches maximum SPL at 1VRMS. With xMEMS transducers that maximum is 10.63VRMS (30VPP). The impedance of capacitive loads is lowest at high frequencies, which is also where xMEMS transducers are most efficient. That is a beneficial relationship, because it helps minimize power consumption in xMEMS microspeakers. Low impedance at high frequencies makes amplifier stability an even less trivial detail.

Despite audiophile lore, sound quality is fairly predictable regardless of transducer technology. Research has consistently shown that frequency response correlates highly with listener perception of sound quality. It’s the biggest part of the job when designing audio products. Get the frequency response right, and the sound quality will be acceptable at the very least.

THD, however, isn’t as simple. While lower THD is certainly generally better than higher THD, there are many real-world details that influence listener perception of sound quality vis-a-vis THD. So while THD is expressed as a single number, much deeper insight can be gained by examining the signal in the frequency domain. For example, frequency masking, a well-known and thoroughly studied psychoacoustic effect, can make it difficult to hear low-order harmonics of low-frequency content as high as 5%. At the same time, high-order harmonics can be easily discernible at 0.1%. THD measured at 1% but isolated to the second harmonic is not audible, but THD at 0.1% at the fifth harmonic likely sounds bad.

The whole reason we have transducers is to make dynamic, musical signals, but then we test them with static signals, not dynamic ones. Frequency response and THD are static measurements made using continuous sine waves and captured after a prescribed settling time. They are definitionally not dynamic signals. So while THD is useful and insightful, it has limitations in describing some aspects of transducer performance that can be heard with subjective listening.

Impulse response is a dynamic signal that can help fill in some of the gaps left by static testing. An impulse response can help visualize decay times across the frequency band by using a cumulative spectral decay (CSD) plot, but there are still critical gaps.

Total Dynamic Distortion (TDD) is a new way to understand and measure the dynamic performance of transducers. All transducers are some form of damped energy storage device, whether they’re inductive transducers that dominate the state of the market today, or whether they’re emergent transducers (e.g., xMEMS’ piezo MEMS transducers), which are capacitive. They all require charge and discharge time with overshoot or undershoot. Eventually all transducers can settle enough to reproduce continuous sine waves, but TDD is a measure of what happens when they are excited by a dynamic signal to show the dynamic response before settling.

Total Dynamic Distortion: What It’s Made Of

TDD is comprised of three specific factors that show the dynamic capabilities of transducers. The first is Amplitude Accuracy, or how closely the dynamic amplitude response matches the continuous amplitude response. The second factor of TDD is Time Symmetry, or how close the reproduced waves match the input in period for both positive and negative waves. Finally, the third factor of TDD is Decay Time, effectively the time between when input signal stops and reproduced SPL drops below a minimum threshold. This is similar in concept to reverberant time measurements in room acoustics.

Let’s look at each component of TDD in a little more depth to better understand how they contribute to what we hear, starting with Amplitude Accuracy. Amplitude Accuracy is the easiest to grasp conceptually. With a transducer’s sensitivity we can calculate the expected SPL output signal from an input signal voltage. Does the transducer actually reproduce that when given a dynamic signal? or is it too loud or too quiet?

In figure 1, the “ideal” is in dotted black. If transducers were perfect, they’d reproduce a single wave of 10kHz at 1PaRMS. There are two transducers, xMEMS’ Cypress in red, and a dynamic driver (DD) in gold. As you can see xMEMS’ Cypress is nearly exactly the same amplitude as the ideal for both the positive and negative halves of the wave, measuring 1.001PaRMS. The DD, on the other hand, is nearly double the amplitude of the ideal on the positive half of the wave and approaching triple the amplitude on the negative half, measuring a total of 2.307PaRMS.

Not only is the DD not particularly accurate, it isn’t even accurate in both directions of movement. It doesn’t take a lot of creativity to understand the limitations that this inaccuracy and asymmetry places on active noise cancellation (ANC) algorithms. If the transducer isn’t accurately reproducing the dynamic wave it’s meant to cancel, it won’t cancel it as effectively. And, this difference is completely missed by static tests, as they explicitly wait for the continuous sine response to settle before measuring.

Now that we can see the significance of what TDD can reveal, let’s look at Time Symmetry with some more detail. What we’re looking for in Time Symmetry is whether the period of the reproduced wave is same amount of time as the input signal, and whether it is the same for both halves of the wave.

Because of the congestion in diagramming these plots are separated into two. In Figure 2 and Figure 3, the ideal is in dotted black. It is a perfect single wave of 10kHz, which means it is 50µs in the first half and 50µs in the second half, for a full period of 100µs. This time we’re comparing the Cypress (in red) to a high-end BA driver from a $1,500 set of IEMs in gold. Figure 2 shows what we’re measuring for Time Symmetry on the Cypress only. (Figure 3 is the same plot showing the measurement for the BA only.)

The first thing we notice is that while both transducers generate sound pressure level (SPL) starting around the same time, the Cypress reproduces one full waveform, while the BA has reproduced an extra half waveform that is both considerably wider and greater amplitude than the rest of the reproduction.

The Cypress is relatively accurate in Time Symmetry, reproducing the first half of the wave in 66.5µs, and the second half in 74.12µs. The BA, on the other hand, takes 73.3µs for the first half wave, and 84.5µs for the second half wave. That means the first half wave for the BA is equivalent to 6.8kHz and the second half is 5.9kHz. This is a little over half the intended frequency of 10kHz. Think about what it means when a transducer can’t keep up with the dynamics of a frequency that nearly everyone can hear for their entire lives.

The last factor of TDD is Decay Time. This is pretty easy to understand. If transducers were ideal, the output acoustic signal should be the same as the input electrical signal. When the signal starts, the transducer moves, and when the signal stops, so does the transducer. However, this is not the case, and it’s much more obvious when analyzed as a damped mass-spring system, as discussed earlier. There will be ringing, and it involves a design tradeoff one way or another.


Once again, we will use the comparison between the Cypress and a DD. 100mPaRMS limit lines are included to make the analysis easier to visualize; the discussion of this limit follows later. The first thing to notice in Figure 4 is how much longer the DD continues to ring than xMEMS’ Cypress. The Cypress only makes one full wave before beginning to decay, whereas the DD makes a one-and-a-half waves of extraneous acoustic output before its output falls off considerably. This is bad! In addition, it takes the Cypress 425µs to decay, compared to the DD at over 1ms—2.5x longer than the Cypress to decay below the limit! To put it in real-world terms, this is extra acoustic output that isn’t present in the input signal. 

This extra acoustic output gets in the way of the next note in music, especially if it’s a dynamic signal like a piano attack or other fast percussive note. In static testing (e.g., conventional THD), we not only feed the transducer with a continuous signal that doesn’t have dynamic or time variation, but there is also settling time built in to allow this exact transient performance to occur before the distortion measurement even takes place!

One last thought on decay time, to pull a few different analysis angles together. In the electrical domain a DD is an inductor. It’s an energy storage device that resists change in electrical current. When the signal stops being fed into the voice coil, the coil doesn’t stop current flow instantaneously. There is residual current. At the same time, the DD is a damped mass spring system. It physically cannot stop moving just because the electromotive force stops. The combination of the inductance causing the current flow to tail off instead of stopping instantaneously along with the mass-spring physics means that some amount of ringing is unavoidable. The data shows that it is significantly longer for magnetic devices than for capacitive devices.

Total Dynamic Distortion Figure of Merit

If you’re paying attention, you’ve noticed that these are three different measurements that don’t even share the same units. So how do we distill them into one single measurement? We can use relative decibels (dBr) to combine them all into one single figure. There are two tasks here for each factor. The first is to identify the reference point and define the best possible measure, and the second is to identify the impact of deviation from the ideal. This is different for each of the factors identified in TDD.

As an example, with Amplitude Accuracy, the reference point is the ideal amplitude. In the case of our testing, this is 94dB SPL, or 1PaRMS. In terms of the impact of deviating from the ideal, it isn’t inherently better to exceed the ideal amplitude any more than it is worse to fall short. Whether the measured amplitude is twice the ideal amplitude or half the ideal amplitude, it is still wrong by a factor of two. This means the best measurement possible is 0dB (i.e., no deviation), and it is calculated using the deviation ratio of the larger amplitude over the smaller amplitude. That way, whether the measured amplitude is twice the ideal amplitude or half of the ideal amplitude, it still results in 6dBr in Amplitude Accuracy. The equation looks like this:

Time Symmetry is similar to Amplitude Accuracy in that its measurement can be greater or lesser than the ideal signal, but neither is inherently better than the other. In this case, the ideal signal is exactly 50µs for the first half wave and 50µs for the second half wave. As is well known, the frequency and the period have an inverse relationship. This is another way of saying that half waves longer than ideal are effectively a lower frequency, and half waves shorter than the ideal are effectively a higher frequency. 

Again, there is nothing better or worse about the measured period being higher or lower than the ideal 100µs period. Therefore, we want to calculate the ratio of the longer period over the shorter period. This way perfect symmetry results in 0dBr of TDD, and larger deviations increase TDD. However, the periodic balance between the first half wave and the second is also relevant for Time Symmetry. The ratio between the longer period half wave and the shorter period half wave is also relevant to dynamic accuracy. A lack of symmetry between the two half waves means two different frequencies are being reproduced. As such, for Time Symmetry we also calculate the ratio of the larger period half wave to the shorter period half wave. The equation, then, is:

Finally, Decay Time is the easiest to measure. You must select a rational threshold and then measure the time it takes for the reproduced wave to fall below that threshold. An easy threshold to set is -20dB. If the ideal amplitude is 94dB SPL, -20dB is 100mPaRMS in absolute terms. This is also enough attenuation to evaluate decay time meaningfully without getting lost in the noise floor. The remaining task is to set a reference point short of which decay time is considered advantageous and after which decay time becomes a problem. 

A good figure is five times the period of the wave. So for 10kHz, the period is 100µs, and the threshold for decay time is 500µs. The Decay Time contribution to Total Dynamic Distortion is calculated from the ratio of measured decay time to 500µs. Decay Time can result in negative dBr, since it is better to have shorter decay time, and to a certain extent the threshold is arbitrary--changing it can drastically alter the final TDD calculation, but it only alters the relative differences between transducers to a slight degree. Transducers that have low TDD compared to the rest of the test transducers remain low regardless of where the thresholds are set. The equation is:

At this point we now have three different measurements, all expressed as relative decibels (dBr). The nice thing about dBr is that the different measures can be added together to reach a final Total Dynamic Distortion expressed in the same dBr units. For the case of xMEMS' Cypress transducer, this figure is a very low 2.5dBr. It is comprised of 0dBr in Amplitude Accuracy, 3.9dBr in Time Symmetry, and -1.4dBr in Decay Time. 0dBr + 3.9dBr - 1.4dBr = 2.5dBr. Compare that to a very high performance dynamic driver from a well-known TWS--the best DD we have measured--and you can start to see how relevant this comparison is. Shown in Table 1, the DD scores 2.3dBr in Amplitude Accuracy, 5.6dBr in Time Symmetry, and -0.2dBr in Decay Time for a very respectable TDD of 7.7dBr. The highest performing BA we measured scores 2.2dBr in Amplitude Accuracy, 3.9dBr in Time Symmetry, and 0.9dBr in Decay Time for an also very respectable TDD of 7.9dBr.


Diagnosing Acoustic Design and DSP Implementation Problems with TDD

Total Dynamic Distortion isn't limited to measuring only the transducer itself. You can also do the same testing over Bluetooth to measure system-level performance and compare it back to transducer level performance. Testing over Bluetooth presents some minor obstacles in visual comparison. Since Bluetooth latency is not deterministic and can range (typically) from 30ms to 150ms, lining up signals to compare visually is prohibitively difficult. Fortunately, the metrics are all the same, whether wired or wireless, so the data still holds.

The real benefit of system level TDD testing is when component level testing has already been completed. When you know how the transducer performs on its own, you can compare it to the performance when EQ is applied in an acoustic housing. These measurements can be taken sequentially, first at component level, again in an acoustic housing, again with EQ applied, and so on. We have made such measurements on both our Cowell transducer and on two different commercial products that use the Cowell in a two-way design. These results shown in Table 2 are interestingly divergent from each other, even though they’re all based around the same transducer. It shows that when there is a relevant component level test as a basis, it can be scaled to yield meaningful system level conclusions.

A New Understanding of Sound Quality

It’s clear how dynamic THD and frequency response testing cannot measure this. It’s also clear how impulse response testing as a CSD breaks down into a qualitative impression of performance without providing a unified number expressing all three of these specific factors. With Total Dynamic Distortion testing we can see these factors that impact the dynamic performance of transducers. And while xMEMS piezo MEMS transducers generally test better than other transducer technologies, the salient point is that with data we can begin to bridge the gap between subjective listening experience and measured performance.

TDD not only shows the difference between classes of transducer technology, but it also separates the good from the bad within a transducer class. It correlates with subjective listening perception of high-quality sound. In addition, TDD can be deployed to show the difference between component level performance and system level performance. 

This is useful to demonstrate the ways acoustic design and algorithms affect system level performance. While we know frequency response is a very good predictor of sound quality, and THD testing has utility beyond simple assessment of expected listener perception, Total Dynamic Distortion adds new understanding to how transducers and acoustic systems perform and why they sound good.

All of this is great in theory, but useless to most if it takes expensive of lab equipment to measure. Fortunately, acoustic TDD can be measured using equipment that can be found in just about any reasonable acoustic lab. The heart of TDD measurement is the same as all audio test and measurement: a signal source and a way to capture it. To develop this testing at xMEMS we used an APx555B with a Brüel & Kjaer Type 5128 head-and-torso simulator (HATS), as shown in Figure 5, and we have validated the testing with an APx525 with Bandwidth Extension and GRAS RA0404 ear simulator.


Since we are not only capturing audible frequencies, but also very fast rise times, the bare essentials for system performance are 20kHz bandwidth from the HATS or ear simulator, and a sampling frequency on the capture side of at least 768kHz. However, results improve with higher sampling frequency. The Bandwidth Extension for APx525 increases sampling frequency to 2.496MHz (1MHz signal bandwidth with anti-aliasing filters) with improved measurement accuracy. On the source side, while APx555B has an analog generator capable of producing burst waveforms of one cycle followed by silence, there are many ways to generate these signals, including the Burst Waveform Generator Utility from Audio Precision, which produces 24-bit, 192kHz wave files.

Readers might have noticed the dependence on Audio Precision tools in developing these tests. This is due to these tools being on hand in our lab. However, there is nothing inherent to Audio Precision tools that makes them indispensable for this testing, and xMEMS is working to expand test functionality to other test and measurement ecosystems.

The ultimate goal, however, is for this testing to become one of the standard audio characterization tests along with frequency response and THD. This is an explicit call to the global community of audio engineers to experiment with the techniques and theory of TDD to further develop its utility in audio testing. This insightful testing needs to expand to all audio test ecosystems, including open source and DIY testing.


Post a Comment

0 Comments