“The Human Eye as a Camera”: A Review of the SSVEP-based BCI Vision Capturing Technique

Current BCI research is metamorphising into a glorious playground, tapping into any field that can possibly be reached. The glory of living in our era of simultaneous biotech and infotech revolution is that they both combine to create synergy and achieve impossible results.

There’s one field that’s particularly interesting: using BCIs to observe and print vision.

At first, the human vision seems independent, personal, and restricted only to the individual experience. But once we understand that experience itself is derived from multiple chemical formulas and neurological reactions (if some of you can tell, vaguely quoting Yuval Noah Harari), monitoring the very brainwaves that govern our vision becomes easier than we think.

This article aims to review the MannLab 2019 paper “The Human Eye as a Camera” to holistically understand a method of using BCIs to capture vision. In the paper, Professor Steve Mann et al. triggered and monitored SSVEPs (Steady State Visual Evoked Potentials) to capture vision.

First identifying the jargon. What are SSVEPs?

SSVEPs are neural responses where the frequency of the evoked potential matches the frequency of the visual stimulus. Staring at a uniformly flicking light or image synchronizes neurons in the visual cortex, particularly in the occipital lobe, to fire with the same rhythm, hence producing an oscillating voltage signal of the same frequency — as shown in Figure 1. For instance, if a dot in the centre of a screen flickers black and white at 25 Hz, the EEG will capture brainwaves with a strong spectral spike at 15 Hz. Although staring at a saturated screen of high-contrast, flickering images is quite uncomfortable and unappealing to the general audience, it’s wonderful nonetheless to retrospect on how biology synchronizes with the nature and environment around them (biologically speaking, we are no less different than any other animal or natural being).

Figure 1: Image of the “SSVEP response to frequency-coded stimuli at the occipital region of the brain”. Zhang, Y., Xu, P., Huang, Y., Cheng, K., & Yao, D. (2013, September 9). SSVEP response is related to functional brain network topology entrained by the flickering stimulus. PloS one. https://pmc.ncbi.nlm.nih.gov/articles/PMC3767745/

And there’s a pattern here. The amplitude or strength of the frequency peak corresponds to certain characteristics at a portion of any random given image. Steve Mann et al. noticed this. But onto that later.

The process of capturing human vision can be described using the term “metavision”. The word itself is, in fact, repeated countlessly throughout the paper to emphasize its particular relevance in the 21st century. Metavision is the position of retrospecting our retrospection, or simply viewing the lens we observe the world with — -a vision of vision. As more sensors and monitors are being actuated in the modern world, along with other connective, sensing technologies such as the Internet of Things, the concept of “meta” becomes more interesting and proactive in the normal human life.

In the paper, the concept of metavision is actuated through the EyeTap Principle. The principle itself is essentially to non-invasively tap into the mind’s eye with a pair of unique eye-glasses. While the wearer’s left eye is unaffected, the device alters or replaces rays of light entering into the right eye with synthetic rays, which arguably becomes a type of CMR (Computer Mediated Reality), as it “enhances” the user’s sensory experience. Note that CMR differs from standard Augmented Reality (AR) systems: where AR overlays digital information onto a view of the world seen through, typically, a pair of goggles, the EyeTap principle “taps” into and intercepts light rays that define reality before they strike the user’s retina. This physical intervention grants the system control over perception, as it alters vision in such a way where digital elements and the physical world are essentially collinear and indistinguishable at the point of retina contact. Fascinating.

During the experiment, a low-cost EEG device — a Muse by Interaxon — was also used, yet it was modified by adding one or more additional EEG electrodes over the occipital lobe. It’s simply wonderful that even sophisticated BCI experiments can be conducted using accessible, commercial-grade hardware. The central electrode was specially placed over the Oz location, as this is the area that allows for the detection of SSVEPs. As explained before, the SSVEP is a specific class of evoked potential that is reliably generated by periodic, flickering visual stimuli. Different brainwaves exist in different regions of the brain, and so Oz location (with other locations O1 and O2 considered as well) was necessary to tap into to monitor these SSVEP residents.

SSVEPs are highly valuable because it provides an involuntary and precisely quantifiable neural responses. When a visual stimulus (such as a cursor or pixel) is made to flicker at a known frequency (i.e. 12 Hz, as identified in the paper), the brain’s visual cortex is forced to resonate at that exact frequency, producing a measurable signal. By mapping the brain’s response to a specific flickering stimulus, the system determines what the human mind is attending to. So, unlike passive eye-tracking, which passively records where the physical eyeball is pointed, SSVEP actively records the brain’s internal, neurological engagement — which, in my opinion, is a very smart choice of study, as it embodies an involuntary honesty in perception.

The power spectra of all brainwave frequencies are then computed using FFT (Fast Fourier Transform). The FFT is an algorithm that breaks complex signals into its individual sinusoidal or frequency components (as illustrated in Figure 2), which is particularly powerful for the team to determine the ratio of the 12 Hz power to the power of the rest of the frequency bands. Interestingly, the power of the 12Hz frequency band is displayed explicitly by an LED light, which changes colors from blue to red in proportion to the relative power of the 12 Hz brainwave frequency. This essentially creates a neurofeedback loop where the user gives the biotech data, and the biotech responds to the user with information about the user themself.

Figure 2: Illustration of the FFT function, separating individual frequency bands from raw EEG time data. ScienceDirect.com. (n.d.). Fast fourier transform — an overview | sciencedirect topics. Fast Fourier Transform. https://www.sciencedirect.com/topics/engineering/fast-fourier-transform

The team was ultimately able to print images of adequate quality, yet doing so required precise methodology. To achieve spatial resolution — that is, to form an actual picture rather than a huge blur — the experiment employed a “raster-scan” technique. A sliding, flickering curser or pixel was moved across visual field in a uniform pattern while the particular coordinates of the image the user observed during the moment was tracked with a Tobii desktop eye tracker. Note that in an improved version of the experiment, there was no need for the eye tracker as a moving cursor was enough to guide and track where the person was looking. In addition, the improved experimental setup offers the user to concentrate on a moving square that flashes bright yellow and blue at 15 Hz while it moves over a non-black part of the image (instead of black and white, for better visual contrast). Hence, at every location where the eyes scanned through, the windowed FFT estimated the SSVEP magnitude (computing the relative power of the 12Hz activity compared to the rest of the spectra). An interesting conclusion was made: lighter areas off the image produced higher SSVEP and darker areas hence produced lower SSVEP. In other words, the more flickering light that entered the eye at the fixed location, the stronger the SSVEP magnitude was. Interesting.

But why so? How does the background of the image affect the SSVEPs? When the sliding pixel or curser flickers over a light area of the background image, the amount of light energy hitting the retina at that precise spot is high. The flicker, therefore, is an intense visual input. The visual processing system, is hence forced into a vigorous resonance, resulting in a high-magnitude SSVEP signal. Conversely, when the curser passes over a dark area, the overall light energy hitting the retina at that point is significantly lower. The visual cortex still resonates, but the required neural activity is weaker, resulting in a low-magnitude SSVEP signal.

The research itself was also driven by a philosophical imperative. The paper argued for the “hypocrisy” of “surveillance” — the act of being recorded by devices from above (or in radical contexts, by authoritative figures) and disallowing users themselves to take and record content. Instead, the paper coined the term “sousveillance”, the act of “recording from below” or recording activity by a general member of the public, typically by wearing personal technologies. This distinction is ethical. It is argued that the increasing prevalence of institutional surveillance creates a “one-sided (sur)veillance game”. Hence, wearable recording technology could potentially create “equiveillance” — a balance of power and right on observation. Citizens now have the technology to capture and document their own story against potentially biased or hidden institutional monitoring, hence fostering a sense of transparency and accountability. The team‘s passion on ’ “sousveillance” was driven to the extent where they decided to print an image “No Cameras” to argue against surveillance, as shown in Figure 3.

Figure 3: Image of the recreated image “No Cameras” from a sliding square flashing yellow and blue. Mann, S., Mann, C., Lam, D., Mathewson, K. E., Stairs, J., Pierce, C., Hernandez, J., Kanaan, G., Piette, L., & Khokhar, H. (2019). The Human Eye as a Camera. https://doi.org/10.1109/healthcom46333.2019.9009592

Despite the team’s methodic approach to metavision, there were inherent limitations to the research. For instance, an LCD monitor was used that displayed irregular pixel onset time for each refresh cycle. The low-cost portable system also utilized dry electrodes which may have reduced optimal signal connections. At times, the user’s eye direction did not linger at a location long enough to extract the SSVEP signals for that particular portion of the image. The team also noticed the significance of other eye movements, such as blinking, on SSVEP detection quality.

While current BCI vision reconstruction methods, including this research, may result in “lossy reconstruction” of images in contrast to the instant, high-resolution of digital cameras, the scientific value is particularly valuable. The primary contribution is not the high-fidelity imagery itself but the validation of the meta-vision principle — the ability to objectively and reliably read the brain’s visual state. This proves the concept that the mind’s eye is readable and quantifiable, providing evidence for the feasibility of Humanistic Intelligence — a term that intertwines humans and computer intelligence, where humans are now integrated in the loop of a reiterative computation and feedback.

Bibliography:

Maklin, C. (2024, February 8). Fast Fourier Transform Explained | Built In. Builtin.com. https://builtin.com/articles/fast-fourier-transform

Mann, S., Mann, C., Lam, D., Mathewson, K. E., Stairs, J., Pierce, C., Hernandez, J., Kanaan, G., Piette, L., & Khokhar, H. (2019). The Human Eye as a Camera. https://doi.org/10.1109/healthcom46333.2019.9009592

Mann, S. (n.d.). Wearable computing: Towards humanistic intelligence. http://n1nlf-1.eecg.toronto.edu/ieeeis_intro.pdf. http://n1nlf-1.eecg.toronto.edu/ieeeis_intro.pdf

ScienceDirect.com. (n.d.). Fast fourier transform — an overview | sciencedirect topics. Fast Fourier Transform. https://www.sciencedirect.com/topics/engineering/fast-fourier-transform

‌The EyeTap Principle: Effectively Locating the Camera Inside the Eye as an Alternative to Wearable Camera Systems. (2001). Intelligent Image Processing, 64–102. https://doi.org/10.1002/0471221635.ch3

Vialatte, F.-B., Maurice, M., Dauwels, J., & Cichocki, A. (2010). Steady-state visually evoked potentials: Focus on essential paradigms and future perspectives. Progress in Neurobiology, 90(4), 418–438

Yang, H., Paller, K. A., & Vugt, van. (2022). The steady state visual evoked potential (SSVEP) tracks “sticky” thinking, but not more general mind-wandering. Frontiers in Human Neuroscience, 16. https://doi.org/10.3389/fnhum.2022.892863

Zhang, Y., Xu, P., Huang, Y., Cheng, K., & Yao, D. (2013, September 9). SSVEP response is related to functional brain network topology entrained by the flickering stimulus. PloS one. https://pmc.ncbi.nlm.nih.gov/articles/PMC3767745/

Learn more about “The Human Eye as a Camera”: A Review of the SSVEP-based BCI Vision Capturing Technique

Leave a Reply