Our brains seem capable of effortlessly constructing connections between sound and image. The art of Foley in filmmaking is the best evidence of this: we are adept at establishing a correspondence between the scene before our eyes and simultaneous—yet merely plausible—sounds. This ability allows us to integrate information from different senses into a complete perceptual experience.
If field recording and synchronous sound—acts of presence and embodiment—reinforce the unity of the on-site audio-visual experience, then in the era of "schizophonia," when sound is severed from its original environment, we are often compelled to perform a kind of subjective mental reconstruction (whether passive or active) of the relationship between what we hear and what we see.
As technological progress makes audio post-production increasingly easy, and as audio tracks can be effortlessly dragged and edited on a timeline, has sonic artifice become a habit, while "authentic listening" has become a luxury?
When we walk down the street wearing headphones, the sounds inside and outside mix together to form a new soundscape. Which part constitutes the real sound event, the true sonic site?
When watching online videos, frame rate issues sometimes cause audio-visual desynchronization. With a slight sense of dissonance, our brains begin to fill that tiny temporal gap until the discrepancy between sound and image is ignored and eliminated.
In all the above cases, our brains are eager to construct relationships between sound and image based on experience and imagination—yet the truth and accuracy of these relationships remain open to question.
I am not here to advocate for a pursuit of "truth" or "accuracy."
For those fortunate enough to both hear and see the world, we face two distinct audio-visual relationships: When listening on-site, sound is concretized by what is seen; everything is precise, inevitable, and one-to-one. When listening to a recording, sound contains vast amounts of memory, imagination, and conjecture, forming blurry images in the mind.
Just as Foley in film supplements a fixed image with specious sound, if we supplement fixed sound with a specious image (let us tentatively call this act "Visual Foley" or "Imitative Imagery"), can we deceive our brains? Can we assign concrete visual symbols to sound, forcibly yet unobtrusively constructing a fictional audio-visual relationship?
In this project, I recorded a soundscape of New York’s Times Square and scraped YouTube for walking tour videos from city centers around the world to correspond with this audio. Every time the page is refreshed, a new video is grabbed to pair with these sounds.
Perhaps, in the context of globalization, these scenes upon which people rely for survival are becoming increasingly convergent. Consequently, our fictional audio-visual relationship approaches plausibility, failing to even arouse the viewer's suspicion.
But at what degree of contradiction is this fictional relationship exposed? Do those moments of exposure—where the language, scenery, or density of the sound differs distinctively from the image—reveal to us the unique soundscape of a specific place?
When the same segment of audio is endowed with different images, do the seemingly reasonable pairings slide by unnoticed, as if sound is tamely subservient to vision? Meanwhile, do the conflicting pairings leap abruptly from background to foreground—appearing bizarre, chaotic, and illogical—allowing our hearing to briefly wrest hegemony away from the dominance of vision?
我们的大脑似乎能轻易构建起声音和画面之间的联系。电影制作中的拟声音效就是最好的佐证:我们善于将眼前的画面和那些与之同时发生的、似是而非的声音建立起对应关系。这种能力使我们能够将来自不同感官的信息整合在一起,形成完整的感知体验。
如果说田野录音&同期声这种在场和具身的行为强化了现场声画之间的统一性,那么在裂声症时代,在声音失去它的原始环境的时候,我们常常需要对声音和画面之间的关系进行一种(或被动或主动的)主观脑补。
当技术进步使得声音的后期制作更加容易,当音轨可以轻易地在音视频制作软件时间轴上被编辑和拖拽,是否声音的矫饰已经成为习惯,而原始聆听已经成为奢望?
当我们带着耳机走在路上,耳机内外的声音混合在一起,形成了一种新的声音景观,哪个部分才是真实的声音事件和声音现场?
在观看网络视频时,有时会因为帧率的问题出现音画不同步的现象。 带着小小的违和感,我们的大脑开始地填补那一点点不同步的时间差,直到声画之间的差异被我们忽视和消弭。
在以上种种情况下,我们的大脑热衷于根据自己的经验和想象来构建声音和画面之间的关系,而这种关系的真实性和准确性似乎有待商榷。
我并非在此鼓吹一种对“真实”或“准确”的追求。
对于有幸能同时听见和看见世界的人来说,我们所要面对的是两种截然不同的声画关系:
聆听现场时,声音被眼前之所见具像化,一切都是精确的、必然的、一一对应的。
聆听录音时,声音包含了大量的回忆、想象、猜测,模糊的画面在脑中形成。
正如电影中的拟声音效对固定的画面佐以似是而非的声音,假使我们对固定的声音佐以一种似是而非的画面 (此处暂且将这种行为称为拟画),我们是否能骗过我们的大脑,给声音赋予具体的视觉符号,强行而不动声色地构建出一种虚构的声画关系?
在这个项目中,我录制了一段纽约时代广场的声景,并抓取了 youtube 上世界各地城市中心漫步的视频,对应在这段音频上。每刷新一次页面都会抓取新的视频来搭配这些声音。
或许在全球化的背景下,这些人们赖以生存的场景越来越趋同,而我们虚构的声画关系趋近于合理,甚至并不引起观者的疑心。
而当声画相悖到什么程度,它们之间的虚构关系会被识破?是否那些被识破的时刻,那些声音中的语言、场景、疏密程度同画面明显不同的时刻,正向我们昭示了某地的独特声景?
当同一段声音被赋予不同的画面,是否那些貌似合理的声画会不被察觉地掠过,仿佛那声音正温顺地为视觉而服务;而相左的声画则会突兀地从背景跳跃到前景,显得怪异、混乱、失去逻辑,使我们的听觉短暂地抢夺过视觉主导的霸权?