I really enjoyed reading Golan Levin’s, Computer Vision for Artists and Designers, and was grateful for his high-level theoretical summaries, as well as his nods to creative applications of machine learning models. As of writing this post, it can feel like the majority of content circulating about machine learning / deep learning / A.I, quickly devolves into A.I apocolypse fear mongering, the fight for robot rights, ergo human extinction, and other A.I related doomsday prophecies. So although Golan wrote this in 2007, it was inspiring to revisit creative applications of machine learning which still explore ethics, but do so in a creative, playful, and experimental way.
In particular, I was struck by two points Golan made about the potential physical optimizations, and natural world phenomenons, to keep in mind when designing and applying ML and vision models. I plan to keep these both in mind as I explore and create this semester.
Re physical / optical interventions Golan says …
“It is essential to design physical conditions in tandem with the development of computer vision code, and/or to select software techniques which are best compatible with the available physical conditions. Some of the most powerful physical optimizations for machine vision can be made without intervening in the observed environment at all, through well-informed selections of the imaging system's camera, lens, and frame-grabber components. To take one example, the use of a "telecentric" lens can significantly improve the performance of certain kinds of shape-based or size-based object recognition algorithms. For this type of lens, which has an effectively infinite focal length, magnification is nearly independent of object distance. As one manufacturer describes it, "an object moved from far away to near the lens goes into and out of sharp focus, but its image size is constant.” - Golan Levin
Re using the natural world as a starting point for detection and semantic understanding, he states …
“In designing systems to "see for us," we must not only become freshly awakened to the many things about the world which make it visually intelligible to us, but also develop a keen intuition about their ease of computability. The sun is the brightest point in the sky, and by its height also indicates the time of day. The mouth cavity is easily segmentable as a dark region, and the circularity of its shape is also closely linked to vowel sound. The pupils of the eye emit an easy-to-track infrared retroreflection, and they also indicate a person's direction of gaze.” - Golan Levin
Earlier this week I was browsing the p5.js shader examples repo, created by Adam Ferris, and was inspired by several of his examples. I’m a sucker for aesthetics reminiscent of 90’s / VHS ish fidelity, so I wanted to apply a similar effect to the ml5.js BodySegmentation Mask Person example.