Using Browsers for Artificial Intelligence

by Erick Engelke
October 11, 2024

AI and browser technology are being used for many new solutions, both good and bad. I’ll focus on the good and very doable ones here.

Gaining Credibility

I’ll start by showing a web page I made that works on most cell phones, tablets, computers and many IoT (internet of things) devices such as the $100ish Raspberry Pi.

It uses your device’s cameras to view the world, and then interprets whether people or many common objects are visible and count up to 20 of them.

It will detect people facing toward or away from the camera. In contrast, its face detection superbly detects if human faces are present, and how many. The facial detection is particularly useful for seeing who is looking at the camera.

Set the per seconds rate down to 0, and it will typically process 10 or more images per second, which is pretty darn fast for a web application. Setting the detection rate slower than that uses battery more efficiently.

So What?

This solution is coded in JavaScript (using EWB specifically).

If you were to try to do this from scratch, you would find neural networks are slow even on fast computers in faster compiled languages.

Behind the scenes, my code is calling WebAssembly which is like assembly language for the web that does execute the math faster through a library called TensorFlowJS (tfjs).

But the real factor is that many devices (phones, laptops, computers and IoT) now have some form of GPU (graphics processing unit) to speed up graphics. Most browsers allow standardized access to GPUs through a feature called WebGL. TensorFlowJS uses the GPU through WebGL to speed up its numerical computations for AI purposes.

The result is sometimes a thousand-fold performance increase over a bare CPU. And this speed makes possible my sample program.

Why This Matters

Python and C language programmers have been able to produce similar capabilities for some time, but resulting programs would have very specific hardware requirements and configuration challenges. In research, GPUs costing $10,000 each are not uncommon, and mere gaming GPUs are often not supported.

In contrast to specialized GPUs, WebGL works almost everywhere immediately, including inexpensive IoT devices with their low end GPUs. Not only does this reduce expense, it offers more options for deployment.

The camera does not need to be high quality. Internally much of the work is done with 240x240 images, and actually we have to downsize camera images to fit this small window.

It is also common to take a high definition image and break it into regions smaller and scan each of them separately, sort of like a multipage poster. This gives a wider angle view but also a higher quality system.

Processing and interpetting visual images in video feeds are particularly impressive, but many AI operations are done on text and other numeric systems. In the end, all inputs are converted to numbers and consistent higher math is applied to achieve the results.

Applications

There are some professional AI-enhanced applications you know already, such as Zoom or Teams which let you add bunny ears, or blur the background pretty well.

Others simple applications using the graphic input such as we’ve done here include:

  • counting people in a room or store - cheaper than hiring someone to keep track of a room’s usage

  • counting cars passing by (in the the daylight or streetlight)

  • automatic doorbell - notify when somebody is present, such as at a service counter

  • automatic wildlife or people cameras - only record images when a subject is nearby and not the wind swaying the grass.

  • it is possible to further train the system to detect new or specific objects (such as you or your family) and act differently when they appear or disappear as a sentry detection system.

  • interactive hall signs that react differently when someone is present and/or watching closely…​ could be spooky

The ubiquity of access, right down to IoT devices mean the cost opens new options. All they need is a web browser, and probably an Internet connection.

The web based software also ensures updates are easily performed. Just reset the IoO device or send a web based signal, and they pick up the newest version by just refreshing the web page or rebooting. Simplicity is nice.

Further Thoughts

AI can overcome some classic challenges, such as camera-related security issues, because the only data transmitted or stored beyond the device is the number of people, not actual images…​ which are often an invasion of privacy.