Build a Gesture-Based Math Solver using Python

Imagine solving math problems with just your hands โ€” no keyboard, no mouse, no calculator. Sounds cool, right? Thatโ€™s exactly what weโ€™ll build in this project: a real-time hand gesture-controlled math solver.

This project uses your webcam to recognize hand gestures and builds math expressions (like 2 + 3 =) based on the fingers you show. It’s perfect for those learning Python + Computer Vision + AI.


If You want more Web Dev Projects Find Here

๐Ÿš€ What is a Gesture-Based Math Solver?

A Gesture-Based Math Solver is an interactive Python app that:

  • Tracks your hand movements
  • Recognizes how many fingers you show
  • Converts those into numbers or operations
  • Evaluates the expression in real-time

All using just your webcam + hands + Python magic!


๐ŸŽฏ What You Will Learn

โœ… Real-time webcam input using OpenCV
โœ… Hand gesture recognition using MediaPipe
โœ… Finger counting logic with Python
โœ… Building and evaluating expressions
โœ… Gesture commands like “delete”, “clear”, “exit”


๐Ÿงฐ Tech Stack Used

ToolPurpose
PythonMain programming language
OpenCVCapture and display webcam input
MediaPipeAI-powered hand tracking (21 keypoints/hand)
NumPyMath calculations (like finger distance)
eval()Pythonโ€™s built-in method to evaluate math

๐Ÿ“ท How it Works โ€“ Behind the Scenes

Step-by-Step Process:

  1. Start Webcam
    OpenCV captures your video feed in real time.
  2. Detect Hands
    MediaPipe finds and tracks your hand landmarks (21 points per hand).
  3. Count Fingers
    Logic checks which fingers are up or down to detect:
    • Numbers (0โ€“9)
    • Operators (+, โˆ’, ร—, รท)
    • Commands (=, delete, clear, exit)
  4. Build Math Expression
    Each gesture adds a part to the expression.
  5. Evaluate Expression
    Show the “equal” gesture to solve it.

๐Ÿง  Hand Gesture Reference Guide

Gesture (Fingers)ActionMeaning
0 fingers (1 hand)Add 0Digit
1 to 5 fingersAdd 1 to 5Digit
5 + 1 fingersAdd 6Digit
5 + 2 fingersAdd 7Digit
5 + 3 fingersAdd 8Digit
5 + 4 fingersAdd 9Digit
1 + 1 fingers (2 hands)Add +Addition
1 + 2 fingersAdd โˆ’Subtraction
1 + 3 fingersAdd ร—Multiplication
1 + 4 fingersAdd รทDivision
0 + 0 fingers=Evaluate
2 + 2 fingersdelDelete last
5 + 5 fingersclearClear input
Index finger closeExitQuit app

๐Ÿ”จ Step-by-Step Guide to Build the Project

1. Create a Project Folder

mkdir GestureMathSolver
cd GestureMathSolver
python -m venv venv

๐Ÿง  2. Activate Virtual Environment

.\venv\Scripts\activate   # For Windows

๐Ÿ“ฆ 3. Install Required Libraries

pip install opencv-python mediapipe numpy

๐Ÿ“„ 4. Create a Python File

Save as gesture_math_solver.py

Complete Python Code: gesture_math_solver.py

import cv2 as cv
import mediapipe as mp
import numpy as np
import time

# MediaPipe modules for hand detection
mp_drawing = mp.solutions.drawing_utils
mp_hands = mp.solutions.hands
hands = mp_hands.Hands(max_num_hands=2, min_detection_confidence=0.7,
                       min_tracking_confidence=0.7)

# Function to calculate Euclidean distance between two hand landmarks
def euclidean_distance(p1, p2):
    return np.sqrt((p1.x - p2.x) ** 2 + (p1.y - p2.y) ** 2)

# Function to count number of fingers raised
def count_fingers(hand_landmarks, label):
    tip_ids = [4, 8, 12, 16, 20]
    fingers = []

    # Thumb logic
    if label == "Left":
        fingers.append(1 if hand_landmarks.landmark[tip_ids[0]].x > hand_landmarks.landmark[tip_ids[0]-1].x else 0)
    else:
        fingers.append(1 if hand_landmarks.landmark[tip_ids[0]].x < hand_landmarks.landmark[tip_ids[0]-1].x else 0)

    # Other four fingers
    for i in range(1, 5):
        fingers.append(1 if hand_landmarks.landmark[tip_ids[i]].y < hand_landmarks.landmark[tip_ids[i]-2].y else 0)

    return fingers.count(1)

# Function to detect gestures using two hands
def detectGesture(hand1_data, hand2_data):
    (hand1, label1), (hand2, label2) = hand1_data, hand2_data
    f1 = count_fingers(hand1, label1)
    f2 = count_fingers(hand2, label2)
    dist = euclidean_distance(hand1.landmark[8], hand2.landmark[8])

    if f1 == 1 and f2 == 1:
        if dist < 0.06:
            return "exit"
        return "+"
    elif (f1 == 1 and f2 == 2) or (f1 == 2 and f2 == 1):
        return "-"
    elif (f1 == 1 and f2 == 3) or (f1 == 3 and f2 == 1):
        return "*"
    elif (f1 == 1 and f2 == 4) or (f1 == 4 and f2 == 1):
        return "/"
    elif f1 == 2 and f2 == 2:
        return "del"
    elif (f1 == 1 and f2 == 5) or (f1 == 5 and f2 == 1):
        return "6"
    elif (f1 == 2 and f2 == 5) or (f1 == 5 and f2 == 2):
        return "7"
    elif (f1 == 3 and f2 == 5) or (f1 == 5 and f2 == 3):
        return "8"
    elif (f1 == 4 and f2 == 5) or (f1 == 5 and f2 == 4):
        return "9"
    elif f1 == 0 and f2 == 0:
        return "="
    elif f1 == 5 and f2 == 5:
        return "clear"
    return None

# Initialize variables
last_update_time = 0
delay = 1.25
expression = ""
res = ""

# Open webcam
cap = cv.VideoCapture(0)

while True:
    success, image = cap.read()
    if not success:
        continue

    image = cv.flip(image, 1)
    img_rgb = cv.cvtColor(image, cv.COLOR_BGR2RGB)
    result = hands.process(img_rgb)
    current_time = time.time()
    hand_data = []

    # Detect hands and count fingers
    if result.multi_hand_landmarks and result.multi_handedness:
        for hand_landmarks, hand_handedness in zip(result.multi_hand_landmarks, result.multi_handedness):
            label = hand_handedness.classification[0].label
            hand_data.append((hand_landmarks, label))
            mp_drawing.draw_landmarks(image, hand_landmarks, mp_hands.HAND_CONNECTIONS)

        # Single hand input for digits 0โ€“5
        if len(hand_data) == 1:
            hand_landmarks, label = hand_data[0]
            fingers_up = count_fingers(hand_landmarks, label)
            if fingers_up in [0, 1, 2, 3, 4, 5] and current_time - last_update_time > delay:
                expression += str(fingers_up)
                last_update_time = current_time

        # Two-hand gestures for operations/commands
        if len(hand_data) == 2:
            gesture = detectGesture(hand_data[0], hand_data[1])

            if gesture == "exit":
                break
            elif gesture == "clear":
                expression = ""
                res = ""
                last_update_time = current_time
            elif gesture == "del" and current_time - last_update_time > delay:
                expression = expression[:-1]
                last_update_time = current_time
            elif gesture == "=" and current_time - last_update_time > delay:
                try:
                    res = str(eval(expression))
                except:
                    res = "Error"
                last_update_time = current_time
            elif gesture and current_time - last_update_time > delay:
                expression += gesture
                last_update_time = current_time

    # Display on screen
    cv.putText(image, f'Expr: {expression}', (10, 50), cv.FONT_HERSHEY_SIMPLEX, 1, (255, 0, 0), 2)
    cv.putText(image, f'Result: {res}', (10, 100), cv.FONT_HERSHEY_SIMPLEX, 1.2, (0, 0, 255), 2)
    cv.imshow("Gesture Math Solver", image)

    key = cv.waitKey(1) & 0xFF
    if key == ord('q') or key == 27:
        break
    elif key == ord('c'):
        expression = ""
        res = ""

# Cleanup
cap.release()
cv.destroyAllWindows()

๐Ÿ’ก Explanation of Key Modules

๐Ÿง  OpenCV

  • Captures video from the webcam
  • Flips the frame (mirror effect)
  • Displays text like Expression: and Result:

โœ‹ MediaPipe

  • Detects your hand
  • Tracks finger tips and joints in real-time
  • Provides landmark points for every finger

๐Ÿ“ NumPy

  • Used to calculate distance between index fingers
  • Helps detect gestures like exit (when two index fingers are close)

๐Ÿงฎ Python eval()

  • Takes your expression like 2+3*4
  • Returns result: 14

๐Ÿ“Œ Example: Real-Life Use Case

You do the following:

  1. โœŒ๏ธ Show 2 fingers โ†’ Adds 2
  2. โ˜๏ธโœŒ๏ธ Show 1 + 2 fingers โ†’ Adds -
  3. ๐Ÿคž Show 3 fingers โ†’ Adds 3
  4. ๐Ÿ‘Š๐Ÿ‘Š Closed fists โ†’ Triggers =

๐Ÿ‘‰ Output:

Expression: 2-3
Result: -1

โœ… Final Touch: Run the App

python gesture_math_solver.py

๐ŸŽ‰ Your webcam opens. Try the gestures and watch the magic!


๐Ÿ”š Final Thoughts

This project is a great example of how Computer Vision + AI + Python can create fun, hands-free applications.

Itโ€™s perfect for:

  • Students learning OpenCV/MediaPipe
  • Accessibility tools
  • AR/VR app development
  • Viral tech demo for Instagram Reels

๐Ÿ“ฅ Bonus Download

You can grab the full code and run it locally:
๐Ÿ”— Download from GitHub (replace with actual link)


๐Ÿ“ฃ Let Me Know!

Tried this project?
Tag me on Instagram @uday.codes or comment below ๐Ÿ‘‡
Iโ€™d love to see your version!

Leave a Comment