Daily AI Skills
Posts
🧠 Can AI Solve Math Puzzles? 🤯 Build a Smart OCR Solver 🚀 with DeepSeek-R1 + Gradio 🖥️

🧠 Can AI Solve Math Puzzles? 🤯 Build a Smart OCR Solver 🚀 with DeepSeek-R1 + Gradio 🖥️

In this DeepSeek-R1 tutorial, you'll learn how to build a math puzzle solver app by integrating DeepSeek-R1 with EasyOCR and Gradio.

February 02, 2025

In this hands-on guide, I’ll use the DeepSeek-R1 model to build a math puzzle solver assistant integrated with EasyOCR and Gradio.

I’ll explain step-by-step how to build a functional web app capable of solving a wide range of mathematical puzzles and generating helpful solutions using the excellent reasoning capabilities of the DeepSeek R1 model.

DeepSeek-R1 Demo Project: Overview

To build our puzzle solver assistant, we’ll go over the following steps:

Set up the necessary prerequisites.
Initialize the model with optimized configurations.
Define core functionalities, using the model's instruct capabilities.
Integrate the components into a user-friendly Gradio interface for easy interaction.

Step 1: Prerequisites

Before diving into the implementation, let’s ensure that we have the following tools and libraries installed:

Python 3.8+
PyTorch: For efficient deep learning model handling.
EasyOCR: A Python module for extracting text from image.
Gradio: To create a user-friendly web interface.

Run the following commands to install the necessary dependencies:

!pip install torch gradio pillow easyocr -q

Once the above dependencies are installed, run the following import commands:

Import torch
from PIL import Image
import easyocr
import requests
import json
import gradio as gr

Step 2: Setting Up the DeepSeek-R1 API

The following script demonstrates how to interact with the DeepSeek API to obtain responses based on user prompts. Note that DeepSeek's API is compatible with OpenAI's format and uses a base URL for API requests.

You can either directly pass in the API key (not recommended for privacy reasons), or if using Google Colab like me, you can save the API key using the Secrets feature. Alternatively, you can use environment variable .

# DeepSeek API configuration
DEEPSEEK_API_URL = "https://api.deepseek.com/v1/chat/completions"

# If you're using Colab and storing your key in the Secrets tab:
from google.colab import userdata
API_KEY = userdata.get('SECRET_KEY')

# If you are running this code elsewhere then, replace 'YOUR_API_KEY' with your actual DeepSeek API key. Uncomment the following line of code.
#API_KEY = 'YOUR_API_KEY'

At the time of publishing this article, DeepSeek’s services are under heavy load, and their performance is degraded—I’ve also had major difficulties running the code for this project. Please check DeepSeek’s status page before attempting to run the code in this project.

Step 3: Designing the Core Features

Now the API is set, we can work on the code features. In this section, we’ll process an image containing a logic puzzle, extract the puzzle text using OCR, refine the text, and send it to the DeepSeek API for solving. Let’s first see the code, and then I’ll explain it.

reader = easyocr.Reader(['en'])

def solve_puzzle(image):
    """Extracts the puzzle from the image and sends it to DeepSeek for solving."""
    try:
        # 1. Save the uploaded image temporarily; EasyOCR uses file paths
        image_path = "uploaded_image.png"
        image.save(image_path)

        # 2. Extract text from the image using EasyOCR
        results = reader.readtext(image_path)
        extracted_text = " ".join([res[1] for res in results])

        # Standardize the text to avoid misinterpretation of "??" as "2?"
        extracted_text = extracted_text.replace('??', '?')
      
        if "?" not in extracted_text:
            extracted_text += "?"

        print("Extracted Text:", extracted_text)  # Debugging output

        # 3. Refine the extracted text to standardize expressions
        refined_text = extracted_text.replace('x', '*').replace('X', '*').replace('=', ' = ').strip()
        print("Refined Text:", refined_text)  # Debugging output

        # 4. Compose the user message with concise instructions
        puzzle_prompt = (
            f"You are an AI specialized in solving puzzles. Analyze the following, identify hidden patterns or rules, and provide the missing value with step-by-step reasoning in text format. Do not return an answer in Latex."
            f"\nPuzzle:\n{refined_text}\n"
            "Format your response strictly as follows:\n"
            "1. **Given Equation**:\n   - (original equations)\n"
            "2. **Pattern Identified**:\n   (explain the hidden logic)\n"
            "3. **Step-by-step Calculation**:\n   - For (input values):\n     (calculation and result)\n"
            "4. **Final Answer**:\n     (Answer = X)"
        )

        messages = [
            {"role": "user", "content": puzzle_prompt}
        ]

        # 5. Optimized API request for faster response
        data = {
            "model": "deepseek-reasoner",
            "messages": messages,
            "temperature": 0,  
            "max_tokens": 100  
        }

        headers = {
            "Authorization": f"Bearer {API_KEY}",
            "Content-Type": "application/json"
        }

        # 6. Send the request to DeepSeek with a timeout
        response = requests.post(DEEPSEEK_API_URL, headers=headers, json=data, timeout=15)

        # 7. Check the result
        if response.status_code == 200:
            try:
                json_resp = response.json()
                return json_resp.get("choices", [{}])[0].get("message", {}).get("content", "Error: No response content.").strip()
            except json.JSONDecodeError:
                return "Error: Invalid JSON response from DeepSeek API."
        else:
            return f"Error: DeepSeek API failed with status code {response.status_code}, Response: {response.text}"
    except requests.exceptions.Timeout:
        return "Error: DeepSeek API request timed out. Please try again."
    except Exception as e:
        return f"Error: {str(e)}"

The solve_puzzle() function processes an image containing a logic puzzle and solves it using the OCR and R1 model. It follows these steps:

Initialize EasyOCR: We start by initializing the EasyOCR reader in English.
Image processing: The uploaded image is saved temporarily and processed using EasyOCR to extract text.
Text refinement: The extracted text is standardized to ensure consistency and accuracy.
Query composition: A structured query is created, including the refined puzzle text and specific instructions for solving.
API interaction: The query is sent to the DeepSeek API, which analyzes and solves the puzzle. Make sure to use the deepseek-reasoner model to use DeepSeek-R1. If you want to use DeepSeek-V3, use deepseek-chat. And always be aware of pricing, so check the pricing page for the most up-to-date information.
Response handling: The API response is processed to extract and return the solution or appropriate error messages.
Error handling: The function also manages issues like timeouts or unexpected exceptions, ensuring robust operation.

This pipeline combines OCR for text extraction and the DeepSeek API for intelligent puzzle-solving.

Step 4: Creating the Gradio Interface

Gradio allows us to create an interactive web interface for our application. The following code snippet creates a user-friendly Gradio web interface for the solve_puzzle() function. The Gradio interface takes the user’s inputs and passes them to the model for validation.

interface = gr.Interface(
    fn=solve_puzzle,
    inputs=gr.Image(type="pil"),
    outputs="text",
    title="Logic Puzzle Solver with EasyOCR & DeepSeek",
    description="Upload an image of a logic puzzle, and the model will solve it for you."
)
interface.launch(debug=True)

The above setup includes three components:

Input: A gr.Image component where users can upload their images.
Output: A text component for displaying the answer from DeepSeek-R1.
Interface: The gr.Interface() function ties the input and output together, launching a web app for user interaction.

Step 5: Test the App

Let’s test our app with a puzzle that involve math and logic.

If you look at the first row, you’ll see 1 + 4 = 5, and you may say this is a simple addition. But on the second row we have 2 + 5 = 12, and then 3 + 6 = 21. Can you figure out the pattern and solve 8 + 11 = ?

If you look on the right side of the Gradio interface, you’ll see that the Puzzle Solver app has identified the pattern:

Conclusion

In this tutorial, we built a math puzzle solver assistant using DeepSeek R1 combined with OCR and Gradio to solve math puzzles. To keep up with the latest in AI, I recommend these blogs: