Samad Shaikh is a professional Software Engineer & AI Specialist based in Bandra, Mumbai, India. He builds high-performance web applications, SaaS platforms, and integrates agentic LLM workflows.

What does Samad Shaikh do?

Samad Shaikh is a Software Engineer specializing in full-stack web architectures (React, Next.js, Node.js) and generative AI application design (FastAPI, Python, agentic pipelines).

Where does Samad Shaikh live and work?

Samad Shaikh resides and operates in Bandra, Mumbai, Maharashtra, India.

What is Samad Shaikh's email address?

You can contact Samad Shaikh via email at sxmxd.1825@gmail.com.

Where did Samad Shaikh graduate from?

Samad Shaikh graduated with a Bachelor of Science (B.Sc.) in Computer Science from M.P.S.P.S College, University of Mumbai.

What was Samad Shaikh's CGPA in university?

Samad Shaikh achieved a cumulative GPA score of 8.25/10 during his B.Sc. Computer Science degree.

Where did Samad Shaikh complete his high school education?

Samad completed his high school education in science stream at RD National College.

What is Samad Shaikh's portfolio website?

Samad Shaikh's primary portfolio site is https://www.samadshaikh.dev.

What is Samad Shaikh's resume website?

Samad Shaikh's official online resume portal is https://samadshaikh.me.

What products has Samad Shaikh developed?

Samad Shaikh has developed MockMate AI (interview platform), PriMaX Hub (SaaS dashboard), Planora (social scheduler), WebLens (SEO/performance auditor), and LegalEase (legal contract simplifier).

MockMate AI is an interactive AI-driven interview practice application built by Samad Shaikh featuring real-time speech analytics and voice feedback loops.

PriMaX Hub is a high-performance, multi-module business SaaS application created by Samad Shaikh using Supabase and optimized databases.

What is Samad Shaikh's LinkedIn URL?

Samad Shaikh's primary LinkedIn profile is https://www.linkedin.com/in/samad-ai.

What is Samad Shaikh's GitHub profile?

Samad Shaikh's GitHub developer page is https://github.com/The-Syntax-Slayer.

What is Samad Shaikh's Instagram handle?

Samad Shaikh's Instagram handle is @x0.sammmm (https://www.instagram.com/x0.sammmm/).

What is Samad Shaikh's about.me page?

Samad Shaikh's about.me profile is located at https://about.me/samad_shaikh.

What is Samad Shaikh's age?

Samad Shaikh was born on December 18, 2004, which makes him 21 years old as of 2026.

Is Samad Shaikh a freelancer?

Yes, Samad Shaikh operates as an independent Software Engineer and consultant, offering freelance full-stack development and AI integration services to global clients.

What is the official Samad Shaikh developer portfolio?

The official developer portfolio of Samad Shaikh is hosted at https://www.samadshaikh.dev where you can view his case studies, technical blog, and contact details.

What does Samad Shaikh specialize in?

Samad Shaikh specializes in full-stack web development (React, Next.js, Node.js) and Generative AI engineering (FastAPI, Python, agentic LLM pipelines) in Mumbai, India.

What programming languages does Samad Shaikh know?

Samad Shaikh is proficient in TypeScript, JavaScript, Python, SQL, and HTML/CSS, which he uses to build robust full-stack applications.

What certifications does Samad Shaikh hold?

Samad Shaikh holds 10 professional certifications, including Google Data Analytics, IBM AI Developer, AWS Cloud Practitioner, Meta Front-End Developer, and Microsoft Azure AI Engineer.

Is Samad Shaikh a React developer?

Yes, Samad Shaikh is a highly skilled React developer with deep expertise in React 19, Next.js, state management (Zustand, Redux), and frontend optimization.

Does Samad Shaikh have experience with FastAPI?

Yes, Samad Shaikh builds backend APIs and microservices using FastAPI and Tornado in Python for asynchronous performance and scalability.

Does Samad Shaikh develop Next.js applications?

Yes, Samad Shaikh builds SEO-friendly, performance-oriented web applications using Next.js, Tailwind CSS, and Server Components.

What topics does Samad Shaikh write about on his blog?

Samad Shaikh's blog focuses on advanced topics like Agentic RAG, real-time speech analytics, asynchronous concurrency in Python, React 19 production features, and web application security.

How can I contact Samad Shaikh for business or recruitment?

You can contact Samad Shaikh directly by email at sxmxd.1825@gmail.com or via his LinkedIn profile at https://www.linkedin.com/in/samad-ai.

What are Samad Shaikh's main GitHub projects?

Some of Samad Shaikh's primary open-source projects on GitHub include MockMate AI, PriMaX Hub, Planora, WebLens, and LegalEase, hosted under his username The-Syntax-Slayer.

What features does MockMate AI by Samad Shaikh offer?

MockMate AI features real-time audio and speech analytics using the Web Audio API and Google Gemini multimodal reasoning to grade candidate interviews.

What is the tech stack of PriMaX Hub built by Samad Shaikh?

PriMaX Hub uses React, TypeScript, Node.js, MongoDB, and the Google Gemini API to build a multi-module productivity SaaS.

Does Samad Shaikh have a computer science degree?

Yes, Samad Shaikh completed a Bachelor of Science (B.Sc.) in Computer Science from M.P.S.P.S College, University of Mumbai, with a CGPA of 8.25/10.

Samad Shaikh kaun hai?

Samad Shaikh ek professional Software Engineer aur AI Specialist hai jo Bandra, Mumbai, India me rehta hai. Yeh high-performance web applications aur agentic AI systems banate hai.

Samad Shaikh kya kaam karta hai?

Samad Shaikh software engineering, full-stack web development (React, Next.js, Node.js) aur generative AI application design (FastAPI, Python, agentic LLM workflows) me specialize karte hai.

Samad Shaikh ne college kaha se kiya hai?

Samad Shaikh ne B.Sc. Computer Science ki degree M.P.S.P.S College, University of Mumbai se ki hai, aur jisme unka CGPA score 8.25/10 tha.

Samad Shaikh se contact kaise kare?

Aap Samad Shaikh se sxmxd.1825@gmail.com par email ke zariye connect kar sakte hai ya unke LinkedIn (linkedin.com/in/samad-ai) par message bhej sakte hai.

Samad Shaikh ke projects aur products kya hai?

Samad Shaikh ne MockMate AI (interview platform), PriMaX Hub (SaaS application), Planora (social scheduler), WebLens (auditor), aur LegalEase (simplifier) banaye hai.

Samad Shaikh ki age kya hai?

Samad Shaikh ka janam 18 December 2004 ko hua tha, toh abhi unki age 21 years hai.

Kya Samad Shaikh freelance kaam karte hai?

Haan, Samad Shaikh ek independent Software Engineer hai jo freelance development aur AI integration services provide karte hai.

Samad Shaikh ke paas kaunsi certifications hai?

Samad Shaikh ke paas Google, IBM, Microsoft, AWS, aur Meta se 10 professional certifications hai.

// Backend blueprint

Taming Asyncio: Handling 10k+ Concurrent LLM Requests with Tornado & FastAPI

Learn how to build asynchronous Python servers that manage high-concurrency LLM streaming requests without blocking the event loop.

Published: April 18, 2026 · 12 min read · Category: Backend

Tags: Python, FastAPI, Tornado, Asyncio, Concurrency, HTTPX

Introduction

Imagine you are running a busy restaurant. When a customer orders a meal, the waiter takes the order to the kitchen, stands next to the chef for ten minutes waiting for the food to cook, and only returns to serve other customers once the meal is ready. The restaurant would go out of business on its first day.

Unfortunately, this is exactly how traditional synchronous Python web servers (like standard Flask or WSGI-based Django) operate.

When a client calls your backend API to stream a Large Language Model (LLM) response (which can take 5 to 10 seconds of waiting for token generation over the network), a synchronous server thread blocks. It cannot process any other user requests until that network stream completes. If your server is configured with 20 worker threads, your 21st user will experience a connection timeout, even if your server's CPU and memory usage are near zero.

To handle 10,000+ concurrent connections without buying expensive server clusters, we must use Asynchronous Python (asyncio) with frameworks like FastAPI or Tornado. Asynchronous servers act like smart waiters: they place an order with the kitchen, immediately go serve other tables, and return to collect the food only when the kitchen signals it is ready.

This guide details how to build non-blocking streaming servers in Python.

The Asynchronous Event Loop Mechanics

The following diagram illustrates how a single-threaded async event loop schedules and executes multiple concurrent client connections, polling non-blocking sockets without blocking processing:

   Client A Request       Client B Request       Client C Request
          │                      │                      │
          ▼                      ▼                      ▼
  +─────────────────────────────────────────────────────────────+
  |                   FastAPI / Tornado Server                  |
  |                                                             |
  |  +───────────────────────────────────────────────────────+  |
  |  |                 Asyncio Event Loop                    |  |
  |  |                                                       |  |
  |  |  +------------+  +------------+  +------------+       |  |
  |  |  | Task A     |  | Task B     |  | Task C     |       |  |
  |  |  | (Client A) |  | (Client B) |  | (Client C) |       |  |
  |  |  +----+-------+  +----+-------+  +----+-------+       |  |
  |  |       |               |               |               |  |
  |  +───────┼───────────────┼───────────────┼───────────────+  |
  +──────────┼───────────────┼───────────────┼──────────────────+
             │               │               │
             v (Await IO)    v (Await IO)    v (Await IO)
       +-----+-----+   +-----+-----+   +-----+-----+
       | Gemini    |   | Database  |   | Cache     |
       | Network   |   | Read      |   | Lookup    |
       +-----------+   +-----------+   +-----------+

When Task A awaits network IO from the Gemini API, the event loop pauses its execution, registers the socket, and switches to Task B. This co-operative multitasking allows a single thread to handle thousands of open connections.

Step-by-Step FastAPI Implementation

To keep an asynchronous server fast under load, we must maintain a shared connection pool using httpx.AsyncClient and stream responses back to the client chunk-by-chunk.

# filepath: src/server.py
import asyncio
import os
import sys
from typing import AsyncGenerator
import httpx
from fastapi import FastAPI, HTTPException
from fastapi.responses import StreamingResponse
from pydantic import BaseModel

app = FastAPI(title="High-Concurrency LLM Backend")

# Configure a shared connection pool
# max_keepalive_connections: Number of idle connections to keep open
# max_connections: Maximum limit of concurrent sockets
connection_limits = httpx.Limits(max_keepalive_connections=200, max_connections=1000)
async_client = httpx.AsyncClient(limits=connection_limits, timeout=60.0)

class GenerationRequest(BaseModel):
    prompt: str

async def gemini_stream_generator(prompt: str) -> AsyncGenerator[bytes, None]:
    '''
    Streams token responses directly from Google's Gemini API
    without blocking the application event loop.
    '''
    api_key = os.environ.get("GEMINI_API_KEY")
    if not api_key:
        yield b"Error: GEMINI_API_KEY environment variable is missing."
        return

    url = (
        f"https://generativelanguage.googleapis.com/v1beta/models/"
        f"gemini-1.5-flash:streamGenerateContent?key={api_key}"
    )
    
    payload = {
        "contents": [{
            "parts": [{"text": prompt}]
        }]
    }

    try:
        # Stream response from external API
        async with async_client.stream("POST", url, json=payload) as response:
            if response.status_code != 200:
                yield f"API Error (Status {response.status_code})".encode()
                return

            # Read the stream chunks as they arrive
            async for raw_chunk in response.aiter_bytes():
                yield raw_chunk
                
                # Co-operatively yield control back to the event loop
                await asyncio.sleep(0.001)
                
    except httpx.RequestError as exc:
        yield f"Network failure during LLM connection: {str(exc)}".encode()
    except Exception as exc:
        yield f"Unexpected stream disruption: {str(exc)}".encode()

@app.post("/api/v1/generate-stream")
async def generate_stream(request: GenerationRequest):
    if not request.prompt.strip():
        raise HTTPException(status_code=400, detail="Prompt cannot be empty.")
        
    return StreamingResponse(
        gemini_stream_generator(request.prompt),
        media_type="text/event-stream"
    )

@app.on_event("shutdown")
async def shutdown_event():
    # Clean up the shared connection pool during server shutdown
    await async_client.aclose()

Alternative: Tornado Concurrency Handler

Tornado is a mature, high-performance asynchronous web framework that is well-suited for raw network socket handling. Here is how you implement a non-blocking streaming handler in Tornado:

# filepath: src/tornado_server.py
import os
import json
import tornado.ioloop
import tornado.web
from tornado.httpclient import AsyncHTTPClient, HTTPRequest

class StreamHandler(tornado.web.RequestHandler):
    async def post(self):
        body = json.loads(self.request.body)
        prompt = body.get("prompt", "")
        
        api_key = os.environ.get("GEMINI_API_KEY")
        url = (
            f"https://generativelanguage.googleapis.com/v1beta/models/"
            f"gemini-1.5-flash:streamGenerateContent?key={api_key}"
        )
        
        payload = {
            "contents": [{"parts": [{"text": prompt}]}]
        }
        
        client = AsyncHTTPClient()
        
        def handle_chunk(chunk):
            # Write chunks directly to the response buffer as they are received
            self.write(chunk)
            self.flush()

        request = HTTPRequest(
            url,
            method="POST",
            headers={"Content-Type": "application/json"},
            body=json.dumps(payload),
            streaming_callback=handle_chunk,
            request_timeout=60.0
        )
        
        await client.fetch(request)
        self.finish()

def make_app():
    return tornado.web.Application([
        (r"/api/v1/stream", StreamHandler),
    ])

if __name__ == "__main__":
    app = make_app()
    app.listen(8080)
    tornado.ioloop.IOLoop.current().start()

Technical Deep Dive: Event Loops & Concurrency Tuning

1. The Cardinal Sin: Blocking the Event Loop

Because an asynchronous server runs on a single execution thread, any blocking call halts all concurrent requests.

Do not use time.sleep(): This blocks the thread. Use await asyncio.sleep().
Do not use standard SQL or HTTP clients: Synchronous clients like requests or psycopg2 block the loop. Use async clients like httpx or asyncpg.
Identifying Blocking Calls: If a request blocks the event loop for longer than 50 milliseconds, the event loop will lag. You can debug this by enabling debug mode in your development environment:

  import logging
  asyncio.run(main(), debug=True)
  # Logs warning messages like: "Executing <Task...> took 0.150 seconds"

2. Offloading CPU-Bound Operations

If your server needs to run CPU-heavy operations (such as resizing images, parsing large JSON blocks, or running local machine learning models), you must offload those tasks to a thread pool or process pool executor:

from concurrent.futures import ProcessPoolExecutor
import asyncio

process_pool = ProcessPoolExecutor(max_workers=4)

def heavy_image_processing(image_bytes: bytes) -> bytes:
    # Synchronous, CPU-heavy work
    return processed_bytes

async def process_image_route(raw_image: bytes):
    loop = asyncio.get_running_loop()
    # Offload the work to run on a separate CPU core
    result = await loop.run_in_executor(
        process_pool, 
        heavy_image_processing, 
        raw_image
    )
    return result

3. Tuning the Linux Kernel and Host Environment

File Descriptor Limits: Every open socket connection is treated as a file descriptor by the operating system. By default, Linux limits file descriptors to 1,024 per process. Increase this value in your deployment scripts to support thousands of concurrent connections:

  ulimit -n 65536

Using uvloop: In production, replace the default Python event loop with uvloop. This is a drop-in replacement written in Cython that leverages libuv under the hood, doubling event loop speeds:

  import uvloop
  import asyncio
  asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())

Cross-Reading Recommendations

For details on managing stateful channels or configuring database backends, check out these articles:

Scaling Stateful WebSockets: Event-Driven Real-Time Sync with FastAPI & Redis: Learn how to scale persistent WebSocket channels across multiple server instances.
Architecting Agentic RAG: Production AI Knowledge Systems with Gemini & PostgreSQL: Learn how to query vector databases asynchronously without blocking your application event loop.

References & Official Documentation

Python Library: Asyncio Concurrency Documentation
Web Framework: FastAPI Async Concurrency Guide
Async Client: HTTPX Async Client Reference Docs

Feedback & Collaboration

Designing high-concurrency backends in Python requires careful attention to the event loop. What tools do you use to monitor event loop blocking in production? Have you migrated legacy synchronous codebases to async?

I would love to learn about your experiences. Share your thoughts on my Resume Portal or write a note on my Portfolio Portal's Connect tab.