A deep dive into architecting a sophisticated PDF analysis system developed by CodeLab Davis for Goodnotes
Have you ever tried to analyze a complex PDF document and wished you could get insights from multiple AI models at once? Most document analysis tools lock you into a single AI provider, and frankly, they're not very good at understanding document structure.
This challenge became the foundation for a project our team at CodeLab Davis was tasked with solving for our industry client, Goodnotes. During our 6-week development cycle, we found ourselves constantly switching between ChatGPT, Claude, and Gemini to get different perspectives on the same document. Each model had its strengths, but there was no way to compare their analyses side by side or merge their insights intelligently.
That's when we decided to build GoodnotesLM — a proof-of-concept document analysis and outline generation testing platform that would change how people interact with documents forever. However, our initial implementation faced significant technical challenges and performance issues. I took on the challenge of completely rewriting the platform from scratch, focusing primarily on researching and implementing the hybrid storage system, multi-model comparison algorithms, efficient OCR inference, and critical UI optimizations.
For a comprehensive overview of the entire product lifecycle, design process, and team collaboration, check out our detailed project overview.
Working for our client Goodnotes, our vision at CodeLab Davis was ambitious: create a platform that could simultaneously analyze PDFs using OpenAI's GPT-4o, Google's Gemini 2.0 Flash, and Anthropic's Claude models, then intelligently merge their insights.
But it wasn't just about running multiple models. We wanted to build something that could:
The catch? The existing codebase couldn't support these requirements, so I took on the challenge of rebuilding everything from scratch while collaborating with my team on design and requirements.
The first major hurdle was creating a unified interface for completely different AI providers. Each has its own SDK, authentication methods, and response formats.
Working closely with my team's requirements, I designed a custom Model class that abstracts away these differences:
export class Model implements ModelInterface { versionedId: string; name: string; provider: ModelProvider; description?: string; toRegistryModelString(): ModelId { return `${this.provider.id}:${this.versionedId}` as ModelId; } }
This seemingly simple abstraction enabled something powerful: the platform could now treat all AI models the same way in code, while still leveraging their unique capabilities.
The real magic happened in the provider registry, where I used the Vercel AI SDK to create a unified interface that could dynamically load any model with proper fallback mechanisms.
Here's where things got really interesting. PDFs are notoriously difficult to work with — they're essentially images with some text metadata. To make documents truly interactive, the platform needed to extract and visualize every text block.
After our team's research comparing different OCR solutions, I focused on building efficient infrastructure around Tesseract.js. My primary contribution wasn't researching OCR models themselves, but rather creating a high-performance processing pipeline that could handle the computational overhead efficiently.
const getOCRResults = async ({ document }: GetOCRResultsParams) => { const scheduler = createScheduler(); const numWorkers = Math.min(3, document.length); // Create workers and add them to the scheduler const workers = await Promise.all( Array.from({ length: numWorkers }, async () => { const worker = await createWorker("eng"); scheduler.addWorker(worker); return worker; }) ); // Process all pages in parallel const results = await Promise.all( Array.from({ length: document.length }, async (_, i) => { const page = await document.getPage(i + 1); const { data: { text, blocks } } = await scheduler.addJob("recognize", page); return { page: i + 1, text, blocks }; }) ); }
The breakthrough was implementing worker pools that could process multiple pages simultaneously without overwhelming the browser. Each worker operated independently, but they were coordinated through a scheduler that balanced the load intelligently — achieving processing speeds as fast as 15 seconds for multi-page PDFs.
The real innovation was simplifying the overhead and usability of segmented documents. Instead of just extracting text, I built infrastructure that allowed users to interact directly with the PDF text layer. This meant developing a system that could draw bounding boxes directly onto the PDF, enabling users to visually see and interact with detected text regions.
My contribution focused on the technical infrastructure: creating efficient coordinate transformation pipelines, optimizing memory usage during processing, and building a seamless user experience that made complex OCR operations feel instantaneous. The system handles PDF.js for page-to-image conversion, manages Tesseract worker coordination, implements visual overlay rendering with Jimp, and reconstructs annotated PDFs using PDF-lib.
Working with multiple AI models presented an interesting challenge: how do you leverage their different strengths effectively? I implemented two distinct systems to address this.
First, I built a comparison interface that allows users to analyze documents with 2-3 AI models simultaneously and see their results side by side. The CompareModelsModal
component runs parallel analysis requests and displays the results in a tabbed interface with both analysis and chat capabilities.
// Parallel analysis execution with proper error handling const analysisResults = await Promise.allSettled( models.map((model) => analyzeModel(model, currentAbortController.signal) ) );
This comparison system lets users see how different models interpret the same document — GPT-4o might excel at extracting technical details, while Claude might provide better structural analysis, and Gemini might catch nuances others miss. Each model maintains its own chat thread, so users can ask follow-up questions to specific models based on their analysis strengths.
The second approach tackles a different problem: what if you want the best insights from multiple models combined into a single result? I developed a sophisticated fusion algorithm that intelligently merges analysis results:
export function selectUnion(responses: DocumentAnalysisSchema[]) { const unionSummary = extractTopSentences( responses.map((r) => r.summary ?? "") ); const unionTags: string[] = []; responses.forEach((resp) => { resp.content_tags.forEach((tag) => { if (!unionTags.some((existing) => areTagsSimilar(existing, tag))) { unionTags.push(tag); } }); }); return { ...responses[0], summary: unionSummary, content_tags: unionTags, layout_evidence: { ...responses[0].layout_evidence, outline_points: mergeOutlinePoints(responses), }, }; }
The key innovation was using Levenshtein distance to identify semantic similarities between results. Instead of naively combining everything, the algorithm intelligently deduplicates while preserving the most informative content from each model.
My implementation showed that different similarity thresholds worked optimally for different content types:
For summaries, I implemented sentence extraction and ranking by information density, creating coherent merged summaries that capture insights from all analyzed perspectives. Together, these comparison and fusion systems provide users with both granular control and intelligent automation when working with multiple AI models.
Traditional PDF viewers are static. You scroll through pages linearly, hoping to find what you need. I wanted to change that completely.
I built a smart navigation system that can jump to any section mentioned in the AI-extracted outline:
const scrollToSection = useCallback(async (sectionText: string) => { for (let pageNum = 1; pageNum <= pdfDocument.numPages; pageNum++) { const page = await pdfDocument.getPage(pageNum); const textContent = await page.getTextContent(); const matchingItems = textContent.items.filter((item: any) => item.str && item.str.includes(sectionText) ); // Handle multi-span text detection // Complex DOM manipulation and scroll logic } }, [pdfDocument]);
The trickiest part was handling section headings that span multiple PDF text elements. A single heading like "Chapter 3: Advanced Techniques" might be split across several text spans with different fonts and positions.
I solved this by analyzing spatial relationships, font sizes, and positioning to logically group related text elements into complete headings.
Document security was non-negotiable, and this became one of my primary implementation areas. I built a hybrid storage system combining local-first architecture with secure cloud storage using AES-256-GCM authenticated encryption.
The system stores document metadata locally using Dexie.js for rapid access (achieving query speeds as low as 0.37ms), while encrypted document content lives in Supabase cloud storage. Encryption keys are managed through HTTP-only cookies, making them inaccessible to client-side JavaScript.
const encryptFile = async (file: Blob, encryptionKey: string): Promise<Blob> => { const buffer = await file.arrayBuffer(); const iv = crypto.getRandomValues(new Uint8Array(16)); const cipher = crypto.createCipheriv(ALGORITHM, Buffer.from(encryptionKey, "hex"), iv); const encryptedData = Buffer.concat([cipher.update(Buffer.from(buffer)), cipher.final()]); const authTag = cipher.getAuthTag(); // Concatenate IV + authTag + encryptedData const result = new Uint8Array(iv.length + authTag.length + encryptedData.length); result.set(iv, 0); result.set(authTag, iv.length); result.set(encryptedData, iv.length + authTag.length); return new Blob([result], { type: "application/octet-stream" }); };
Documents are encrypted before storage, with keys managed through HTTP-only cookies. Even if someone gained access to the storage system, they'd only find encrypted blobs without the keys to decrypt them.
This hybrid approach solved the challenge of providing fast, responsive interactions (through local metadata storage) while maintaining security and cross-device access (through encrypted cloud storage). My implementation of optimal database indexing and query patterns became crucial for maintaining performance as document collections grew.
The chat component of the comparison system was technically challenging. Each model in the side-by-side view maintains its own conversation thread, but they all need to respond to synchronized user inputs.
The challenge was managing conversation state across multiple models with different response times and failure scenarios. I built a system where each ModelChatBox
component maintains independent conversation state while sharing user inputs through a global store:
const ModelChatBox = ({ model, analysis }: ModelChatBoxProps) => { const { messages, isLoading, append } = useChat({ api: `/api/chat/${model.versionedId}`, body: { fileContent: analysis?.content || "", contentTags: analysis?.contentTags || [], documentType: analysis?.classification || "Unknown", }, }); // Synchronized message handling across all model instances useEffect(() => { if (userMessages.length > 0) { const lastUserMessage = userMessages[userMessages.length - 1]; if (!lastUserMessage.appended && !isProcessingMessageRef.current) { append({ role: "user", content: lastUserMessage.content }); } } }, [userMessages]); };
Each chat instance receives full document context including extracted content, content tags, and document classification. This enables contextually aware responses that reference specific parts of the analyzed document. The system handles model failures gracefully — if one model's analysis fails, its chat becomes unavailable, but others continue functioning normally.
With all these features, performance became critical. My focus on UI optimizations led to implementing several key improvements:
Parallel Processing: AI model analyses run concurrently with proper abort controllers for cleanup
Memory Management: Worker pools for OCR processing prevent memory leaks during extended sessions
Virtualized Rendering: Intersection observers enable smooth scrolling through large documents without rendering all pages simultaneously
Debounced Search: Real-time fuzzy search using Fuse.js without performance degradation
Smart Caching: Intelligent caching of analysis results and document metadata
Database Performance: Optimized Dexie.js queries with proper indexing achieving sub-millisecond retrieval times
The result was a platform that could handle documents of any size while maintaining responsive interactions — a significant improvement over our initial implementation that struggled with performance bottlenecks.
I chose a local-first approach using IndexedDB through Dexie.js. This meant building a sophisticated database system with:
The search implementation deserves special mention. I integrated Fuse.js for fuzzy search across document titles, classifications, and content tags, with intelligent result ranking and semantic similarity matching.
Building GoodnotesLM for our client while collaborating within our CodeLab Davis team taught us several important lessons:
OCR Infrastructure Complexity: Building efficient processing pipelines around existing OCR solutions proved more challenging than expected — optimizing worker coordination and memory management became critical for performance.
Multi-Model Fusion Engineering: Different AI models excel at different aspects of document analysis — developing algorithms to merge their strengths proved more valuable than relying on any single provider.
Hybrid Storage Architecture: Balancing local performance with cloud security required careful implementation of encryption patterns and database optimization strategies.
UI Performance at Scale: Memory management and rendering optimization became critical when processing large documents with multiple AI models simultaneously.
Team Collaboration Drives Innovation: Working closely with my CodeLab Davis teammates enabled solutions that none of us could have achieved individually, while the complete rewrite approach allowed implementing optimal solutions without legacy constraints.
The final platform successfully processes documents of various types and sizes, provides accurate AI-powered analysis across multiple models, and delivers an intuitive user experience for document exploration.
Users can upload a PDF, get instant analysis from multiple AI models, navigate through documents using AI-extracted outlines, chat with their documents using any supported AI model, and compare analysis results side by side.
This project for our client Goodnotes pushed the boundaries of what's possible with document analysis while maintaining excellent performance and user experience. The complete rewrite approach enabled implementing optimal solutions without the constraints of our initial broken implementation.
The platform demonstrates that with careful architecture, focused research, and attention to performance, it's possible to create sophisticated AI-powered applications that feel fast and responsive, even when doing complex processing behind the scenes.
Building GoodnotesLM from scratch was an incredible learning experience that combined cutting-edge AI integration, complex document processing, real-time systems, and thoughtful user experience design. While I implemented all the features during the rewrite, my primary contributions focused on building efficient infrastructure around OCR processing, developing multi-model comparison algorithms, implementing the hybrid storage system, and optimizing UI performance that made the platform viable.
The collaboration within our CodeLab Davis team brought together diverse technical perspectives that made the final product far stronger than any individual effort could have achieved. The future of document analysis is multi-modal, intelligent, and interactive — this project represents just the beginning of what's possible when academic research teams tackle real-world industry challenges.
This project was developed by CodeLab Davis for our industry client Goodnotes. For the complete product story including user research, design process, and team collaboration, read our comprehensive project overview. Want to dive deeper into any of these technical challenges? The intersection of academic research and practical client work continues to drive innovation in document analysis.