Lesson 3: Ask the Archive — Talking to Du Bois

Section 1 of 6

What If You Could Ask Du Bois a Question?

Not a chatbot. Not a guess. An answer grounded in his own words, with citations you can check.

Type your question and press Enter

Section 2 of 6

How the Archive Works

From your question to a citation-backed answer — every step is transparent.

Three simple steps: You ask → It searches → It answers with quotes. Click each step below to learn more.

❓

Your Question

You type a question

→

🔍

Search

Archive searches Du Bois's works

Embedding + Vector Search

→

📚

Retrieve

Finds matching passages

Top-K Passage Retrieval

→

📝

Answer

Writes an answer with quotes

LLM Synthesis + Citation

Step 1: Your Question

You type a question in plain English. "What did Du Bois think about the color line?" The system takes your words and prepares them for search.

Technically, your question is converted into a vector embedding — a list of numbers that captures the meaning of your question. Two questions with similar meaning (like "What about the color line?" and "How did Du Bois describe racial division?") will have similar embeddings, even though they use different words.

Step 2: Searching the Archive

The system doesn't search for keywords — it searches for meaning. Your question's embedding is compared to every passage in the archive to find the most relevant matches.

The archive uses pgvector (a PostgreSQL extension) for similarity search. Each passage in the corpus is an 800-token chunk with 200-token overlap. The embedding model converts both your question and each passage into vectors, then finds the closest matches using cosine similarity. We retrieve the top results that pass a similarity threshold of 0.7 or higher.

Step 3: Retrieving Passages

The archive pulls the most relevant passages from Du Bois's works. These are the raw source materials — his actual words.

Retrieval quality depends on chunk size and overlap. If chunks are too small, you lose context. Too large, and irrelevant text dilutes the signal. The 800-token chunk with 200-token overlap was validated across the full corpus. The system returns the top matches, but we always .slice(0, topK) to prevent overflow — never trust the result count from the vector database.

Step 4: Generating the Answer

A language model reads the retrieved passages and composes a coherent answer. The key: it must cite which passage each claim comes from. No citation = no trust.

The LLM (Claude) receives the retrieved passages as context, along with instructions to ground every claim in the source material and provide citations. This is the "generation" in Retrieval Augmented Generation. The model doesn't use its training data about Du Bois — it synthesizes answers only from the retrieved passages. This constrains hallucination but doesn't eliminate it entirely, which is why citation verification matters.

Section 3 of 6

The Citation Test

Not all AI responses are created equal. Can you tell which one to trust?

Your task: Three responses to the same question. One has real citations, one has no citations, and one has fake citations. Click the one you trust most.

Hallucination in RAG systems: Even with retrieval augmentation, LLMs can generate plausible but unsourced claims. They may also attribute real quotes to wrong works or fabricate citations entirely. The minimum standard for trust is a verifiable citation — a passage you can look up in the original text.

Question: "What did Du Bois say about education?"

TRUSTWORTHY

Response A

Du Bois argued that education was the key to racial uplift. He wrote: "The Talented Tenth of the Negro race must be made leaders of thought and missionaries of culture among their people."

Source: "The Talented Tenth" (1903), paragraph 3

Has citation

UNTRUSTWORTHY

Response B

Du Bois believed deeply in the power of education and thought that every Black person should have access to university-level learning. He spent his entire life fighting for educational equality across the country.

No citation

UNTRUSTWORTHY

Response C

Du Bois stated: "Education is the great equalizer of the conditions of men, the balance wheel of the social machinery."

Source: The Souls of Black Folk (1903), Chapter 12

Fake citation

Section 4 of 6

What Du Bois Actually Said

Five questions. Five real archive responses. His words, his works, verifiable.

Archive Response

"What did Du Bois think about the color line?"

Du Bois saw the color line as the defining challenge of the modern world. He argued that racial division was not merely a social inconvenience but a fundamental structural barrier that shaped politics, economics, education, and daily life for millions. He first articulated this in 1900 at the Pan-African Conference in London, and it became the organizing thesis of his most famous work.

Source Passage

"The problem of the twentieth century is the problem of the color line — the relation of the darker to the lighter races of men in Asia and Africa, in America and the islands of the sea."

— W.E.B. Du Bois, The Souls of Black Folk (1903), "Of the Dawn of Freedom"

Think about it: Du Bois wrote this in 1903. Was he right? Is the color line still a problem in the 21st century? What evidence would you use to argue your position?

Retrieval analysis: This passage is one of the most frequently retrieved chunks in the archive because it appears in multiple works and is highly embedded in the corpus. A good RAG system should retrieve this passage for many related queries. But does high retrieval frequency mean it's always the best passage for a given question? Consider: if you asked "How did Du Bois's views on race change over time?" — would this same passage be the right answer?

Archive Response

"How did Du Bois describe the Black church?"

Du Bois documented the Black church as far more than a religious institution. In his sociological study of Philadelphia's Black community, he described it as the central organizing force of Black social life — a place where community, mutual aid, politics, education, and culture converged. He approached the church not as a theologian but as a social scientist, mapping its functions with the same rigor he applied to economic data.

Source Passage

"The Negro church of today is the social centre of Negro life in the United States, and the most characteristic expression of African character."

— W.E.B. Du Bois, The Philadelphia Negro (1899), Chapter XII: "The Organized Life of Negroes"

What stands out: Du Bois calls the church the "social centre" — not just a place of worship. What institutions in your community serve a similar role today?

Methodology note: The Philadelphia Negro (1899) was the first major sociological case study of an African American community. Du Bois combined door-to-door surveys, census data, and ethnographic observation. The archive retrieves this passage from a work that pioneered American sociology — a fact that enriches the citation beyond the quote itself.

Archive Response

"What was Du Bois's view on education vs. industrial training?"

Du Bois argued against the dominant model of industrial education promoted by Booker T. Washington. While Washington advocated for vocational training as the primary path for Black advancement, Du Bois insisted that a liberal arts education was essential for developing leaders who could fight for full citizenship rights. He did not oppose industrial education entirely but argued that it could not be the only path offered to Black Americans.

Source Passage

"The Talented Tenth of the Negro race must be made leaders of thought and missionaries of culture among their people. No others can do this work and Negro colleges must train men for it. The Negro race, like all other races, is going to be saved by its exceptional men."

— W.E.B. Du Bois, "The Talented Tenth" (1903), in The Negro Problem

Debate it: Du Bois vs. Washington — should education focus on practical skills or on developing leaders and thinkers? Is this still a relevant debate today?

Retrieval note: The Du Bois-Washington debate is well-represented in the corpus across multiple works. A nuanced question about this debate should retrieve passages from both The Souls of Black Folk (Chapter III: "Of Mr. Booker T. Washington and Others") and "The Talented Tenth." Evaluate: does the system retrieve from multiple relevant works, or does it over-index on the most famous passage?

Archive Response

"What did Du Bois say about the 1900 Paris Exposition?"

For the 1900 Exposition Universelle in Paris, Du Bois and his students at Atlanta University created the "Exhibit of American Negroes" — a collection of 63 hand-drawn data visualizations, photographs, maps, and sociological materials that demonstrated the progress, complexity, and humanity of Black Americans. The exhibit was a direct counter-argument to the dominant narrative of Black inferiority. It won a Grand Prix at the exposition.

Source Passage

The 63 data plates — now held by the Library of Congress — used bar charts, area charts, spiral diagrams, and maps to visualize data about Black land ownership, literacy rates, occupations, and population distribution in Georgia. Each plate was drawn by hand with ink and colored pencil.

— Referenced across Du Bois's Atlanta University Studies and correspondence about the Exposition des Nègres d'Amérique, 1900

Amazing fact: Du Bois and his students drew 63 data visualizations by hand — no computers, no software. These plates are now in the Library of Congress and are considered masterpieces of data visualization.

Archive limitation: The Paris Exposition plates are visual artifacts, not text. A text-based RAG system can retrieve Du Bois's writing about the exposition but not the plates themselves. This highlights a structural limitation of text-only archives when the primary sources are visual. A multimodal archive would retrieve the actual plates as evidence.

Archive Response

"How did Du Bois use data to argue for equality?"

Du Bois pioneered the use of empirical social science to argue for racial equality. Rather than appealing to emotion or moral philosophy alone, he collected data — door-to-door surveys, census analysis, economic statistics — to build an evidence-based case that contradicted racist pseudoscience. His method was radical: let the data speak, and let the data be drawn from the community itself.

Source Passage

In The Philadelphia Negro, Du Bois conducted over 5,000 interviews across Philadelphia's Seventh Ward, documenting employment, housing, income, family structure, and social organizations. This door-to-door methodology — years before it became standard sociological practice — produced the first rigorous empirical study of an American urban community.

— W.E.B. Du Bois, The Philadelphia Negro: A Social Study (1899), methodology described in the introduction and Chapter I

Additional Source

The Atlanta University Studies (1897–1914) extended this empirical approach across 18 annual research publications covering every aspect of Black American life, from health and education to business and art.

— Atlanta University Publications, edited by W.E.B. Du Bois

Think about it: Du Bois believed that if you showed people the data, they would have to change their minds. Did it work? Does data change minds today?

Evaluation exercise: This response draws from multiple sources in the corpus. Rate the retrieval: did the system find the best passages to answer this question, or just the most frequently matched ones? What passage would you have retrieved if you were building the search?

Section 5 of 6

When the Archive Doesn't Know

The most honest thing an archive can do is tell you what it doesn't know.

What happens when you ask a question the archive can't answer from its sources? That's not a failure — that's honesty. A system that admits its limits is more trustworthy than one that always has an answer.

Corpus limitations as a feature, not a bug. The archive contains 27 works from 1896–1920. The decision about what to include — and what to exclude — is a curatorial choice that shapes every answer the system can give. What's IN the archive is a statement about what matters.

Speculative Question

"What would Du Bois think about social media?"

The archive's answer: It shouldn't answer this confidently. Du Bois died in 1963. Social media didn't exist. A responsible RAG system should flag this as speculative and refuse to generate a definitive answer from a corpus that predates the question by a century.

The problem: An unconstrained LLM would happily generate "Du Bois would have viewed social media as a tool for..." — plausible speculation that looks like a sourced answer. RAG should prevent this, but only if the system is designed to say "I cannot answer this from the available sources" when retrieval confidence is low.

Complex Political Evolution

"Was Du Bois a communist?"

The archive's answer: The 27-work corpus (1896–1920) captures Du Bois's early career. His relationship with communism evolved over decades — he joined the Communist Party in 1961 at age 93. The archive can speak to his early socialist leanings but cannot address his full political evolution.

Temporal bias: A corpus limited to 1896–1920 will systematically underrepresent Du Bois's later radicalization. Any answer the system gives about his political beliefs will be incomplete by design. Users must understand the temporal boundaries of the archive to evaluate answers correctly.

Excluded Material

"Tell me about Du Bois's childhood."

The archive's answer: Du Bois's autobiography is not in the corpus — The Autobiography of W.E.B. Du Bois (published posthumously, 1968) is likely still under copyright. Some childhood references appear in Darkwater (1920), but the archive cannot provide a comprehensive answer about his early life.

Copyright as curation: The decision to exclude The Autobiography and Dusk of Dawn (1940) is a copyright decision that has intellectual consequences. A user asking about Du Bois's life gets a corpus shaped by legal constraints, not by scholarly completeness. This is true of every digital archive — the legal landscape shapes what knowledge is accessible.

Section 6 of 6

Build Your Own Question

Now it's your turn. Craft a question, predict the answer, and evaluate what a good response looks like.

Your Archive Question

What would you ask Du Bois? What do you think the answer might be? What would a GOOD answer include?

Great work!

You've thought critically about what makes an archive response trustworthy. A good answer cites specific works, provides quotable passages, and acknowledges what it doesn't know. You're ready to use the Living Archive like a scholar.

Research Question Design

Write a research question for the archive: Predict the retrieval strategy: Which of the 27 works should the system search first? How would you evaluate the response quality?

Excellent research design.

You've mapped the full pipeline from question to evaluation. This is how archivists and AI researchers think about retrieval systems — not just "does it answer?" but "does it answer well, from the right sources, with verifiable claims?" You're ready to contribute to the conversation about how we build ethical AI archives.

Ask the Archive: Talking to Du Bois

Critical AI Literacy Meets Cultural Heritage

What If You Could Ask Du Bois a Question?

The Du Bois Living Archive

The Technical Truth

How the Archive Works

Your Question

Search

Retrieve

Answer

Step 1: Your Question

Step 2: Searching the Archive

Step 3: Retrieving Passages

Step 4: Generating the Answer

The Citation Test

Response A

Response B

Response C

Analysis

What Du Bois Actually Said

When the Archive Doesn't Know

Speculative Question

Complex Political Evolution

Excluded Material

Build Your Own Question

Your Archive Question

Great work!

Research Question Design

Excellent research design.