Blog / AI
AI

使用 Bright Data 和 Vercel AI SDK 构建 AI 新闻研究助手

本指南将带你构建一款 AI 新闻研究助手,能够抓取全球新闻、绕过付费墙、检测报道偏见,并使用 Bright Data 和 Vercel AI SDK 进行智能分析。
5 分钟阅读
Bright Data 与 Vercel AI SDK 博客图片

该系统可跨全球来源搜索新闻,提取完整文章内容(绕过付费墙),检测报道偏见,并基于真实新闻事件生成智能分析。

你将学到:

  • 如何使用 Bright Data SDK 进行网页抓取,构建新闻研究工具
  • 如何利用 Vercel AI SDK 实现智能新闻分析与对话
  • 如何绕过付费墙和反爬虫保护,访问任意新闻来源
  • 如何通过比较多家媒体的报道来检测偏见
  • 如何从新闻发现到事实核查创建自动化流水线

开始吧!

前置条件

要跟随本教程,你需要:

  • 具备 React 和 Next.js 的基础知识
  • 在本地开发环境安装 Node.js 20.18.1+
  • 一个带有 API 访问权限的 Bright Data 账户(提供免费套餐)
  • 一个具备 GPT-4 访问权限的 OpenAI API Key
  • 熟悉 TypeScript 与现代 JavaScript
  • 对使用 API 与环境变量有基础了解

传统新闻获取的挑战

传统的信息和新闻获取方式存在如下关键局限:

  • 信息过载:每天会遇到数以百计的标题,难以判断哪些对你重要或与兴趣相关。
  • 偏见与视角缺失:大多数人从少量来源获取新闻,容易错过重要观点,复杂议题的报道也常常单一。
  • 付费墙障碍:高质量新闻常被付费墙保护,难以从多个来源获取完整文章进行深入研究。
  • 事实核查负担:为验证论断,需要检索多源、交叉比对并评估可信度,很多人没有时间完成。
  • 缺乏背景:突发新闻往往缺少历史背景或相关事件脉络,难以看到全貌。

NewsIQ 针对这些挑战给出了解法。它将 AI 驱动的分析与企业级网页抓取结合,能够访问任意新闻来源(绕过反爬虫保护),在多家媒体间横向对比报道,并提供带有可信来源标注的智能洞察。

构建新闻研究助手

我们将使用 Bright Data 与 Vercel AI SDK 构建一套完整的 AI 新闻研究助手 NewsIQ。我们会创建一个能够处理任意来源新闻并通过对话界面提供智能分析的解决方案。

步骤 1:项目初始化

首先,配置你的 Next.js 开发环境。创建一个新项目目录:

npx create-next-app@latest ai-news-assistant

在提示时选择以下选项:

选择推荐设置

进入项目目录并安装所需依赖:

cd ai-news-assistant &&
npm install @brightdata/sdk ai zod @ai-sdk/openai

这些依赖提供了你所需的一切:Bright Data SDK 用于网页抓取、Vercel AI SDK 用于智能分析、Zod 用于类型安全的模式校验,以及 OpenAI 用于 LLM 文本生成。

所有依赖安装成功,项目已创建

接着,创建 .env.local 文件存放你的 API 凭证:

BRIGHTDATA_API_KEY=your_brightdata_api_key_here
OPENAI_API_KEY=your_openai_api_key_here

你需要:

  • Bright Data API 令牌:在 Bright Data 控制台中 生成
  • OpenAI API Key:用于 LLM 文本生成

步骤 2:定义新闻研究工具

通过定义三种工具来创建核心新闻研究功能,并利用 Bright Data 的网页抓取。在项目目录下创建 lib/brightdata-tools.ts 文件:

import { tool, type Tool } from "ai";
import { z } from "zod";
import { bdclient } from "@brightdata/sdk";

type NewsTools = "searchNews" | "scrapeArticle" | "searchWeb";

interface NewsToolsConfig {
  apiKey: string;
  excludeTools?: NewsTools[];
}

export const newsTools = (
  config: NewsToolsConfig
): Partial<Record<NewsTools, Tool>> => {
  const client = new bdclient({
    apiKey: config.apiKey,
    autoCreateZones: true,
  });

  const tools: Partial<Record<NewsTools, Tool>> = {
    searchNews: tool({
      description:
        "Search for news articles on any topic using Google News. Returns recent news articles with titles, snippets, sources, and publication dates. Use this for finding current news coverage on specific topics.",
      inputSchema: z.object({
        query: z
          .string()
          .describe(
            'The news search query (e.g., "artificial intelligence", "climate change policy", "tech earnings")'
          ),
        country: z
          .string()
          .length(2)
          .optional()
          .describe(
            'Two-letter country code for localized news (e.g., "us", "gb", "de", "fr", "jp")'
          ),
      }),
      execute: async ({
        query,
        country,
      }: {
        query: string;
        country?: string;
      }) => {
        try {
          const newsQuery = `${query} news`;
          const result = await client.search(newsQuery, {
            searchEngine: "google",
            dataFormat: "markdown",
            format: "raw",
            country: country?.toLowerCase() || "us",
          });
          return result;
        } catch (error) {
          return `Error searching for news on "${query}": ${String(error)}`;
        }
      },
    }),

    scrapeArticle: tool({
      description:
        "Scrape the full content of a news article from any URL. Returns the complete article text in clean markdown format, bypassing paywalls and anti-bot protection. Use this to read full articles after finding them with searchNews.",
      inputSchema: z.object({
        url: z.string().url().describe("The URL of the news article to scrape"),
        country: z
          .string()
          .length(2)
          .optional()
          .describe("Two-letter country code for proxy location"),
      }),
      execute: async ({ url, country }: { url: string; country?: string }) => {
        try {
          const result = await client.scrape(url, {
            dataFormat: "markdown",
            format: "raw",
            country: country?.toLowerCase(),
          });
          return result;
        } catch (error) {
          return `Error scraping article at ${url}: ${String(error)}`;
        }
      },
    }),

    searchWeb: tool({
      description:
        "General web search using Google, Bing, or Yandex. Use this for background research, fact-checking, or finding additional context beyond news articles.",
      inputSchema: z.object({
        query: z
          .string()
          .describe(
            "The search query for background information or fact-checking"
          ),
        searchEngine: z
          .enum(["google", "bing", "yandex"])
          .optional()
          .default("google")
          .describe("Search engine to use"),
        country: z
          .string()
          .length(2)
          .optional()
          .describe("Two-letter country code for localized results"),
      }),
      execute: async ({
        query,
        searchEngine = "google",
        country,
      }: {
        query: string;
        searchEngine?: "google" | "bing" | "yandex";
        country?: string;
      }) => {
        try {
          const result = await client.search(query, {
            searchEngine,
            dataFormat: "markdown",
            format: "raw",
            country: country?.toLowerCase(),
          });
          return result;
        } catch (error) {
          return `Error searching web for "${query}": ${String(error)}`;
        }
      },
    }),
  };

  for (const toolName in tools) {
    if (config.excludeTools?.includes(toolName as NewsTools)) {
      delete tools[toolName as NewsTools];
    }
  }

  return tools;
};

上述代码使用 Vercel AI SDK 的工具接口定义了三个关键工具。searchNews 用于在 Google 新闻上查询最新文章;scrapeArticle 用于从任意新闻 URL 提取完整内容(绕过付费墙);searchWeb 用于补充背景研究和事实核查。每个工具都使用 Zod 模式进行类型安全的输入校验,并返回结构化数据供 AI 分析。Bright Data 客户端会自动处理反爬虫与代理管理等复杂性。

步骤 3:创建 AI Chat API 路由

构建驱动对话界面的 API 端点。创建 app/api/chat/route.ts

import { openai } from "@ai-sdk/openai";
import { streamText, convertToModelMessages, stepCountIs } from "ai";
import { newsTools } from "@/lib/brightdata-tools";

export const maxDuration = 60;

export async function POST(req: Request) {
  const { messages } = await req.json();
  const modelMessages = convertToModelMessages(messages);

  const tools = newsTools({
    apiKey: process.env.BRIGHTDATA_API_KEY!,
  });


  const result = streamText({
    model: openai("gpt-4o"),
    messages: modelMessages,
    tools,
    stopWhen: stepCountIs(5),
    system: `You are NewsIQ, an advanced AI news research assistant. Your role is to help users stay informed, analyze news coverage, and understand complex current events.

**Core Capabilities:**
1. **News Discovery**: Search for current news on any topic using searchNews
2. **Deep Reading**: Scrape full articles with scrapeArticle to provide complete context
3. **Fact Checking**: Use searchWeb to verify claims and find additional sources
4. **Bias Analysis**: Compare coverage across multiple sources and identify potential bias
5. **Trend Analysis**: Identify emerging stories and track how topics evolve

**Guidelines:**
- Always cite your sources with publication name and date
- When analyzing bias, be objective and provide evidence
- For controversial topics, present multiple perspectives
- Clearly distinguish between facts and analysis
- If information is outdated, note the publication date
- When scraping articles, summarize key points before analysis
- For fact-checking, use multiple independent sources

**Response Format:**
- Start with a clear, direct answer
- Provide source citations in context
- Use bullet points for multiple sources
- End with a brief analysis or insight
- Offer to explore specific aspects further

Remember: Your goal is to help users become better-informed, critical thinkers.`,
  });

  return result.toUIMessageStreamResponse();
}

该 API 路由创建了一个流式端点,将你的新闻研究工具与 OpenAI 的 GPT-4 连接起来。系统提示引导 AI 以专业新闻分析师的方式工作,强调来源引用、客观性与批判性思维。流式响应会在生成时实时展示分析,带来流畅的对话体验。

基于实时数据获取响应

步骤 4:构建聊天界面

创建与 NewsIQ 交互的用户界面。将 app/page.tsx 替换为:

```typescript
"use client";

import { useChat } from "@ai-sdk/react";
import { useState } from "react";

export default function NewsResearchAssistant() {
  const { messages, sendMessage, status } = useChat();
  const [input, setInput] = useState("");

  const [exampleQueries] = useState([
    "🌍 What are the latest developments in climate change policy?",
    "💻 Search for news about artificial intelligence regulation",
    "📊 How are different sources covering the economy?",
    "⚡ What are the trending tech stories this week?",
    "🔍 Fact-check: Did [specific claim] really happen?",
  ]);

  return (
    <div className="flex flex-col h-screen bg-gradient-to-br from-slate-50 via-blue-50 to-indigo-50">
      {/* Header */}
      <header className="bg-white shadow-md border-b border-gray-200">
        <div className="max-w-5xl mx-auto px-6 py-5">
          <div className="flex items-center gap-3">
            <div className="bg-gradient-to-br from-blue-600 to-indigo-600 w-12 h-12 rounded-xl flex items-center justify-center shadow-lg">
              <span className="text-2xl">📰</span>
            </div>
            <div>
              <h1 className="text-2xl font-bold text-gray-900">NewsIQ</h1>
              <p className="text-sm text-gray-600">
                AI-Powered News Research & Analysis
              </p>
            </div>
          </div>
        </div>
      </header>

      {/* Main Chat Area */}
      <div className="flex-1 overflow-hidden max-w-5xl w-full mx-auto px-6 py-6">
        <div className="h-full flex flex-col bg-white rounded-2xl shadow-xl border border-gray-200">
          {/* Messages Container */}
          <div className="flex-1 overflow-y-auto p-6 space-y-6">
            {messages.length === 0 ? (
              <div className="h-full flex flex-col items-center justify-center text-center px-4">
                {/* Welcome Screen */}
                <div className="bg-gradient-to-br from-blue-500 to-indigo-600 w-20 h-20 rounded-2xl flex items-center justify-center mb-6 shadow-lg">
                  <span className="text-4xl">📰</span>
                </div>
                <h2 className="text-3xl font-bold text-gray-900 mb-3">
                  Welcome to NewsIQ
                </h2>
                <p className="text-gray-600 mb-8 max-w-2xl text-lg">
                  Your AI-powered research assistant for news analysis,
                  fact-checking, and staying informed. I can search across news
                  sources, analyze bias, and help you understand complex
                  stories.
                </p>

                {/* Feature Pills */}
                <div className="flex flex-wrap gap-3 justify-center mb-8">
                  <div className="px-4 py-2 bg-blue-100 text-blue-700 rounded-full text-sm font-medium">
                    🔍 Multi-Source Research
                  </div>
                  <div className="px-4 py-2 bg-purple-100 text-purple-700 rounded-full text-sm font-medium">
                    🎯 Bias Detection
                  </div>
                  <div className="px-4 py-2 bg-green-100 text-green-700 rounded-full text-sm font-medium">
                    ✓ Fact Checking
                  </div>
                  <div className="px-4 py-2 bg-orange-100 text-orange-700 rounded-full text-sm font-medium">
                    📊 Trend Analysis
                  </div>
                </div>

                {/* Example Queries */}
                <div className="w-full max-w-3xl">
                  <p className="text-sm font-semibold text-gray-700 mb-4">
                    Try asking:
                  </p>
                  <div className="grid grid-cols-1 md:grid-cols-2 gap-3">
                    {exampleQueries.map((query, i) => (
                      <button
                        key={i}
                        onClick={() => {
                          setInput(query);
                        }}
                        className="p-4 text-left bg-gradient-to-br from-gray-50 to-gray-100 hover:from-blue-50 hover:to-indigo-50 rounded-xl border border-gray-200 hover:border-blue-300 transition-all duration-200 text-sm text-gray-700 hover:text-gray-900 shadow-sm hover:shadow-md"
                      >
                        {query}
                      </button>
                    ))}
                  </div>
                </div>
              </div>
            ) : (
              // Messages Display
              messages.map((m: any) => (
                <div
                  key={m.id}
                  className={`flex ${
                    m.role === "user" ? "justify-end" : "justify-start"
                  }`}
                >
                  <div
                    className={`max-w-[85%] rounded-2xl px-5 py-4 ${
                      m.role === "user"
                        ? "bg-gradient-to-br from-blue-600 to-indigo-600 text-white shadow-lg"
                        : "bg-gray-100 text-gray-900 border border-gray-200"
                    }`}
                  >
                    <div className="flex items-center gap-2 mb-2">
                      <span className="text-lg">
                        {m.role === "user" ? "👤" : "📰"}
                      </span>
                      <span className="text-xs font-semibold opacity-90">
                        {m.role === "user" ? "You" : "NewsIQ"}
                      </span>
                    </div>
                    <div className="prose prose-sm max-w-none prose-headings:font-bold prose-h3:text-lg prose-h3:mt-4 prose-h3:mb-2 prose-p:my-2 prose-ul:my-2 prose-li:my-1 prose-a:text-blue-600 prose-a:underline prose-strong:font-semibold">
                      <div
                        className="whitespace-pre-wrap"
                        dangerouslySetInnerHTML={{
                          __html:
                            m.parts
                              ?.map((part: any) => {
                                if (part.type === "text") {
                                  let html = part.text
                                    // Headers
                                    .replace(/### (.*?)$/gm, "<h3>$1</h3>")
                                    // Bold
                                    .replace(
                                      /\*\*(.*?)\*\*/g,
                                      "<strong>$1</strong>"
                                    )
                                    // Links
                                    .replace(
                                      /\[(.*?)\]\((.*?)\)/g,
                                      '<a href="$2" target="_blank" rel="noopener noreferrer">$1</a>'
                                    );

                                  html = html.replace(
                                    /(^- .*$\n?)+/gm,
                                    (match: string) => {
                                      const items = match
                                        .split("\n")
                                        .filter((line: string) => line.trim())
                                        .map((line: string) =>
                                          line.replace(/^- /, "")
                                        )
                                        .map((item: any) => `<li>${item}</li>`)
                                        .join("");
                                      return `<ul>${items}</ul>`;
                                    }
                                  );

                                  // Paragraphs
                                  html = html
                                    .split("\n\n")
                                    .map((para: string) => {
                                      if (
                                        para.trim() &&
                                        !para.startsWith("<")
                                      ) {
                                        return `<p>${para}</p>`;
                                      }
                                      return para;
                                    })
                                    .join("");

                                  return html;
                                }
                                return "";
                              })
                              .join("") || "",
                        }}
                      />
                    </div>
                  </div>
                </div>
              ))
            )}

            {/* Loading Indicator */}
            {(status === "submitted" || status === "streaming") && (
              <div className="flex justify-start">
                <div className="bg-gray-100 rounded-2xl px-5 py-4 border border-gray-200">
                  <div className="flex items-center gap-3">
                    <div className="flex space-x-2">
                      <div className="w-2 h-2 bg-blue-500 rounded-full animate-bounce"></div>
                      <div className="w-2 h-2 bg-blue-500 rounded-full animate-bounce delay-100"></div>
                      <div className="w-2 h-2 bg-blue-500 rounded-full animate-bounce delay-200"></div>
                    </div>
                    <span className="text-sm text-gray-600">
                      Researching news sources...
                    </span>
                  </div>
                </div>
              </div>
            )}
          </div>

          {/* Input Area */}
          <div className="border-t border-gray-200 p-5 bg-gray-50">
            <form
              onSubmit={(e) => {
                e.preventDefault();
                if (input.trim()) {
                  sendMessage({ text: input });
                  setInput("");
                }
              }}
              className="flex gap-3"
            >
              <input
                value={input}
                onChange={(e) => setInput(e.target.value)}
                placeholder="Ask about any news topic, request analysis, or fact-check a claim..."
                className="flex-1 px-5 py-3 border border-gray-300 rounded-xl focus:outline-none focus:ring-2 focus:ring-blue-500 focus:border-transparent bg-white shadow-sm text-gray-900 placeholder-gray-600"
                disabled={status === "submitted" || status === "streaming"}
              />

              <button
                type="submit"
                disabled={
                  status === "submitted" ||
                  status === "streaming" ||
                  !input.trim()
                }
                className="px-8 py-3 bg-gradient-to-r from-blue-600 to-indigo-600 text-white rounded-xl hover:from-blue-700 hover:to-indigo-700 disabled:opacity-50 disabled:cursor-not-allowed transition-all duration-200 font-semibold shadow-lg hover:shadow-xl"
              >
                {status === "submitted" || status === "streaming" ? (
                  <span className="flex items-center gap-2">
                    <svg className="animate-spin h-5 w-5" viewBox="0 0 24 24">
                      <circle
                        className="opacity-25"
                        cx="12"
                        cy="12"
                        r="10"
                        stroke="currentColor"
                        strokeWidth="4"
                        fill="none"
                      />
                      <path
                        className="opacity-75"
                        fill="currentColor"
                        d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4zm2 5.291A7.962 7.962 0 014 12H0c0 3.042 1.135 5.824 3 7.938l3-2.647z"
                      />
                    </svg>
                    Analyzing
                  </span>
                ) : (
                  "Research"
                )}
              </button>
            </form>
            <div className="flex items-center justify-between mt-3">
              <p className="text-xs text-gray-500">
                Powered by Bright Data × Vercel AI SDK
              </p>
              <div className="flex gap-2">
                <span className="px-2 py-1 bg-green-100 text-green-700 rounded text-xs font-medium">
                  ✓ Real-time
                </span>
                <span className="px-2 py-1 bg-blue-100 text-blue-700 rounded text-xs font-medium">
                  🌐 Global Sources
                </span>
              </div>
            </div>
          </div>
        </div>
      </div>
    </div>
  );
}

该界面基于 Vercel AI SDK 的 useChat Hook,带来参与感十足的对话体验。欢迎页提供示例查询便于上手;主聊天区支持流式展示消息。界面使用 Tailwind CSS,整体现代专业,带有渐变背景与平滑动画。组件优雅处理加载状态,并在 AI 处理期间提供可视化反馈。

NewsIQ 欢迎界面及示例查询的截图

步骤 5:更新根布局

通过更新 app/layout.tsx 并配置元信息,完成应用搭建:

import type { Metadata } from 'next'
import { Inter } from 'next/font/google'
import './globals.css'

const inter = Inter({ subsets: ['latin'] })

export const metadata: Metadata = {
  title: 'NewsIQ - AI News Research Assistant',
  description:
    'AI-powered news research, analysis, and fact-checking tool. Search across sources, detect bias, and stay informed with intelligent insights.',
  keywords: [
    'news',
    'AI',
    'research',
    'fact-checking',
    'bias detection',
    'news analysis',
  ],
}

export default function RootLayout({
  children,
}: {
  children: React.ReactNode
}) {
  return (
    <html lang="en">
      <body className={inter.className}>{children}</body>
    </html>
  )
}

该布局配置设定了恰当的 SEO 元数据,并加载 Inter 字体,为整个应用提供简洁、专业的排版。

步骤 6:运行应用

运行应用使用以下命令:

npm run dev

应用将启动于 http://localhost。要测试 NewsIQ 的能力,可尝试如下示例查询:

Fact-check: Did Apple announce a new product last week?

AI 将会根据你的请求自动选择合适的工具。询问新闻时会搜索 Google News;请求全文时会抓取文章内容;进行事实核查时会交叉引用多个来源。你将看到结果随着 AI 处理信息而实时流式返回。

NewsIQ 实战:搜索、抓取与分析新闻

步骤 7:部署到 Vercel

要将应用部署到生产环境,首先推送代码到 GitHub:

git init
git add .
git commit -m "Initial commit: NewsIQ AI News Assistant"
git branch -M main
git remote add origin https://github.com/yourusername/ai-news-assistant.git
git push -u origin main

然后部署到 Vercel:

  1. 访问 vercel.com 并使用 GitHub 登录
  2. 点击 “Add New Project” 并导入你的仓库
  3. 配置环境变量:
  • 添加 BRIGHTDATA_API_KEY
  • 添加 OPENAI_API_KEY
  1. 点击 “Deploy”

你的应用将在 2-3 分钟内上线,地址类似 https://ai-news-assistant.vercel.app

总结

这个 AI 新闻研究助手展示了自动化如何简化新闻收集与分析流程。若你对将 Bright Data 与 AI 工具集成的其他方式感兴趣,可查看我们的 使用 MCP 服务器进行网页抓取 指南。要进一步增强新闻监控工作流,可考虑使用 Bright Data 的产品,例如用于访问任意新闻来源的 Web Scraper API,以及为内容聚合与媒体监测团队打造的其他数据集与自动化工具。

Bright Data 文档 中探索更多解决方案。

创建一个免费的 Bright Data 帐号,开始使用你的自动化新闻研究工作流。

支持支付宝等多种支付方式