在本指南中,我们将逐步了解以下与 Python API 调用相关的概念:
- 什么是 HTTP?
- 什么是 REST API?
- 如何发起 GET 请求
- 如何发起 POST 请求
- 如何使用 SDK
什么是 HTTP?
HTTP(超文本传输协议)是大多数数据在网络上传输的标准方式。你可能听说过,数据库构成了大多数网站的后端——确实如此,但在我们的客户端(浏览器或 Python 脚本)与数据库真正进行交互时,中间有诸多细节。HTTP 就是客户端与后端服务器之间的通信层。
在使用 HTTP 进行网页抓取和 Web API 请求时,你最常用到的就是以下方法:
- GET:这是使用最频繁的方法。每当你访问一个网站时,你的浏览器会执行 GET 请求获取 HTML,然后呈现页面供你查看。
- POST:这是第二常用的方法。POST 常用于安全地传输较大规模的数据——通常是往数据库中添加内容。当你填写表单、问卷或在社交媒体上发帖时,你就是在执行一次 POST 请求。
- PUT:PUT 请求用于更新数据库中的现有条目。当你编辑社交媒体帖子时,在底层会使用 PUT 请求。
- DELETE:如果你想要删除一条社交媒体帖子(或数据库中的任何内容),浏览器就会向服务器发送 DELETE 请求来移除它。
HTTP 及其缺乏统一的返回标准
HTTP 虽然简单,但缺乏统一的返回数据标准。有些服务器默认返回 HTML,有些则返回 JSON,甚至还有返回 XML 或纯文本等旧数据结构的情况。
首先,让我们实现一个最基本的 GET 请求。如果你还没有安装 Python Requests 库,可以通过 pip 安装。
pip install requests
安装完成后,你可以运行以下代码来执行简单的 GET 请求。留意终端输出。
import requests
response = requests.get("https://quotes.toscrape.com")
print(response.text)
运行这段代码后,你会发现我们获取到的是一个 HTML 页面。虽然在浏览器中很好看,但在终端中就比较混乱。下面展示的输出只是裁剪后的一部分,但你能感受到它的原始样貌。
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Quotes to Scrape</title>
<link rel="stylesheet" href="/static/bootstrap.min.css">
<link rel="stylesheet" href="/static/main.css">
</head>
<body>
<div class="container">
<div class="row header-box">
<div class="col-md-8">
<h1>
<a href="/" style="text-decoration: none">Quotes to Scrape</a>
</h1>
</div>
<div class="col-md-4">
<p>
<a href="/login">Login</a>
</p>
</div>
</div>
<div class="row">
<div class="col-md-8">
<div class="quote" itemscope itemtype="http://schema.org/CreativeWork">
<span class="text" itemprop="text">“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”</span>
<span>by <small class="author" itemprop="author">Albert Einstein</small>
<a href="/author/Albert-Einstein">(about)</a>
</span>
<div class="tags">
Tags:
<meta class="keywords" itemprop="keywords" content="change,deep-thoughts,thinking,world" / >
<a class="tag" href="/tag/change/page/1/">change</a>
<a class="tag" href="/tag/deep-thoughts/page/1/">deep-thoughts</a>
<a class="tag" href="/tag/thinking/page/1/">thinking</a>
<a class="tag" href="/tag/world/page/1/">world</a>
</div>
</div>
<div class="quote" itemscope itemtype="http://schema.org/CreativeWork">
<span class="text" itemprop="text">“It is our choices, Harry, that show what we truly are, far more than our abilities.”</span>
<span>by <small class="author" itemprop="author">J.K. Rowling</small>
<a href="/author/J-K-Rowling">(about)</a>
</span>
<div class="tags">
Tags:
<meta class="keywords" itemprop="keywords" content="abilities,choices" / >
<a class="tag" href="/tag/abilities/page/1/">abilities</a>
<a class="tag" href="/tag/choices/page/1/">choices</a>
</div>
</div>
HTML 页面是供浏览器读取和渲染的,并不适合直接在代码中进行解析或整合。
REST(Representational State Transfer)如何解决这一问题
REST API 为数据管道提供了一种设计标准。JSON 在 REST API 中是最受欢迎的返回格式,灵活且易读。它清晰可读的语法也使得在编程环境中解析起来非常简单。
下面展示了 JSON 的基本样子。要记住,我们使用 REST API 就能获得这种类型的数据结构。
{
"name": "Jake",
"age": 34,
"professions": ["writing", "coding"]
}
REST API 通过端点、参数和 HTTP 方法来控制返回数据及其格式。
发起你的第一个 API 请求
现在你已经知道了 REST API 应该做什么,让我们来实际调用一下。Quotes to Scrape 同样提供了一个 REST API。我们不再简单地访问首页,而是去访问他们的 API。我们与服务器之间的通信依靠端点来完成。
完整的端点 /api/quotes
可以被拆分为如下两个部分:
/api
:这告诉服务器,我们想要的是结构化的 API 数据,而不是 HTML 页面。/quotes
:我们希望 API 返回quotes
端点的数据。
发起请求
像之前一样,运行下列代码。
import requests
import json
response = requests.get("https://quotes.toscrape.com/api/quotes")
print(json.dumps(response.json(), indent=4))
现在,返回的数据干净且结构化,很容易进行解析——从而让我们对数据做几乎任何想做的事情。
{
"has_next": true,
"page": 1,
"quotes": [
{
"author": {
"goodreads_link": "/author/show/9810.Albert_Einstein",
"name": "Albert Einstein",
"slug": "Albert-Einstein"
},
"tags": [
"change",
"deep-thoughts",
"thinking",
"world"
],
"text": "u201cThe world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.u201d"
},
{
"author": {
"goodreads_link": "/author/show/1077326.J_K_Rowling",
"name": "J.K. Rowling",
"slug": "J-K-Rowling"
},
"tags": [
"abilities",
"choices"
],
"text": "u201cIt is our choices, Harry, that show what we truly are, far more than our abilities.u201d"
},
{
"author": {
"goodreads_link": "/author/show/9810.Albert_Einstein",
"name": "Albert Einstein",
"slug": "Albert-Einstein"
},
"tags": [
"inspirational",
"life",
"live",
"miracle",
"miracles"
],
"text": "u201cThere are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.u201d"
},
{
"author": {
"goodreads_link": "/author/show/1265.Jane_Austen",
"name": "Jane Austen",
"slug": "Jane-Austen"
},
"tags": [
"aliteracy",
"books",
"classic",
"humor"
],
"text": "u201cThe person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.u201d"
},
{
"author": {
"goodreads_link": "/author/show/82952.Marilyn_Monroe",
"name": "Marilyn Monroe",
"slug": "Marilyn-Monroe"
},
"tags": [
"be-yourself",
"inspirational"
],
"text": "u201cImperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.u201d"
},
{
"author": {
"goodreads_link": "/author/show/9810.Albert_Einstein",
"name": "Albert Einstein",
"slug": "Albert-Einstein"
},
"tags": [
"adulthood",
"success",
"value"
],
"text": "u201cTry not to become a man of success. Rather become a man of value.u201d"
},
{
"author": {
"goodreads_link": "/author/show/7617.Andr_Gide",
"name": "Andru00e9 Gide",
"slug": "Andre-Gide"
},
"tags": [
"life",
"love"
],
"text": "u201cIt is better to be hated for what you are than to be loved for what you are not.u201d"
},
{
"author": {
"goodreads_link": "/author/show/3091287.Thomas_A_Edison",
"name": "Thomas A. Edison",
"slug": "Thomas-A-Edison"
},
"tags": [
"edison",
"failure",
"inspirational",
"paraphrased"
],
"text": "u201cI have not failed. I've just found 10,000 ways that won't work.u201d"
},
{
"author": {
"goodreads_link": "/author/show/44566.Eleanor_Roosevelt",
"name": "Eleanor Roosevelt",
"slug": "Eleanor-Roosevelt"
},
"tags": [
"misattributed-eleanor-roosevelt"
],
"text": "u201cA woman is like a tea bag; you never know how strong it is until it's in hot water.u201d"
},
{
"author": {
"goodreads_link": "/author/show/7103.Steve_Martin",
"name": "Steve Martin",
"slug": "Steve-Martin"
},
"tags": [
"humor",
"obvious",
"simile"
],
"text": "u201cA day without sunshine is like, you know, night.u201d"
}
],
"tag": null,
"top_ten_tags": [
[
"love",
14
],
[
"inspirational",
13
],
[
"life",
13
],
[
"humor",
12
],
[
"books",
11
],
[
"reading",
7
],
[
"friendship",
5
],
[
"friends",
4
],
[
"truth",
4
],
[
"simile",
3
]
]
}
发起需要身份验证的请求
现在我们已经看到了如何请求公共数据,让我们看看需要身份验证的 API。很多情况下,你需要自己的 API 密钥才能获取数据。大多数 API 服务器要求在请求头中携带你的 API 密钥来完成验证。
发起一个基本的 GET 请求非常简单。现在,我们试着发起一个 POST 请求。POST 请求用于安全地处理更大规模的数据。在下面的示例代码中,我们使用网络解锁器 API 来解析网页并返回 Markdown。
import requests
API_KEY = "your-api-key"
ZONE = "web_unlocker1"
url = "https://api.brightdata.com/request"
headers = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json"
}
payload = {
"url": "https://quotes.toscrape.com/",
"zone": ZONE,
"format": "raw",
"data_format": "markdown"
}
response = requests.post(url, headers=headers, json=payload)
print(response.text)
这次我们的请求地址是 https://api.brightdata.com/request
。所有的细节都由 headers
和 payload
控制。
以下是我们的 headers
:
"Authorization": f"Bearer {API_KEY}"
:将请求与您的 Bright Data 帐号关联。"Content-Type": "application/json"
:告诉服务器我们发送的是 JSON 格式的数据。
再来看看 payload
:
"url"
:我们想要网络解锁器访问的目标页面。"zone"
:你的网络解锁器实例所设置的区域名称。"format"
:我们想要的响应格式(此处选择 raw)。"data_format"
:这里使用 “markdown”——告诉 Bright Data 我们想将页面转换为 Markdown 格式。它虽然不如 JSON 灵活,但可以很容易地再转换成 JSON。
下面是转换为 Markdown 后的终端输出。
# [Quotes to Scrape](/)
[Login](/login)
“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.” by Albert Einstein [(about)](/author/Albert-Einstein)
Tags: [change](/tag/change/page/1/) [deep-thoughts](/tag/deep-thoughts/page/1/) [thinking](/tag/thinking/page/1/) [world](/tag/world/page/1/)
“It is our choices, Harry, that show what we truly are, far more than our abilities.” by J.K. Rowling [(about)](/author/J-K-Rowling)
Tags: [abilities](/tag/abilities/page/1/) [choices](/tag/choices/page/1/)
“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.” by Albert Einstein [(about)](/author/Albert-Einstein)
Tags: [inspirational](/tag/inspirational/page/1/) [life](/tag/life/page/1/) [live](/tag/live/page/1/) [miracle](/tag/miracle/page/1/) [miracles](/tag/miracles/page/1/)
“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.” by Jane Austen [(about)](/author/Jane-Austen)
Tags: [aliteracy](/tag/aliteracy/page/1/) [books](/tag/books/page/1/) [classic](/tag/classic/page/1/) [humor](/tag/humor/page/1/)
“Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.” by Marilyn Monroe [(about)](/author/Marilyn-Monroe)
Tags: [be-yourself](/tag/be-yourself/page/1/) [inspirational](/tag/inspirational/page/1/)
“Try not to become a man of success. Rather become a man of value.” by Albert Einstein [(about)](/author/Albert-Einstein)
Tags: [adulthood](/tag/adulthood/page/1/) [success](/tag/success/page/1/) [value](/tag/value/page/1/)
“It is better to be hated for what you are than to be loved for what you are not.” by André Gide [(about)](/author/Andre-Gide)
Tags: [life](/tag/life/page/1/) [love](/tag/love/page/1/)
“I have not failed. I've just found 10,000 ways that won't work.” by Thomas A. Edison [(about)](/author/Thomas-A-Edison)
Tags: [edison](/tag/edison/page/1/) [failure](/tag/failure/page/1/) [inspirational](/tag/inspirational/page/1/) [paraphrased](/tag/paraphrased/page/1/)
“A woman is like a tea bag; you never know how strong it is until it's in hot water.” by Eleanor Roosevelt [(about)](/author/Eleanor-Roosevelt)
Tags: [misattributed-eleanor-roosevelt](/tag/misattributed-eleanor-roosevelt/page/1/)
“A day without sunshine is like, you know, night.” by Steve Martin [(about)](/author/Steve-Martin)
Tags: [humor](/tag/humor/page/1/) [obvious](/tag/obvious/page/1/) [simile](/tag/simile/page/1/)
* [Next →](/page/2/)
## Top Ten tags
[love](/tag/love/) [inspirational](/tag/inspirational/) [life](/tag/life/) [humor](/tag/humor/) [books](/tag/books/) [reading](/tag/reading/) [friendship](/tag/friendship/) [friends](/tag/friends/) [truth](/tag/truth/) [simile](/tag/simile/)
Quotes by: [GoodReads.com](https://www.goodreads.com/quotes)
M
身份验证通常使用一个唯一标识(通常是 API 密钥)。在这里,我们使用了网络解锁器,原则是相同的——无论使用什么 API 服务,底层逻辑大体一致。
处理响应
每一个响应都会带有一个状态码(status code)。状态码用于向客户端传达不同信息。在理想情况下,你会一直收到 200
状态码。
可惜现实并不总是如人所愿。如果你收到非 200
的状态码,就说明有问题发生了:
- 400-499:通常意味着客户端错误。请仔细检查你的 API 密钥以及请求格式。
- 500-599:表示服务器端错误。你的请求本身没问题,但服务器无法完成这个请求。
你可以在这里了解更多状态码。如果你想了解如何通过 Python 来处理这些状态码,请参考这篇关于重试逻辑的教程。
使用 SDK 跳过重复的请求样板
SDK(软件开发工具包)让我们在连接 REST API 时省去了大量的底层错误处理和重试逻辑。OpenAI API 就提供了一个完整的 REST API。你可以在 这里查看。
要安装他们的 SDK 并跳过手动编写 HTTP 请求,只需运行以下命令。
pip install openai
现在,我们导入 OpenAI 的 SDK。我们先像最开始那样抓取纯 HTML 页面。如果你想深入研究手动解析 HTML,可以看看 Requests 与 BeautifulSoup 的用法。抓取到 HTML 页面之后,我们再用 SDK 将其传给 ChatGPT 进行解析。
from openai import OpenAI
import requests
OPENAI_API_KEY = "sk-your-openai-api-key"
response = requests.get("https://quotes.toscrape.com")
html_page = response.text
client = OpenAI(api_key=OPENAI_API_KEY)
chat = client.chat.completions.create(
messages=[
{
"role": "user",
"content": f"Parse the quotes from the following page. I want JSON only--zero commentary from you, here's the page: {html_page}",
}
],
model="gpt-4o-mini",
)
reply = chat.choices[0].message.content
print(f"ChatGPT: {reply}")
这一次的输出就大不相同了。我们无需做任何解析,即可拿到 JSON 结构的数据。
[
{
"text": "The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.",
"author": "Albert Einstein",
"tags": ["change", "deep-thoughts", "thinking", "world"]
},
{
"text": "It is our choices, Harry, that show what we truly are, far more than our abilities.",
"author": "J.K. Rowling",
"tags": ["abilities", "choices"]
},
{
"text": "There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.",
"author": "Albert Einstein",
"tags": ["inspirational", "life", "live", "miracle", "miracles"]
},
{
"text": "The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.",
"author": "Jane Austen",
"tags": ["aliteracy", "books", "classic", "humor"]
},
{
"text": "Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.",
"author": "Marilyn Monroe",
"tags": ["be-yourself", "inspirational"]
},
{
"text": "Try not to become a man of success. Rather become a man of value.",
"author": "Albert Einstein",
"tags": ["adulthood", "success", "value"]
},
{
"text": "It is better to be hated for what you are than to be loved for what you are not.",
"author": "André Gide",
"tags": ["life", "love"]
},
{
"text": "I have not failed. I've just found 10,000 ways that won't work.",
"author": "Thomas A. Edison",
"tags": ["edison", "failure", "inspirational", "paraphrased"]
},
{
"text": "A woman is like a tea bag; you never know how strong it is until it's in hot water.",
"author": "Eleanor Roosevelt",
"tags": ["misattributed-eleanor-roosevelt"]
},
{
"text": "A day without sunshine is like, you know, night.",
"author": "Steve Martin",
"tags": ["humor", "obvious", "simile"]
}
]
SDK 为我们提供了 REST API 的全部能力,却不需要手动管理 HTTP。如果你对使用 AI 进行抓取感兴趣,可以参考我们关于 Claude 和 DeepSeek 的相关教程。
结论
现在你已经知道如何用 Python 发起基本的 API 请求,你可以开始更大规模的项目了。你可以使用 API 与各种服务进行交互来获取数据,或者利用 SDK 自动化解析数据。在本教程中,我们使用了网络解锁器,Bright Data 还提供了许多其他产品来满足你的数据需求:
- 住宅代理:通过真实的家庭 IP 设备代理你的 HTTP 流量。
- 网页抓取工具 API:让你完全自动化地抓取网页,并将结果直接下载到你的编程环境。
- 抓取浏览器:绕过 CAPTCHA 验证,并在你的 Python 脚本中控制一个真正的无头浏览器。
注册一个免费的试用帐户,马上开始吧!
支持支付宝等多种支付方式