Skip to content

Commit

Permalink
📝Translating docs to Simplified Chinese (#2705)
Browse files Browse the repository at this point in the history
* 📝Translating docs to Simplified Chinese

* update files

* 📝Translating docs to Simplified Chinese

* 📝Translating docs to Simplified Chinese

* 📝Translating docs to Simplified Chinese

* update files

* 📝Translating docs to Simplified Chinese

* 📝Translating docs to Simplified Chinese

* update files

* translate 'hf_file_system.md'

* update files
  • Loading branch information
miaowumiaomiaowu authored Dec 13, 2024
1 parent ca3f674 commit 6be2b3e
Show file tree
Hide file tree
Showing 3 changed files with 253 additions and 0 deletions.
4 changes: 4 additions & 0 deletions docs/source/cn/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,10 @@
title: 集合
- local: guides/community
title: 社区
- local: guides/overview
title: 概览
- local: guides/hf_file_system
title: Hugging Face 文件系统
- title: "concepts"
sections:
- local: concepts/git_vs_http
Expand Down
119 changes: 119 additions & 0 deletions docs/source/cn/guides/hf_file_system.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
<!--⚠️ 请注意,此文件为 Markdown 格式,但包含我们文档生成器的特定语法(类似于 MDX),可能无法在您的 Markdown 查看器中正确渲染。
-->

# 通过文件系统 API 与 Hub 交互

除了 [`HfApi`]`huggingface_hub` 库还提供了 [`HfFileSystem`],这是一个符合 [fsspec](https://filesystem-spec.readthedocs.io/en/latest/) 规范的 Python 文件接口,用于与 Hugging Face Hub 交互。[`HfFileSystem`] 基于 [`HfApi`] 构建,提供了典型的文件系统操作,如 `cp``mv``ls``du``glob``get_file``put_file`

<Tip warning={true}>

[`HfFileSystem`] 提供了 fsspec 兼容性,这对于需要它的库(例如,直接使用 `pandas` 读取 Hugging Face 数据集)非常有用。然而,由于这种兼容性层,会引入额外的开销。为了更好的性能和可靠性,建议尽可能使用 [`HfApi`] 方法。


</Tip>

## 使用方法

```python
>>> from huggingface_hub import HfFileSystem
>>> fs = HfFileSystem()

>>> # 列出目录中的所有文件
>>> fs.ls("datasets/my-username/my-dataset-repo/data", detail=False)
['datasets/my-username/my-dataset-repo/data/train.csv', 'datasets/my-username/my-dataset-repo/data/test.csv']

>>> # 列出仓库中的所有 ".csv" 文件
>>> fs.glob("datasets/my-username/my-dataset-repo/**/*.csv")
['datasets/my-username/my-dataset-repo/data/train.csv', 'datasets/my-username/my-dataset-repo/data/test.csv']

>>> # 读取远程文件
>>> with fs.open("datasets/my-username/my-dataset-repo/data/train.csv", "r") as f:
... train_data = f.readlines()

>>> # 远程文件内容读取为字符串
>>> train_data = fs.read_text("datasets/my-username/my-dataset-repo/data/train.csv", revision="dev")

>>> # 写入远程文件
>>> with fs.open("datasets/my-username/my-dataset-repo/data/validation.csv", "w") as f:
... f.write("text,label")
... f.write("Fantastic movie!,good")
```

可以传递可选的 `revision` 参数,以从特定提交(如分支、标签名或提交哈希)运行操作。

与 Python 内置的 `open` 不同,`fsspec``open` 默认是二进制模式 `"rb"`。这意味着您必须明确设置模式为 `"r"` 以读取文本模式,或 `"w"` 以写入文本模式。目前不支持追加到文件(模式 `"a"``"ab"`

## 集成

[`HfFileSystem`] 可以与任何集成了 `fsspec` 的库一起使用,前提是 URL 遵循以下格式:

```
hf://[<repo_type_prefix>]<repo_id>[@<revision>]/<path/in/repo>
```

<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/huggingface_hub/hf_urls.png"/>
</div>

对于数据集,`repo_type_prefix``datasets/`,对于Space,`repo_type_prefix``spaces/`,模型不需要在 URL 中使用这样的前缀。

以下是一些 [`HfFileSystem`] 简化与 Hub 交互的有趣集成:

* 从 Hub 仓库读取/写入 [Pandas](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#reading-writing-remote-files) DataFrame :

```python
>>> import pandas as pd

>>> # 将远程 CSV 文件读取到 DataFrame
>>> df = pd.read_csv("hf://datasets/my-username/my-dataset-repo/train.csv")

>>> # 将 DataFrame 写入远程 CSV 文件
>>> df.to_csv("hf://datasets/my-username/my-dataset-repo/test.csv")
```

同样的工作流程也适用于 [Dask](https://docs.dask.org/en/stable/how-to/connect-to-remote-data.html)[Polars](https://pola-rs.github.io/polars/py-polars/html/reference/io.html) DataFrames.

* 使用 [DuckDB](https://duckdb.org/docs/guides/python/filesystems) 查询(远程)Hub文件:

```python
>>> from huggingface_hub import HfFileSystem
>>> import duckdb

>>> fs = HfFileSystem()
>>> duckdb.register_filesystem(fs)
>>> # 查询远程文件并将结果返回为 DataFrame
>>> fs_query_file = "hf://datasets/my-username/my-dataset-repo/data_dir/data.parquet"
>>> df = duckdb.query(f"SELECT * FROM '{fs_query_file}' LIMIT 10").df()
```

* 使用 [Zarr](https://zarr.readthedocs.io/en/stable/tutorial.html#io-with-fsspec) 将 Hub 作为数组存储:

```python
>>> import numpy as np
>>> import zarr

>>> embeddings = np.random.randn(50000, 1000).astype("float32")

>>> # 将数组写入仓库
>>> with zarr.open_group("hf://my-username/my-model-repo/array-store", mode="w") as root:
... foo = root.create_group("embeddings")
... foobar = foo.zeros('experiment_0', shape=(50000, 1000), chunks=(10000, 1000), dtype='f4')
... foobar[:] = embeddings

>>> # 从仓库读取数组
>>> with zarr.open_group("hf://my-username/my-model-repo/array-store", mode="r") as root:
... first_row = root["embeddings/experiment_0"][0]
```

## 认证

在许多情况下,您必须登录 Hugging Face 账户才能与 Hub 交互。请参阅文档的[认证](../quick-start#authentication) 部分,了解有关 Hub 上认证方法的更多信息。

也可以通过将您的 token 作为参数传递给 [`HfFileSystem`] 以编程方式登录:

```python
>>> from huggingface_hub import HfFileSystem
>>> fs = HfFileSystem(token=token)
```

如果您以这种方式登录,请注意在共享源代码时不要意外泄露令牌!
130 changes: 130 additions & 0 deletions docs/source/cn/guides/overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
<!--⚠️ 请注意,此文件为 Markdown 格式,但包含我们文档生成器的特定语法(类似于 MDX),可能无法在您的 Markdown 查看器中正确渲染。
-->

# 操作指南

在本节中,您将找到帮助您实现特定目标的实用指南。
查看这些指南,了解如何使用 huggingface_hub 解决实际问题:

<div class="mt-10">
<div class="w-full flex flex-col space-y-4 md:space-y-0 md:grid md:grid-cols-3 md:gap-y-4 md:gap-x-5">

<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg"
href="./repository">
<div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">
仓库
</div><p class="text-gray-700">
如何在 Hub 上创建仓库?如何配置它?如何与之交互?
</p>
</a>

<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg"
href="./download">
<div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">
下载文件
</div><p class="text-gray-700">
如何从 Hub 下载文件?如何下载仓库?
</p>
</a>

<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg"
href="./upload">
<div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">
上传文件
</div><p class="text-gray-700">
如何上传文件或文件夹?如何对 Hub 上的现有仓库进行更改?
</p>
</a>

<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg"
href="./search">
<div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">
搜索
</div><p class="text-gray-700">
如何高效地搜索超过 200k+ 个公共模型、数据集和Space?
</p>
</a>

<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg"
href="./hf_file_system">
<div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">
HfFileSystem
</div><p class="text-gray-700">
如何通过一个模仿 Python 文件接口的便捷接口与 Hub 交互?
</p>
</a>

<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg"
href="./inference">
<div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">
推理
</div><p class="text-gray-700">
如何使用加速推理 API 进行预测?
</p>
</a>

<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg"
href="./community">
<div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">
社区
</div><p class="text-gray-700">
如何与社区(讨论和拉取请求)互动?
</p>
</a>

<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg"
href="./collections">
<div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">
集合
</div><p class="text-gray-700">
如何以编程方式构建集合?
</p>
</a>

<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg"
href="./manage-cache">
<div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">
缓存
</div><p class="text-gray-700">
缓存系统如何工作?如何从中受益?
</p>
</a>

<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg"
href="./model-cards">
<div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">
模型卡片
</div><p class="text-gray-700">
如何创建和分享模型卡片?
</p>
</a>

<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg"
href="./manage-spaces">
<div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">
管理您的Space
</div><p class="text-gray-700">
如何管理您的Space的硬件和配置?
</p>
</a>

<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg"
href="./integrations">
<div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">
集成库
</div><p class="text-gray-700">
将库集成到 Hub 中意味着什么?如何实现?
</p>
</a>

<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg"
href="./webhooks_server">
<div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">
Webhooks 服务器
</div><p class="text-gray-700">
如何创建一个接收 Webhooks 的服务器并将其部署为一个Space?
</p>
</a>

</div>
</div>

0 comments on commit 6be2b3e

Please sign in to comment.