From 3a9f5ee2b2f8ad57a6c3c915ea6a0fa3ce4f3220 Mon Sep 17 00:00:00 2001 From: liuwenli Date: Sat, 3 Aug 2019 09:56:50 +0800 Subject: [PATCH 1/5] =?UTF-8?q?=E7=94=A8=E6=88=B7=E6=8C=87=E5=8D=97?= =?UTF-8?q?=E8=BE=93=E5=85=A5=E8=BE=93=E5=87=BAAPI=E9=83=A8=E5=88=86?= =?UTF-8?q?=E7=BF=BB=E8=AF=91?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- docs/docs/user_guide/io.md | 28 +++++++++++----------------- 1 file changed, 11 insertions(+), 17 deletions(-) diff --git a/docs/docs/user_guide/io.md b/docs/docs/user_guide/io.md index c32064d..2aaaf48 100644 --- a/docs/docs/user_guide/io.md +++ b/docs/docs/user_guide/io.md @@ -1,10 +1,6 @@ # IO工具(文本,CSV,HDF5,…) -The pandas I/O API is a set of top level ``reader`` functions accessed like -[``pandas.read_csv()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html#pandas.read_csv) that generally return a pandas object. The corresponding -``writer`` functions are object methods that are accessed like -[``DataFrame.to_csv()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html#pandas.DataFrame.to_csv). Below is a table containing available ``readers`` and -``writers``. +pandas的I/O API是一组``read``函数,比如[``pandas.read_csv()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html#pandas.read_csv)函数。这类函数可以返回pandas对象。相应的``write``函数是像[``DataFrame.to_csv()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html#pandas.DataFrame.to_csv)一样的对象方法。下面是一个方法列表,包含了这里面的所有``readers``函数和``writer``函数。 Format Type | Data Description | Reader | Writer ---|---|---|--- @@ -12,8 +8,8 @@ text | [CSV](https://en.wikipedia.org/wiki/Comma-separated_values) | [read_csv]( text | [JSON](https://www.json.org/) | [read_json](#io-json-reader) | [to_json](#io-json-writer) text | [HTML](https://en.wikipedia.org/wiki/HTML) | [read_html](#io-read-html) | [to_html](#io-html) text | Local clipboard | [read_clipboard](#io-clipboard) | [to_clipboard](#io-clipboard) -binary | [MS Excel](https://en.wikipedia.org/wiki/Microsoft_Excel) | [[read_excel](#io-ods)](#io-excel-reader) | [to_excel](#io-excel-writer) -binary | [OpenDocument](http://www.opendocumentformat.org) | read_excel |   +binary | [MS Excel](https://en.wikipedia.org/wiki/Microsoft_Excel) | [read_excel](#io-excel-reader) | [to_excel](#io-excel-writer) +binary | [OpenDocument](http://www.opendocumentformat.org) | [read_excel](#io-ods) |   binary | [HDF5 Format](https://support.hdfgroup.org/HDF5/whatishdf5.html) | [read_hdf](#io-hdf5) | [to_hdf](#io-hdf5) binary | [Feather Format](https://github.com/wesm/feather) | [read_feather](#io-feather) | [to_feather](#io-feather) binary | [Parquet Format](https://parquet.apache.org/) | [read_parquet](#io-parquet) | [to_parquet](#io-parquet) @@ -26,24 +22,22 @@ SQL | [Google Big Query](https://en.wikipedia.org/wiki/BigQuery) | [read_gbq](#i [Here](#io-perf) is an informal performance comparison for some of these IO methods. -::: tip Note +::: tip 注意 -For examples that use the ``StringIO`` class, make sure you import it -according to your Python version, i.e. ``from StringIO import StringIO`` for -Python 2 and ``from io import StringIO`` for Python 3. +比如在使用 ``StringIO`` 类时, 请先确定python的版本信息。也就是说,是使用python2的``from StringIO import StringIO``还是python3的``from io import StringIO``。 ::: -## CSV & text files +## CSV & 文本文件 -The workhorse function for reading text files (a.k.a. flat files) is -[``read_csv()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html#pandas.read_csv). See the [cookbook](cookbook.html#cookbook-csv) for some advanced strategies. +读文本文件 (a.k.a. flat files)的主要方法 is +[``read_csv()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html#pandas.read_csv). 关于一些更高级的用法请参阅[cookbook](cookbook.html#cookbook-csv)。 -### Parsing options +### 方法解析(Parsing options) -[``read_csv()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html#pandas.read_csv) accepts the following common arguments: +[``read_csv()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html#pandas.read_csv) 可接受以下常用参数: -#### Basic +#### 基础 filepath_or_buffer : *various* From 389e012439d11dd88978250e05341d8fdc86d6f9 Mon Sep 17 00:00:00 2001 From: liuwenli Date: Sat, 3 Aug 2019 11:28:46 +0800 Subject: [PATCH 2/5] =?UTF-8?q?=E8=8B=B1=E6=96=87=E7=89=88=E6=9C=AC?= =?UTF-8?q?=E7=94=A8=E6=88=B7=E6=8C=87=E5=8D=97=E8=BE=93=E5=85=A5=E8=BE=93?= =?UTF-8?q?=E5=87=BA=E9=83=A8=E5=88=86bug=E4=BF=AE=E6=94=B9?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- docs/en/docs/user_guide/io.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/en/docs/user_guide/io.md b/docs/en/docs/user_guide/io.md index e2480a0..41db05a 100644 --- a/docs/en/docs/user_guide/io.md +++ b/docs/en/docs/user_guide/io.md @@ -13,8 +13,8 @@ text | [CSV](https://en.wikipedia.org/wiki/Comma-separated_values) | [read_csv]( text | [JSON](https://www.json.org/) | [read_json](#io-json-reader) | [to_json](#io-json-writer) text | [HTML](https://en.wikipedia.org/wiki/HTML) | [read_html](#io-read-html) | [to_html](#io-html) text | Local clipboard | [read_clipboard](#io-clipboard) | [to_clipboard](#io-clipboard) -binary | [MS Excel](https://en.wikipedia.org/wiki/Microsoft_Excel) | [[read_excel](#io-ods)](#io-excel-reader) | [to_excel](#io-excel-writer) -binary | [OpenDocument](http://www.opendocumentformat.org) | read_excel |   +binary | [MS Excel](https://en.wikipedia.org/wiki/Microsoft_Excel) | [read_excel](#io-excel-reader) | [to_excel](#io-excel-writer) +binary | [OpenDocument](http://www.opendocumentformat.org) | [read_excel](#io-ods) |   binary | [HDF5 Format](https://support.hdfgroup.org/HDF5/whatishdf5.html) | [read_hdf](#io-hdf5) | [to_hdf](#io-hdf5) binary | [Feather Format](https://github.com/wesm/feather) | [read_feather](#io-feather) | [to_feather](#io-feather) binary | [Parquet Format](https://parquet.apache.org/) | [read_parquet](#io-parquet) | [to_parquet](#io-parquet) From fa44bba2aeb467052d1f36153c7be5868d397a78 Mon Sep 17 00:00:00 2001 From: liuwenli Date: Fri, 9 Aug 2019 14:33:51 +0800 Subject: [PATCH 3/5] =?UTF-8?q?=E7=94=A8=E6=88=B7=E6=8C=87=E5=8D=97io?= =?UTF-8?q?=E9=83=A8=E5=88=86=E7=BF=BB=E8=AF=91?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- docs/docs/user_guide/io.md | 96 +++++++++++++------------------------- 1 file changed, 33 insertions(+), 63 deletions(-) diff --git a/docs/docs/user_guide/io.md b/docs/docs/user_guide/io.md index 2aaaf48..d54773f 100644 --- a/docs/docs/user_guide/io.md +++ b/docs/docs/user_guide/io.md @@ -41,85 +41,57 @@ SQL | [Google Big Query](https://en.wikipedia.org/wiki/BigQuery) | [read_gbq](#i filepath_or_buffer : *various* -- Either a path to a file (a [``str``](https://docs.python.org/3/library/stdtypes.html#str), [``pathlib.Path``](https://docs.python.org/3/library/pathlib.html#pathlib.Path), +- 文件路径 (a [``str``](https://docs.python.org/3/library/stdtypes.html#str), [``pathlib.Path``](https://docs.python.org/3/library/pathlib.html#pathlib.Path), or ``py._path.local.LocalPath``), URL (including http, ftp, and S3 -locations), or any object with a ``read()`` method (such as an open file or +locations), 或者具有 ``read()`` 方法的任何对象 (such as an open file or [``StringIO``](https://docs.python.org/3/library/io.html#io.StringIO)). -sep : *str, defaults to ``','`` for [``read_csv()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html#pandas.read_csv), ``\t`` for [``read_table()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_table.html#pandas.read_table)* +sep : *str, 默认 [``read_csv()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html#pandas.read_csv)分隔符为``','``, [``read_table()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_table.html#pandas.read_table)方法,分隔符为 ``\t``* -- Delimiter to use. If sep is ``None``, the C engine cannot automatically detect -the separator, but the Python parsing engine can, meaning the latter will be -used and automatically detect the separator by Python’s builtin sniffer tool, -[``csv.Sniffer``](https://docs.python.org/3/library/csv.html#csv.Sniffer). In addition, separators longer than 1 character and -different from ``'s+'`` will be interpreted as regular expressions and -will also force the use of the Python parsing engine. Note that regex -delimiters are prone to ignoring quoted data. Regex example: ``'\r\t'``. +- 分隔符的使用. 如果分隔符为``None``,虽然C不能解析,但python解析引擎可解析,这意味着python将被使用,通过内置的sniffer tool自动检测分隔符, +[``csv.Sniffer``](https://docs.python.org/3/library/csv.html#csv.Sniffer). 除此之外,字符长度超过1并且不同于 ``'s+'`` 的将被视为正则表达式,并且将强制使用python解析引擎。需要注意的是,正则表达式易于忽略引用数据(主要注意转义字符的使用) 例如: ``'\\r\\t'``. delimiter : *str, default ``None``* -- Alternative argument name for sep. +- sep的替代参数. delim_whitespace : *boolean, default False* -- Specifies whether or not whitespace (e.g. ``' '`` or ``'\t'``) -will be used as the delimiter. Equivalent to setting ``sep='\s+'``. -If this option is set to ``True``, nothing should be passed in for the -``delimiter`` parameter. +- 指定是否将空格 (e.g. ``' '`` or ``'\t'``)视为delimiter。 +等价于设置 ``sep='\s+'``. +如果这个选项被设置为 ``True``,就不要给 +``delimiter`` 传参了. -*New in version 0.18.1:* support for the Python parser. +*version 0.18.1:* 支持Python解析器. -#### Column and index locations and names +#### 列、索引、名称 header : *int or list of ints, default ``'infer'``* -- Row number(s) to use as the column names, and the start of the -data. Default behavior is to infer the column names: if no names are -passed the behavior is identical to ``header=0`` and column names -are inferred from the first line of the file, if column names are -passed explicitly then the behavior is identical to -``header=None``. Explicitly pass ``header=0`` to be able to replace -existing names. - -- The header can be a list of ints that specify row locations -for a MultiIndex on the columns e.g. ``[0,1,3]``. Intervening rows -that are not specified will be skipped (e.g. 2 in this example is -skipped). Note that this parameter ignores commented lines and empty -lines if ``skip_blank_lines=True``, so header=0 denotes the first -line of data rather than the first line of the file. +- 当选择默认值或``header=0``时,将首行设为列名。如果列名被传入明确值就令``header=None``。注意,当``header=0``时,即使列名被传参也会被覆盖。 + + +- 标题可以是指定列上的MultiIndex的行位置的整数列表,例如 ``[0,1,3]``。在列名指定时,若某列未被指定,读取时将跳过该列 (例如 在下面的例子中第二列将被跳过).注意,如果 ``skip_blank_lines=True``,此参数将忽略空行和注释行, 因此 header=0 表示第一行数据而非文件的第一行. names : *array-like, default ``None``* -- List of column names to use. If file contains no header row, then you should -explicitly pass ``header=None``. Duplicates in this list are not allowed. +- 列名列表的使用. 如果文件不包含列名,那么应该设置``header=None``。 列名列表中不允许有重复值. index_col : *int, str, sequence of int / str, or False, default ``None``* -- Column(s) to use as the row labels of the ``DataFrame``, either given as -string name or column index. If a sequence of int / str is given, a -MultiIndex is used. +- ``DataFrame``的行索引列表, 既可以是字符串名称也可以是列索引. 如果传入一个字符串序列或者整数序列,那么一定要使用多级索引(MultiIndex). -- Note: ``index_col=False`` can be used to force pandas to not use the first -column as the index, e.g. when you have a malformed file with delimiters at -the end of each line. +- 注意: 当``index_col=False`` ,pandas不再使用首列作为索引。例如, 当你的文件是一个每行末尾都带有一个分割符的格式错误的文件时. usecols : *list-like or callable, default ``None``* -- Return a subset of the columns. If list-like, all elements must either -be positional (i.e. integer indices into the document columns) or strings -that correspond to column names provided either by the user in *names* or -inferred from the document header row(s). For example, a valid list-like -*usecols* parameter would be ``[0, 1, 2]`` or ``['foo', 'bar', 'baz']``. +- 返回列名列表的子集. 如果该参数为列表形式, 那么所有元素应全为位置(即文档列中的整数索引)或者 全为相应列的列名字符串(这些列名字符串为*names*参数给出的或者文档的``header``行内容).例如,一个有效的列表型参数 +*usecols* 将会是是 ``[0, 1, 2]`` 或者 ``['foo', 'bar', 'baz']``. -- Element order is ignored, so ``usecols=[0, 1]`` is the same as ``[1, 0]``. To -instantiate a DataFrame from ``data`` with element order preserved use -``pd.read_csv(data, usecols=['foo', 'bar'])[['foo', 'bar']]`` for columns -in ``['foo', 'bar']`` order or -``pd.read_csv(data, usecols=['foo', 'bar'])[['bar', 'foo']]`` for -``['bar', 'foo']`` order. +- 元素顺序可忽略,因此 ``usecols=[0, 1]``等价于 ``[1, 0]``。如果想实例化一个自定义列顺序的DataFrame,请使用``pd.read_csv(data, usecols=['foo', 'bar'])[['foo', 'bar']]`` ,这样列的顺序为 ``['foo', 'bar']`` 。如果设置``pd.read_csv(data, usecols=['foo', 'bar'])[['bar', 'foo']]`` 那么列的顺序为``['bar', 'foo']`` 。 -- If callable, the callable function will be evaluated against the column names, -returning names where the callable function evaluates to True: +- 如果使用callable的方式, 可调用函数将根据列名计算, +返回可调用函数计算结果为True的名称: ``` python In [1]: from io import StringIO, BytesIO @@ -146,34 +118,32 @@ Out[4]: ``` -Using this parameter results in much faster parsing time and lower memory usage. +使用此参数可以大大加快解析时间并降低内存使用率。 squeeze : *boolean, default ``False``* -- If the parsed data only contains one column then return a ``Series``. +- 如果解析的数据仅包含一个列,那么结果将以 ``Series``的形式返回. prefix : *str, default ``None``* -- Prefix to add to column numbers when no header, e.g. ‘X’ for X0, X1, … +- 当没有header时,可通过该参数为数字列名添加前缀, e.g. ‘X’ for X0, X1, … mangle_dupe_cols : *boolean, default ``True``* -- Duplicate columns will be specified as ‘X’, ‘X.1’…’X.N’, rather than ‘X’…’X’. Passing in ``False`` will cause data to be overwritten if there are duplicate names in the columns. +- 当列名有重复时,解析列名将变为 ‘X’, ‘X.1’…’X.N’而不是 ‘X’…’X’。 如果该参数为 ``False`` ,那么当列名中有重复时,前列将会被后列覆盖。 -#### General parsing configuration +#### 常规解析配置 dtype : *Type name or dict of column -> type, default ``None``* -- Data type for data or columns. E.g. ``{'a': np.float64, 'b': np.int32}`` -(unsupported with ``engine='python'``). Use *str* or *object* together -with suitable ``na_values`` settings to preserve and -not interpret dtype. +- 指定某列或整体数据的数据类型. E.g. ``{'a': np.float64, 'b': np.int32}`` +(不支持 ``engine='python'``).将*str*或*object*与合适的设置一起使用以保留和不解释dtype。 -- *New in version 0.20.0:* support for the Python parser. +- *New in version 0.20.0:* 支持python解析器. engine : *{``'c'``, ``'python'``}* -- Parser engine to use. The C engine is faster while the Python engine is currently more feature-complete. +- 解析引擎的使用。 尽管C引擎速度更快,但是目前python引擎功能更加完美。 converters : *dict, default ``None``* From 90d7d6b15114cbb10caa31d666f5239929125f7a Mon Sep 17 00:00:00 2001 From: liuwenli Date: Fri, 9 Aug 2019 16:09:21 +0800 Subject: [PATCH 4/5] =?UTF-8?q?=E7=94=A8=E6=88=B7=E6=8C=87=E5=8D=97io?= =?UTF-8?q?=E9=83=A8=E5=88=86=E7=BF=BB=E8=AF=91?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- docs/docs/user_guide/io.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/docs/user_guide/io.md b/docs/docs/user_guide/io.md index d54773f..d0c7133 100644 --- a/docs/docs/user_guide/io.md +++ b/docs/docs/user_guide/io.md @@ -57,7 +57,7 @@ delimiter : *str, default ``None``* delim_whitespace : *boolean, default False* -- 指定是否将空格 (e.g. ``' '`` or ``'\t'``)视为delimiter。 +- 指定是否将空格 (e.g. ``' '`` or ``'\t'``)当作delimiter。 等价于设置 ``sep='\s+'``. 如果这个选项被设置为 ``True``,就不要给 ``delimiter`` 传参了. From b1d2ec58048d681f54cd731e336abad0ba270996 Mon Sep 17 00:00:00 2001 From: liuwenli Date: Sat, 24 Aug 2019 12:03:31 +0800 Subject: [PATCH 5/5] =?UTF-8?q?20190824io.md=E9=83=A8=E5=88=86=E7=BF=BB?= =?UTF-8?q?=E8=AF=91?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- docs/docs/user_guide/io.md | 108 +++++++++++++++---------------------- 1 file changed, 42 insertions(+), 66 deletions(-) diff --git a/docs/docs/user_guide/io.md b/docs/docs/user_guide/io.md index d0c7133..041c8d3 100644 --- a/docs/docs/user_guide/io.md +++ b/docs/docs/user_guide/io.md @@ -49,7 +49,7 @@ locations), 或者具有 ``read()`` 方法的任何对象 (such as an open file sep : *str, 默认 [``read_csv()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html#pandas.read_csv)分隔符为``','``, [``read_table()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_table.html#pandas.read_table)方法,分隔符为 ``\t``* - 分隔符的使用. 如果分隔符为``None``,虽然C不能解析,但python解析引擎可解析,这意味着python将被使用,通过内置的sniffer tool自动检测分隔符, -[``csv.Sniffer``](https://docs.python.org/3/library/csv.html#csv.Sniffer). 除此之外,字符长度超过1并且不同于 ``'s+'`` 的将被视为正则表达式,并且将强制使用python解析引擎。需要注意的是,正则表达式易于忽略引用数据(主要注意转义字符的使用) 例如: ``'\\r\\t'``. +[``csv.Sniffer``](https://docs.python.org/3/library/csv.html#csv.Sniffer). 除此之外,字符长度超过1并且不同于 ``'s+'`` 的将被视为正则表达式,并且将强制使用python解析引擎。需要注意的是,正则表达式易于忽略引用数据(主要注意转义字符的使用) 例如: ``'\\r\\t'``. delimiter : *str, default ``None``* @@ -57,7 +57,7 @@ delimiter : *str, default ``None``* delim_whitespace : *boolean, default False* -- 指定是否将空格 (e.g. ``' '`` or ``'\t'``)当作delimiter。 +- 指定是否将空格 (e.g. ``' '`` or ``'\t'``)视为delimiter。 等价于设置 ``sep='\s+'``. 如果这个选项被设置为 ``True``,就不要给 ``delimiter`` 传参了. @@ -147,7 +147,7 @@ engine : *{``'c'``, ``'python'``}* converters : *dict, default ``None``* -- Dict of functions for converting values in certain columns. Keys can either be integers or column labels. +- 函数字典,用于转换某列的值。key可以是整型也可以是列标签。 true_values : *list, default ``None``* @@ -159,15 +159,13 @@ false_values : *list, default ``None``* skipinitialspace : *boolean, default ``False``* -- Skip spaces after delimiter. +- 跳过分隔符后的空格. skiprows : *list-like or integer, default ``None``* -- Line numbers to skip (0-indexed) or number of lines to skip (int) at the start -of the file. +- 在读取文件时,跳过第0行,或者指定跳过哪些行。 -- If callable, the callable function will be evaluated against the row -indices, returning True if the row should be skipped and False otherwise: +- 如果调用函数,函数将根据行索引进行计算,如果应该跳过行则返回True,否则返回False: ``` python In [5]: data = ('col1,col2,col3\n' @@ -192,26 +190,20 @@ Out[7]: skipfooter : *int, default ``0``* -- Number of lines at bottom of file to skip (unsupported with engine=’c’). +- 需跳过文件底部的行数 (engine=’c’不支持该项). nrows : *int, default ``None``* -- Number of rows of file to read. Useful for reading pieces of large files. +- 需读取的文件行数. 一般在大文件读取时使用. low_memory : *boolean, default ``True``* -- Internally process the file in chunks, resulting in lower memory use -while parsing, but possibly mixed type inference. To ensure no mixed -types either set ``False``, or specify the type with the ``dtype`` parameter. -Note that the entire file is read into a single ``DataFrame`` regardless, -use the ``chunksize`` or ``iterator`` parameter to return the data in chunks. -(Only valid with C parser) +- 以块的形式进行内部文件处理, 这样可以在解析时减少内存的使用, 但可能是混合类型推断。确保未设置任何混合类型``False``,或使用``dtype``参数指定类型。请注意,整个文件都被读入单个文件``DataFrame``,使用``chunksizeor``或者``iterator``参数以块的形式返回数据。(仅对C解析器有效) + memory_map : *boolean, default False* -- If a filepath is provided for ``filepath_or_buffer``, map the file object -directly onto memory and access the data directly from there. Using this -option can improve performance because there is no longer any I/O overhead. +- 如果参数 ``filepath_or_buffer``提供了文件路径, 那么将直接将文件映射进内存,并且直接从内存访问数据. 由于使用该项不会产生任何I/O开销,因此能够提高性能。 #### NA and missing data handling @@ -223,73 +215,58 @@ for a list of the values interpreted as NaN by default. keep_default_na : *boolean, default ``True``* -- Whether or not to include the default NaN values when parsing the data. -Depending on whether *na_values* is passed in, the behavior is as follows: - - If *keep_default_na* is ``True``, and *na_values* are specified, *na_values* - is appended to the default NaN values used for parsing. - - If *keep_default_na* is ``True``, and *na_values* are not specified, only - the default NaN values are used for parsing. - - If *keep_default_na* is ``False``, and *na_values* are specified, only - the NaN values specified *na_values* are used for parsing. - - If *keep_default_na* is ``False``, and *na_values* are not specified, no - strings will be parsed as NaN. +- 解析文件时是否包含默认NaN值,根据``na_values``的传入来,具体的解析行为如下: + - *keep_default_na* = ``True``, *na_values*被设置, *na_values* + 与默认值一起被解析为NaN。 + - *keep_default_na* = ``True``, *na_values*没有被赋值,那么仅默认值被解析为NaN。 + - *keep_default_na* = ``False``, *na_values* 被设置,仅仅 *na_values*被解析为NaN。 + - *keep_default_na* = ``False``,*na_values*没有被设置,所有值正常读取不会有值被认为是NaN。 - Note that if *na_filter* is passed in as ``False``, the *keep_default_na* and *na_values* parameters will be ignored. + 注意,如果*na_filter*被传入``False``,*keep_default_na*和 *na_values* 参数将会被忽略。 na_filter : *boolean, default ``True``* -- Detect missing value markers (empty strings and the value of na_values). In data without any NAs, passing ``na_filter=False`` can improve the performance of reading a large file. +- 检测缺失值 (空字符串和na_values的值)。在确定没有NAs时,设置 ``na_filter=False``有利于提高读取大文件的性能。 verbose : *boolean, default ``False``* -- Indicate number of NA values placed in non-numeric columns. +- 指示在非数字列中NA值的数量。 skip_blank_lines : *boolean, default ``True``* -- If ``True``, skip over blank lines rather than interpreting as NaN values. +-如果设为``True``,将跳过空白行而不是将空白行解析为``NaN``。 -#### Datetime handling +#### 日期时间处理 parse_dates : *boolean or list of ints or names or list of lists or dict, default ``False``.* -- If ``True`` -> try parsing the index. -- If ``[1, 2, 3]`` -> try parsing columns 1, 2, 3 each as a separate date -column. -- If ``[[1, 3]]`` -> combine columns 1 and 3 and parse as a single date -column. -- If ``{'foo': [1, 3]}`` -> parse columns 1, 3 as date and call result ‘foo’. -A fast-path exists for iso8601-formatted dates. +- ``True`` -> 解析索引. +- If ``[1, 2, 3]`` -> 解析1,2,3列的值作为独立的日期列. +- If ``[[1, 3]]`` -> 解析1,3列为一个日期列使用. +- If ``{'foo': [1, 3]}`` -> 解析1,3列为一个日期列并取名为‘foo’. +iso8601-formatted dates存储在fast-path. infer_datetime_format : *boolean, default ``False``* -- If ``True`` and parse_dates is enabled for a column, attempt to infer the datetime format to speed up the processing. +- 若设为 ``True`` 且设置了parse_dates,尝试推断日期时间格式以加快处理速度。 keep_date_col : *boolean, default ``False``* -- If ``True`` and parse_dates specifies combining multiple columns then keep the original columns. +- 若设为 ``True``且设置 parse_dates来组合多列,那么可保持原始列。 date_parser : *function, default ``None``* -- Function to use for converting a sequence of string columns to an array of -datetime instances. The default uses ``dateutil.parser.parser`` to do the -conversion. pandas will try to call date_parser in three different ways, -advancing to the next if an exception occurs: 1) Pass one or more arrays (as -defined by parse_dates) as arguments; 2) concatenate (row-wise) the string -values from the columns defined by parse_dates into a single array and pass -that; and 3) call date_parser once for each row using one or more strings -(corresponding to the columns defined by parse_dates) as arguments. +- 定义的函数被用来将字符串列转换为时间序列。默认是使用函数 ``dateutil.parser.parser``进行转换。pandas将尝试以以下三种方式调用date_parser指定的函数,如果发生异常则前进到下一个: 1) 将一个或多个数组(由parse_dates定义的)作为参数传递; 2) 将parse_dates定义的列中的字符串值连接(逐行)到一个数组中并传递; and 3) 使用一个或多个字符串(对应于parse_dates定义的列)作为参数,为每一行调用date_parser一次. dayfirst : *boolean, default ``False``* -- DD/MM format dates, international and European format. +- DD/MM 日期格式, 国际和欧洲格式。 cache_dates : *boolean, default True* -- If True, use a cache of unique, converted dates to apply the datetime -conversion. May produce significant speed-up when parsing duplicate -date strings, especially ones with timezone offsets. +- True, 使用独立的缓存,在解析日期字符串时会显著加速,尤其是具有时区偏移的字符串。 -*New in version 0.25.0.* +*版本0.25.0中的新功能.* #### Iteration @@ -301,27 +278,26 @@ chunksize : *int, default ``None``* - Return TextFileReader object for iteration. See [iterating and chunking](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-chunking) below. -#### Quoting, compression, and file format +#### 引用, 压缩, 文件格式 compression : *{``'infer'``, ``'gzip'``, ``'bz2'``, ``'zip'``, ``'xz'``, ``None``}, default ``'infer'``* -- For on-the-fly decompression of on-disk data. If ‘infer’, then use gzip, -bz2, zip, or xz if filepath_or_buffer is a string ending in ‘.gz’, ‘.bz2’, -‘.zip’, or ‘.xz’, respectively, and no decompression otherwise. If using ‘zip’, -the ZIP file must contain only one data file to be read in. -Set to ``None`` for no decompression. +- 用于磁盘数据即时解压缩。 若设为‘infer’则使用 gzip, +bz2, zip,或 xz。如果filepath_or_buffer是以‘.gz’, ‘.bz2’, +‘.zip’, 或‘.xz’结尾,否则不进行解压。若设置为 ‘zip’, +ZIP文件中有且只有一个数据文件。若设置为``None``则不进行解压缩。 -*New in version 0.18.1:* support for ‘zip’ and ‘xz’ compression. +*0.18.1版本:* 支持 ‘zip’ 和 ‘xz’ 压缩。 -*Changed in version 0.24.0:* ‘infer’ option added and set to default. +*0.24.0版本:* 添加了‘infer’选项并设置为默认选项。 thousands : *str, default ``None``* -- Thousands separator. +- 千位分隔符。 decimal : *str, default ``'.'``* -- Character to recognize as decimal point. E.g. use ',' for European data. +- 要识别为小数点的字符。 E.g. 欧洲数据通常 ','为小数点字符。 float_precision : *string, default None*