Merge branch 'release/0.2.0'

fgmacedo · Oct 3, 2013 · ebdf77f · ebdf77f
2 parents b56d801 + 89b7186
commit ebdf77f
Show file tree

Hide file tree

Showing 33 changed files with 1,032 additions and 966 deletions.
diff --git a/.coveragerc b/.coveragerc
@@ -1,10 +1,12 @@
 [run]
 include=*raspador*
-omit=tasks.py
+omit=tasks.py,*ordereddict*
 
 [report]
 exclude_lines =
 
     raise NotImplementedError
 
-    if __name__ == '__main__':
+    if __name__ == '__main__':
+
+    except ImportError:
diff --git a/.gitignore b/.gitignore
@@ -5,6 +5,7 @@ MANIFEST
 
 # Virtualenvs
 env*
+.tox
 
 # Distribute
 build

diff --git a/.travis.yml b/.travis.yml
@@ -6,7 +6,6 @@ python:
   - "3.3"
   - "pypy"
 install:
-  - "pip install -r requirements_dev.txt --use-mirrors"
   # For Python 2.6 support
   - "pip install ordereddict --use-mirrors"
   - "pip install coveralls"

diff --git a/README.rst b/README.rst
@@ -15,90 +15,99 @@ raspador
         :target: https://crate.io/packages/raspador/
 
 
-Biblioteca para extração de dados em documentos semi-estruturados.
+Library to extract data from semi-structured text documents.
 
-A definição dos extratores é feita através de classes como modelos, de forma
-semelhante ao ORM do Django. Cada extrator procura por um padrão especificado
-por expressão regular, e a conversão para tipos primitidos é feita
-automaticamente a partir dos grupos capturados.
+It's best suited for data-processing in files that do not have a formal
+structure and are in plain text (or that are easy to convert). Structured files
+like XML, CSV and HTML doesn't fit a good use case for raspador, and have
+excellent alternatives to get data extracted, like lxml_, html5lib_,
+BeautifulSoup_, and PyQuery_.
 
+The extractors are defined through classes as models, something similar to the
+Django ORM. Each field searches for a pattern specified by the regular
+expression, and captured groups are converted automatically to primitives.
 
-O analisador é implementado como um gerador, onde cada item encontrado pode ser
-consumido antes do final da análise, caracterizando uma pipeline.
+The parser is implemented as a generator, where each item found can be consumed
+before the end of the analysis, featuring a pipeline.
 
+The analysis is forward-only, which makes it extremely quick, and thus any
+iterator that returns a string can be analyzed, including infinite streams.
 
-A análise é foward-only, o que o torna extremamente rápido, e deste modo
-qualquer iterador que retorne uma string pode ser analisado, incluindo streams
-infinitos.
+.. _lxml: http://lxml.de
+.. _html5lib: https://github.com/html5lib/html5lib-python
+.. _BeautifulSoup: http://www.crummy.com/software/BeautifulSoup/
+.. _PyQuery: https://github.com/gawel/pyquery/
 
 
-Com uma base sólida e enxuta, é fácil construir seus próprios extratores.
+Install
+=======
 
-Além da utilidade da ferramenta, o raspador é um exemplo prático e simples da
-utilização de conceitos e recursos como iteradores, geradores, meta-programação
-e property-descriptors.
+raspador works on CPython 2.6+, CPython 3.2+ and PyPy. To install it, use::
 
+    pip install raspador
 
-Compatibilidade e dependências
-===============================
+or easy install::
 
-O raspador é compatível com Python 2.6, 2.7, 3.2, 3.3 e pypy.
+    easy_install raspador
 
-Desenvolvimento realizado em Python 2.7.5 e Python 3.2.3.
 
-Não há dependências externas.
+From source
+-----------
 
-.. note:: Python 2.6
+Download and install from source::
 
-    Em Python 2.6, a biblioteca `ordereddict
-    <https://pypi.python.org/pypi/ordereddict/>`_ é necessária.
+    git clone https://github.com/fgmacedo/raspador.git
+    cd raspador
+    python setup.py install
 
-    Você pode instalar com pip::
 
-        pip install ordereddict
+Dependencies
+------------
 
-Testes
-======
+There are no external dependencies.
 
-Os testes dependem de algumas bibliotecas externas:
-
-.. code-block:: text
+.. note:: Python 2.6
 
-    coverage==3.6
-    nose==1.3.0
-    flake8==2.0
-    invoke==0.5.0
+    With Python 2.6, you must install `ordereddict
+    <https://pypi.python.org/pypi/ordereddict/>`_.
 
+    You can install it with pip::
 
-Você pode executar os testes com ``nosetests``:
+        pip install ordereddict
 
-.. code-block:: bash
+Tests
+======
 
-    $ nosetests
+To automate tests with all supported Python versions at once, we use `tox
+<http://tox.readthedocs.org/en/latest/>`_.
 
-E adicionalmente, verificar a compatibilidade com o PEP8:
+Run all tests with:
 
 .. code-block:: bash
 
-    $ flake8 raspador testes
+    $ tox
 
-Ou por conveniência, executar os dois em sequência com invoke:
+Tests depend on several third party libraries, but these are installed by tox
+on each Python's virtualenv:
 
-.. code-block:: bash
+.. code-block:: text
 
-    $ invoke test
+    nose==1.3.0
+    coverage==3.6
+    flake8==2.0
 
 
-Exemplos
+Examples
 ========
 
-Extrator de dados em logs
--------------------------
+Extract data from logs
+----------------------
 
 .. code-block:: python
 
+    from __future__ import print_function
     import json
-    from raspador import Analizador, CampoString
+    from raspador import Parser, StringField
 
     out = """
     PART:/dev/sda1 UUID:423k34-3423lk423-sdfsd-43 TYPE:ext4
@@ -107,22 +116,23 @@ Extrator de dados em logs
     """
 
 
-    class AnalizadorDeLog(Analizador):
-        inicio = r'^PART.*'
-        fim = r'^PART.*'
-        PART = CampoString(r'PART:([^\s]+)')
-        UUID = CampoString(r'UUID:([^\s]+)')
-        TYPE = CampoString(r'TYPE:([^\s]+)')
+    class LogParser(Parser):
+        begin = r'^PART.*'
+        end = r'^PART.*'
+        PART = StringField(r'PART:([^\s]+)')
+        UUID = StringField(r'UUID:([^\s]+)')
+        TYPE = StringField(r'TYPE:([^\s]+)')
 
 
-    a = AnalizadorDeLog()
+    a = LogParser()
 
-    # res é um gerador
-    res = a.analizar(linha for linha in out.splitlines())
+    # res is a generator
+    res = a.parse(iter(out.splitlines()))
 
-    print (json.dumps(list(res), indent=2))
+    out_as_json = json.dumps(list(res), indent=2)
+    print (out_as_json)
 
-    # Saída:
+    # Output:
     """
     [
       {

diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -1,17 +1,17 @@
-Documentação do raspador
-========================
+.. _topics-index:
+
+================================
+Raspador |version| documentation
+================================
 
-Conteúdo:
 
-.. toctree::
-   :maxdepth: 2
 
-   raspador
+.. toctree::
+    :hidden:
 
+   intro/overview
+   intro/install
+   intro/tutorial
 
-Índices e tabelas
-==================
 
-* :ref:`genindex`
-* :ref:`modindex`
-* :ref:`search`
+Looking for specific information? Try the :ref:`genindex` or :ref:`modindex`.
diff --git a/docs/source/intro/install.rst b/docs/source/intro/install.rst
@@ -0,0 +1,28 @@
+
+*******
+Install
+*******
+
+
+Package managers
+================
+
+You can install using pip or easy_install.
+
+PIP::
+
+    pip install raspador
+
+Easy install::
+
+    easy_install raspador
+
+
+From source
+===========
+
+Download and install from source::
+
+    git clone https://github.com/fgmacedo/raspador.git
+    cd raspador
+    python setup.py install
diff --git a/docs/source/raspador.rst b/docs/source/raspador.rst
@@ -1,30 +1,31 @@
 
+========
 raspador
 ========
 
 O módulo raspador fornece estrutura genérica para extração de dados a partir de
 arquivos texto semi-estruturados.
 
 
-Analizador
+Parser
 ----------
 
-.. automodule:: raspador.analizador
+.. automodule:: raspador.parser
     :members:
 
 
 Campos
 ------
 
-.. automodule:: raspador.campos
+.. automodule:: raspador.fields
     :members:
     :undoc-members:
 
 
-Coleções
---------
+Item
+----
 
-.. automodule:: raspador.colecoes
+.. automodule:: raspador.item
     :members:
     :undoc-members:
 
diff --git a/raspador/__init__.py b/raspador/__init__.py
@@ -1,9 +1,10 @@
 # flake8: noqa
 
-from .analizador import Analizador, Dicionario
-from .campos import CampoBase, CampoString, CampoNumerico, \
-    CampoInteiro, CampoData, CampoDataHora, CampoBooleano
+from .parser import Parser
+from .item import Dictionary
+from .fields import BaseField, StringField, FloatField, BRFloatField, \
+    IntegerField, DateField, DateTimeField, BooleanField
 
-from .decoradores import ProxyDeCampo, ProxyConcatenaAteRE
+from .decorators import FieldProxy, UnionUntilRegexProxy
 
 from .cache import Cache
-Original file line number
+Diff line change
@@ Expand Up / @@ -5,6 +5,7 @@ MANIFEST @@
     # Virtualenvs
     env*
+    .tox
     # Distribute
     build
@@ Expand Down @@