Skip to content

documentcloud/cloud-crowd

Repository files navigation

=                                                                               
           _  _                                                                
          ( `   )_                                                             
         (    )    `)                                                          
       (_   (_ .  _) _)                                                        
                                      _                                        
                                     (  )                                      
      _ .                         ( `  ) . )                                   
    (  _ )_                      (_, _(  ,_)_)                                 
  (_  _(_ ,)                                                                   
                                                                               
           _  _               ___ _             _  ___                   _     
          ( `   )_           / __| |___ _  _ __| |/ __|_ _ _____ __ ____| |    
         (    )    `)       | (__| / _ \ || / _` | (__| '_/ _ \ V  V / _` |    
       (_   (_ .  _) _)      \___|_\___/\_,_\__,_|\___|_| \___/\_/\_/\__,_|    
                                                                               
                                                     _                         
                                                    (  )                       
                  _, _ .                         ( `  ) . )                    
                 ( (  _ )_                      (_, _(  ,_)_)                  
               (_(_  _(_ ,)                                                    
                                                                               
                                                                               
                                                                               
  ~ CloudCrowd ~

    * Parallel processing for the rest of us
    * Write your scripts in Ruby
    * Works with Amazon EC2 and S3
    * split -> process -> merge
    * As easy as `gem install cloud-crowd`

    Well-suited for:
    
    * Generating or resizing images.
    * Encoding video.
    * Running text extraction or OCR on PDFs.
    * Migrating a large file set or database.
    * Web scraping.
    
    
  ~ Documentation ~
  
    Wiki: https://github.com/documentcloud/cloud-crowd/wiki
    Rdoc: http://www.rubydoc.info/github/documentcloud/cloud-crowd
  
  
  ~ Getting started ~
  
    # Install the gem.
    
      >> sudo gem install cloud-crowd
    
    # Install the CloudCrowd configuration files to a location of your choosing.
    
      >> crowd install ~/config/cloud-crowd
    
    # Now, you can use the full complement of `crowd` commands from inside of
    # this configuration directory. To see the available commands:
    
      >> crowd --help
    
    # Edit the configuration files to your satisfaction, add AWS credentials, 
    # and then load the CloudCrowd schema into your configured database.
    
      >> cd ~/config/cloud-crowd
      >> mate config.yml
      >> mate database.yml
      >> [create the database you just configured...]
      >> crowd load_schema
    
    # Write your actions, and install them into the 'actions' subdirectory.
    # CloudCrowd comes with a few default actions as an example.
    
    # To launch the central server (make sure that you include its location
    # in config.yml):
    
      >> crowd server
    
    # The configuration folder also includes 'config.ru', which can be used by
     # any Rack-compliant webserver to run your central server.
    
    # Then, to launch a node of workers:
    
      >> crowd node
    
    # To spin up remote nodes, install the 'cloud-crowd' gem and copy over
    # your configuration directory. Run `crowd node`, and the remote machines
    # will register with the central server, becoming available for processing.
    
    # At this point you can visit your Operations Center at localhost:9173 to 
    # view all of your nodes, ready for action.