Skip to content
Patrick Harrison edited this page Sep 2, 2014 · 42 revisions

Setup

This section will walkthrough a basic setup for Scumblr on a base Ubuntu 14.04 system. This guide assumes you have an Ubuntu system setup and available to go.

Install Prerequisites

From the command line:

sudo apt-get update
sudo apt-get -y install git libxslt-dev libxml2-dev build-essential bison openssl zlib1g libxslt1.1 libssl-dev libxslt1-dev libxml2 libffi-dev libxslt-dev autoconf libc6-dev libreadline6-dev zlib1g-dev libtool libsqlite3-dev libcurl3 libmagickcore-dev ruby-build libmagickwand-dev imagemagick bundler

Install Rbenv/Ruby

From the command line:

cd ~
    git clone git://github.com/sstephenson/rbenv.git .rbenv
echo 'export PATH="$HOME/.rbenv/bin:$PATH"' >> ~/.bashrc
echo 'eval "$(rbenv init -)"' >> ~/.bashrc
exec $SHELL

git clone git://github.com/sstephenson/ruby-build.git ~/.rbenv/plugins/ruby-build
echo 'export PATH="$HOME/.rbenv/plugins/ruby-build/bin:$PATH"' >> ~/.bashrc
exec $SHELL

rbenv install 2.0.0-p481
rbenv global 2.0.0-p481
ruby -v

Install Ruby on Rails

From the command line:

gem install bundler --no-ri --no-rdoc
rbenv rehash
gem install rails -v 4.0.9 

Install Application Dependencies

sudo apt-get install redis-server
gem install sidekiq
rbenv rehash

Setup Application

From the command line:

git clone https://github.com/Netflix/Scumblr.git
cd Scumblr
bundle install
rake db:create
rake db:schema:load

Create an Admin User

From the command line from the Scumblr root folder:

../.rbenv/versions/2.0.0-p481/bin/rails c

In the console:

user = User.new
user.email = "<Valid email address>"
user.password = "<Password>"
user.password_confirmation = "<Password>"
user.admin = true
user.save

Run Scumblr

From the command line from the Scumblr root folder:

redis-server &
../.rbenv/shims/bundle exec sidekiq -l log/sidekiq.log &
../.rbenv/shims/bundle exec rails s &

Now connect to your server on port 3000

Additional Configuration

This section will discuss additional items that should be configured before using Scumblr in production.

Scumblr Configuration

Scumblr integrates with other services and APIs in order to find results and generate screenshots. Locations and API keys should be placed in config/initializers/scumblr.rb. A sample of this file is located at config/initializers/scumblr.rb.sample. In this file you can set:

  • The URL where Sketchy can be accessed (if using Sketchy to generate screenshots)
  • Keys, Secrets, and IDs for API authentication/authorization (Google, Apple Store, eBay, Twitter, etc.)

Examples for each configuration option for built-in search providers are located in the config/initializer/scumblr.rb.sample file. Simply rename this file to scumblr.rb and add the appropriate keys/values.

Secrets

First you should generate secrets for the Rails Application and Devise:

rake secret

Run this command twice. Put one secret on line 7 of config/initializers/secret_token.rb. Put the other on line 7 of config/initializers/devise.rb

Email Notifications/Routing

If you plan to use email notifications, you should ensure the default URL options are set correctly. This can be done in the config/environments/* files. Placeholders are located at the end of the production.rb and test.rb files. Example:

Rails.application.routes.default_url_options[:host] = "scumblr.com"
Rails.application.routes.default_url_options[:protocol] = "https"

Automatic Syncing

In order to allow Scumblr to automatically run searches and send email notifications, you may want to setup cron jobs using the appropriate rake tasks:

rake sync_all

This take will run all the searches and import any new results. It will also generate screenshots of each result, if the integration with Sketchy is configured.

Other Setup

There are other items that should be considered before deploying Scumblr in a "production" environment. These include:

  • Choosing an appropriate web server to front the Rails application (We use Unicorn/Nginx)
  • Adjusting redis configuration to meet your needs
  • Reviewing and adjusting the Devise configuration if needed
  • Standard hardening for the Ubuntu host

You may also want to review the app/models/ability.rb file. This file specifies the authorization roles in place. At Netflix, we use a simple admin/normal user scheme. This may not be appropriate for all use cases.

Using Scumblr

This section will discuss using basic use of Scumblr. We will assume that you've gotten Scumblr up and running, including Redis and Sidekiq, and have left the default search providers in place.

Searches

Searches are task that run in order to look for results to import into Scumblr. Searches rely on a Search Provider--a plugin module that knows how to take a set of options and find and return results. Scumblr includes a number of search providers by default. These include:

  • Google
  • YouTube
  • Facebook
  • Apple AppStore
  • Google Play Store
  • eBay
  • Twitter

For now we'll stick with using the built-in search providers, but adding a new search provider is relatively straightforward and will be discussed later. We will assume you have generated the appropriate API keys and added them to the configuration file as discussed in "Scumblr Configuration" above.

Creating a New Search

In order to create a new search, you will need to be logged in with an account with admin rights. If you'd like normal users to have the ability to create a new search, you can make the appropriate modifications to the /app/models/ability.rb file.

From any page in Scumblr:

  1. On the top menu click "Searches"
  2. Click "New Search"
  3. Give your search an identifiable name
  4. Add a query string. We're going to use "netflix scumblr"
  5. Select a Search Provider. We're going to use Google
  6. If additional options are available, they will appear inline. We'll leave the additional options blank for now.
  7. Add any tags you would like to be automatically applied to these results. We'll add one tag, "Scumblr"
  8. If you'd like, add a verbose description to the search
  9. When done click "Create Search"

Running a Search

In order to get results, the search needs to be run. This can be done in a number of ways:

  • An individual search can be run through the web interface
  • All searches can be run at the same time through the web interface
  • All searches can be run at the same time using a rake task

We'll discuss each of these methods in this section.

Running an Individual Search

  1. From anywhere in Scumblr, click "Searches" on the top menu
  2. Click on the Name of the Search you'd like to run
  3. Click "Run Now"

Your search should run and once complete you should see results on the results page (if any we're found!)

Running All Seaches

  1. From anywhere in Scumblr, click "Searches" on the top menu
  2. Click "Run All Searches"

All the searches you have configured should run. Once complete you should see identified results on the results page.

Running All Searches From a Rake Task

  1. From the command line at the Scumblr root path, run:

     rake sync_all
    

Thils will run all the searches you have configured. Once complete you should see results on the results page.

Results

Results are the core model in Scumblr. A result is represent a URL that has been entered manually or imported using a Search Provider. This section will discuss how to view, inspect, and action results.

Result List

The result list is the main page for the application. This page shows a summary of all the results that have been identified and also searching/filtering, sorting, viewing basic details, and taking basic actions on the results.

The Result List page is is the first page you'll arrive at after logging in. It can also be reached by clicking "Results" on the top menu.

Navigating the Result List

There are two main sections on the Result List page: the results list, and the filter/action panel.

Result List

The results list is the main part of this screen and consists of the table on the left side of the page. In this table, each row represents a result. From here you can view the title of the result, the status (if one is given/available), the domain, when the result was first identified, and when it was last seen in a search.

There is also a link that will take you to the URL represented by the result. Note: This open a new tab and take you outside of Scumblr. Always be careful when visiting random sites on the Internet.

On the right side of the result list is a "Show" button clicking this button will take you to the detail page for this result. This page will be discussed in the "Vewiing Results" section below.

If a result has screenshots attached (either manually added or synced with Sketchy), a small monitor icon will appear to the left of the result's title. Hovering over this icon will show a thumbnail image of the first screenshot. Clicking the icon will allow seeing a larger gallery view of all the screenshots attached to the result.

Filter Panel

On the right side of the results list page is the filter/action panel. This section allows performing granular filtering for specific results.

From here it is possible to serach based on:

  • URL
  • Title
  • Tags
  • Assignee
  • Status
  • Search
  • Workflow Flag
  • Worklow Stage

It is also possible to incidcate whether "Closed" results should be included in the list. A closed result is an result whose status has been indicated to be a closed status. (More about Statuses in the relevant section below.) In order to perform a search, fill out the fields you're interested in and click "Search". Multiple filter attributes can be used and will be treated as "and" conditions. For example if you search for "facebook" in the URL field and "Investigating" in the Status field, you will get all the results with facebook in the URL that are also currently in the "Investigating" status.

If multiple entries are searched in the multi-search boxes (Tags, Assignee, Status, Search, Workflow Flag, and Workflow Stage), these will be treated as an "or" condition. For example if you search for "John" and "Cindy" in the Assignee field (assuming these were users of the system), you will get a list of all results assigned to either John OR Cindy.

Important: Filters will persist between requests. In other words if you navigate away from the results list (into a result's detail page for example), when you return to the result's list your filter will remain intact. If you want to remove your filter and see all results click "Clear Search". When results are filtered this will be indicated in the result count displayed at the top of the result list table. For example, the result count may indicate "Displaying 1 result (1000 results filtered)". This would mean that 1 result meets your search criteria while another 1000 have been filtered from view.

Action Panel

The action panel allows performing certain actions on one or more results. The action panel appears at the right side of the screen, but is not visible until one or more results are selected with the checkboxes on the left side of the result. Actions that can be taken with the action panel include:

  • Changing the Status
  • Adding Tags
  • Setting the Assignee
  • Generating Screenshot (if Sketchy is enabled)

To use the action panel, first select one or more results. You can also use the checkbox in the header of the results list. This will select all results on the current page of results. If you'd like you can select all results that meet the current filter (on all pages). This is done by selecting the checkbox in the header of the results list and then clicking "Select all n results that match this filter."

Once the appropriate results are selected, simply select the options on the action panel that you'd like to change. You can perform multiple changes at one time (changing status, adding tags, setting assignee, generating screenshots). You can also add multiple tags at once by adding multiple tags to the tag field.

To generate screenshots, use the right side of the update button (area with the arrow) and select either "Update and Generate Screenshot" or "Update and Force Generate Screenshot". "Force Generate" will add a new screenshot for all selected results, even if a screenshot already exists. "Generate" will only add a new screenshot to selected results without an existing attachment.

Creating Results

Results can be manually created by clicking the "New Result" button on the bottom of the Result List page. To create a results you'll need to provide a name/title for the result, as well as a URL.

Result Details

The result details page contains additional information and actions that can be taken on individual results. From here one can:

  • View the details of the results
  • View/manually add screenshots
  • Change the status
  • Add comments
  • Changes the assignee
  • Subscribe to updates
  • Add/Remove tags
  • Add/Update Workflows

This section will walkthrough using the results detail page.

Status

At the top of the result view is the status bar. This indicates the current status of the result. A new statuscan be selected by clicking on the desired status. This is meant for a simple, high-level status that can be used to filter/search for results and indicate where in the process the result currently resides.

Details

Below the status bar are additional details about the results. These include the title, when the result was first identified, and a link to the original result page.

Searches

The searches section lists all the searches which have found the result page. If more than one search identifies the same URL, they will all be listed here. The information listed here includes the name of the search, the providers, the query used, and when the search first identified the result.

Attachments

The attachments section show any files attached to the result and allows triggering screenshot generation (through Sketchy if available) or manually uploading a file. New screenshots can be uploaded by rolling over the large "+" placeholder and clicking "Upload". A screenshot can be requested from Sketchy by rolling over the large "+" placeholder and clicking "Generate"

Rolling over an existing screenshot will provide options to "View" the result full size or "Delete" the screenshot.

Comments

At the bottom of the result page is the comments section. Comments can be added using the comment form at the top, or existing comments can be replyed to by hitting the "Reply" button under the comment. The small -/+ to the left of the commenter's name allows collasping/expanding the comment thread.

Assignee

On the right side of the page, the result's assignee can be viewed/set. To change the assignee click the pencil icon and select a new user.

Note: a user must exist on the system to be used as an assignee.

Subscriptions

If Scumblr is configured to send email messages, it is possible to subscribe to a result to receive an email when updates occur. To do this, click the "Subscribe" link. Once subscribed you can unsubscribe by clicking "Unsubscribe".

The number to the right of the "Subscribe/Unsubscibe" link indicates how many users are currently subscribed. Clicking this number will show a list of subscribed users and allow adding a new subscriber (including someone besides yourself) or removing existing subscribers.

Tags

If the result has had tags applied, they will appear in the tags section. Tags can be removed by clicking the "X" inside the tag. New tags can be added by clicking the "+" and typing/selecting the tag you'd like to apply.

Workflow Flags

The workflow flags section of the page shows any workflows that have been added to the result. Results can be flagged for multiple workflows ("Investigate" and "Takedown" for example), however they can only utilize a single copy of an individual workflow (You cannot apply two "Investigate" workflows, for example). All workflows that have been applied to the current result will be listed and the current stage will be shown in the drop down.

To add a new workflow flag to the result, click the "+", choose a workflow flag, and click submit. If any options are required to add the workflow, a form will appear for you to fill out. Click "Add workflow" when complete.

To change the stage of an existing workflow, click the associate dropdown and select the new stage. If any options are required to add the workflow, a form will appear for you to fill out. Click "Add workflow" when complete.

Saved Filters

Saved filtered allow saving a set of criteria used to filter results so it can easily be accessed and shared. Additionally, saved filters can have a list of subscribers, who can receive email updates when new matching results are identified.

Saved Filters are created from the results list page. To create a saved filter perform a search as normal. You can click "Search" to preview the results if you'd like, but this is not necessary. When you're ready to save your filter, click the save button. This will allow to to fill in additional options about the search include a name which will be used to refer to the saved filter, a list of subscribers to notify when new matching results are identified, and whether you want to share the filter with other users of the application. When you're happy with your saved filter, click "Create Saved Filter".

Once a saved filter has been saved, it can be access from the Saved Filters menu at the top of the page. Clicking on the filter name will take you to the current results for that saved filter.

To modify an existing filter, click "Manage" in the Saved Filters menu. Saved Filters can be modified by clicking the "Edit" button or deleted by clicking "Delete".

Pulbic filters created by other users can be added to your Saved Filters menu. To do this, first click "Manage" under the "Saved Filters" menu at the top of the page. From here, at the bottom of this page will be a list of public filters, if any exist. Clicking "Add" next to any of any of these filters will add it to your list (under Saved Filters, Public Filters).

Statuses

Statuses allow a flexible way of tracking the high-level state of a given result. Statuses can be created/editted from the "Admin>Statuses" menu at the top of the page. Created/editting statuses required admin privileges on the system.

Statuses have three fields:

  • Name
  • Closed
  • Invalid

The Closed flag indicates that results in this status should be consided closed. Results that have beed moved into a closed status will be excluded from the results list by default.

The Invalid flag indicates the result was invalid--meaning it's not something you were looking for from the given search. For example, you may have a "False Positive" status. The invalid flag allows tracking which searches are producing high numbers of results that you're not actually interested in.

Dashboard

Scumblr ships with a simple dashboard which is available from the top menu. This page shows the following information:

  • Time-series chart showing number of results identified per page
  • A breakdown of results by status
  • A breakdown of the number of results assigned to a workflow
  • For each search
    • The total number of results found
    • The number of results found in the last 24 hours
    • The number of results found in the last 7 days
    • A 30 day trend line
    • The number of results assigned to each workflow and/or to no workflow

Extending Scumblr

Scumblr can be extended in a variety of ways including adding new workflows and new search providers. These capabilities will be discussed in this section.

Workflow

Scumblr uses the Workflowable gem (www.github.com/netflix/workflowable) to allow defining flexible workflows for actioning results. This section will give a brief overview of workflows. For more information about setting up and defining workflows, see the Workflowable wiki.

In Scumblr, results can be assigned one or more workflows. This is done by adding a workflow flag from the Result view page (as discussed in the Workflow Flags section above). Once a workflow flag has been added, the result can be moved through the various phases from the result view page.

Workflow Concepts

This section will give a brief overview of the concepts used by the Workflowable gem.

Workflow: A process, generally with multiple, ordered stages

Stage: A state within the process

Initial stage: The state the workflow starts in

Action: A function that gets run as part of adding a workflow to an item or moving between stages

Before action: An action that gets run when transitioning into a specific stage

After action: An action that gets run when transitioning out of a specific stage

Global action: An action that gets run when between any stage, including moving into the initial stage (i.e. when a result is first flagged)

Workflow Setup

In order to use workflows, we first need to create one. This can be done from the Workflowable admin page (available at /workflowable or from the Admin menu). Please see the workflowable wiki for detailed instructions on setting up a workflow.

Once the workflow has been setup from the Workflowable admin page, we also need to create a Workflow Flag that can be used in Scumblr. This can be done from the Flag admin page (in the admin menu).

Workflow Actions

Workflowable allow defining custom actions that can be run when flagging a result with a workflow and/or when moving between stages. These actions are developed as classes and need to conform to the API defined by workflowable (see the Workflowable wiki). Action classes should be stored in the lib/workflowable/actions folder.

Creating a Workflow Flag

You will need to have already created the workflow through the workflowable admin page. Once this is complete, go to the Flag admin page and click "New Flag". Here you can choose a name for the workflow (which will be used in Scumblr), a description for the workflow, and add any subscribers who, if Scumblr is setup to send email, will receive notifications when a result is flagged for this workflow. You will additionally need to choose the workflow to associate with the flag by choosing it from the dropdown box. This box is populated based on the workflows created in the workflowable admin page.

Once the flag has been created it is ready to be used in Scumbr. See the "Workflow Flags" section above for instructions on how to assign and use workflows from the result view page.

Search Providers

It is possible to define new search providers that are capable of searching for and identifying results. This is done by implemnting a class that conform to the search provider API and placing the provider file in lib/providers. All search providers should be a subclass of SearchProvider::Provider.

Search Provider Methods

A search provider generally needs to implement 4 functions:

self.provider_name: This function allow Scumblr to retrieve and present a readable name for the provider

self.options: This function allow defined what options can/should be passed into the provider. These options are defined when creating the Search

initialize(query, options={}): This function is often overridden in order to retrieve an access token from the configuration or otherwise setup the provider

run: This function performs the search and returns the results

Provider Name Function

Parameters: None

Return Value: String

This simple function returns the provider name when called Scumblr needs to get the human readable name for the provider. For example SeachProvider::AppStore.provider_name will return "Apple Store Search"

Options Function

Parameters: None

Return Value: options (hash)

This function will return a hash that defines which options can/should be passed in when running a search. Options are defined when an individual search is created.

Options Hash

key: (symbol) A key used to identify the option value. Each option must have a unique key

value: (hash) This will contain information about the option:

value[:name] (string) The name of the option

value[:description] (string) A description of the option

value[:requred] (boolean) Is the option required

Example
{
	:results=>{name: "Max results (200)", description: "Specifiy the number of results to retrieve", required: false},
	:require_all_terms=>{name: "Require all terms", description: "If set to \"true\" will ensure all search terms contained in result", required: false}
}

If self.options returns this hash, the provider will have two optional options that can be defined when creating a search: "Max results (200)" and "Require all terms"

Initialize Funciton

Parameters: query (string), options (hash, optional)

Return Value: None

The default initializer will set the query to @query and the options to @options. These values (@query, @options) can be accessed from the run function. If the initializer is overridden, the overriding function can call "super" to setup these values, or overriding function can setup the appropriate values for the run function itself.

This function is often used to pull API keys or other credentials from the config file. If using the convention used for the out of the box search providers, this data should be put in config/initializers/scumbr.rb and would be accessed using:

Rails.configuration.try(:<OPTION NAME>)

Run Function

Parameters: None

Return Value: results (array of results hashes)

The run function should perform the search and return any identified results. If using the default initalizer, the query to search will be stored in @query and the options in @options. Options can be accessed using the same key as that used in the options hash. So, for example, if the options hash defined an options ":results", this value would be retrieved using @options[:results]

Results Hash

title: (string) The title of the result

url: (string) The url for the result

domain: (string) The result's domain

metadata: (hash) An (optional) hash of additional metadata to store with the result

Example
[ 
	{title: "Netflix", url: "http://www.netlfix.com", domain: "www.netflix.com", metadata: {priority: 1}
	{title: "Netflix Blog", url: "http://blog.netlfix.com/page1", domain: "blog.netflix.com", metadata: {priority: 2}		
]

This example includes 2 results: one for the Netflix main site and one for the Netflix Blog. These results would be imported into Scumbr, if not previously identified.

Sample Search Provider

Below is a simple sample of a search provider class. This class will not perform a search, but should provide guidence on what a Search Provider class should look like:

class SearchProvider::FakeSearch < SearchProvider::Provider
	def self.provider_name
      "Fake Search"
	end

    def self.options
	  	{
    		:max_result=>{name: "Max Results", description: "The maximum number of results to retrieve", required: false},
   		}
	end

	def initialize(query, options={})
    	super
    	@access_token = Rails.configuration.try(:simple_search_token)
	end

	def run
    	if(@access_token.blank?)
        	Rails.logger.error "Unable to run Simple Search. Please define an access key as simple_search_token in the Scumblr initializer."
        	return             
    	end
    
    	# RUN SEARCH HERE

		# Fake results
		result = [ 
			{title: "Netflix", url: "http://www.netlfix.com", domain: "www.netflix.com", metadata: {priority: 1}
			{title: "Netflix Blog", url: "http://blog.netlfix.com/page1", domain: "blog.netflix.com", metadata: {priority: 2}
		]
	
	    return results
	end
end
Clone this wiki locally