-
Notifications
You must be signed in to change notification settings - Fork 319
Home
This section will walkthrough a basic setup for Scumblr on a base Ubuntu 14.04 system. This guide assumes you have an Ubuntu system setup and available to go.
From the command line:
sudo apt-get update
sudo apt-get -y install git libxslt-dev libxml2-dev build-essential bison openssl zlib1g libxslt1.1 libssl-dev libxslt1-dev libxml2 libffi-dev libxslt-dev autoconf libc6-dev libreadline6-dev zlib1g-dev libtool libsqlite3-dev libcurl3 libmagickcore-dev ruby-build libmagickwand-dev imagemagick bundler
From the command line:
cd ~
git clone git://github.com/sstephenson/rbenv.git .rbenv
echo 'export PATH="$HOME/.rbenv/bin:$PATH"' >> ~/.bashrc
echo 'eval "$(rbenv init -)"' >> ~/.bashrc
exec $SHELL
git clone git://github.com/sstephenson/ruby-build.git ~/.rbenv/plugins/ruby-build
echo 'export PATH="$HOME/.rbenv/plugins/ruby-build/bin:$PATH"' >> ~/.bashrc
exec $SHELL
rbenv install 2.0.0-p481
rbenv global 2.0.0-p481
ruby -v
From the command line:
gem install bundler --no-ri --no-rdoc
rbenv rehash
gem install rails -v 4.0.9
sudo apt-get install redis-server
gem install sidekiq
rbenv rehash
From the command line:
git clone https://github.com/Netflix/Scumblr.git
cd Scumblr
bundle install
rake db:create
rake db:schema:load
From the command line from the Scumblr root folder:
../.rbenv/versions/2.0.0-p481/bin/rails c
In the console:
user = User.new
user.email = "<Valid email address>"
user.password = "<Password>"
user.password_confirmation = "<Password>"
user.admin = true
user.save
From the command line from the Scumblr root folder:
redis-server &
../.rbenv/shims/bundle exec sidekiq -l log/sidekiq.log &
../.rbenv/shims/bundle exec rails s &
Now connect to your server on port 3000
This section will discuss additional items that should be configured before using Scumblr in production.
Scumblr integrates with other services and APIs in order to find results and generate screenshots. Locations and API keys should be placed in config/initializers/scumblr.rb. A sample of this file is located at config/initializers/scumblr.rb.sample. In this file you can set:
- The URL where Sketchy can be accessed (if using Sketchy to generate screenshots)
- Keys, Secrets, and IDs for API authentication/authorization (Google, Apple Store, eBay, Twitter, etc.)
Examples for each configuration option for built-in search providers are located in the config/initializer/scumblr.rb.sample file. Simply rename this file to scumblr.rb and add the appropriate keys/values.
First you should generate secrets for the Rails Application and Devise:
rake secret
Run this command twice. Put one secret on line 7 of config/initializers/secret_token.rb. Put the other on line 7 of config/initializers/devise.rb
If you plan to use email notifications, you should ensure the default URL options are set correctly. This can be done in the config/environments/* files. Placeholders are located at the end of the production.rb and test.rb files. Example:
Rails.application.routes.default_url_options[:host] = "scumblr.com"
Rails.application.routes.default_url_options[:protocol] = "https"
In order to allow Scumblr to automatically run searches and send email notifications, you may want to setup cron jobs using the appropriate rake tasks:
rake sync_all
This take will run all the searches and import any new results. It will also generate screenshots of each result, if the integration with Sketchy is configured.
There are other items that should be considered before deploying Scumblr in a "production" environment. These include:
- Choosing an appropriate web server to front the Rails application (We use Unicorn/Nginx)
- Adjusting redis configuration to meet your needs
- Reviewing and adjusting the Devise configuration if needed
- Standard hardening for the Ubuntu host
You may also want to review the app/models/ability.rb file. This file specifies the authorization roles in place. At Netflix, we use a simple admin/normal user scheme. This may not be appropriate for all use cases.
This section will discuss using basic use of Scumblr. We will assume that you've gotten Scumblr up and running, including Redis and Sidekiq, and have left the default search providers in place.
Searches are task that run in order to look for results to import into Scumblr. Searches rely on a Search Provider--a plugin module that knows how to take a set of options and find and return results. Scumblr includes a number of search providers by default. These include:
- YouTube
- Apple AppStore
- Google Play Store
- eBay
For now we'll stick with using the built-in search providers, but adding a new search provider is relatively straightforward and will be discussed later. We will assume you have generated the appropriate API keys and added them to the configuration file as discussed in "Scumblr Configuration" above.
In order to create a new search, you will need to be logged in with an account with admin rights. If you'd like normal users to have the ability to create a new search, you can make the appropriate modifications to the /app/models/ability.rb file.
From any page in Scumblr:
- On the top menu click "Searches"
- Click "New Search"
- Give your search an identifiable name
- Add a query string. We're going to use "netflix scumblr"
- Select a Search Provider. We're going to use Google
- If additional options are available, they will appear inline. We'll leave the additional options blank for now.
- Add any tags you would like to be automatically applied to these results. We'll add one tag, "Scumblr"
- If you'd like, add a verbose description to the search
- When done click "Create Search"
In order to get results, the search needs to be run. This can be done in a number of ways:
- An individual search can be run through the web interface
- All searches can be run at the same time through the web interface
- All searches can be run at the same time using a rake task
We'll discuss each of these methods in this section.
- From anywhere in Scumblr, click "Searches" on the top menu
- Click on the Name of the Search you'd like to run
- Click "Run Now"
Your search should run and once complete you should see results on the results page (if any we're found!)
- From anywhere in Scumblr, click "Searches" on the top menu
- Click "Run All Searches"
All the searches you have configured should run. Once complete you should see identified results on the results page.
-
From the command line at the Scumblr root path, run:
rake sync_all
Thils will run all the searches you have configured. Once complete you should see results on the results page.
Results are the core model in Scumblr. A result is represent a URL that has been entered manually or imported using a Search Provider. This section will discuss how to view, inspect, and action results.
The result list is the main page for the application. This page shows a summary of all the results that have been identified and also searching/filtering, sorting, viewing basic details, and taking basic actions on the results.
The Result List page is is the first page you'll arrive at after logging in. It can also be reached by clicking "Results" on the top menu.
There are two main sections on the Result List page: the results list, and the filter/action panel.
The results list is the main part of this screen and consists of the table on the left side of the page. In this table, each row represents a result. From here you can view the title of the result, the status (if one is given/available), the domain, when the result was first identified, and when it was last seen in a search.
There is also a link that will take you to the URL represented by the result. Note: This open a new tab and take you outside of Scumblr. Always be careful when visiting random sites on the Internet.
On the right side of the result list is a "Show" button clicking this button will take you to the detail page for this result. This page will be discussed in the "Vewiing Results" section below.
If a result has screenshots attached (either manually added or synced with Sketchy), a small monitor icon will appear to the left of the result's title. Hovering over this icon will show a thumbnail image of the first screenshot. Clicking the icon will allow seeing a larger gallery view of all the screenshots attached to the result.
On the right side of the results list page is the filter/action panel. This section allows performing granular filtering for specific results.
From here it is possible to serach based on:
- URL
- Title
- Tags
- Assignee
- Status
- Search
- Workflow Flag
- Worklow Stage
It is also possible to incidcate whether "Closed" results should be included in the list. A closed result is an result whose status has been indicated to be a closed status. (More about Statuses in the relevant section below.) In order to perform a search, fill out the fields you're interested in and click "Search". Multiple filter attributes can be used and will be treated as "and" conditions. For example if you search for "facebook" in the URL field and "Investigating" in the Status field, you will get all the results with facebook in the URL that are also currently in the "Investigating" status.
If multiple entries are searched in the multi-search boxes (Tags, Assignee, Status, Search, Workflow Flag, and Workflow Stage), these will be treated as an "or" condition. For example if you search for "John" and "Cindy" in the Assignee field (assuming these were users of the system), you will get a list of all results assigned to either John OR Cindy.
Important: Filters will persist between requests. In other words if you navigate away from the results list (into a result's detail page for example), when you return to the result's list your filter will remain intact. If you want to remove your filter and see all results click "Clear Search". When results are filtered this will be indicated in the result count displayed at the top of the result list table. For example, the result count may indicate "Displaying 1 result (1000 results filtered)". This would mean that 1 result meets your search criteria while another 1000 have been filtered from view.
The action panel allows performing certain actions on one or more results. The action panel appears at the right side of the screen, but is not visible until one or more results are selected with the checkboxes on the left side of the result. Actions that can be taken with the action panel include:
- Changing the Status
- Adding Tags
- Setting the Assignee
- Generating Screenshot (if Sketchy is enabled)
To use the action panel, first select one or more results. You can also use the checkbox in the header of the results list. This will select all results on the current page of results. If you'd like you can select all results that meet the current filter (on all pages). This is done by selecting the checkbox in the header of the results list and then clicking "Select all n results that match this filter."
Once the appropriate results are selected, simply select the options on the action panel that you'd like to change. You can perform multiple changes at one time (changing status, adding tags, setting assignee, generating screenshots). You can also add multiple tags at once by adding multiple tags to the tag field.
To generate screenshots, use the right side of the update button (area with the arrow) and select either "Update and Generate Screenshot" or "Update and Force Generate Screenshot". "Force Generate" will add a new screenshot for all selected results, even if a screenshot already exists. "Generate" will only add a new screenshot to selected results without an existing attachment.
Results can be manually created by clicking the "New Result" button on the bottom of the Result List page. To create a results you'll need to provide a name/title for the result, as well as a URL.
The result details page contains additional information and actions that can be taken on individual results. From here one can:
- View the details of the results
- View/manually add screenshots
- Change the status
- Add comments
- Changes the assignee
- Subscribe to updates
- Add/Remove tags
- Add/Update Workflows
This section will walkthrough using the results detail page.
At the top of the result view is the status bar. This indicates the current status of the result. A new statuscan be selected by clicking on the desired status. This is meant for a simple, high-level status that can be used to filter/search for results and indicate where in the process the result currently resides.
Below the status bar are additional details about the results. These include the title, when the result was first identified, and a link to the original result page.
The searches section lists all the searches which have found the result page. If more than one search identifies the same URL, they will all be listed here. The information listed here includes the name of the search, the providers, the query used, and when the search first identified the result.
The attachments section show any files attached to the result and allows triggering screenshot generation (through Sketchy if available) or manually uploading a file. New screenshots can be uploaded by rolling over the large "+" placeholder and clicking "Upload". A screenshot can be requested from Sketchy by rolling over the large "+" placeholder and clicking "Generate"
Rolling over an existing screenshot will provide options to "View" the result full size or "Delete" the screenshot.
At the bottom of the result page is the comments section. Comments can be added using the comment form at the top, or existing comments can be replyed to by hitting the "Reply" button under the comment. The small -/+ to the left of the commenter's name allows collasping/expanding the comment thread.
On the right side of the page, the result's assignee can be viewed/set. To change the assignee click the pencil icon and select a new user.
Note: a user must exist on the system to be used as an assignee.
If Scumblr is configured to send email messages, it is possible to subscribe to a result to receive an email when updates occur. To do this, click the "Subscribe" link. Once subscribed you can unsubscribe by clicking "Unsubscribe".
The number to the right of the "Subscribe/Unsubscibe" link indicates how many users are currently subscribed. Clicking this number will show a list of subscribed users and allow adding a new subscriber (including someone besides yourself) or removing existing subscribers.
If the result has had tags applied, they will appear in the tags section. Tags can be removed by clicking the "X" inside the tag. New tags can be added by clicking the "+" and typing/selecting the tag you'd like to apply.
The workflow flags section of the page shows any workflows that have been added to the result. Results can be flagged for multiple workflows ("Investigate" and "Takedown" for example), however they can only utilize a single copy of an individual workflow (You cannot apply two "Investigate" workflows, for example). All workflows that have been applied to the current result will be listed and the current stage will be shown in the drop down.
To add a new workflow flag to the result, click the "+", choose a workflow flag, and click submit. If any options are required to add the workflow, a form will appear for you to fill out. Click "Add workflow" when complete.
To change the stage of an existing workflow, click the associate dropdown and select the new stage. If any options are required to add the workflow, a form will appear for you to fill out. Click "Add workflow" when complete.
Saved filtered allow saving a set of criteria used to filter results so it can easily be accessed and shared. Additionally, saved filters can have a list of subscribers, who can receive email updates when new matching results are identified.
Saved Filters are created from the results list page. To create a saved filter perform a search as normal. You can click "Search" to preview the results if you'd like, but this is not necessary. When you're ready to save your filter, click the save button. This will allow to to fill in additional options about the search include a name which will be used to refer to the saved filter, a list of subscribers to notify when new matching results are identified, and whether you want to share the filter with other users of the application. When you're happy with your saved filter, click "Create Saved Filter".
Once a saved filter has been saved, it can be access from the Saved Filters menu at the top of the page. Clicking on the filter name will take you to the current results for that saved filter.
To modify an existing filter, click "Manage" in the Saved Filters menu. Saved Filters can be modified by clicking the "Edit" button or deleted by clicking "Delete".
Pulbic filters created by other users can be added to your Saved Filters menu. To do this, first click "Manage" under the "Saved Filters" menu at the top of the page. From here, at the bottom of this page will be a list of public filters, if any exist. Clicking "Add" next to any of any of these filters will add it to your list (under Saved Filters, Public Filters).
Statuses allow a flexible way of tracking the high-level state of a given result. Statuses can be created/editted from the "Admin>Statuses" menu at the top of the page. Created/editting statuses required admin privileges on the system.
Statuses have three fields:
- Name
- Closed
- Invalid
The Closed flag indicates that results in this status should be consided closed. Results that have beed moved into a closed status will be excluded from the results list by default.
The Invalid flag indicates the result was invalid--meaning it's not something you were looking for from the given search. For example, you may have a "False Positive" status. The invalid flag allows tracking which searches are producing high numbers of results that you're not actually interested in.
Scumblr ships with a simple dashboard which is available from the top menu. This page shows the following information:
- Time-series chart showing number of results identified per page
- A breakdown of results by status
- A breakdown of the number of results assigned to a workflow
- For each search
- The total number of results found
- The number of results found in the last 24 hours
- The number of results found in the last 7 days
- A 30 day trend line
- The number of results assigned to each workflow and/or to no workflow
Scumblr can be extended in a variety of ways including adding new workflows and new search providers. These capabilities will be discussed in this section.
Scumblr uses the Workflowable gem (www.github.com/netflix/workflowable) to allow defining flexible workflows for actioning results. This section will give a brief overview of workflows. For more information about setting up and defining workflows, see the Workflowable wiki.
In Scumblr, results can be assigned one or more workflows. This is done by adding a workflow flag from the Result view page (as discussed in the Workflow Flags section above). Once a workflow flag has been added, the result can be moved through the various phases from the result view page.
This section will give a brief overview of the concepts used by the Workflowable gem.
Workflow: A process, generally with multiple, ordered stages
Stage: A state within the process
Initial stage: The state the workflow starts in
Action: A function that gets run as part of adding a workflow to an item or moving between stages
Before action: An action that gets run when transitioning into a specific stage
After action: An action that gets run when transitioning out of a specific stage
Global action: An action that gets run when between any stage, including moving into the initial stage (i.e. when a result is first flagged)
In order to use workflows, we first need to create one. This can be done from the Workflowable admin page (available at /workflowable or from the Admin menu). Please see the workflowable wiki for detailed instructions on setting up a workflow.
Once the workflow has been setup from the Workflowable admin page, we also need to create a Workflow Flag that can be used in Scumblr. This can be done from the Flag admin page (in the admin menu).
Workflowable allow defining custom actions that can be run when flagging a result with a workflow and/or when moving between stages. These actions are developed as classes and need to conform to the API defined by workflowable (see the Workflowable wiki). Action classes should be stored in the lib/workflowable/actions folder.
You will need to have already created the workflow through the workflowable admin page. Once this is complete, go to the Flag admin page and click "New Flag". Here you can choose a name for the workflow (which will be used in Scumblr), a description for the workflow, and add any subscribers who, if Scumblr is setup to send email, will receive notifications when a result is flagged for this workflow. You will additionally need to choose the workflow to associate with the flag by choosing it from the dropdown box. This box is populated based on the workflows created in the workflowable admin page.
Once the flag has been created it is ready to be used in Scumbr. See the "Workflow Flags" section above for instructions on how to assign and use workflows from the result view page.
It is possible to define new search providers that are capable of searching for and identifying results. This is done by implemnting a class that conform to the search provider API and placing the provider file in lib/providers. All search providers should be a subclass of SearchProvider::Provider.
A search provider generally needs to implement 4 functions:
self.provider_name: This function allow Scumblr to retrieve and present a readable name for the provider
self.options: This function allow defined what options can/should be passed into the provider. These options are defined when creating the Search
initialize(query, options={}): This function is often overridden in order to retrieve an access token from the configuration or otherwise setup the provider
run: This function performs the search and returns the results
Parameters: None
Return Value: String
This simple function returns the provider name when called Scumblr needs to get the human readable name for the provider. For example SeachProvider::AppStore.provider_name will return "Apple Store Search"
Parameters: None
Return Value: options (hash)
This function will return a hash that defines which options can/should be passed in when running a search. Options are defined when an individual search is created.
key: (symbol) A key used to identify the option value. Each option must have a unique key
value: (hash) This will contain information about the option:
value[:name] (string) The name of the option
value[:description] (string) A description of the option
value[:requred] (boolean) Is the option required
{
:results=>{name: "Max results (200)", description: "Specifiy the number of results to retrieve", required: false},
:require_all_terms=>{name: "Require all terms", description: "If set to \"true\" will ensure all search terms contained in result", required: false}
}
If self.options returns this hash, the provider will have two optional options that can be defined when creating a search: "Max results (200)" and "Require all terms"
Parameters: query (string), options (hash, optional)
Return Value: None
The default initializer will set the query to @query and the options to @options. These values (@query, @options) can be accessed from the run function. If the initializer is overridden, the overriding function can call "super" to setup these values, or overriding function can setup the appropriate values for the run function itself.
This function is often used to pull API keys or other credentials from the config file. If using the convention used for the out of the box search providers, this data should be put in config/initializers/scumbr.rb and would be accessed using:
Rails.configuration.try(:<OPTION NAME>)
Parameters: None
Return Value: results (array of results hashes)
The run function should perform the search and return any identified results. If using the default initalizer, the query to search will be stored in @query and the options in @options. Options can be accessed using the same key as that used in the options hash. So, for example, if the options hash defined an options ":results", this value would be retrieved using @options[:results]
title: (string) The title of the result
url: (string) The url for the result
domain: (string) The result's domain
metadata: (hash) An (optional) hash of additional metadata to store with the result
[
{title: "Netflix", url: "http://www.netlfix.com", domain: "www.netflix.com", metadata: {priority: 1}
{title: "Netflix Blog", url: "http://blog.netlfix.com/page1", domain: "blog.netflix.com", metadata: {priority: 2}
]
This example includes 2 results: one for the Netflix main site and one for the Netflix Blog. These results would be imported into Scumbr, if not previously identified.
Below is a simple sample of a search provider class. This class will not perform a search, but should provide guidence on what a Search Provider class should look like:
class SearchProvider::FakeSearch < SearchProvider::Provider
def self.provider_name
"Fake Search"
end
def self.options
{
:max_result=>{name: "Max Results", description: "The maximum number of results to retrieve", required: false},
}
end
def initialize(query, options={})
super
@access_token = Rails.configuration.try(:simple_search_token)
end
def run
if(@access_token.blank?)
Rails.logger.error "Unable to run Simple Search. Please define an access key as simple_search_token in the Scumblr initializer."
return
end
# RUN SEARCH HERE
# Fake results
result = [
{title: "Netflix", url: "http://www.netlfix.com", domain: "www.netflix.com", metadata: {priority: 1}
{title: "Netflix Blog", url: "http://blog.netlfix.com/page1", domain: "blog.netflix.com", metadata: {priority: 2}
]
return results
end
end