Skip to content

Commit

Permalink
解耦了下载器、数据管道、解析器
Browse files Browse the repository at this point in the history
  • Loading branch information
zlzforever committed Sep 19, 2018
1 parent 0d27df1 commit e17955d
Show file tree
Hide file tree
Showing 232 changed files with 1,782 additions and 2,578 deletions.
21 changes: 21 additions & 0 deletions Design.zh-CN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# DESIGN

在此重大更新之前, 下载器 & 选择器 & 数据管理是耦合在一起的, 经过许久思考我决定把它们解耦出来, 这样用户可以自由的选择他们喜爱的组件. 比如说, 下载器使用框架自带的Downloader、WebClientApi、苏菲的HttpHelper等; 解析器可以使用框架自带的Extraction、AngleSharp等;

### Dowloader

Downloader is a independent module to help user to download data from target website. There are a lot of details, see below:

1. Two ways to set cookie, one is call the AddCookie method in downloader, it add cookie to CookieContainer so impact every request.
Set cookie header in request, the result is combine you cookie header and cookies in CookieContainer.
2. CookieInjector in downloader is invoked one time, and inject cookies to CookieContainer.

### Scheduler

#### Request hash

1. Same url different headers are different requests, so headers are a factor
2. There is a CycleRetryTimes property in a request, if value are different, then requests are different. Depth property is not
a factor.


11 changes: 3 additions & 8 deletions DotnetSpider.sln
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,6 @@ Microsoft Visual Studio Solution File, Format Version 12.00
# Visual Studio 15
VisualStudioVersion = 15.0.27703.2042
MinimumVisualStudioVersion = 10.0.40219.1
Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "DotnetSpider.Common", "src\DotnetSpider.Common\DotnetSpider.Common.csproj", "{F1C6C272-A72A-4A5B-95EE-846643A29A3A}"
EndProject
Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "DotnetSpider.Extraction", "src\DotnetSpider.Extraction\DotnetSpider.Extraction.csproj", "{C5A68E4D-E9B4-4B2D-B198-74FA88C8CA22}"
EndProject
Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "DotnetSpider.HtmlAgilityPack.Css", "src\DotnetSpider.HtmlAgilityPack.Css\DotnetSpider.HtmlAgilityPack.Css.csproj", "{38DFF949-761C-4DC1-ADC6-D3F535E84AEF}"
Expand Down Expand Up @@ -36,11 +34,12 @@ Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "Solution Items", "Solution
.editorconfig = .editorconfig
.gitignore = .gitignore
.travis.yml = .travis.yml
Design.md = Design.md
Design.zh-CN.md = Design.zh-CN.md
DistributeDesign.md = DistributeDesign.md
publishToNuget.bat = publishToNuget.bat
README.md = README.md
runtests.sh = runtests.sh
Design.md = Design.md
EndProjectSection
EndProject
Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "DotnetSpider.Worker", "src\DotnetSpider.Worker\DotnetSpider.Worker.csproj", "{C416B779-5018-42AF-A1A5-98186389CCED}"
Expand All @@ -49,18 +48,14 @@ Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "DotnetSpider.Migrator", "sr
EndProject
Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "DotnetSpider.Broker.Test", "src\DotnetSpider.Broker.Test\DotnetSpider.Broker.Test.csproj", "{6CAEECB0-0BD0-4A32-B057-99C7DADE3F4C}"
EndProject
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "DotnetSpider.Broker", "src\DotnetSpider.Broker\DotnetSpider.Broker.csproj", "{AAD552D8-0D0A-43B0-9C5D-E542AA8998CE}"
Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "DotnetSpider.Broker", "src\DotnetSpider.Broker\DotnetSpider.Broker.csproj", "{AAD552D8-0D0A-43B0-9C5D-E542AA8998CE}"
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|Any CPU = Debug|Any CPU
Release|Any CPU = Release|Any CPU
EndGlobalSection
GlobalSection(ProjectConfigurationPlatforms) = postSolution
{F1C6C272-A72A-4A5B-95EE-846643A29A3A}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{F1C6C272-A72A-4A5B-95EE-846643A29A3A}.Debug|Any CPU.Build.0 = Debug|Any CPU
{F1C6C272-A72A-4A5B-95EE-846643A29A3A}.Release|Any CPU.ActiveCfg = Release|Any CPU
{F1C6C272-A72A-4A5B-95EE-846643A29A3A}.Release|Any CPU.Build.0 = Release|Any CPU
{C5A68E4D-E9B4-4B2D-B198-74FA88C8CA22}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{C5A68E4D-E9B4-4B2D-B198-74FA88C8CA22}.Debug|Any CPU.Build.0 = Debug|Any CPU
{C5A68E4D-E9B4-4B2D-B198-74FA88C8CA22}.Release|Any CPU.ActiveCfg = Release|Any CPU
Expand Down
5 changes: 0 additions & 5 deletions src/DotnetSpider.Broker.Test/BaseTest.cs
Original file line number Diff line number Diff line change
@@ -1,15 +1,10 @@
using DotnetSpider.Broker.Data;
using DotnetSpider.Broker.Hubs;
using Microsoft.AspNetCore.Builder.Internal;
using Microsoft.EntityFrameworkCore;
using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.DependencyInjection;
using System;
using System.Collections.Generic;
using System.Data;
using System.Data.SqlClient;
using System.IO;
using System.Text;
using DotnetSpider.Broker.Services;

namespace DotnetSpider.Broker.Test
Expand Down
3 changes: 0 additions & 3 deletions src/DotnetSpider.Broker.Test/WorkerServiceTest.cs
Original file line number Diff line number Diff line change
@@ -1,10 +1,7 @@
using DotnetSpider.Broker.Data;
using DotnetSpider.Broker.Hubs;
using DotnetSpider.Broker.Services;
using Microsoft.Extensions.DependencyInjection;
using System;
using System.Collections.Generic;
using System.Text;
using Xunit;

namespace DotnetSpider.Broker.Test
Expand Down
3 changes: 0 additions & 3 deletions src/DotnetSpider.Broker/ApiAuthorizeMiddleware.cs
Original file line number Diff line number Diff line change
@@ -1,7 +1,4 @@
using Microsoft.AspNetCore.Http;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Net;
using System.Threading.Tasks;

Expand Down
5 changes: 1 addition & 4 deletions src/DotnetSpider.Broker/BrokerOptions.cs
Original file line number Diff line number Diff line change
@@ -1,7 +1,4 @@
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;
using System.Collections.Generic;

namespace DotnetSpider.Broker
{
Expand Down
4 changes: 0 additions & 4 deletions src/DotnetSpider.Broker/Controllers/BrokerController.cs
Original file line number Diff line number Diff line change
@@ -1,9 +1,5 @@
using Microsoft.AspNetCore.Mvc;
using Microsoft.Extensions.Logging;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;

namespace DotnetSpider.Broker.Controllers
{
Expand Down
6 changes: 1 addition & 5 deletions src/DotnetSpider.Broker/Controllers/HomeController.cs
Original file line number Diff line number Diff line change
@@ -1,8 +1,4 @@
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Threading.Tasks;
using System.Diagnostics;
using Microsoft.AspNetCore.Mvc;
using DotnetSpider.Broker.Models;

Expand Down
3 changes: 0 additions & 3 deletions src/DotnetSpider.Broker/Data/Block.cs
Original file line number Diff line number Diff line change
@@ -1,7 +1,4 @@
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;

namespace DotnetSpider.Broker.Data
{
Expand Down
5 changes: 1 addition & 4 deletions src/DotnetSpider.Broker/Data/BrokerDbContext.cs
Original file line number Diff line number Diff line change
@@ -1,7 +1,4 @@
using System;
using System.Collections.Generic;
using System.Text;
using Microsoft.AspNetCore.Identity.EntityFrameworkCore;
using Microsoft.AspNetCore.Identity.EntityFrameworkCore;
using Microsoft.EntityFrameworkCore;

namespace DotnetSpider.Broker.Data
Expand Down
2 changes: 0 additions & 2 deletions src/DotnetSpider.Broker/Data/Entity.cs
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
using System;
using System.Collections.Generic;
using System.Linq;
using System.Reflection;
using System.Threading.Tasks;

namespace DotnetSpider.Broker.Data
{
Expand Down
2 changes: 0 additions & 2 deletions src/DotnetSpider.Broker/Data/Job.cs
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
using System;
using System.Collections.Generic;
using System.ComponentModel.DataAnnotations;
using System.Linq;
using System.Threading.Tasks;

namespace DotnetSpider.Broker.Data
{
Expand Down
3 changes: 0 additions & 3 deletions src/DotnetSpider.Broker/Data/JobProperty.cs
Original file line number Diff line number Diff line change
@@ -1,8 +1,5 @@
using System;
using System.Collections.Generic;
using System.ComponentModel.DataAnnotations;
using System.Linq;
using System.Threading.Tasks;

namespace DotnetSpider.Broker.Data
{
Expand Down
39 changes: 35 additions & 4 deletions src/DotnetSpider.Broker/Data/JobStatus.cs
Original file line number Diff line number Diff line change
@@ -1,11 +1,42 @@
using DotnetSpider.Common;
using Newtonsoft.Json;
using Newtonsoft.Json.Converters;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;

namespace DotnetSpider.Broker.Data
{
/// <summary>
/// 爬虫状态
/// </summary>
[System.Flags]
[JsonConverter(typeof(StringEnumConverter))]
public enum Status
{
/// <summary>
/// 初始化
/// </summary>
Init = 1,

/// <summary>
/// 正在运行
/// </summary>
Running = 2,

/// <summary>
/// 暂停
/// </summary>
Paused = 4,

/// <summary>
/// 完成
/// </summary>
Finished = 8,

/// <summary>
/// 退出
/// </summary>
Exited = 16
}

public class JobStatus : Entity<Guid>, IHasModificationTime
{
public virtual Guid Identity { get; set; }
Expand Down
3 changes: 0 additions & 3 deletions src/DotnetSpider.Broker/Data/NodeStatus.cs
Original file line number Diff line number Diff line change
@@ -1,8 +1,5 @@
using System;
using System.Collections.Generic;
using System.ComponentModel.DataAnnotations;
using System.Linq;
using System.Threading.Tasks;

namespace DotnetSpider.Broker.Data
{
Expand Down
3 changes: 0 additions & 3 deletions src/DotnetSpider.Broker/Data/Running.cs
Original file line number Diff line number Diff line change
@@ -1,7 +1,4 @@
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;

namespace DotnetSpider.Broker.Data
{
Expand Down
6 changes: 1 addition & 5 deletions src/DotnetSpider.Broker/Data/Worker.cs
Original file line number Diff line number Diff line change
@@ -1,8 +1,4 @@
using System;
using System.Collections.Generic;
using System.ComponentModel.DataAnnotations;
using System.Linq;
using System.Threading.Tasks;
using System.ComponentModel.DataAnnotations;

namespace DotnetSpider.Broker.Data
{
Expand Down
1 change: 0 additions & 1 deletion src/DotnetSpider.Broker/DotnetSpider.Broker.csproj
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@
</ItemGroup>

<ItemGroup>
<ProjectReference Include="..\DotnetSpider.Common\DotnetSpider.Common.csproj" />
<ProjectReference Include="..\DotnetSpider.Downloader\DotnetSpider.Downloader.csproj" />
</ItemGroup>

Expand Down
6 changes: 1 addition & 5 deletions src/DotnetSpider.Broker/Dtos/AddNodeStatusDto.cs
Original file line number Diff line number Diff line change
@@ -1,8 +1,4 @@
using System;
using System.Collections.Generic;
using System.ComponentModel.DataAnnotations;
using System.Linq;
using System.Threading.Tasks;
using System.ComponentModel.DataAnnotations;

namespace DotnetSpider.Broker.Dtos
{
Expand Down
6 changes: 1 addition & 5 deletions src/DotnetSpider.Broker/Dtos/BlockDto.cs
Original file line number Diff line number Diff line change
@@ -1,9 +1,5 @@
using DotnetSpider.Common;
using DotnetSpider.Downloader;
using System;
using DotnetSpider.Downloader;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;

namespace DotnetSpider.Broker.Dtos
{
Expand Down
4 changes: 0 additions & 4 deletions src/DotnetSpider.Broker/HttpGlobalExceptionFilter.cs
Original file line number Diff line number Diff line change
@@ -1,10 +1,6 @@
using Microsoft.AspNetCore.Mvc.Filters;
using Microsoft.Extensions.Logging;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Net;
using System.Threading.Tasks;

namespace DotnetSpider.Broker
{
Expand Down
3 changes: 0 additions & 3 deletions src/DotnetSpider.Broker/Hubs/NodeHub.cs
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,7 @@
using DotnetSpider.Broker.Dtos;
using DotnetSpider.Broker.Services;
using Microsoft.AspNetCore.SignalR;
using Microsoft.EntityFrameworkCore;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;

namespace DotnetSpider.Broker.Hubs
Expand Down
6 changes: 1 addition & 5 deletions src/DotnetSpider.Broker/Hubs/WorkerHub.cs
Original file line number Diff line number Diff line change
@@ -1,10 +1,6 @@
using DotnetSpider.Broker.Data;
using DotnetSpider.Broker.Services;
using DotnetSpider.Broker.Services;
using Microsoft.AspNetCore.SignalR;
using Microsoft.EntityFrameworkCore;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;

namespace DotnetSpider.Broker.Hubs
Expand Down
2 changes: 0 additions & 2 deletions src/DotnetSpider.Broker/Models/ErrorViewModel.cs
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
using System;

namespace DotnetSpider.Broker.Models
{
public class ErrorViewModel
Expand Down
4 changes: 0 additions & 4 deletions src/DotnetSpider.Broker/Program.cs
Original file line number Diff line number Diff line change
@@ -1,12 +1,8 @@
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Threading.Tasks;
using Microsoft.AspNetCore;
using Microsoft.AspNetCore.Hosting;
using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.Logging;
using Serilog;
using Serilog.Events;

Expand Down
4 changes: 0 additions & 4 deletions src/DotnetSpider.Broker/ServiceCollectionExtensions.cs
Original file line number Diff line number Diff line change
@@ -1,9 +1,5 @@
using DotnetSpider.Broker.Services;
using Microsoft.Extensions.DependencyInjection;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;

namespace DotnetSpider.Broker
{
Expand Down
5 changes: 1 addition & 4 deletions src/DotnetSpider.Broker/Services/INodeService.cs
Original file line number Diff line number Diff line change
@@ -1,7 +1,4 @@
using DotnetSpider.Broker.Data;
using System;
using System.Collections.Generic;
using System.Linq;
using System;
using System.Threading.Tasks;

namespace DotnetSpider.Broker.Services
Expand Down
3 changes: 0 additions & 3 deletions src/DotnetSpider.Broker/Services/INodeStatusService.cs
Original file line number Diff line number Diff line change
@@ -1,7 +1,4 @@
using DotnetSpider.Broker.Data;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;

namespace DotnetSpider.Broker.Services
Expand Down
5 changes: 1 addition & 4 deletions src/DotnetSpider.Broker/Services/IWorkerService.cs
Original file line number Diff line number Diff line change
@@ -1,7 +1,4 @@
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;
using System.Threading.Tasks;

namespace DotnetSpider.Broker.Services
{
Expand Down
2 changes: 0 additions & 2 deletions src/DotnetSpider.Broker/Services/NodeService.cs
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
using DotnetSpider.Broker.Data;
using Microsoft.EntityFrameworkCore;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;

namespace DotnetSpider.Broker.Services
Expand Down
5 changes: 1 addition & 4 deletions src/DotnetSpider.Broker/Services/NodeStatusService.cs
Original file line number Diff line number Diff line change
@@ -1,7 +1,4 @@
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;
using System.Threading.Tasks;
using DotnetSpider.Broker.Data;

namespace DotnetSpider.Broker.Services
Expand Down
3 changes: 0 additions & 3 deletions src/DotnetSpider.Broker/Services/WorkerService.cs
Original file line number Diff line number Diff line change
@@ -1,8 +1,5 @@
using DotnetSpider.Broker.Data;
using Microsoft.EntityFrameworkCore;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;

namespace DotnetSpider.Broker.Services
Expand Down
Loading

0 comments on commit e17955d

Please sign in to comment.