Skip to content

Commit 4db7a74

Browse files
committed
2.0.0.0
Added Instagram downloading, filter by site, channels groups, change folder function, imgur compatibility, special folders, deleting with keeping data, Reddit saved posts downloading Fixed limited twitter downloading, suspended profiles Updated download algo Concat sites editors into a single form Updated Reddit downloading algo Fixed saved function in video downloader Some improvements
1 parent 5f2c447 commit 4db7a74

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

47 files changed

+4680
-2103
lines changed

Changelog.md

+17
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,20 @@
1+
# 2.0.0.0
2+
3+
- Added
4+
- **Instagram**
5+
- Filter by site
6+
- Group for regular channels in the main window
7+
- Ability to change user/collection path
8+
- Imgur albums downloading
9+
- NSFW Imgur content bypass (requires 'CilentID')
10+
- Special user folder
11+
- Remove user while keeping data
12+
- Disabled overriding user preferences when creating a new user if it already exists in the destination (in case of deleting a user with saving data).
13+
- **Saved Reddit posts downloading**
14+
- Fixed
15+
- Suspended profiles do not change status if the profile is no longer suspended
16+
- Limited download for Twitter not implemented
17+
118
# 1.0.1.0
219

320
- Added

README.md

+24-15
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
# Social networks crawler
22

3-
Program for downloading photo and video from Reddit and Twitter
3+
Program for downloading photo and video from Reddit, Twitter and Instagram
44

5-
Enjoying the tool? Considering adding to my coffee fund :)
5+
Do you like this program? Consider adding to my coffee fund by making a donation to show your support. :)
66

77
[![ko-fi](https://www.ko-fi.com/img/githubbutton_sm.svg)](https://ko-fi.com/andyprogram)
88

@@ -12,9 +12,10 @@ Enjoying the tool? Considering adding to my coffee fund :)
1212
- Reddit galleries of images;
1313
- Redgifs hosted videos (https://www.redgifs.com/);
1414
- Reddit hosted videos (downloading Reddit hosted video is going through ffmpeg);
15-
- Twitter images;
16-
- Twitter videos.
17-
- Parse channel and view data.
15+
- Twitter images and videos;
16+
- Instagram images and videos.
17+
- Parse [channel and view data](https://github.com/AAndyProgram/SCrawler/wiki/Channels).
18+
- Download [saved Reddit posts](https://github.com/AAndyProgram/SCrawler/wiki/Home#saved-posts).
1819
- Add users from parsed channel.
1920
- Labeling users.
2021
- Filter exists users by label or group.
@@ -24,41 +25,49 @@ Enjoying the tool? Considering adding to my coffee fund :)
2425

2526
## Reddit
2627

27-
The program parsing all user's posts, gathering pictures' MD5 hash and compare with existing for remove duplicates. Then media will be downloaded.
28+
The program parses all user posts, obtain MD5 images hash and compares them with existing ones to remove duplicates. Then the media will be downloaded.
2829

29-
## Twitter
30+
## Twitter and Instagram
3031

31-
The program parsing all user's posts and compare file names with existing for remove duplicates. Then media will be downloaded.
32+
The program parses all user posts and compares file names with existing ones to remove duplicates. Then the media will be downloaded.
33+
34+
You can read about Instagram restrictions [here](https://github.com/AAndyProgram/SCrawler/wiki/Settings#instagram-limits)
3235

3336
# Requirements:
3437

3538
- Windows 7, 8, 9, 10, 11 with NET Framework 4.6.1 or higher
3639
- Authorization cookies and tokens for Twitter (if you want to download data from Twitter)
37-
- ffmpeg library for download Reddit hosted videos (you can download it from the [official repo](https://github.com/GyanD/codexffmpeg/releases/tag/2021-01-12-git-ca21cb1e36) or [from my first release](https://github.com/AAndyProgram/SCrawler/releases/download/1.0.0.0/ffmpeg.zip))
40+
- Authorization cookies Instagram (if you want to download data from Instagram)
41+
- ffmpeg library for downloading videos hosted on Reddit (you can download it from the [official repo](https://github.com/GyanD/codexffmpeg/releases/tag/2021-01-12-git-ca21cb1e36) or [from my first release](https://github.com/AAndyProgram/SCrawler/releases/download/1.0.0.0/ffmpeg.zip))
3842
- Don't put program in the ```Program Files``` system folder (this is portable program and program settings are stored in the program folder)
39-
- Just unpack program archive in any folder you want, copy ```ffmpeg.exe``` into and enjoy. :-)
43+
- Just unzip the program archive to any folder, copy the file ```ffmpeg.exe``` into it and enjoy. :)
44+
45+
# Updating
46+
47+
Just download [latest](https://github.com/AAndyProgram/SCrawler/releases/latest) version and unpack it into the program folder. Before starting a new version, I recommend making a backup copy of the program settings folder.
4048

4149
# Settings and usage
4250

4351
The program has an intuitive interface.
4452

45-
Just add user profile and press ```Start downloading``` button.
53+
Just add a user profile and click the ```Start downloading``` button.
4654

47-
Users can be added by patterns:
55+
You can add users by patterns:
56+
- https://www.instagram.com/SomeUserName
4857
- https://twitter.com/SomeUserName
4958
- https://reddit.com/user/SomeUserName
5059
- https://reddit.com/r/SomeSubredditName
5160
- u/SomeUserName
5261
- r/SomeSubredditName
53-
- SomeUserName (in this case you must to choose user site)
62+
- SomeUserName (in this case, you need to select the user's site)
5463
- SomeSubredditName
5564

56-
More about users and subreddits adding [here](https://github.com/AAndyProgram/SCrawler/wiki/Users)
65+
Read more about adding users and subreddits [here](https://github.com/AAndyProgram/SCrawler/wiki/Users)
5766

5867
**Full guide you can find [here](https://github.com/AAndyProgram/SCrawler/wiki)**
5968

6069
## Using program as just video downloader
6170

62-
Create a shortcut for the program. Open shortcut properties. On the ```Shortcut``` tab in ```Target``` field just add ```v``` at the end through the space.
71+
Create a shortcut for the program. Open shortcut properties. In the ```Shortcut``` tab, in the ```Target``` field, just add the letter ```v``` at the end across the space.
6372

6473
Example: ```D:\Programs\SCrawler\SCrawler.exe v```

SCrawler.sln

+2
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@ Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "Solution Items", "Solution
1111
ProjectSection(SolutionItems) = preProject
1212
.gitignore = .gitignore
1313
Changelog.md = Changelog.md
14+
Info\InstaAlgo.txt = Info\InstaAlgo.txt
15+
Info\InstagramInfo.txt = Info\InstagramInfo.txt
1416
README.md = README.md
1517
Info\RedditUrlsInfo.txt = Info\RedditUrlsInfo.txt
1618
ToDo.txt = ToDo.txt

SCrawler/API/Base/Declarations.vb

+13
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
' Copyright (C) 2022 Andy
2+
' This program is free software: you can redistribute it and/or modify
3+
' it under the terms of the GNU General Public License as published by
4+
' the Free Software Foundation, either version 3 of the License, or
5+
' (at your option) any later version.
6+
'
7+
' This program is distributed in the hope that it will be useful,
8+
' but WITHOUT ANY WARRANTY
9+
Namespace API.Base
10+
Friend Module Declarations
11+
Friend ReadOnly LNC As New ListAddParams(LAP.NotContainsOnly)
12+
End Module
13+
End Namespace

SCrawler/API/Base/SiteSettings.vb

+137-26
Original file line numberDiff line numberDiff line change
@@ -25,49 +25,160 @@ Namespace API.Base
2525
_Path.Value = NewFile
2626
End Set
2727
End Property
28+
Friend ReadOnly Property InstaHash As XMLValue(Of String)
29+
Friend ReadOnly Property InstaHashUpdateRequired As XMLValue(Of Boolean)
30+
Friend ReadOnly Property InstagramDownloadingErrorDate As XMLValue(Of Date)
31+
Friend Property InstagramLastApplyingValue As Integer? = Nothing
32+
Friend ReadOnly Property InstagramReadyForDownload As Boolean
33+
Get
34+
With InstagramDownloadingErrorDate
35+
If .ValueF.Exists Then
36+
Return .ValueF.Value.AddMinutes(If(InstagramLastApplyingValue, 10)) < Now
37+
Else
38+
Return True
39+
End If
40+
End With
41+
End Get
42+
End Property
43+
Friend Property InstagramTooManyRequestsReadyForCatch As Boolean = True
44+
Friend Sub InstagramTooManyRequests(ByVal Catched As Boolean)
45+
With InstagramDownloadingErrorDate
46+
If Catched Then
47+
If Not .ValueF.Exists Then
48+
.Value = Now
49+
If InstagramTooManyRequestsReadyForCatch Then
50+
InstagramLastApplyingValue = If(InstagramLastApplyingValue, 0) + 10
51+
InstagramTooManyRequestsReadyForCatch = False
52+
MyMainLOG = $"Instagram downloading error: too many requests. Try again after {If(InstagramLastApplyingValue, 10)} minutes..."
53+
End If
54+
End If
55+
Else
56+
.ValueF = Nothing
57+
InstagramLastApplyingValue = Nothing
58+
End If
59+
End With
60+
End Sub
61+
Friend ReadOnly Property Temporary As XMLValue(Of Boolean)
62+
Friend ReadOnly Property DownloadImages As XMLValue(Of Boolean)
63+
Friend ReadOnly Property DownloadVideos As XMLValue(Of Boolean)
64+
Friend ReadOnly Property GetUserMediaOnly As XMLValue(Of Boolean)
65+
Friend ReadOnly Property SavedPostsUserName As XMLValue(Of String)
2866
Private ReadOnly SettingsFile As SFile
29-
Friend Sub New(ByVal s As Sites, ByRef _XML As XmlFile, ByVal GlobalPath As SFile)
67+
Friend Sub New(ByVal s As Sites, ByRef _XML As XmlFile, ByVal GlobalPath As SFile,
68+
ByRef _Temp As XMLValue(Of Boolean), ByRef _Imgs As XMLValue(Of Boolean), ByRef _Vids As XMLValue(Of Boolean))
3069
Site = s
3170
SettingsFile = $"{SettingsFolderName}\Responser_{s}.xml"
3271
Responser = New WEB.Response(SettingsFile)
3372

3473
If SettingsFile.Exists Then
3574
Responser.LoadSettings()
3675
Else
37-
If Site = Sites.Twitter Then
38-
With Responser
39-
.ContentType = "application/json"
40-
.Accept = "*/*"
41-
.CookiesDomain = "twitter.com"
42-
.Decoders.Add(SymbolsConverter.Converters.Unicode)
43-
With .Headers
44-
.Add("sec-ch-ua", " Not;A Brand" & Chr(34) & ";v=" & Chr(34) & "99" & Chr(34) & ", " & Chr(34) &
45-
"Google Chrome" & Chr(34) & ";v=" & Chr(34) & "91" & Chr(34) & ", " & Chr(34) & "Chromium" &
46-
Chr(34) & ";v=" & Chr(34) & "91" & Chr(34))
47-
.Add("sec-ch-ua-mobile", "?0")
48-
.Add("sec-fetch-dest", "empty")
49-
.Add("sec-fetch-mode", "cors")
50-
.Add("sec-fetch-site", "same-origin")
51-
.Add(Header_Twitter_Token, String.Empty)
52-
.Add("x-twitter-active-user", "yes")
53-
.Add("x-twitter-auth-type", "OAuth2Session")
54-
.Add(Header_Twitter_Authorization, String.Empty)
76+
Select Case Site
77+
Case Sites.Twitter
78+
With Responser
79+
.ContentType = "application/json"
80+
.Accept = "*/*"
81+
.CookiesDomain = "twitter.com"
82+
.Decoders.Add(SymbolsConverter.Converters.Unicode)
83+
With .Headers
84+
.Add("sec-ch-ua", " Not;A Brand" & Chr(34) & ";v=" & Chr(34) & "99" & Chr(34) & ", " & Chr(34) &
85+
"Google Chrome" & Chr(34) & ";v=" & Chr(34) & "91" & Chr(34) & ", " & Chr(34) & "Chromium" &
86+
Chr(34) & ";v=" & Chr(34) & "91" & Chr(34))
87+
.Add("sec-ch-ua-mobile", "?0")
88+
.Add("sec-fetch-dest", "empty")
89+
.Add("sec-fetch-mode", "cors")
90+
.Add("sec-fetch-site", "same-origin")
91+
.Add(Header_Twitter_Token, String.Empty)
92+
.Add("x-twitter-active-user", "yes")
93+
.Add("x-twitter-auth-type", "OAuth2Session")
94+
.Add(Header_Twitter_Authorization, String.Empty)
95+
End With
5596
End With
56-
End With
57-
ElseIf Site = Sites.Reddit Then
58-
Responser.CookiesDomain = "reddit.com"
59-
Responser.Decoders.Add(SymbolsConverter.Converters.Unicode)
60-
End If
97+
Case Sites.Reddit
98+
Responser.CookiesDomain = "reddit.com"
99+
Responser.Decoders.Add(SymbolsConverter.Converters.Unicode)
100+
Case Sites.Instagram : Responser.CookiesDomain = "instagram.com"
101+
End Select
61102
Responser.SaveSettings()
62103
End If
63-
_Path = New XMLValue(Of SFile)("Path", SFile.GetPath($"{GlobalPath.PathWithSeparator}{Site}"),
64-
_XML, {SettingsCLS.Name_Node_Sites, Site.ToString}, XMLValue(Of SFile).ToFilePath)
104+
105+
Dim n() As String = {SettingsCLS.Name_Node_Sites, Site.ToString}
106+
_Path = New XMLValue(Of SFile)("Path", SFile.GetPath($"{GlobalPath.PathWithSeparator}{Site}"), _XML, n, XMLValue(Of SFile).ToFilePath)
65107
_Path.ReplaceByValue("Path", {Site.ToString})
66108
_XML.Remove(Site.ToString)
109+
110+
Temporary = New XMLValue(Of Boolean)
111+
Temporary.SetExtended("Temporary", False, _XML, n)
112+
Temporary.SetDefault(_Temp)
113+
114+
DownloadImages = New XMLValue(Of Boolean)
115+
DownloadImages.SetExtended("DownloadImages", True, _XML, n)
116+
DownloadImages.SetDefault(_Imgs)
117+
118+
DownloadVideos = New XMLValue(Of Boolean)
119+
DownloadVideos.SetExtended("DownloadVideos", True, _XML, n)
120+
DownloadVideos.SetDefault(_Vids)
121+
122+
If Site = Sites.Twitter Then
123+
GetUserMediaOnly = New XMLValue(Of Boolean)("GetUserMediaOnly", True, _XML, n)
124+
GetUserMediaOnly.ReplaceByValue("TwitterDefaultGetUserMedia", n)
125+
Else
126+
GetUserMediaOnly = New XMLValue(Of Boolean)
127+
End If
128+
129+
If Site = Sites.Instagram Then
130+
InstaHash = New XMLValue(Of String)("InstaHash", String.Empty, _XML, n)
131+
InstaHashUpdateRequired = New XMLValue(Of Boolean)("InstaHashUpdateRequired", True, _XML, n)
132+
If (InstaHash.IsEmptyString Or InstaHashUpdateRequired) And Responser.Cookies.ListExists Then GatherInstaHash()
133+
InstagramDownloadingErrorDate = New XMLValue(Of Date) With {.ToStringFunction = Function(ss, vv) AConvert(Of String)(vv, Nothing)}
134+
InstagramDownloadingErrorDate.SetExtended("InstagramDownloadingErrorDate", Now.AddYears(-10), _XML, n)
135+
Else
136+
InstaHash = New XMLValue(Of String)
137+
InstaHashUpdateRequired = New XMLValue(Of Boolean)
138+
End If
139+
If Site = Sites.Reddit Then
140+
SavedPostsUserName = New XMLValue(Of String)("SavedPostsUserName", String.Empty, _XML, n)
141+
Else
142+
SavedPostsUserName = New XMLValue(Of String)
143+
End If
67144
End Sub
68145
Friend Sub Update()
69146
Responser.SaveSettings()
70147
End Sub
148+
Friend Function GatherInstaHash() As Boolean
149+
Try
150+
Dim rs As New RegexStructure("=" & Chr(34) & "([^" & Chr(34) & "]+?ConsumerLibCommons[^" & Chr(34) & "]+?.js)" & Chr(34), 1) With {
151+
.UseTimeOut = True,
152+
.MatchTimeOutSeconds = 10
153+
}
154+
Dim r$ = Responser.GetResponse("https://instagram.com",, EDP.ThrowException)
155+
If Not r.IsEmptyString Then
156+
Dim hStr$ = RegexReplace(r, rs)
157+
If Not hStr.IsEmptyString Then
158+
Do While Left(hStr, 1) = "/" : hStr = Right(hStr, hStr.Length - 1) : Loop
159+
hStr = $"https://instagram.com/{hStr}"
160+
r = Responser.GetResponse(hStr,, EDP.ThrowException)
161+
If Not r.IsEmptyString Then
162+
rs = New RegexStructure("generatePaginationActionCreators.+?.profilePosts.byUserId.get.+?queryId:.([\d\w\S]+?)" & Chr(34), 1) With {
163+
.UseTimeOut = True,
164+
.MatchTimeOutSeconds = 10
165+
}
166+
Dim h$ = RegexReplace(r, rs)
167+
If Not h.IsEmptyString Then
168+
InstaHash.Value = h
169+
InstaHashUpdateRequired.Value = False
170+
Return True
171+
End If
172+
End If
173+
End If
174+
End If
175+
Return False
176+
Catch ex As Exception
177+
InstaHashUpdateRequired.Value = True
178+
InstaHash.Value = String.Empty
179+
Return ErrorsDescriber.Execute(EDP.SendInLog + EDP.ReturnValue, ex, "[SiteSettings.GaterInstaHash]", False)
180+
End Try
181+
End Function
71182
#Region "IDisposable Support"
72183
Private disposedValue As Boolean = False
73184
Protected Overridable Overloads Sub Dispose(ByVal disposing As Boolean)

0 commit comments

Comments
 (0)