What is Octoparse

JiashanCai 440 views 10 slides Aug 15, 2016
Slide 1
Slide 1 of 10
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10

About This Presentation

Octoparse is an easy-to-use but powerful web scraping tool that helps you collect any data from the web with high speed and productivity. Being a Windows client-side application, Octoparse works well for both static and dynamic websites, including scraping data with pagination, extracting data behin...


Slide Content

What is Octoparse

A

Get to know Octoparse!

Navigation Panel

+ dnt
A & に

‘The Ward Made fal forbegnners You just Sm the oro n the wird nd

ont nee to setup any extacton ries, But the Ward Node ny can be 0260 n

tee as oo cre extraction es wh sp on né dick cons With
O en proper exracton ns, Ockopase capable oF etracng any MED pages
3, Bec dele web age dt en there at of is toco -pagmaton

opor

teaming Learning

See Page Later Table List Det URL Sie Page st orTable Lt Détai URLS

Operation
Panel

Home |

+ Wizard Mode
00

‘The Ward Made & al for begnners You just Sm the orate m the war and
ont need to setup any extacton ries, But the Ward Node ony can be poled n
OO

2 omnes byte = pageation spores
3. Brrocr detaled web pape data when there 5a It of inks to ckk nto - papnaten

soportes

Seg page ator Table Lat.Detal

LAS |

Users pet to configure estacton mes wäh serie po and cckactons With
rope extraction rls, Octogarse cable of eractng any web pages

Sog Page ator Table Lst_Detat

URL ust

Operation
Panel

About Harvard Aémissions& Ad HARVARD

Gao News ents UNIVERSITY

y

Aou Harare

Harvard at a Glance

Harvard cance ‘Established
abe ety EN
6 vote fe Ge ad General Court a the Masse By Cl.

[Rente tron as ecc
rr ese

PRODUCTS SERVICES DEALS

a
pe aa
genera

RegEx Tool

À RegEx Tool Cae
Source Text | Auto Generate | Reference Lbrary | Samples
(1353 Reviews) © | RA start with [O] incude Start [(
| Mend vith [include End | Reviews
[1 Contains Once
| Generate |
Matches regular expression

FI (?<=\0(-+?)(?= Reviews)

LI Match Al Match || Appy

XPath Tool

R XPath Tool GoGo
|hetp://stackexchange.comy | Go || Auto Generate | Reference Lorary | Samples
2 Why would an encrypted file be -35% larger than an い Hem Tag Name [span
이: nn E
tem td
O tem name
団 When should a pilot use the word “takeoff?” @0 때 때
raco-communicatons ) (praseoogy )S answers asked 23s 09 by yan | ren =
O rien Text Contains
yy, Why did the original Game Boy have four colours? O tiem Text start With
7 (hardware ) ( game-boy ) gaming-history ) 2 answers | asked yesterday by Polyd
O... |
as
tnd | Matches |
Tag:SPAN Textinext | hd | [Parent | Previous | Next “Generate
XPath
//SPAN[text()=' next] -
1 tes) matched

CLO