+ - 0:00:00
Notes for current slide
Notes for next slide



Considerations about Web Data

Dr. Mine Dogucu

1 / 6

2 / 6

Do you need all that data at that speed?

Sampling rather than scraping all of the data may be an option.

3 / 6

Do you need all that data at that speed?

Sampling rather than scraping all of the data may be an option.

You may end up with HTTP Error 429 (Too many requests). In this case you may want to slow down your requests per a given time interval.

scrape_movie <- function(movie_url) {
Sys.sleep(runif(1))
#### Remaining code of the function
}

Before scraping each movie's page this would make system to sleep for a random number of seconds between 0 and 1 second.

4 / 6

Write your data (if possible)

  • Data online are not static.

  • Web pages change structures.

  • Only way of reproducing the same results may be from the csv files that you write.

5 / 6

Optional

Make use of beepr::beep(), this way when your code finishes running, you will be notified.

6 / 6

2 / 6
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow