Code

WP Content Crawler - Get content from almost any site, automatically!

WP Content Crawler - Get content from almost any site, automatically!

(76) 4.97 stars
1,064 Sales

Get content from almost any site to your WordPress blog, automatically!

FOR WHAT IT CAN BE USED

  • Create a personal site which collects news, posts, etc. from your favorite sites to see them in one place
  • Use it with WooCommerce to collect products from shopping sites
  • Collect products from affiliate programs to make money
  • Collect posts to create a test environment for your plugin/theme
  • Collect plugins, themes, apps, images from other sites to create a collection of them
  • Keep track of competitors
  • You can imagine anything. The internet is full of contents :)

WP Content Crawler - Get content from almost any site, automatically! - 1


QUICK START

WP Content Crawler - Get content from almost any site, automatically! - 2


HOW IT WORKS

It’s all about CSS selectors and you can learn how to use them in minutes by watching the introduction tutorial.

WP Content Crawler - Get content from almost any site, automatically! - 3 WP Content Crawler - Get content from almost any site, automatically! - 4 WP Content Crawler - Get content from almost any site, automatically! - 5


SEE IT IN ACTION, LEARN IN MINUTES

WP Content Crawler - Get content from almost any site, automatically! - 6WP Content Crawler - Get content from almost any site, automatically! - 7


VIDEO TUTORIALS

WP Content Crawler - Get content from almost any site, automatically! - 8
WP Content Crawler - Get content from almost any site, automatically! - 9
WP Content Crawler - Get content from almost any site, automatically! - 10


WP Content Crawler - Get content from almost any site, automatically! - 11WP Content Crawler - Get content from almost any site, automatically! - 12


MAIN FEATURES

Save every post detail
Title, excerpt, content, tags, categories, slug, date, custom meta, taxonomies, meta keywords, meta description, featured image, post images, status… Just everything.
Visual selector (visual inspector)
Just click to an element to find its CSS selector. You can also get alternative CSS selectors that you might be interested in. There is no need to leave your admin panel anymore.
 
Crawl (scrape, grab, save) posts
After the settings are configured, the plugin finds URLs of the posts and crawls them automatically in the background.

Recrawl (update) posts
Recrawl posts automatically to keep them updated all the time. You can limit how many times a post can be updated, set update interval, and ignore old posts.
 
Delete posts
You want to delete old crawled posts? The plugin can delete them automatically.




Control scheduling
You can set how many times URL collection and post crawling events should run each time for a site. For instance, you can save 3 posts every minute, or run URL collection 5 times every 2 minutes.
 

Save categories
The target category does not exist in your site? No problem. The plugin can create the target categories for you. Just define the CSS selectors that find category names. They can even be created as subcategories.
Save slugs (permalink)
You can define the permalink of the posts. You can get the permalink from the target site, enter custom text, and even create templates for the slugs by using short codes.
 
Save taxonomies
Save taxonomy values by retrieving them from the target site or entering manually. Saving details of custom post types is easier than ever.



Save posts into custom categories
A custom post type has custom categories? No problem. You can define custom category taxonomies used by the custom post type and select those categories when defining the categories of the post. The plugin can also create custom categories for you.
 
Custom post meta
Save anything as custom post meta. You can use a CSS selector or just type the value.


Content templates
Prepare post content, title, excerpt, list item and gallery item templates using short codes. Moreover, you can define templates for values of each CSS selector using the options box.
 
Alternative selectors
You can write alternative selectors to get the data even if the target site has post pages designed different from each other.




Find and replace anything
You can use plain text or regular expressions to find and replace anything. You can even modify the HTML of the page, create your own HTML elements and write selectors to use them. You can even change image URLs. You have the power.
 
Paginated posts
Target post has more than one page? No worries. You can save paginated posts as well.

List type posts
Some sites create posts with a list inside. You can extract the list from the post, create a template that should be applied to each list item and even reverse the list.
 

Remove unnecessary elements
Sometimes you need to get rid of some elements, such as advertisements, comments, you name it. Just write its CSS selector and it is removed.

Automatically insert category URLs
Target site has hundreds of categories? Piece of cake. Just write the CSS selector and the plugin will insert them for you.
 

Post types
Set post type. It can be a post, a page, a product, or any other post type available in your WordPress installation.

Remove links
You can remove links from the post. Just check the checkbox and the links are gone. That easy.
 
Password protection
You can set a password for the posts to show them only to the users who have the password.
Notes
You can add notes for yourself to remind you things about the site. CSS selectors, TODO list, anything.
 


Test everything on the fly
Test post crawling, URL collection, CSS selectors, regular expressions, find and replace options and proxies on the fly. You can also enable caching to conduct the tests much faster and reduce the requests sent to the target site.
Test all the settings of a site at once
Using the tester, you can test all options you configured in the site settings to make sure everything works as you want before enabling automatic crawling.
 
Tools
Using the tools, you can save posts manually with their URL, recrawl posts with their ID or delete already-saved URLs.
Custom general settings for each site
You can provide custom general settings for each post to override them and make them suitable for a site.
 
Post status
You can directly publish the saved posts or keep them as draft to check them before publishing.
Save all images in post content
Saving all images in the content of the post is as easy as checking a single checkbox.
 



Save images as gallery
You can save the images in the target page as gallery and provide a template for each image to make it suitable for the gallery library that you use on frontend. You can also save the images as WooCommerce gallery by just checking one checkbox.
Any data as short code
Get anything from target page as a short code and use the short codes in the plugin’s templates to place any data anywhere you want.
 
Proxy
Use a proxy or proxies to get content from the sites to which your IP does not have access.

Cookies
Attach cookies, such as session cookies, to each request. By this way, for example, you can crawl the target site as if you are logged in.
 
Crawl as many posts as you want
You can set how many times post crawling or URL collection CRON events should run. By this way, you can, e.g., save 100 posts every minute. Just be careful and consider your server’s capacity.
Email notifications
Set CSS selectors whose values should not be empty for category and post pages. When an empty value is found using those selectors, you can get an email notification.
 
Get data from JSON
When you enable JSON parsing for a CSS selector, you can get the values from the JSON easily.


Advanced HTML manipulations
Find-replace in response HTML, find and replace in element attributes, exchange element attributes, remove element attributes, manipulate HTML of an element, remove HTML elements…
 


Automatic translation
Use the artificial intelligence of Google Cloud Translation API or Microsoft Translator Text API to automatically translate the posts. Note that these are paid services.
Actions and filters
If you are a developer, you can use actions and filters to extend and modify the plugin.
 


Duplicate post check
Check duplicate posts by URL, post title and/or post content. If you are using WooCommerce, products whose SKU already exists are considered as duplicate and they will not be added to your site.
Scheduled posts
You can add/remove minutes to/from the post date. By this way, you can schedule post publishing.
 

Save WooCommerce products
Save price, inventory, shipping, attributes, and advanced options. You can save the product as a simple or an external product. You can also set downloadable file options and define the product as virtual. The options are available for WooCommerce versions greater than or equal to 3.3.
Options box
You have the control! Define many options for the values found by a CSS selector. The options include find-replace, calculation, template, and JSON parsing settings. You can easily import/export the options defined in the options boxes as well.
 
Handle files like a pro
Rename, copy, and move saved files easily. You can also define title, description, caption, and alt texts for the saved media files using templates in which you can use any short code. It is also possible to give random names to the saved files.

Handle iframes and scripts like a pro
WordPress does not allow showing iframes and scripts since they pose a security risk. You can turn iframe and script HTML elements into short codes by just checking a checkbox. The short code will show iframes and scripts from the allowed source domains defined by you.
 
Quick save
With quick save button, you can save the settings much more quickly. No need to wait for page to reload.

Regular expressions
Define regular expressions in find-replace options to find-replace anything. You can also use delimiters and modifiers to match more precisely.
 

Save “srcset” attributes
When alternative sizes of the saved images are available, the plugin assigns them into srcset attribute of img elements so that your pages will load faster in different screen sizes.


Save “alt” and “title” attributes
When you save images, their “alt” and “title” attributes are automatically retrieved from the target site and assigned to the saved media. You can also define templates for them to apply your SEO strategies.
 
Warnings
Learn when there is a problem. The plugin will show you the details of the error so that you can fix it right away.


Handle character encoding problems
The plugin is able to handle different character encodings, even if the target site contains mixed encodings. You can convert the encoding by checking a single checkbox.
 
Navigate between settings easily
Fix navigation to the top! The plugin stores where you were before switching to a new tab and restores your previous location when you activate that tab again. No more getting lost among the settings.

Manual crawling tool
With manual crawling tool, save multiple posts by entering their URLs. You can also enter category URLs so that the tool can get post URLs from there. Moreover, you can set it to crawl multiple posts at the same time.
 




Add URLs to the database
The plugin collects URLs automatically. However, if you want it to crawl only certain URLs, you can add them to the database manually using the manual crawling tool. By this way, the specified URLs will be crawled using your scheduling options, automatically.
Enable/disable automatic crawling for a specific site
You can enable or disable automatic crawling for each site individually.
 

Import/export
You can import and export site settings easily. Just copy and paste the code created by the plugin.
Unlimited
Add unlimited sites and activate how many of them you want.
 


Detailed dashboard
See what’s going on in the background. Active sites, number of posts crawled, number of posts updated, last crawled and updated posts, last added URLs, last and next run of CRON events, currently being saved posts and URLs…
Get updates from your admin panel
You can update the plugin with just one click whenever an update is ready. Just go to your updates page in your admin panel.
 
Use the most secure PHP
The plugin supports the latest versions of PHP.
Use the most modern browsers
The plugin supports Chrome, Firefox, Safari, Opera, and Edge.
 
Online documentation
You can check the online documentation whenever you feel a need.

Quick guides right next to the settings
Each setting in the plugin has a quick guide that will help you understand what each setting does.
 
Video tutorials
Watch video tutorials to easily learn how to use the plugin.
Ready to translate
You can translate the plugin into your own language using Poedit.


Requirements PHP >= 5.6, mbstring
Tested with WP versions 5.0.3, 4.9.4, 4.8.2, 4.7.2, 4.6.1, 4.5.3, 4.4.2, 4.3.3, 4.2.7, 4.1.10, 4.0.10, 3.9.11
Tested with WooCommerce versions 3.5.3, 3.4.7, 3.3.5
Languages English, Türkçe, Français (partial)


HAPPY CUSTOMERS :)
WP Content Crawler - Get content from almost any site, automatically! - 13


WP Content Crawler - Get content from almost any site, automatically! - 14WP Content Crawler - Get content from almost any site, automatically! - 15


CHANGELOG

v1.8.0 - 1 January 2019
* New: Save WooCommerce product details much more easily using the options specifically defined for WooCommerce products. To see the options under Post tab of site settings, just select the post type as "product" either in General Settings or by defining custom general settings. The options are available for WooCommerce versions 3.3, 3.4, and the latest one, 3.5.
* New: Save categories.
* New: Save post slugs.
* New: Save taxonomy values.
* New: Save the posts into custom post categories. You can define custom post category taxonomies in general setings so that you can select the custom post categories when saving a post.
* New: Options Box. For the settings that have Options Box button, you can define several settings for each item found by given CSS selectors. Options Box contains find-replace, calculation, and templating options. You can also take notes. It also allows you to use JSON values in calculations and templates.
* New: Rename, copy, and move saved files. You can also define title, description, caption, and alt texts for the saved media files using templates in which you can use any short code. It is also possible to give random names to the saved files.
* New: Recent tests in Site Tester page. You can now repeat your previous tests easily.
* New: Caching responses of test URLs in site settings page. You can now configure the settings faster and send less number of requests to the target site.
* New: Replace iframe and script HTML elements with short codes by just checking a checkbox. The options are available under Templates tab.
* New: Quick save button in the site settings. Now you can save the site settings faster (much faster).
* New: You can use delimiters and modifiers for regular expressions in find-replace options.
* New: Sets srcset attribute values of img elements of the saved image files in the templates when different sizes of the images are available.
* New: Warnings. When there is an error, you will get a warning showing the details of the warning/error.
* New: Saves "alt" and "title" values of media items when they are saved as attachments.
* New: Adds "wpcc/post/settings/meta-key-defaults" filter that you can use to set default values of site settings.
* New: Convert character encoding to UTF8 when target page's HTML has a different encoding. You can enable the option under General Settings > Advanced.
* New: You can now navigate between tabs and settings much more easily when you activate fixing tabs and content navigation under Main tab of site settings.
* New: Adds "find and replace in raw response HTML" option for post and category settings. Using this, you can fix HTML errors that prevent the plugin from being able to parse the HTML code.
* New: Adds "wpcc/bot/response-content" filter that can be used to manipulate raw response content.
* Improvement: Manual crawling tool has been redesigned. Now, you can manually crawl multiple URLs or insert post URLs to the database so that they can be crawled later. You can also perform parallel crawling. Moreover, you can recrawl the posts directly from the manual crawling tool.
* Improvement: When testing your settings in the site settings page, all manipulations defined in your settings will be applied. By this way, you can conduct more robust tests and figure out the cause of a misbehavior more easily.
* Improvement: Short code buttons now contain custom short codes defined by you.
* Improvement: The files that are saved when testing are now deleted from the file system after the test.
* Improvement: Shows all types of saved posts, including custom post types, in the dashboard.
* Improvement: Uses the HTTP user agent defined in the settings when saving media.
* Fixes: You can now enter cookies without decoding them. Just copy and paste the values retrieved from your browser.
* Fixes: In "Find and replace in custom meta" option, only one replacement was applied to each meta key. Now, all replacements will be applied sequentially.
* Fixes: Images having "&" symbol in their URLs are not saved properly.
* Fixes: Invalid chars coming after the file's extension in the file's URL (such as png:s) cause the files not to be saved with the right extension.
* Fixes: When there were no API keys for translation services and the translation was active, a fatal error were shown. Now, it is handled silently.
* Fixes: Scroll animation does not work.
* Fixes: When testing find and replace settings for custom short codes, all test data options are required although one of them is enough to perform the test.
* Fixes: The plugin causes the text editor in "Edit Page" page to be double.
* Fixes: Relative URLs should be resolved automatically.
* Tested and works in PHP 5.6, 7.0, 7.1, and 7.2 and in Chrome, Firefox, Opera, Safari, and Edge.
* Updates limits and API versions of Google's and Microsoft's translation services.
* Updates third party libraries.
* Other small fixes and improvements.
* No longer supports Internet Explorer.

v1.7.0 - 22 October 2017
* New: Translate posts automatically by using Google Cloud Translation API or Microsoft Translator Text API.
* New: Randomize proxies. By checking this option, you can make the plugin randomly order the proxies you entered.
* New: Over 50 filters and actions are added. If you are a developer, you can now use these to extend the plugin however you like.
* Fixed: The proxies were used when there was an error getting the target page's source code. Now, they are always used, even when testing.
* Fixed: Plugin's pages were not shown properly with PHP 7.1.
* UI and UX improvements.

v1.6.0 - 4 March 2017
* New: Date selectors.
* New: Add/remove minutes to/from the post date. You can schedule post publishing by this way.
* New: Scheduled post delete.
* New: Duplicate post checking via URL, title and/or content.
* New: More HTML manipulation options: exchange element attributes, remove element attributes, find and replace in element attributes, manipulate HTML of an element.
* New: Find and replace in custom short code and custom post meta content
* Improvement: More counts are shown in site listing.
* Improvement: Save all images in the post content by checking a single checkbox.
* Improvement: Reorder settings that can have multiple values.
* Improvement: If the main template is empty, it will be considered as it contains [wcc-main-content] shortcode in it.
* Improvement: An option to always use UTF8 encoding.
* Improvement: Load general settings with a button when you are overwriting them for a site.
* Improvement: Settings are grouped and reordered for better navigation.
* Improvement: Auto refresh the dashboard every few seconds.
* Improvement: Track CRON events and the next sites that will be processed by the CRON events in the dashboard.
* Improvement: Better notifications for the required settings when performing a test.
* Improvement: Auto find for next page URL, post date and post title in DEV tools.
* Improvement: Remove elements using a CSS selector in DEV tools. This can be used to remove blocking elements to better select the elements you want.
* Fix: Sometimes thumbnail images and post URLs did not match when category pages were crawled.
* Fix: When importing site settings, form validation should not be performed.
* Small bug fixes and improvements.

v1.5.1 - 7 February 2017
* New: Dashboard. See what's going on behind the scenes.
* Bug fixes and improvements.

v1.4.1 - 27 January 2017
* Fixed: URLs in the queue should be saved uniformly according to their categories.

v1.4.0 - 26 January 2017
* New: Post recrawling. Recrawl posts to update them regularly.
* New: Proxy tester. Test if your proxies work correctly.
* New: Cookies. Attach cookies to every request that is made to the target site.
* Removes Lodash.
* Small bug fixes and improvements.

v1.3.0 - 14 January 2017
* New: Visual inspector
* Fixed: Assets are not loaded on Windows servers.
* Fixed: "General settings" link on plugins page does not work.
* Fixed: Plugin does not crawl all active sites when there are more than 10 active sites.

v1.2.0 - 30 August 2016
* New: You can now use proxy.
* New: Set connection timeout in seconds.
* New: Post title and excerpt templates in which you can use custom short codes.
* New: Find and replace in custom short code data.
* New: Maximum number of categories that can be added automatically via CSS selectors to the category map increased.
* New: Add custom post meta without a selector.
* New: You can set how many times URL collection and post crawling events should run each time for a site. For instance, you can save 3 posts every minute, or run URL collection 5 times every 2 minute.
* New: You can collect post URLs in reverse order for each category page.
* New: Remove links from all short code data. This will not touch the links manually added to the templates.
* New: Notifications. You can now set CSS selectors whose values should not be empty for category and post pages. When an empty value is found using those selectors, you can get an email notification.
* Fixed: Downloaded file's name does not have a proper file extension if the file on the target site is generated dynamically.
* Fixed: Crawling stops if there is a request exception.
* Fixed: Crawling stops if target page's HTML could not be retrieved.
by
by
by
by
by
by