Support for WP Content Crawler - Get content from almost any site, automatically!

turgutsaricam

turgutsaricam supports this item

Supported

This author's response time can be up to 5 business days.

Popular questions for this item

The plugin does not show its pages. I see empty pages after 1.6.x update.

The plugin requires PHP 5.6 starting from v1.6.0. Please change your PHP version to at least 5.6.

The plugin does not work.

If you could not find a solution from FAQ section, please send me an email through my profile page, including admin login credentials for your site in the email and a description of the problem. I may ask you to send me the debug.log file. If you don’t know anything about this file, first you need to enable WordPress’ debug mode. You can learn how to do it from here. Please make sure you enabled both WP_DEBUG and WP_DEBUG_LOG. So, you need to have these two in your wp-config.php file, which is in your main WordPress installation directory:

define( 'WP_DEBUG', true );
define( 'WP_DEBUG_LOG', true );

If there are other definitions for these two variables, please remove them. These need to be defined only once.

After you enable debug mode, please try to reproduce the error you faced, so that the error can be written to debug.log file. Then, please send me wp-content/debug.log file. If you do not have a debug.log file in wp-content directory, then there may be an error_log file in wp-content directory or main WordPress installation directory. If this is the case, please send me that error_log file.

The plugin does not show its pages and it gives an error like “Parse error: syntax error, unexpected ‘finally’ (T_STRING), expecting catch (T_CATCH) in …” or “Parse error: syntax error, unexpected ’.’ ...”

Please make sure your PHP version is greater than or equal to 5.6, and mbstring extension is enabled.

When I save general settings, an empty page is shown and settings are not updated.

Your server may have a request filter that checks the content of the request, and it may find a few settings in the general settings page unsafe. You can delete the values of “HTTP User Agent” and “HTTP Accept” settings to fix the issue. You don’t need to worry about these settings, since the default values will be used when they are empty.

I installed the plugin on a test site and I want to use the plugin with the same license code on another site. Can I do it?

First, you need to disable the plugin on your test site. Then, you are good to go. You can use your license code on another domain.

I configured settings of a site properly. The tester seems OK. I can also save a post manually. However, automatic crawling does not work.

Please make sure that there is no empty URL inputs in category map settings.

Automatic crawling works only when someone browses my site.

The plugin uses WordPress CRON, which only works when your site is loaded. You can replace WP CRON with a real CRON job. You can find several tutorials explaining this. For instance, you can check this one.

The CSS selectors for post content work when I test them. However, I cannot see the content when I use the tester to test a post page.

Please make sure you included [wcc-main-content] in Templates > Main Post Template.

I want to remove all of the links in post content.

You can achieve this using regular expressions. You can try this one. Please note that this regex may not work for you. If this is the case, you can use an online regex tester to create a regex and test it for your case.

I want to remove links from the images.

You can achieve this using regular expressions. Here is an example. The regex at the top can be used to remove links from the images. $1 in the replace box represents the image element. So, if there is a link with an image inside, the image will stay, while the link is removed. Please note that this regex may not work for you. If this is the case, you can use an online regex tester to create a regex and test it for your case.

Settings tabs’ contents are not shown in site settings page.

This issue is probably caused by another plugin/theme you use. Some of the plugins/themes do not write their CSS/JS codes in a way that the codes are only applied to pages of their plugin/theme. If this is the case, you can find that plugin/theme by deactivating your plugins/themes one by one and checking the site settings page after each deactivation.

Contents of settings tabs and other plugin pages are not shown.

Please make sure you are using at least PHP 5.6 and mbstring extension is enabled. In addition, please check if the following directories’ CHMOD is set to 755. If so, please change their CHMOD to 777 and try again. If not, please try 755 and 777 one by one, respectively.

/wp-content/plugins/wp-content-crawler/app/storage
/wp-content/plugins/wp-content-crawler/app/storage/cache

How to change PHP version and enable mbstring extension?

If you use cPanel, you can change your PHP version from there. The menus in the cPanel may differ depending on the hosting company. However, you can search for “PHP” in cPanel. There may be “PHP Configuration” or “Select PHP Version” menu. If they exist, you can use them to change PHP version and enable mbstring extension. If not, you can search on the internet to learn how to configure PHP for your hosting company. If you could not find how to do it, you can get in contact with your hosting company and ask them to tell you how to do it or to change the configuration for you.

Characters are not shown properly. I think there is a character encoding problem.

First, please make sure your server’s language settings are configured properly. If the problem is still not fixed, the site you are trying to crawl may have a meta tag that defines a charset other than UTF-8. For instance, there may be a meta tag like this one in the page’s HTML code:

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-9" />

As you can see, charset value is iso-8859-9, although we need it to be UTF-8. If this is the case, you can use Post > Find and replace in HTML at first load setting to replace the charset with UTF-8.

There are duplicate posts. (The plugin saves a post twice, or more than once)

The plugin uses target post’s URL to make sure the post is not saved before. Please make sure there are no variables in the target post’s URL. If there is no variable in the post URL, then, the URL might have been changed by the target site.

Let’s take this URL as an example: “http://domain.com/post.html?rank=10”. Here, “rank=10” is a variable, because it may change in the future. For instance, it may become “rank=11”. In this case, even if the URL is different, the post is the same. The plugin cannot understand that these URLs point to the same post. So, you need to remove variable part from the URL. For example, you need to remove ”?rank=10” part. Then, the URL becomes “http://domain.com/post.html”. In this case, even if the variable part of the URL is changed from the target site, the plugin knows that it is the same post. By this way, you can avoid duplicate posts.

Another case might be that the target post’s URL is changed, but its content is the same as before. For example, the URL might become “http://domain.com/post-2.html”. In this case, there is no way for the plugin to know that this URL points to the post that was saved via “http://domain.com/post.html”. So, this is directly related to the target site.

If your case is different than the cases above, there might a problem with your server, because the plugin makes sure that the same URL is not crawled more than once.

Embed, script and iframe codes are not shown properly, or not shown at all.

WordPress clears HTML codes like embed, script and iframes from the post content before showing them. This is a security measure and not related to the plugin. If you want to show these HTML codes, you can use “find and replace” options to replace these codes with proper short codes that are defined on your site. For example, if you want to retrieve embed code for, say, YouTube, you can replace the iframe code with a proper embed short code using a regular expression.

Below are some examples you can use:

To replace all iframe HTML elements with [iframe] short code:

First, install iframe shortcode plugin. This plugin will recognize our [iframe] shortcode. Then, you can use below find-and-replace options:
Regex: Checked
Find: <iframe\s([^>]*)><\/iframe>
Replace: [iframe $1]

Example iframe code: <iframe width="500" height="281" src="http://..." frameborder="0" allowfullscreen></iframe>
Example result: [iframe width="500" height="281" src="http://..." frameborder="0" allowfullscreen]

To replace all iframe HTML elements with [embed] shortcode (using “src” attribute of the iframe):

Regex: Checked
Find: <iframe.*?src=["']?([^"']+).*?><\/iframe>
Replace: [embed]$1[/embed]

Example iframe code: <iframe width="500" height="281" src="http://..." frameborder="0" allowfullscreen></iframe>
Example result: [embed]http://...[/embed]

I cannot crawl sites using SSL (https).

This is directly related to cURL. You need to supply a valid SSL certificate for cURL installed on your server. You can ask about this to your server support.

My PHP version is 7.1 and the plugin does not work properly.

Please downgrade your PHP to 5.6 or 7.0.

Where is my license key (purchase code)? How can I find it?

Please see here.

I get a cURL error and CRON does not work properly.

A few customers have reported an issue like this. This is not a problem related to the plugin. The customers got back to me with a solution. Apparently, if you alternate CRON, everything works OK. What you need to do is to add the code below to your wp-config.php file:

define( 'ALTERNATE_WP_CRON', true );

I’m using Newspaper 8 (or later) theme and CRON jobs (automated crawling) do not work. How can I fix this?

A lot of customers face this issue. The problem is caused by Newspaper 8 (or later).

1. The solution below is suggested by casualbabies.

In theme’s folder “includes/wp_booster/td_cake.php” about 549th line, replace below line:

function _schedule_modify_add_three_days() {

to

function _schedule_modify_add_three_days($schedules) {

2. You can try to increase PHP’s memory limit, save general settings of the plugin and check Dashboard page for the next CRON events. If next CRON event times look OK (e.g. “x minutes later“), then your problem is fixed. If that does not solve it, you can try to increase the memory even more and repeat the process.

3. If increasing the memory does not solve your issue, according to a few customers, downgrading your theme to a version below 8 might solve it. Downgrade your theme, save general settings of the plugin and check Dashboard page for the next CRON events. If next CRON event times look OK (e.g. “x minutes later“), then your problem is fixed.

4. If downgrading your theme version does not solve it either, please temporarily activate one of the official WordPress themes, save general settings of the plugin and check Dashboard page for the next CRON events. If next CRON event times look OK (e.g. “x minutes later“), then your problem is fixed and the plugin works properly. The problem was caused by your theme.

How can I save featured image stored in a “meta” tag or any other tag?

Let’s say the target meta tag is the following

<meta property="og:image" content="http://...">

The image selectors look at the value of “src” attribute. Hence, you can use “exchange element attributes” option to put the value of “content” attribute to “src” attribute.

Go to “Post > Manipulate HTML > Exchange element attributes” and enter the following:

Selector: meta[property="og:image"]
Attribute 1: src
Attribute 2: content
This will change the following code
<meta property="og:image" content="http://...">
to this
<meta property="og:image" content="" src="http://">

Now, you can enter “meta[property=”og:image”]” for “featured image selectors” or “image URL selectors”, or any other image selectors, if they exist.

Here, you can modify “meta[property=”og:image”]” selector according to the meta tag you are interested in. If the target HTML tag is not “meta”, then, again, you can modify the selector accordingly. The key thing is to have the image URL in “src” attribute of the target element.

Show more

Contact the author

This author provides limited support for this item through this item's comments.

Item support includes:

  • Availability of the author to answer questions
  • Answering technical questions about item’s features
  • Assistance with reported bugs and issues
  • Help with included 3rd party assets

However, item support does not include:

  • Customization services
  • Installation services

View the item support policy

by
by
by
by
by
by