2655 comments found.
It’s a shame about the Cloudflare block… Unfortunately, I can’t find any proxy to use with the plugin to continue crawling websites 
Hi,
I can’t seem to crawl this page – https://eurohockey.org/news#news
Can you check to see if this is possible, please, and point my in the right direction.
Many thanks
Hi,
You can see this page to learn more on how to understand if you can crawl the content of a site. Additionally, if you want to go the JavaScript rendering route, which is to use a proxy service that is capable of rendering JavaScript and sending the rendered page as static HTML, after you find a proxy that is capable of doing that, you can make the plugin use that proxy by configuring the proxy settings.
Hi, I have two questions that I can not see how to do.
Date – How do I set the date as the original crawled content date, not the current date? When crawling I am receiving multiple posts in my blog from the same site at the top and not in any date order.
Second question – Can it be set so it only crawls and post the last 4 articles (if new). I have set it to crawl 1 page, but it’s still posting lots of posts from teh site and not only the new ones.
Hi,
If you do not set any date selectors, the publish date of the post will be the date and time when the post is crawled. So, you can simply skip the date selector setup to use the crawling date as the publish date of the post.
If you already set it to crawl only the first page, you can configure the post URL selectors in such a way that only the URLs of the first 4 articles of the category page is found. For example, you can use `:nth-child(-n+4)` pseudo class to select the first 4 elements, e.g. `ul > li:nth-child(-n+4) > h3 > a`.
The database is becoming overloaded when we have more than 200k posts. Is there any way to optimize this?
Hi,
If you are looking for an option in the plugin, there is no such option to improve your database performance, unfortunately.
Hi,
I want the domain currently registered to my license key to be removed.
My license key is
Thanks
Hi,
Please send your license key by using the contact form on my profile page.
Hello,
I’m contacting you to request assistance with the Content Crawler plugin for WordPress.
Hi,
I replied to your email.
Hello Support Team,
We are currently managing a large-scale aggregation project and have identified two critical logic behaviors in WP Content Crawler that are affecting data integrity. We are writing to report these as bugs/logic flaws and to inquire if a fix or “Strict Mode” is planned for an upcoming release.
Issue 1: The “Current Date” Fallback is Corrupting Archival Data We have observed that when the plugin fails to parse a date selector (i.e., when it returns null or false), the system automatically defaults to current_time().
Current Behavior: If a configured selector like meta[itemprop=”datePublished”] finds no match on a page, the post is saved and published with today’s date and time.
Desired Behavior: We require a setting to SKIP the post or set status to Draft if the date selector fails to find a value.
Context: For archival projects, no data is better than incorrect data. Defaulting to “now” destroys the chronological accuracy of the archive.
Question: Is there a filter (e.g., wcc_post_date or similar) where we can return false to abort the save if the date is empty? Or do you plan to add a “Require Date” checkbox in the settings?
Issue 2: Soft 404s Being Scraped as Valid Posts The plugin appears to process pages that return HTTP 200 OK but contain “Page Not Found” content (Soft 404s).
Scenario: A source URL is dead/removed. The target site redirects to a custom 404 page which returns a 200 OK status code.
Failure Mode: The crawler loads this 404 page. It fails to find the article content/date. However, because of the fallback logic mentioned in Issue 1, it assigns “Today’s Date” and saves the “Page Not Found” error message as a valid published post.
Question: How can we configure the crawler to strictly validate that specific selectors (like Date or Content) MUST exist? If they are missing, the crawl for that specific URL should be marked as “Failed” rather than attempting to save what it found.
Technical Request: We are avoiding sharing specific source URLs for privacy, but this behavior is reproducible on any WordPress site that:
Has a meta date tag on valid post pages.
Does not have that tag on 404 pages.
Returns 200 OK for 404 pages (common in many themes).
Could you please clarify if a patch for this fallback behavior is on your roadmap, or if you can provide a snippet to enforce strict date parsing?
Thank you for your assistance.
Hi,
You can set the status of the post via the filters conditionally. For example, you can check if the value of an element does not exist, and, if it does not exist, you can change the post status. You can also check if the text of the element contains/doesn’t contain a value and many more other things. Watching the introduction tutorial of the filters feature is a good starting point. You can also read the documentation of the filters feature here. Here is an example for your “publish date” case: https://ibb.co/V0WNs7pp Additionally, you can find the documentation for all the available filter commands here, which should give you an idea of what can be achieved with the filters.
hocam merhaba, çok eski bi müşterinim. takıldığım çözemediğim bi konu var, b2b bayi sayfasından ürün verilerini çekmem gerekiyor, fakat şifreli bi alan olduğu için bu alana giriş yapabilmem için yapmam gereken ayarları bir türlü bilemedim. cookie eklemeyi denedim ama başarılı olamadım. sorunumu anlatabildim mi tam bilemiyorum. yardımcı olabilir misiniz çok teşekkür ederim.
Merhaba,
Çerezleri eklemeyle ilgili şurada bir anlatım mevcut. Belirtilen adımları takip ederek hedef sitedeki tüm çerezleri site ayarlarına ekleyebilirsiniz. Ek olarak, sayfanın alt bölümünde açıklanan “tüm request headerları içeri aktarmak” bölümünde belirtilenleri de yapmanız gerekebilir.
Message: The license could not be checked with the server. Please try saving your license settings again in a few minutes. If the error persists, please contact the developer.
The plugin was running. Suddenly stopped. Domain name, https://lyrics.arnlweb.com/
Hi,
Could you please send the rest of the error message as well? The plugin’s license server works without any issues.
The license key entered for Content Crawler is not valid or it could not be checked. You did not verify your license. The features are disabled. Please verify your license to continue using Content Crawler. (Current domain: lyrics.arnlweb.com)
Message: The license could not be checked with the server. Please try saving your license settings again in a few minutes. If the error persists, please contact the developer.
after wp debug enabled:
Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the wp-content-crawler domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /home1/axomhutb/public_html/lyrics.arnlweb.com/wp-includes/functions.php on line 6131
The debug message is not related to the license-checking issue. It is just a notice, meaning that it does not prevent any functionality from being executed correctly.
About the license-checking issue, the message says that your server cannot connect to the license server, which is hosted at wpcontentcrawler.com. Since the license server is up and running, and you are the only one having this issue, it is probably related to your server and/or WordPress installation, not the plugin. I suggest to get in touch with your server provider and ask them to check if the server is capable of connecting to wpcontentcrawler.com. Sometimes, the firewall mechanisms might prevent that connection. If the server provider ensures that there is no connectivity problem, the issue is probably related to another theme/plugin that prevents the connection. In that case, you can see this page to learn how to find that plugin/theme.
Please check this, https://chatgpt.com/share/69576d28-30cc-8009-84dc-cc28528dd5a0
It says “nothing is broken”, which agrees with my message.
error: This license has reached its domain limit and is not valid for this domain. Registered domains: gpl.arnlweb.com
my new domain: lyrics.arnlweb.com
Please send your purchase code via the contact form on my profile page so that I can remove the domain registered to it.
My license doesn’t work on any other site. I need help resolving this issue. Please help me fix it and get the extension working again.
Hi,
The plugin’s license server is up and running without any issues. Also, you are the only one having this issue, which suggests that the issue is caused by your setup, not the plugin.
You can first make sure that your site can connect to wpcontentcrawler.com, to eliminate any connectivity issues. If your site can connect to wpcontentcrawler.com, the problem is likely caused by another third party software (your theme or other plugins). In that case, you can see this page to learn how to find out which software causes the issue.
I want to collect all articles from the website https://dienthoaivui.com.vn/. It uses loadmore. Can your product be collected?
Hi,
It is possible to configure the plugin to make subsequent requests to retrieve more information from other URLs, but it requires technical knowledge about how the websites and AJAX requests work. So, if you do not have such technical knowledge, it is hard to do, unfortunately.
I’m getting this error with php 8.1 and the last WP:
Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the wp-content-crawler domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /home/xxx/xxx/xxx/public_html/wp-includes/functions.php on line 6121
The plugin stopped crawling and I can’t access the Dashboard either.
Hi,
Some of your other plugins’themes might be triggering something too early, which causes this problem, as the plugin has already been successfully tested in several WordPress and PHP versions, including PHP 8.1, before release. You can see this page to learn how to find out what third party software causes this: https://docs.wpcontentcrawler.com/troubleshooting/fixing-problems-caused-by-other-plugins-themes.html
Merhaba,
Elimde lisanslı WP Content Crawler Pro (codecanyon.net) sürümü mevcut. Ancak JavaScript ile yüklenen dinamik içerikler (örneğin beko.com.tr sayfalarındaki fiyatlar) crawler tarafından çekilmiyor.
“Test via: Rendered / Both” seçenekleri ve “Manipulations” aktif olmasına rağmen, sayfada fiyat bilgisi HTML’e düşmüyor. Bu nedenle “script[type=’application/ld+json’]” veya “.beko-price” gibi selector’larla veri yakalanamıyor.
Rica ederim şu konularda yardımcı olabilir misiniz:
Headless browser (JS rendering) özelliği mevcut sürümümde aktif mi?
Eğer değilse, JavaScript rendering veya puppeteer / chromium modülü nasıl etkinleştirilebilir?
“Rendered test” penceresinde dynamic price elements’in görünebilmesi için özel bir ayar, proxy veya eklenti modülü gerekiyor mu?
Teşekkür ederim,
Chatgpt ile çalıştım onun mesajı:)
Merhaba,
Eklentinin yalnızca bir çeşidi var ve yalnızca CodeCanyon üzerinden satılıyor. Eklentide “headless browser” entegrasyonu mevcut değil. JavaScript çalıştırmak istiyorsanız, JavaScript çalıştırabilen bir proxy kullanabilirsiniz.
Merhaba; Bu sitede normal fiyat ve olizli fiyat alamıyorum. Ayrıca daha önce fotoğraf içinde yazmıştım. Bu sorunu çözebilirsek bir çok lisans alacağız baylilerle ortak.. Lütfen bu konuda pratik çözüm verebilirmisiniz yada bir güncelleme. Çünkü bir çok siteden çekim yaptım sorun yok. Ama bu tarz sitelerden yapamıyorum. Özellikle beko sitesi.
Merhaba,
Emailde bu konuyu zaten şu şekilde açıklamıştım:
Sitede `img` elementlerinin `src` öznitelikleri tanımlanmadığı için, görseller doğru yüklenemiyor. Site, sayfa yüklendikten sonra JavaScript ile `src` özniteliğini tanımlıyor. Eklenti JavaScript çalıştıramadığı için, bu işlemi eklentinin HTML manipülasyonu seçeneklerini kullanarak yapmanız gerekiyor.
Sayfanın kaynak kodunu incelerseniz, `img` elementlerinin `data-srcset` özniteliklerinde görsel URL’leri bulunuyor. Bu URL’lerden ilkini, `src` özniteliğinin değeri olarak belirlemek için ben şöyle bir ayar yaptım: https://ibb.co/1YtF1kzw Şuradan görülebileceği gibi ( https://ibb.co/MxZv6rjB ), görsellerin URL’leri doğru bir şekilde alınabiliyor.
Eklentinin ayarlarının yapılması destek kapsamında olmadığı için, eklentiyi satın alıp ayarları yapamamanız durumunda ne yazık ki bu şekilde ayarları ben yapmıyorum. Bunu yalnızca bir örnek olarak gönderiyorum. Daha sonraki mesajlarınızı CodeCanyon’daki profil sayfamdaki iletişim formunu kullanarak gönderirseniz, süreci takip etmem daha kolay olacaktır.
Critical PHP TypeError in WP Content Crawler v1.15.0 – Plugin Conflict with MyListing Theme
Dear WP Content Crawler Support Team, I’m experiencing a critical issue with your plugin that causes fatal errors and conflicts with my theme functionality. I would appreciate your assistance in resolving this problem.
When WP Content Crawler is active, I encounter a PHP TypeError that breaks the “Related Listing” dropdown functionality in MyListing theme. The dropdown shows “The Results could not be loaded” instead of displaying the listing options. When I deactivate WP Content Crawler, the functionality works perfectly.
Error Details: From my debug.log, I can see the following critical error: PHP Fatal error: Uncaught TypeError: strtolower(): Argument #1 ($string) must be of type string, array given in /wp-content/plugins/wp-content-crawler/app/Utils.php:1176 Stack trace: #0 /wp-content/plugins/wp-content-crawler/app/Utils.php(1176): strtolower() #1 /wp-content/plugins/wp-content-crawler/app/RequirementValidator.php(58): WPCCrawler\Utils::isPluginPage() #2 /wp-content/plugins/wp-content-crawler/app/WPCCrawler.php(58): WPCCrawler\RequirementValidator->validateAll() Additional Notices The plugin also triggers multiple early loading warnings: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the wp-content-crawler domain was triggered too early.
Environment Information WordPress Version: 6.8.3 PHP Version: 8.4 WP Content Crawler Version: 1.15.0 Theme: MyListing (based on JobListing)
Steps to Reproduce Activate WP Content Crawler plugin Go to Listings → Edit any listing Try to use the “Related Listing” dropdown field Error occurs: “The Results could not be loaded” Expected Behavior The Related Listing dropdown should load and display available listings via AJAX request. Temporary Workaround Currently, I have to deactivate WP Content Crawler to use the Related Listing functionality. Request Could you please: Investigate this TypeError in Utils.php line 1176 Check why the plugin triggers translations too early Provide a fix that prevents this conflict with AJAX requests I’m available to provide additional information, screenshots, or perform testing if needed. Thank you for your attention to this matter. Best regards, Roman https://disk.yandex.ru/d/YRE2oeTsF7jHtAHi,
Thanks for your report. To fix this, you can do the following. Open `/wp-content/plugins/wp-content-crawler/app/Utils.php` file and find the line 1176, which is:
return isset($_GET['post_type']) && strtolower($_GET['post_type']) === strtolower(Environment::postType());Now, replace it with the following:
return isset($_GET['post_type']) && is_string($_GET['post_type']) && strtolower($_GET['post_type']) === strtolower(Environment::postType());
The error should be gone after that. I will include this fix in the next release of the plugin.
This works, thanks for the quick fix!
I am glad it worked. Thanks for letting me know!
“Thank you for your help with “Related Listing”! One last question. I’m trying to parse a site where phone numbers are hidden behind a ‘Show’ button. Even with proper cookies/auth, the parser sees only: ‘8 800 10… Show’ instead of the full number. Current situation: Main page: https://optlist.ru/company/uyutnosti/ (basic info) API endpoint: https://optlist.ru/company/uyutnosti/phone_json (full phone) Question: Is it possible to configure the crawler to: Get main content from the primary URL Simultaneously fetch specific data (like phone) from a secondary API endpoint? Or simulate clicking ‘Show’ button before parsing? The phone JSON returns: {Phones 800 101-49-41”],Fax“} Any solution for this multi-URL parsing scenario?” Best regards, Roman
You can use the `Make` command to make requests to other URLs and include their response into the current page. This video explains how to use the command: https://www.youtube.com/watch?v=VR3rh6DK8_E
Taxonomy terms not splitting after Find & Replace
Hello!
I’m trying to import movie genres using WP Crawler from a site where genres are listed like this: Триллер, Фантастика, Приключение, Боевик
Here is my setup:
Selector: .stat.wborder li:nth-child(6) span.value
Attribute: text
Custom field: raw_movie_genre
Taxonomy: movie_genre
Delimiter: ,
Find & Replace: added to remove | or |, at the beginning (regex like ^\|\s*)
The issue: Even after Find & Replace, the genres are imported as one single taxonomy term instead of being split into separate terms.
The preview shows correct output (clean list of genres), but taxonomy splitting does not happen.
I’ve tried multiple combinations, including different selectors, delimiters, and regex variations — nothing helps.
My question: Does WP Crawler apply Find & Replace before splitting taxonomy terms? If not — is there any way to fix it? This functionality is critical for my use case.
Thanks in advance for your help!
Hi,
When you use the test button (the button with magnifier icon on it) of the taxonomy rule you added, do you get multiple results?- If you get multiple results, and you want to add each of them, you need to tick the “Multiple” checkbox of the taxonomy rule.
- If you do not get multiple results, you need to separate the comma-separated list into different HTML elements, such as:
<span>Триллер</span> <span>Фантастика</span> <span>Приключение</span> <span>Боевик</span>After that, you can use `.stat.wborder li:nth-child(6) span.value > span` CSS selector to find all of the items. Again, you need to check the “Multiple” checkbox to retrieve all the values.
By the way, could you please elaborate on how you set the delimiter as a comma?
Hello!
I have two different site settings inside WP Content Crawler:
One crawler imports movies.
Another crawler imports collections (lists of movies).
The problem: WP Content Crawler saves all posts into the same post type, because the “Post Type” option is global for all crawlers.
Question: Is it possible to assign a different post type per site settings in the latest version? For example:
Crawler 1 → post type movie
Crawler 2 → post type collection
If this is not possible, is there any filter/hook I can use to change the post type dynamically based on the source URL?
Example: If the source URL contains “/spisok/”, I want to automatically change the created post to post type “collection”.
is the latest version 1.15.0?
Thank you!
Yes, you can use custom general settings for a site: https://docs.wpcontentcrawler.com/guides/using-custom-general-settings-for-a-site.html
No, the post type cannot be dynamically set, unfortunately.
Yes, the latest version is 1.15.0.
Hi!
We’re using Content Crawler WP to parse a website with nested categories, for example: For boys → Cars → BMW.
Categories and subcategories are created correctly, but there’s one issue — the main category (“For boys”) is not being saved as the primary category (checkbox not checked).
We’d like to configure this directly inside the plugin, without using custom hooks or additional PHP code.
Currently, in the Category settings we’re using this selector for breadcrumbs:
ul#breadcrumbs .b-item:nth-child(n+4) a
The following options are enabled:
✅ Add all found category names
✅ Add as subcategories
Everything else works perfectly — categories and subcategories are correct — but the main category is not marked as primary in WordPress.
Could you please tell us how to properly configure the plugin so that Content Crawler automatically sets the first found category as the primary one?
Thank you in advance!
Hi! Could you please tell me what you mean by a “primary category”?
By “primary category” I mean the main WordPress category assigned to the post — the top-level category in the hierarchy.
In our case the structure is like: For boys → Cars → BMW.
Content Crawler correctly creates and assigns Cars and BMW, but the top-level category (“For boys”) is not checked/assigned to the post, even though the subcategories belong to it.
We are not talking about SEO plugins or SEO primary category — only the standard WordPress category assignment (the checkbox in the Categories meta box).
Is there a way to configure Content Crawler so that the top-level category from breadcrumbs is also assigned to the post automatically
Are you sure the top-level category is found by the CSS selector you defined? What do you see as the categories in the Tester page when you test a post? The plugin adds all the found category names hierarchically. It does not remove the top-level category.
Hi!
Yes, I’m sure the top-level category is found by the selector.
In the Tester page I can clearly see all breadcrumb levels, for example: For boys / Cars / BMW.
So the selector does find the top-level category correctly.
However, after the post is saved, only Cars and BMW are assigned to the post. The top-level category (For boys) is not assigned / not checked in the WordPress Categories meta box.
So the issue is not that the top-level category is missing from the selector results — it is found, but it is not applied to the post.
Is there any option or internal logic in Content Crawler that controls whether the top-level category from the found hierarchy is also assigned to the post, not only its child categories?
Could you please import your settings to the demo site so that I can check?
Hi, I created something called “test test test.”
I added a screenshot, so you can see what I’m talking about categories. https://prnt.sc/QM4dEAfjEBBeIf that top category’s checkbox is checked, doesn’t it mean that the post belongs to all the categories that are hierarchically under that top-level category? In your screenshot, the top-level category that is marked with an arrow contains 5 categories, two of which are assigned to your post. If your post was assigned to all those 5 categories, only then the marked top-level category’s checkbox would be checked, indicating the post belonged to all the sub-categories of that top-level category. Additionally, the information that the selected categories belong to the marked top-level category is already there, as you can see that the selected categories are under that top-level category, indicated by indentation.
- Check `Post Tab > Category Section > Do not add the category defined in the category URLs?` setting’s checkbox. This will make the plugin not assign the top-level category selected in the `Category Tab > Category URLs` setting.
- Update `Post Tab > Category Section > Category Name Selectors` setting so that it finds the top-level category in the page.
This requires the top-level category information to be available in the target post page. If that is not available, you can add any element to the page by using the settings. For example, this screenshot shows a filter that is used to add an element inside the `body` element, which then you can select it via `#my-top-level-category` CSS selector.
Thanks for your help, the second option helped me
I am glad to hear that. Thanks for letting me know.
Hi Is there a way to scrape/parse from csv files? Can’t find a plugin that can do that, and it would be the best of two worlds.
Especially if we could choose exactly which parts of the csv file that should be parsed and which parts that should be directly imported.
That would really help to get full description of the product or post during import from a csv…
Thanks!
Hi,
Sorry, no, the plugin is not designed to process CSV files, unfortunately.
Would you consider adding that function? We affiliated marketers would be very happy, cause our biggest challenge is to be able to get the original product description(which are rarely included in the csv files) to use in AI API to create new unique content and at the same time use the CSV for convenient data updates etc. I believe there is an untouched market for that function 
Anyway, thanks for getting back to me!
I see. Isn’t there a WooCommerce plugin that can be used to update all the existing products by using data from a CSV file? If so, you can crawl the products via WP Content Crawler, with their SKUs, and then use such a plugin to update them. I am not too familiar with the plugins available to WooCommerce in that area, unfortunately.
Can I use it without openAI or any API?
Hi,
Yes, you can.
Hello! Your plugin has been working great for a whole year, but recently we started seeing a message saying the license couldn’t be verified. This may be due to partial internet blockages in Russia.
Please tell me what needs to be done to get it working again.
Screenshots of the issue: https://ibb.co/pr9stNK2 https://ibb.co/8gw1yWYPPurchase information: LICENSE CERTIFICATE : Envato Market Item
This document certifies the purchase of: ONE REGULAR LICENSE as defined in the standard terms and conditions on Envato Market.
Licensor’s Author Username: turgutsaricam
Licensee: Denis Babasinov
Item Title: WP Content Crawler – Get content from almost any site, automatically!
Item URL: https://codecanyon.net/item/wp-content-crawler-get-content-from-almost-any-site-automatically/15983018Item ID: 15983018
Item Purchase Code: 81f2b39c-(XXXXXXXXXXXX)86ed
Purchase Date: 2024-07-05 10:29:25 UTC
Hi,
I replied to your email.
Turgut bet lisans kodum hostinger geçiçi domaininde kayıtlı kaldı kendi domainimi yönlendirdiğimde sorun yaşıyorum yardımcı olur musunuz
Merhaba,
Profil sayfamda bulunan iletişim formunu kullanarak lisans anahtarınızla birlikte kayıtlı domaini kaldırmak istediğinizi belirten bir email gönderirseniz, lisansınıza kayıtlı domaini kaldırayım.