Code

Discussion on WP Content Crawler - Get content from almost any site, automatically!

Discussion on WP Content Crawler - Get content from almost any site, automatically!

Cart 3,660 sales

turgutsaricam supports this item

Supported

This author's response time can be up to 5 business days.

2655 comments found.

How to remove this code, after enable script remover option, this code still showing. please help.

‘);var c=function(){cf.showAsyncAd(opts)};if(typeof window.cf !== ‘undefined’)c();else{cf_async=!0;var r=document.createElement(“script”),s=document.getElementsByTagName(“script”)[0];r.async=!0;r.src=”//srv.clickfuse.com/showads/showad.js”;r.readyState?r.onreadystatechange=function(){if(“loaded”==r.readyState||”complete”==r.readyState)r.onreadystatechange=null,c()}:r.onload=c;s.parentNode.insertBefore(r,s)}; })();

thanks

You can do it by using the find-replace options available in the options box of the Post Slug Selectors setting. You need to use regular expressions. You can find more information here.

As a note, you do not have active support. Support is provided only to the customers with active support.

Hello again, a client already has the plugin on his website, he asked me to scrape this page “flowhot[dot]cc”. I already have all the configuration and it works correctly. The detail that, for example, the “MP3” category currently has 189 pages (and counting), the detail that I can’t make it scrape the 189 pages, it only always does it up to the third page (only the first 3 pages). Each page has 250 post. How can you scrape the 189 pages that a category has?

screenshot1: https://i.imgur.com/kciXMU9.png

settings export: https://justpaste.it/77riy

site: flowhot[dot]cc

It’s true, mark 404, but the page does exist, why can this be? https://i.imgur.com/Kab9cKk.png

The target site returns a 404 response along with the data. However, the plugin stops processing a request when its status code is 404. The response’s status code can be seen in the developer tools of the browser: https://ibb.co/j56QJCY It is not possible to retrieve the content in this case, because there is no setting to disable the behavior that stops crawling when the status code is 404.

You are right, it is practically an error on that page, I tried an online seo checker, I put the link and it also gave 404. Then I will use another similar page to scrape. Thank you so much.

Hello, I have seen in some Youtube videos that Automatic can change languages several times like for example from English -> Spanish -> Russian and then to English again, in order to have a text not similar to the original, it is possible with Does your plugin translate several languages ​​the content sequentially?

The translation must be by elements, for example it is not necessary to translate several times the words “Tags” or “Category” or the name of an artist of a selected item, but the text of the content and the title if it would be necessary in a scenario. It occurs to me that in the section that you have to translate, all the selected elements appear, then each element has to add languages to translate, then only what is chosen is translated, and they are also translated several times according to the languages that are added for that element. It can be seen in the form of a matrix for the simple reading of the option.

All right, I think I got it, thanks. I think adding this feature as a filter command is more suitable. It might not be added in the next version, but I plan to add it after that.

Yes, I also think the same, within filters you could add it and the possibility of being able to translate it several (X) times or only once.

Done thanks for the attention, I’ll be waiting for this version.

Hi, is it possible to scrape the different seller prices on this page and to insert them in custom fields? https://www.hind.ee/p/apple-iphone-13/

Hi,

It is not possible to get each price with their seller information separately and save them as custom fields, unfortunately.

(1) Can your plugin not download the image in category/post tab (2) Can your plugin change the crawled image URL by source image URL

I want to use the images from source. Don’t want to save them in my server

Hi,

If you do not specifically configure the plugin to save the images, it does not save the images by default. However, because WordPress cannot show a featured image via its external URL, the featured images are always downloaded.

please add deepl for translate with deepl write for “no double content”

The date is not certain yet. But, it will probably be released in a month.

perfect. I am very happy about this news

If you have use-case ideas about OpenAI GPT, please join the Discord server so that we can discuss the ideas there. I would like to know them so that I can design the feature to cover all the use-cases.

The plugin crawls too slowly, (save all content images option unchecked)—any way to improve it? Thanks

Hi,

If you want to save more posts in a specific time interval, you can increase the values of General Settings > Scheduling Tab > Run count for URL collection event and General Settings > Scheduling Tab > Run count for post crawling event settings. If crawling a single post takes too much time, it is probably related to the speed of the target site.

thanks, btw what’s your suggest value for this option.

It depends on your needs and your server’s capacity. You can start by entering 2 and then observe if your server can handle it. Then you can gradually increase the value until you find the sweet spot.

Hi I can not crawl by your tool?

Hi,

Could you please tell me what exactly the problem is?

I actived it for a long time but nothing happend. I just can crawl every single post.

In that case, the settings under the Category tab might not be able to find post URLs. For the automatic crawling to work, the plugin needs to be able to find post URLs from category pages. How to configure and test the settings under the Category tab is explained here.

wp sitemi sub domaine aktardım lakin eklenti veri çekmeyi durdurdu ne yapmam gerekiyor?

Merhaba,

Eğer lisansınızın başka domainde olduğuna dair bir hata alıyorsanız, eski domaininizi lisansınızdan kaldırabilmem için lütfen lisans anahtarınızı profil sayfamda bulunan iletişim formunu kullanarak bana gönderin. Eğer başka bir sorun varsa, şu sayfayı inceleyerek sorununuza bir çözüm bulmayı deneyebilirsiniz: https://docs.wpcontentcrawler.com/troubleshooting/index.html

I want to copy all the content of a WP, but I think they are pages, not posts. Is it possible?

I can’t fill in any category because the posts have no category. I have tried to use the demo without success.

Hi,

It is possible to create pages with the plugin. A page is a custom post type. How to save custom post types is explained here.

About the category, a category is a page that contains the links of the posts. So, URL of any page that contains the links to the pages you want to save can be used as a category URL.

Thank you.

You say I can crawl a post and turn it into a page on my website.

But I don’t need to track post, I need to track pages. That’s possible? The pages have no category.

I want to track pages and copy them to my wp.

I use the demo but I am not able to do it :(

You can select a dummy category in Category URLs setting, because it does not matter for pages, if your concern is this. So, yes, it is possible. You can follow the instructions on the page whose link I sent in my previous comment to create pages.

I have crawled content minified with a single line. How can I make paragraphs (p tag) starts from a new line, Thanks.

Hi,

It does not seem to be possible, unfortunately. However, after the post is created, you should be able to see the unminified version when editing the post, if that’s your purpose.

Hi,

Does “SpinRewriter” function rewrite the content and title or only content?

Thank you for your feedback

It is possible to except tags from spun? For example I have a clean html content without any additional attributes like id, classes, etc. But sometimes after Spin rewriter at the end I got code with missing tags.

Is there any solution to this problem?

It looks like it is a problem related to SpinRewriter. Maybe, in your post content selectors, you can retrieve the “text” of the target elements instead of “html”, which will make the content text-only so that SpinRewriter does not have an opportunity to break the HTML structure.

Hello, kindly reset my domain name so I can attach a new domain with my license. I do not have access to current domain showing on my license since I was unable to renew it.

Kindly reset domain to blank so I can start afresh

Hi,

Please send your purchase code via the contact form on my profile page so that I can remove the domain registered to your license.

Does wp content crawler upload the image twice, if it is in two posts?

Hi,

Yes, it does.

Your plugin is really very complete. I did some tests on your server and it is very promising. Maybe you already have this option, but I didn’t see it. I would like to put a Custom Post Meta as the caption of the Featured Image. It is possible? Perhaps there is a better way to do this, but a site externally placed the caption of the photo as <figcaption></figcaption> and it had not automatically gone as caption for the highlighted image.

Hi,

Thank you. The plugin does not retrieve the captions from figcaption elements, unfortunately. However, it automatically retrieves “alt” and “title” attributes’ values from the elements found by the image URL selectors. If you can use find-replace options to put the text of the figcaption element into the “alt” or “title” attribute of the image element, it should be automatically saved with the image.

If what you want to do is save the text of a figcaption element as the value of a custom post meta whose key is known by you, you can directly use Post Tab > Post Meta Section > Custom Meta Selectors setting to save it. However, this will not associate the value with the saved image, unfortunately.

Nice. Which find-replace would I use to do this? Because there is this option in several places.

Under Post Tab > Manipulate HTML Section, you can either use Find and replace in raw HTML or Find and replace in element HTML. Please note that you need to use regular expressions. If you do not know how to use them, you can use a regular expression like this. As a replacement, you can write something like this:

<figure><img title="$2" /></figure>

Please note that the plugin’s support does not cover this type of customizations. I am sending this just as an example. The example replacement rule also removes the figcaption element. This regex might not work for the code you have.

Merhaba, eklentide URL toplama kısmı düzgün çalışıyor ancak yazı kaydetme kısmında sorun yaşıyorum. Bazen kaydetme hiç çalışmıyor bazen de her dakika kaydet seçmeme rağmen 5 dakikada bir ya da 1 saat aralıkla yazı kayıt ediliyor. Sunucu tarafında cronlarla ilgili logları talep ettim. Aşağıdaki gibi uyarılar var. Ne yapmam gerekiyor?

2023-01-24 23:34:51 Warning IP Adresi mod_fcgid: stderr: #1 /var/www/vhosts/website.com/httpdocs/wp-content/plugins/wp-content-crawler/app/Objects/Crawling/Savers/PostSaver.php(207): WPCCrawler\Objects\Crawling\Savers\PostSaver->savePost(22595, Array, false, true, false, false, false), referer: https://website.com/wp-cron.php?doing_wp_cron=1674592484.0154 Apache error

2023-01-24 23:34:51 Warning IP Adresi mod_fcgid: stderr: #2 /var/www/vhosts/website.com/httpdocs/wp-content/plugins/wp-content-crawler/app/Services/SchedulingService.php(167): WPCCrawler\Objects\Crawling\Savers\PostSaver->executePostSave(22595), referer: https://website.com/wp-cron.php?doing_wp_cron=1674592484.0154 Apache error

Teşekkür ederim.

Merhaba,

Logların detaylarını da gönderebilir misiniz? Tam olarak hatanın nereden kaynaklandığını göremezsem size yardımcı olmam mümkün değil.

Merhaba, sanırım wordpress cronları düzgün çalışmıyor ve daha detaylı bir logda göremiyorum. Bunun yerine daha önce farklı kişilere önerdiğiniz https://www.hostgator.com/help/article/how-to-replace-wordpress-cron-with-a-real-cron-job linkini inceledim. Wordpress cronlarını pasife çekip, cpanel üzerinden cron eklesem eklentideki hangi cronları sunucu üzerinden çağırmam gerekiyor?

Merhaba,

Eklentiye özel bir şey yapmanıza gerek yok. Linkte bahsedildiği gibi, WordPress’in ana WP-Cron’unu tetiklemeniz yeterli.

Merhaba, lisans aktivasyonu yapmama rağmen belirli aralıklarla bu uyarı önüme çıkıyor. Lisansı tekrar aktifleştiriyorum ancak bir süre sonra uyarı gelmeye devam ediyor. Yardımcı olabilir misiniz?

İçerik Toplayıcı için girdiğiniz lisans anahtarı geçerli değil veya kontrol edilemedi. İçerik Toplayıcı kullanmaya devam etmek için lütfen 20/01/2023 23:57 tarihine kadar lisansınızı girin.

Mesaj: Lisansınız sunucu ile kontrol edilemedi. Lütfen lisans ayarlarınızı birkaç dakika içinde tekrar kaydetmeyi deneyin. Uyarıyı görmeye devam ederseniz lütfen geliştirici ile iletişime geçin.

Özellikler pasif hale getirilmeden önce kalan deneme sayısı: 2

Yazı Sekmesi > Diğer Bölümü > save_post kancasını tetikle seçeneğini işaretleyerek tekrar dener misiniz?

Merhaba denedim yine ürün görselleri görünmüyor. Veri tabanına kayıtlar atılmış. Ancak ürünlere tek tek baktım link kayıt edilmemiş görünüyor.

Bu durumda eklenti tarafından yapabileceğiniz ekstra bir şey ne yazık ki yok. Belki farklı bir özel alan değeri daha kaydetmeniz de gerekiyor olabilir.

How to replace the featured image URL before importing

Feature image on category page: https://fancy4talk.com/wp-content/uploads/2023/01/xxxx-300x300.jpg The real featured image: https://fancy4talk.com/wp-content/uploads/2023/01/xxxx.jpg

Can I remove -300×300 before getting the featured image Thanks

You should test a category page. The post page tests do not show the featured images collected from the category pages. If the featured images are available in the post pages, I recommend you to retrieve them from the post pages instead of the category pages.

I face with this problem [17-Jan-2023 17:07:45 UTC] WPCC – No URL is found in the database. Site ID to check: 220, Last Crawled URL ID: does not exist [17-Jan-2023 17:07:45 UTC] WPCC (error): URL does not exist in the database.

It means there are no post URLs waiting to be crawled in the database. Your category settings might not be correct, or all the posts waiting in the queue are already crawled.

I bought 2 licenses because of a mistake, can I have a refund of 1. Thanks!

Hi,

Could you please send a refund request through CodeCanyon?

which email do I send a support request to change the domain? I don’t have access to the previous one, and I’m assuming it’s not advisable to post my purchase code here

Hi,

You can use the contact form on my profile page.

sent. thanks

by
by
by
by
by
by

Tell us what you think!

We'd like to ask you a few questions to help improve CodeCanyon.

Sure, take me to the survey