symbols in gallery/image name cause weird issues (urls)

soaringeagle
@soaringeagle
6 years ago
3,304 posts
im getting timeouts in my sitemap crawler caused by weird urls

2/15/2018 2:38:14 AM - Warning: Request Timeout (try reducing "Number of Crawlers" in Advanced Settings): https://www.dreadlockssite.com/opaulao/gallery/92036/Frizz ♥
2/15/2018 3:02:50 AM - Warning: Request Timeout (try reducing "Number of Crawlers" in Advanced Settings): https://www.dreadlockssite.com/light-faerie/gallery/65355/☮
2/15/2018 3:14:32 AM - Warning: Request Timeout (try reducing "Number of Crawlers" in Advanced Settings): https://www.dreadlockssite.com/lisa-ann-maynard/gallery/78045/My first three babies♡
2/15/2018 3:16:49 AM - Warning: Request Timeout (try reducing "Number of Crawlers" in Advanced Settings): https://www.dreadlockssite.com/jennaleigh-moonflower/gallery/65423/..♡
2/15/2018 3:53:14 AM - Warning: Request Timeout (try reducing "Number of Crawlers" in Advanced Settings): https://www.dreadlockssite.com/starslingr/gallery/71135/★hair-dreads-smDSC_2543

in datastore for last 1 in list is this
gallery_alt_text: !!emoji!!11!!emoji!!hair-dreads-smDSC_2543
gallery_bundle_title: !!emoji!!11!!emoji!!hair-dreads-smDSC_2543
gallery_image_download_count: 257
gallery_image_extension: jpg
gallery_image_height: 684
gallery_image_name: !!emoji!!11!!emoji!!hair-dreads-smDSC_2543
gallery_image_name_url: hair-dreads-smdsc-2543
gallery_image_size: 550098
gallery_image_time: 12/18/13 05:36:23AM
gallery_image_type: image/jpeg
gallery_image_view_count: 518
gallery_image_width: 1024
gallery_ning_id: 3754945:Photo:1807984
gallery_order: 4
gallery_pending: 0
gallery_title: !!emoji!!21!!emoji!!starslingr!!emoji!!21!!emoji!!
gallery_title_url: starslingr

loading these weird urls in litespeed caused hundreds of connections to go into wait state while it tries to parse the weird uls

i will have a complete list of effected urls in a few days

just fyi i have the sitemap crawler dev team investigating as well.
they will probably run a crawl with diagnstics that lets them know where the url was gathered from

my guess...
1 of the modules like activity or profile comments is creating these odd urls
probably 1 thats been updated in the past month (had not run a crawl in a few weeks)



--
soaringeagle
head dreadhead at dreadlocks site
glider pilot student and member/volunteer coordinator with freedoms wings international soaring for people with disabilities

updated by @soaringeagle: 05/18/18 09:55:28PM
michael
@michael
6 years ago
7,692 posts
check your database has a jr_jrcore_emoji table.

those emoji should all be being replaced by the values in that table. They were put there when the items were entered into the system.


--edit--
Those emoji should not come out in the URL, use the _url item for url titles
gallery_title_url: starslingr
or escape with |jrCore_url_string eg:
{$item.video_title|jrCore_url_string}

updated by @michael: 02/15/18 07:14:55PM
soaringeagle
@soaringeagle
6 years ago
3,304 posts
yes i do and if im reading it correct it has values like e298baefb88f
?

what i don't get is.. the emojis are part of the name of the gallery or file.. and do not show up in the url

and yes just sw the edit thats where im getting confusion this was never an issue before so i do not think it was a skin template thing
i think its something in 1 of the modules (profiles or actifity if not gallery) that would have a link to gallery items, but has the code incorect

in cases with no emoji, gallery title and url could be interchangable without effect
the values are the same
only the emoji is stripped from name to create ul
i think somewhere..i dont know where its being called by name not url (in the building of the url)

make sense?
(i do have beta modules installed so that ight be where its at)


--
soaringeagle
head dreadhead at dreadlocks site
glider pilot student and member/volunteer coordinator with freedoms wings international soaring for people with disabilities
michael
@michael
6 years ago
7,692 posts
Weird characters do not show up for me in a similar location on a skin that uses the default templates.

My biggest suspect is still a template override by your skin. Does the same issue still exist on a default jrElastic2 ?
characters.jpg
characters.jpg  •  268KB

soaringeagle
@soaringeagle
6 years ago
3,304 posts
odd ok i found a profile with the same issue
https://www.dreadlockssite.com/✿-נα∂є-✿

so could it be an issue in the user/profile module?
is there any way it could be a server setting (litespeed)
something in the external apps settings or cache settings
(only private cache is enbled)
i do not currently have pagespeed enabled
not sure best filters to use

i dont think it could be a server issue just covering all bases.
(no longer have timeouts when running integrity check or other things)

no php settings could cause this right/

it has got to be template somewhere thats to blame


--
soaringeagle
head dreadhead at dreadlocks site
glider pilot student and member/volunteer coordinator with freedoms wings international soaring for people with disabilities
soaringeagle
@soaringeagle
6 years ago
3,304 posts
i can't find out without running a full crawl with default skin (having my site messed up for days)
its not the gallery url its how its linked to from...somewhere. which i can't find out without a diagnostic crawl to find the sources of the links.

see what i mean.. if the links coming from say activity. its harvested t add to urls t crawl, then the issue only shows up when it tries to crawl that page.
so i need to find out where its actualy getting the url from then adding it to the crawl lit.

hopefully inspyder will look into it over next few days and find the causes
since it affects profiles too i would gues its caused by something like activity and thats why its showing up in multiple modules

complete list so far of ages affected

2/15/2018 2:54:50 PM - Paused. Click Go to continue.
2/15/2018 4:12:17 PM - Warning: Request Timeout (try reducing "Number of Crawlers" in Advanced Settings): https://www.dreadlockssite.com/shaun-saunders/uploaded_audio/7233/%E5%88%80%E6%BC%80%E6%8C%80%E2%80%80%E4%88%80%E6%BC%80%E7%A4%80%E7%8C%80%E2%80%80%E2%A0%80%E4%84%80%E6%B8%80%E6%90%80%E2%80%80%E5%90%80%E6%A0%80%E6%94%80%E2%80%80%E5%9C%80%E6%A4%80%E6%B8%80%E6%B8%80%E6%94%80%E7%88%80%E2%80%80%E4%A4%80%E7%8C%80%E2%A4%80
2/15/2018 5:04:42 PM - Warning: Request Timeout (try reducing "Number of Crawlers" in Advanced Settings): https://www.dreadlockssite.com/opaulao/gallery/92036/Frizz ♥
2/15/2018 5:29:50 PM - Warning: Request Timeout (try reducing "Number of Crawlers" in Advanced Settings): https://www.dreadlockssite.com/light-faerie/gallery/65355/☮
2/15/2018 5:38:22 PM - Warning: Request Timeout (try reducing "Number of Crawlers" in Advanced Settings): https://www.dreadlockssite.com/jennaleigh-moonflower/gallery/65423/..♡
2/15/2018 5:41:27 PM - Warning: Request Timeout (try reducing "Number of Crawlers" in Advanced Settings): https://www.dreadlockssite.com/lisa-ann-maynard/gallery/78045/My first three babies♡
2/15/2018 6:05:26 PM - Warning: Request Timeout (try reducing "Number of Crawlers" in Advanced Settings): https://www.dreadlockssite.com/starslingr/gallery/71135/★hair-dreads-smDSC_2543
2/15/2018 6:16:46 PM - Warning: Request Timeout (try reducing "Number of Crawlers" in Advanced Settings): https://www.dreadlockssite.com/tashie/gallery/106706/☺
2/15/2018 7:01:48 PM - Warning: Request Timeout (try reducing "Number of Crawlers" in Advanced Settings): https://www.dreadlockssite.com/tia/gallery/27852/♥
2/15/2018 7:52:30 PM - Warning: Request Timeout (try reducing "Number of Crawlers" in Advanced Settings): https://www.dreadlockssite.com/madison-gregg/gallery/100292/Gafi♥
2/15/2018 8:27:41 PM - Warning: Request Timeout (try reducing "Number of Crawlers" in Advanced Settings): https://www.dreadlockssite.com/daydreamer/gallery/64795/♡
2/15/2018 8:27:42 PM - Warning: Request Timeout (try reducing "Number of Crawlers" in Advanced Settings): https://www.dreadlockssite.com/daydreamer/gallery/64793/♡
2/15/2018 11:13:30 PM - Warning: Request Timeout (try reducing "Number of Crawlers" in Advanced Settings): https://www.dreadlockssite.com/gee-chin/gallery/74911/☀
2/15/2018 11:13:31 PM - Warning: Request Timeout (try reducing "Number of Crawlers" in Advanced Settings): https://www.dreadlockssite.com/gee-chin/gallery/74909/☀
2/16/2018 12:16:16 AM - Warning: Request Timeout (try reducing "Number of Crawlers" in Advanced Settings): https://www.dreadlockssite.com/naomi-choate/uploaded_audio/11722/not-an-angel-yet
2/16/2018 12:51:52 AM - Warning: Request Timeout (try reducing "Number of Crawlers" in Advanced Settings): https://www.dreadlockssite.com/stamen-ilianov-iliev/uploaded_audio/albums/%E4%90%80%E4%A8%80%E2%80%80%E4%AC%80%E4%94%80%E4%B8%80%E4%B8%80%E5%A4%80%E2%80%80%E5%9C%80%E4%A4%80%E4%B0%80%E4%90%80%E2%80%80%E2%98%80%E2%80%80%E6%98%80%E5%88%80%E4%A4%80%E4%94%80%E4%B8%80%E4%90%80%E5%8C%80%E2%80%80%E2%A8%80%E2%80%80%E2%A0%80%E6%BC%80%E5%88%80%E4%AC%80%E5%94%80%E5%90%80%E2%A4%80
2/16/2018 1:04:47 AM - Warning: Request Timeout (try reducing "Number of Crawlers" in Advanced Settings): https://www.dreadlockssite.com/courtney-rose/uploaded_audio/9596/%E4%A4%80%E6%B8%80%E6%A4%80%E6%B8%80%E6%B8%80%E6%84%80%E2%80%80%E5%90%80%E6%BC%80%E7%88%80%E6%84%80%E2%80%80%E2%A0%80%E4%90%80%E4%A8%80%E2%80%80%E7%A4%80%E4%84%80%E4%B0%80%E3%84%81%E6%B4%80%E2%80%80%E5%90%80%E7%88%80%E6%A4%80%E6%88%80%E6%84%80%E6%B0%80%E2%80%80%E4%B4%80%E6%A4%80%E7%A0%80%E2%A4%80%E2%80%80%E2%B4%80%E2%80%80%E7%9C%80%E7%9C%80%E7%9C%80%E2%B8%80%E6%90%80%E6%A8%80%E7%A4%80%E6%84%80%E6%B0%80%E6%A4%80%E6%B4%80%E2%B8%80%E6%8C%80%E6%BC%80%E6%B4%80
2/16/2018 1:27:25 AM - Warning: Request Timeout (try reducing "Number of Crawlers" in Advanced Settings): https://www.dreadlockssite.com/new-dreddie/uploaded_audio/8769/%E4%8C%80%E6%84%80%E6%B0%80%E6%A4%80%E6%98%80%E6%BC%80%E7%88%80%E6%B8%80%E6%A4%80%E6%84%80%E2%80%80%E4%9C%80%E7%94%80%E7%88%80%E6%B0%80%E7%8C%80%E2%80%80%E2%A0%80%E4%98%80%E6%94%80%E6%84%80%E7%90%80%E2%B8%80%E2%80%80%E5%8C%80%E6%B8%80%E6%BC%80%E6%BC%80%E7%80%80%E2%80%80%E4%90%80%E6%BC%80%E6%9C%80%E6%9C%80%E2%A4%80
2/16/2018 1:27:26 AM - Warning: Request Timeout (try reducing "Number of Crawlers" in Advanced Settings): https://www.dreadlockssite.com/new-dreddie/uploaded_audio/8770/micheal-jackson-smooth-criminal-merengue-mambo-remix
2/16/2018 4:30:48 AM - Warning: Request Timeout (try reducing "Number of Crawlers" in Advanced Settings): https://www.dreadlockssite.com/ianthe/uploaded_audio/7247/%E5%8C%80%E6%A0%80%E6%94%80%E2%9C%80%E7%8C%80%E2%80%80%E5%88%80%E6%BC%80%E7%A4%80%E6%84%80%E6%B0%80
2/16/2018 7:31:07 AM - Warning: Request Timeout (try reducing "Number of Crawlers" in Advanced Settings): https://www.dreadlockssite.com/mamaa-wolf/uploaded_audio/albums/%E4%94%80%E6%90%80%E7%9C%80%E6%84%80%E7%88%80%E6%90%80%E2%80%80%E5%8C%80%E6%A0%80%E6%84%80%E7%88%80%E7%80%80%E6%94%80%E2%80%80%E2%98%80%E2%80%80%E5%90%80%E6%A0%80%E6%94%80%E2%80%80%E4%B4%80%E6%84%80%E6%9C%80%E6%B8%80%E6%94%80%E7%90%80%E6%A4%80%E6%8C%80%E2%80%80%E5%A8%80%E6%94%80%E7%88%80%E6%BC%80%E7%8C%80
2/16/2018 10:29:33 AM - Warning: Request Timeout (try reducing "Number of Crawlers" in Advanced Settings): https://www.dreadlockssite.com/✿-נα∂є-✿
2/16/2018 10:51:00 AM - Warning: Request Timeout (try reducing "Number of Crawlers" in Advanced Settings): https://www.dreadlockssite.com/judge-dread/uploaded_audio/9479/%E5%9C%80%E6%A0%80%E6%BC%80%E2%9C%80%E7%8C%80%E2%80%80%E5%90%80%E6%A0%80%E6%94%80%E2%80%80%E4%B4%80%E6%84%80%E6%B8%80
2/16/2018 11:58:55 AM - Warning: Request Timeout (try reducing "Number of Crawlers" in Advanced Settings): https://www.dreadlockssite.com/gallery/download/gallery_image/65355/%E2%98%AE
2/16/2018 12:11:06 PM - Warning: Request Timeout (try reducing "Number of Crawlers" in Advanced Settings): https://www.dreadlockssite.com/sarah6/gallery/71301/Playing my Uke ❤️
2/16/2018 12:11:07 PM - Warning: Request Timeout (try reducing "Number of Crawlers" in Advanced Settings): https://www.dreadlockssite.com/feed/timeline/sarahbachar

note thats only 1/4 of site crawled so far


--
soaringeagle
head dreadhead at dreadlocks site
glider pilot student and member/volunteer coordinator with freedoms wings international soaring for people with disabilities
michael
@michael
6 years ago
7,692 posts
best way to get me to understand the issue is if you can provide steps for me to take on my dev machine to reproduce the issue, then when I can see it happening I can debug it.
michael
@michael
6 years ago
7,692 posts
For now, you can see the link location here:
https://www.dreadlockssite.com/daydreamer/gallery/64795

it links to
https://www.dreadlockssite.com/daydreamer/gallery/64795/(heart mark)

in each of the forward / backward links of the gallery
michael
@michael
6 years ago
7,692 posts
Can see it happening here. Looking into why and what to do about it.
michael
@michael
6 years ago
7,692 posts
Think we've got this fixed for the next jrCore version.

If you just cant wait, its /modules/jrCore/lib/util.php

This is the new version of the function:

/**
 * Replace emoji unicode characters with placeholders in a string
 * @param $string string
 * @param $replace bool set to TRUE to store emoji replacements
 * @return int
 */
function jrCore_strip_emoji($string, $replace = true){
    if (is_string($string) && !jrCore_get_flag('jrCore_strip_emoji')) {
        jrCore_set_flag('jrCore_strip_emoji', 1);
        $pattern = '/([0-9|#][\x{20E3}])|[\x{00ae}|\x{00a9}|\x{203C}|\x{2047}|\x{2048}|\x{2049}|\x{3030}|\x{303D}|\x{2139}|\x{2122}|\x{3297}|\x{3299}][\x{FE00}-\x{FEFF}]?|[\x{2190}-\x{21FF}][\x{FE00}-\x{FEFF}]?|[\x{2300}-\x{23FF}][\x{FE00}-\x{FEFF}]?|[\x{2460}-\x{24FF}][\x{FE00}-\x{FEFF}]?|[\x{25A0}-\x{25FF}][\x{FE00}-\x{FEFF}]?|[\x{2600}-\x{27BF}][\x{FE00}-\x{FEFF}]?|[\x{2900}-\x{297F}][\x{FE00}-\x{FEFF}]?|[\x{2B00}-\x{2BF0}][\x{FE00}-\x{FEFF}]?|[\x{1F000}-\x{1FFFF}][\x{FE00}-\x{FEFF}]?/u';
        if (preg_match_all($pattern, $string, $_match)) {
            $_rp = array();
            foreach ($_match[0] as $e) {
                if (strlen($e) > 1) {
                    $_rp[$e] = $e;
                }
            }
            if (count($_rp) > 0) {
                if ($replace) {
                    $tbl = jrCore_db_table_name('jrCore', 'emoji');
                    $req = "SELECT * FROM {$tbl} WHERE emoji_value IN('" . implode("','", $_rp) . "')";
                    $_rt = jrCore_db_query($req, 'emoji_value', false, 'emoji_id', false, null, false);
                    foreach ($_rp as $k => $e) {
                        if (!$_rt || !isset($_rt[$k])) {
                            $req     = "INSERT INTO {$tbl} (emoji_value) VALUES ('{$e}')";
                            $eid     = jrCore_db_query($req, 'INSERT_ID', false, null, false, null, false);
                            $_rt[$k] = $eid;
                        }
                        else {
                            $eid = $_rt[$k];
                        }
                        if ($eid && $eid > 0) {
                            // Replace with placeholder in our string
                            $string = str_replace($e, "!!emoji!!{$eid}!!emoji!!", $string);
                        }
                    }
                }
                else {
                    foreach ($_rp as $k => $e) {
                        $string = str_replace($e, '', $string);
                    }
                }
            }
        }
        jrCore_delete_flag('jrCore_strip_emoji');
        return jrCore_strip_non_utf8($string);
    }
    return $string;
}
soaringeagle
@soaringeagle
6 years ago
3,304 posts
you are the best!
i knew it had to be something like that

i thought it through and thought it through and searched templates and my only conclusion was it had to be coming from the core


--
soaringeagle
head dreadhead at dreadlocks site
glider pilot student and member/volunteer coordinator with freedoms wings international soaring for people with disabilities
soaringeagle
@soaringeagle
6 years ago
3,304 posts
thats not the whole file right just 1 function within it/?


--
soaringeagle
head dreadhead at dreadlocks site
glider pilot student and member/volunteer coordinator with freedoms wings international soaring for people with disabilities
soaringeagle
@soaringeagle
6 years ago
3,304 posts
yikes ok ill wait for next version fix
i tried replacing that function and caused more issues
1 forums and other profiles vanished so did header images (my own customization to profile head) i must not have dne it right .any chance ya can attach the whole file


--
soaringeagle
head dreadhead at dreadlocks site
glider pilot student and member/volunteer coordinator with freedoms wings international soaring for people with disabilities
michael
@michael
6 years ago
7,692 posts
nope, the rest of the file might not match up with your core version.

What were the issues you had when you changed it?
soaringeagle
@soaringeagle
6 years ago
3,304 posts
i searched for this line
* Replace emoji unicode characters with placeholders in a string
then replaced that block of code
(see attached backup)

i loaded a page noticed that the forum header image
https://www.dreadlockssite.com/dreadlocks-forums/forum
and the profile picture of the poster was missing
i went to my profiles clicked dreadlocks-forums to check about re-adding the image and that said page not found create but then imediately redirected back to my pofiles list

did i do something wrong in the edit
zip
util.zip  •  26KB




--
soaringeagle
head dreadhead at dreadlocks site
glider pilot student and member/volunteer coordinator with freedoms wings international soaring for people with disabilities
soaringeagle
@soaringeagle
6 years ago
3,304 posts
i see where i went wrong
2 opening comments
let me try again


--
soaringeagle
head dreadhead at dreadlocks site
glider pilot student and member/volunteer coordinator with freedoms wings international soaring for people with disabilities
soaringeagle
@soaringeagle
6 years ago
3,304 posts
ok i believe we're good will update if theres any other issued

so umm no expert at reading php..but was it coming from activity feed?
i saw mention of add item to activity up a ways
it seemed like the most logical place that would affect multiple modules

wow had an instant affect on cpu load it seems and the number of current connections dropped as well


--
soaringeagle
head dreadhead at dreadlocks site
glider pilot student and member/volunteer coordinator with freedoms wings international soaring for people with disabilities

updated by @soaringeagle: 02/17/18 06:53:12PM

Tags