Count of a sequence in a string

Dúvidas, dicas e truques de SQL, Select, Update, Delete, cláusulas, operações com joins, Funções em SQLs, etc
Post Reply
wkinoue
Rank: Estagiário Júnior
Rank: Estagiário Júnior
Posts: 2
Joined: Fri, 13 Dec 2013 9:38 am

Hello everyone, I need a help.

I have a string that has several characters and numbers, sometimes separated by brackets ([]), by keys ({}) and by pipes (|) between them, and I need to make a count of the values ??within them. In certain circumstances, I need it to be combined.

Ex.:

Select all

[LOG]2331|1321{VP}|4524|5512]|885|4242113234{eddx]|5424|4524
In this example, it could count a single sequence or two.
- How often the sequence 4524 appear;
- How many times the sequences 4524 and 5512 appear.

I have already tried using Length, but are approximately 20 different sequences, and consultation would be huge using Length.

To count only a sequence, I used the REGEXP_COUNT function as follows:

Select all

SELECT REGEXP_COUNT('campo','\|4524\|') FROM (SELECT REPLACE(REPLACE(REPLACE(REPLACE('campo','[','|'),']','|'),'{','|'),'}','|') campo FROM 'table').
The problem is in accounting when two different sequences are:
I tried to use the regExp_COUNT function as follows:

Select all

SELECT REGEXP_COUNT('campo','\|4524\||\|5512\|') FROM 'table'
But it counts or 4524 o 5512.

How do I That the REGEXP_COUNT function Count the count when two different sequences are, or if there is any other function.

Thank you for the help.
User avatar
stcoutinho
Moderador
Moderador
Posts: 850
Joined: Wed, 11 May 2011 5:15 pm
Location: são Paulo - SP

Hello Wkinoue,

Welcome to the Glufke Forum.

For the solution of your problem, let's go to parts:
First, we will first be removed from string all that does not interest us, that is, all Characters other than "numbers" or "pipe". To do this, we can use the "regExp_Replace" function with the filter "[^ 0-9 \ |]", where we only keep the numerical digits and the pipe. Run the queries below:

Select all

SQL> SELECT 
'[LOG]2331|1321{VP}|4524|5512]|885|4242113234{eddx]|5424|4524' MINHA_STRING, 
REGEXP_REPLACE('[LOG]2331|1321{VP}|4524|5512]|885|4242113234{eddx]|5424|4524', '[^0-9\|]','') MINHA_STRING_FILTRADA  
FROM DUAL 
 
SQL> 
 
MINHA_STRING                                                  MINHA_STRING_FILTRADA  
------------------------------------------------------------  ------------------------------------------ 
[LOG]2331|1321{VP}|4524|5512]|885|4242113234{eddx]|5424|4524  2331|1321|4524|5512|885|4242113234|5424|4524
Ready! Note that you have a line only with numeric values, separated by pipe: 2331|1321|4524|5512|885|4242113234|5424|4524

Now, let's try to identify how many values ??exist on the string. You said the official separator would be the "pipe". So let's repeat the regexp_replace command to identify how many pipes there are on the string. In this case, I will use the filter "[^ |]" to keep the PIPE of the initial string:

Select all

SQL> SELECT REGEXP_REPLACE('2331|1321|4524|5512|885|4242113234|5424|4524', '[^|]','') TOTAL 
     FROM DUAL; 
 
TOTAL 
------- 
||||||| 
But it is very boring to tell the resulting "chopsticks" of the quieu. We will improve the same with the help of the Length and Trim commands:

Select all

SQL> SELECT LENGTH(TRIM(REGEXP_REPLACE('2331|1321|4524|5512|885|4242113234|5424|4524', '[^|]',''))) TOTAL 
  2  FROM DUAL; 
 
     TOTAL 
---------- 
         7 
Ok .. So until here we already have the "purified string" and how many numbers They meet on the string, based on the counts of the "Pipe" tabs.

For this, we need the "regexp_substr ... Connect by Level ..." command. Run this desire now

Select all

SQL> SELECT trim(regexp_substr('2331|1321|4524|5512|885|4242113234|5424|4524', '[^|]+', 1, LEVEL)) RESULTADO 
  2    FROM dual 
  3  CONNECT BY LEVEL <= 7+1; 
 
RESULTADO 
-------------------------------------------------------------------------------- 
2331 
1321 
4524 
5512 
885 
4242113234 
5424 
4524
Some explanations:

- Note in the regExp_subtr command that I have informed the second parameter as' [^ |] + '. This indicates for it that this is the value separator.
- Note at the level I added 1 to the value (EX: 7 + 1). The reason is that the separators are always a number less than the total number of records.
- Note that I used Length and RegExp_Replace to identify how many tabs exist. At 10g this is how to do this. In 11g I step to use the regexp_count

well .. now let's check which distribution of these values ??in string. Let's put the previous cradle inside a SELECT .. COUNT(*) FROM .. GROUP BY

Run now:

Select all

SQL> SELECT RESULTADO,COUNT(*) 
  2  FROM 
  3  ( 
  4  SELECT trim(regexp_substr('2331|1321|4524|5512|885|4242113234|5424|4524', '[^|]+', 1, LEVEL)) RESULTADO 
  5    FROM dual 
  6  CONNECT BY LEVEL <= 7+1 
  7  ) 
  8  GROUP BY RESULTADO; 
 
RESULTADO    COUNT(*) 
---------- ---------- 
4524                2 
1321                1 
5424                1 
5512                1 
2331                1 
885                 1 
4242113234          1
We then finish counting That you needed to do. I will now repeat this querie by adding in her everything we explained earlier:

Select all

SQL> SELECT RESULTADO,COUNT(*) 
  2  FROM 
  3  ( 
  4  SELECT trim(regexp_substr(REGEXP_REPLACE('[LOG]2331|1321{VP}|4524|5512]|885|4242113234{eddx]|5424|4524', '[^0-9\|]',''), '[^|]+', 1, LEVEL)) RESULTADO 
  5    FROM dual 
  6  CONNECT BY LEVEL <= LENGTH(TRIM(REGEXP_REPLACE('[LOG]2331|1321{VP}|4524|5512]|885|4242113234{eddx]|5424|4524', '[^|]','')))+1 
  7  ) 
  8  GROUP BY RESULTADO 
  9  / 
 
RESULTADO    COUNT(*) 
---------- ---------- 
4524                2 
1321                1 
5424                1 
5512                1 
2331                1 
885                 1 
4242113234          1
I developed this explanation through consultations to sites on the Internet, which I am posting here:
http://www.sqlsnippets.com/en/topic-12818.html http://stackoverflow.com/questions/1099 ... gits-again

I hope I have solved your problem, and thank you for posting this subject, since I had not encountered Still with this type of problem.

Hugs,

Sergio Coutinho
wkinoue
Rank: Estagiário Júnior
Rank: Estagiário Júnior
Posts: 2
Joined: Fri, 13 Dec 2013 9:38 am

Hello Sergio!

Thanks for your explanations. It sure will help me a lot in other consultations that I have to do.

However, I need more help. I do not think I could explain right or I could not use your explanation. I need to account for how many times two sequences appear. For example, in this string:

Select all

[LOG]|2331{VP}5354|3215|5331|982[eddx]|3443|5234|2331{VP}|4353|3215|903]|2599 
      ^^^^         ^^^^                          ^^^^          ^^^^ 
let's assume that the sequences were "2331 and 3215". In the string above, they would appear 2x.

I need to count how many times the sequence "2331 and 3215" appear in the string.

Once again, thank you!
User avatar
stcoutinho
Moderador
Moderador
Posts: 850
Joined: Wed, 11 May 2011 5:15 pm
Location: são Paulo - SP

I think I understood ..

Let's assume that [yyy] and {yyy} can be considered separators, with the same power as the pipe.

then perform this querie, based on the example that passed me:

Select all

 
SELECT RESULTADO,COUNT(*) 
 FROM 
 ( 
 SELECT trim(regexp_substr(REGEXP_REPLACE(REPLACE(REPLACE(REPLACE(REPLACE('[LOG]|2331{VP}5354|3215|5331|982[eddx]|3443|5234|2331{VP}|4353|3215|903]|2599','[','|'),']','|'),'{','|'),'}','|'),  '[^0-9\|]',''), '[^|]+', 1, LEVEL)) RESULTADO 
   FROM dual 
 CONNECT BY LEVEL <= LENGTH(TRIM(REGEXP_REPLACE(REPLACE(REPLACE(REPLACE(REPLACE('[LOG]|2331{VP}5354|3215|5331|982[eddx]|3443|5234|2331{VP}|4353|3215|903]|2599','[','|'),']','|'),'{','|'),'}','|'),'[^|]','')))+1 
 ) 
 GROUP BY RESULTADO 
/ 
 
RESULTA   COUNT(*) 
------- ---------- 
3443             1 
                 9 
3215             2 
2331             2 
982              1 
5354             1 
5331             1 
2599             1 
5234             1 
903              1 
4353             1

was not a Solution soooo beautiful .. I had to make some replaces to convert the "[", "]", "{", "}" in characters "|

of the queries outcome above, you can disregard the 9 null occurs.

I hope this adaptation solves your problem.

ATT

Sergio Coutinho
Post Reply
  • Information
  • Who is online

    Users browsing this forum: No registered users and 19 guests