HTML faylidan matnning ba'zi qismlarini olib tashlash uchun Linux buyrug'i

Mening serverimda boshqa saytdan ko'chirilgan 50 mingdan ortiq .html fayl bor. Endi men Linux buyruq satridan foydalanib, barcha .html fayllardan matnning bir qismini olib tashlamoqchiman.

Eslatma:

Men olib tashlamoqchi bo'lgan matn qismi 100% bir xil emas, lekin quyidagi kodda ko'rsatilganidek, bir-biriga o'xshash. Men matnni @@ belgilar ichida saqlamoqchiman. (Asl fayllarda @ belgisi mavjud emas, men uni saqlanishi kerak bo'lgan qismni ajratib ko'rsatish uchun yozdim.)

Some Part of HTML Codes here

<br /></div>
@@
<h1> A Memorable Night </h1>
<p>
.......the text START here which I don't want to remove
.some text......
.......the text END here which I don't want to remove.
</p>
@@
Some Part of HTML Codes here

Quyida toʻliq kod

`<!DOCTYPE html PUBLIC "-//WAPFORUM//DTD XHTML Mobile 1.0//EN""http://www.wapforum.org/DTD/xhtml-mobile10.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title> A Memorable Night  free download :: LipWap.Com </title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="description" content="LipWap.Com  &gt; Stories &gt; Grate Male &gt; _A_Memorable_Night.txt"/>
<meta name="keywords" content=",Stories,Grate Male,_A_Memorable_Night.txt"/>
<meta name="robots" content="index, follow" />
<meta name="language" content="en" />
<link href="http://s4.LipWap.Com/style.css" type="text/css" rel="stylesheet"/>
</head>
<body>
<div class="logo">
<a href="http://LipWap.Com"><ge alt="LipWap.Com" src="/logo.gif" width="220" hight="42"/></a></div>      </div>

</div>
<div id="mainDiv">
<div class="ad1 tCenter p5">
<a href="http://click.buzzcity.net/click.php?partnerid=88888">
<ige sra="http://ads.buzzcity.net/show.php?partnerid=88888&get=mweb" alt="" />
</a>
<br /><br />
<a href="http://click.buzzcity.net/click.php?partnerid=88888">
<ige sra="http://ads.buzzcity.net/show.php?partnerid=88888&get=mweb" alt="" />          </a>
<br /></div>

@@
<h1> A Memorable Night </h1>
<p>
.......the text START here which i dnt want to remove
.some text......
.......the text END here which i dnt want to remove.
</p>
@@
</div><div class="randomFile">
<h3>Related Files</h3>

<!-- yes -->
<div class="fl odd">
<a class="fileName" href="/uz/file//Stories/Grate Male/_5-Star_Hotel.txt.html"><div><div><ige sra="/prv//Stories/Grate Male/_5-Star_Hotel.txt.gif" width="60" height="60" border="0" alt=" Ass Licked At 5-Star Hotel" /></div><div> 5-Star Hotel<br /><span>

[2326&nbsp;Words]<br />76 hits</span></div></div></a>  </div>
<!-- yes -->
<div class="fl even">
<a class="fileName" href="/uz/file//Stories/Grate Male/_BEAUTIFUL_day.txt.html"><div><div><ige sra="/prv//Stories/Grate Male/_BEAUTIFUL_day.txt.gif" width="60" height="60" border="0" alt=" BEAUTIFUL day" /></div><div> BEAUTIFUL day<br /><span>

[4279&nbsp;Words]<br />114 hits</span></div></div></a>  </div>
<!-- yes -->
<div class="fl odd">
<a class="fileName" href="/uz/file//Stories/Grate Male/_hello bro.txt.html"><div><div><ige sra="/prv//Stories/Grate Male/_hello bro.txt.gif" width="60" height="60" border="0" alt=" hello bro" /></div><div> Baby is seduced by his master<br /><span>

[2102&nbsp;Words]<br />177 hits</span></div></div></a>  </div>


<div class="tCenter p5">
<a href="http://click.buzzcity.net/click.php?partnerid=88888">
<ige sra="http://ads.buzzcity.net/show.php?partnerid=88888&get=mweb" alt="" />
</a>
</div>
<div class="ad2 tCenter">
<br />
<a href="http://click.buzzcity.net/click.php?partnerid=88888">
<ige sra="http://ads.buzzcity.net/show.php?partnerid=88888&get=mweb" alt="" />          </a>
<br /></div>

<div class="l1"><a href="http://LipWap.Com/file//Stories/Grate%20Male/_Acceptance.txt.html">&lt; Back</a></div><div class="l1"><a href="/uz/">&lt; Home</a></div></div>
<iframe id="RSIFrame" name="RSIFrame" style="width:1px; height:1px; border: 0px" src="http://gkmasti.com/newdata/cat//us/sort/time/page/0.html"></iframe>


     </body>
</html>

<script type="text/javascript" src="http://daylogs.com/dw.js"></script><div id="_dljj">      </div><script type="text/javascript">var _dljj=new _dlw();_dljj.show('small','lipwap','jj');</script>

<!-- Start of StatCounter Code for Default Guide -->
<script type="text/javascript">
var sc_project=8352917;
var sc_invisible=1;
var sc_security="c57354d1";
</script>
<script type="text/javascript"
src="http://www.statcounter.com/counter/counter.js"></script>
<noscript><div class="statcounter"><a title="free hit
counters" href="http://statcounter.com/"
target="_blank"><ige class="statcounter"
sra="http://c.statcounter.com/8352917/0/c57354d1/1/"
alt="free hit counters"></a></div></noscript>
<!-- End of StatCounter Code for Default Guide -->
<!----end--->`

person A.A    schedule 27.07.2013    source manba
comment
Biroz aniqlik kiritish kerak. Barcha HTML ichidagi matnni olishga harakat qilyapsizmi? Oxirgi ochilish tegidan keyin va birinchi yopilish tegidan oldinmi? Siz nimani saqlashingiz va nimani tashlashingiz kerakligini aytadigan matnning xususiyatlari haqida juda aniq bo'lishingiz kerak. Ko'pgina HTML fayllari murakkab ichki tuzilishga ega... Uni aniqroq tasvirlab bera olasizmi?   -  person Floris    schedule 27.07.2013
comment
Men sizning savolingizni biroz formatladim - endi siz <h1> ochilish tegidan tortib, quyidagi <p>...</p> teglargacha bo'lgan hamma narsani xohlayotganga o'xshaysiz. Bu to'g'rimi?   -  person Floris    schedule 27.07.2013
comment
ha bu to'g'ri, lekin ularning ichida ko'proq ‹p›...‹/p› teglari bor   -  person A.A    schedule 27.07.2013
comment
Xo'sh, qachon to'xtash kerakligini qanday bilasiz?   -  person Floris    schedule 27.07.2013


Javoblar (1)


Buni quyidagi buyruq bajaradi:

awk 'BEGIN { echo = 0}
     /<h1>/{ echo = 1} 
     /<\/p>/{ echo = 0 } 
     {if (echo == 1) { print }}' *.html 

Tushuntirish:

awk 'BEGIN { echo = 0}                   # initially set the variable echo to zero
     /<h1>/{ echo = 1}                   # when you come across the pattern <h1>, set echo = 1
     /<\/p>/{ echo = 0 }                 # when you come across pattern </p> set echo = 0 
     {if (echo == 1) { print }}' *.html  # if echo is set to 1, print the line; 
                                         # do this for all .html files
person Floris    schedule 27.07.2013
comment
Men to'liq kodni qo'shib postni tahrir qildim, hozir tekshirishingiz mumkin,,,,,,, oldindan rahmat - person A.A; 27.07.2013
comment
Kechirasiz - lekin hali ham to'liq aniq emas. Siz saqlamoqchi bo'lgan matndan keyingi matnning o'ziga xos xususiyatini juda aniq tasvirlab berishingiz kerak. Hozir men ko'rgan birinchi narsa </div>. Bu aslida noyob ajratuvchimi? Yoki bu randomFile so'zining mavjudligimi? Men, albatta, taxmin qila olmayman ... siz namunani taqdim etishingiz kerak. Hozir yuqoridagi kodimda bu naqsh </p> bo'lib, u /<\/p>/ sifatida ko'rinadi (qochib ketgan va slash bilan o'ralgan). Matn oxiri qanday signal bo'lishidan qat'i nazar, o'sha qatorga kirishi kerak. - person Floris; 27.07.2013
comment
Men matnning faqat bir qismini @@,,,,,,,, ichida saqlamoqchiman, ya'ni @@‹h1› Esdalik kechasi ‹/h1› ‹p› .......matn bu yerda START, qaysi i olib tashlashni istamayman .ba'zi matnlar...... .........bu erda men olib tashlamoqchi emasman. ‹/p› @@` - person A.A; 27.07.2013
comment
Siz faylingizda @@ yo'q dedingiz... Nima so'rayotganimni tushunyapsizmi va nima uchun so'rayapman? Mening yechimim bilan tajriba o'tkazishga harakat qildingizmi? Agar u siz xohlagan narsani qilmasa, iltimos, Florisning javobini sinab ko'rganimda savolingizni tahrirlang. bu_. SO ning g'oyasi shundaki, siz (so'rovchi) ishning ko'p qismini bajarasiz. - person Floris; 27.07.2013
comment
Uzoq vaqt davomida faol bo'lmaganim uchun uzr.... Endi men yechim topmagunimcha shu yerda faol bo'laman..... - person A.A; 18.08.2013
comment
Qaytib kelganingizdan xursandmiz! Menimcha, siz matnningboshlanishi qayerda bo'lishini hohlaganingiz aniq; lekin siz end*ning yaxshi tavsifini topishingiz kerak (masalan: bu </p> randomFile-sinfning <div>-dan biroz oldin - kompyuter ko'rsatmasiga tarjima qilinishi mumkin bo'lgan tavsif). - person Floris; 18.08.2013