banana_hammock
1 Nickel

Broken table - master received SIGSEGV / cache lookup fail / cannot open relation

Jump to solution
A little background of what I'm trying to do

This is all in dev, so not a production issue.

I have a bzipped file of approx 400,000 rows, 6 columns wide I want to load into a table which inherits from a parent table. The parent table already exists, and I have a script which loads the bzipped file and automatically creates the child table. The script loads the data by bzcat-ing the table, piped into psql which executes a COPY from stdin statement. The parent table definition is:

CREATE TABLE clog_users

(

    date date

  , market text

  , partner text

  , isp_id integer

  , internal_user_id integer

  , user_id bigint 

)

DISTRIBUTED RANDOMLY;

The child tables are the same but include an additional check constraint on the date and an 'inherits' clause.

The problem

So I executed my script to load my data file, and then I interrupted it using ^C (which I understand is harmless)

I dropped my parent table using CASCADE and then re-created it for another run.

The next run of the loader script returned the following from psql:

FATAL:  Unexpected internal error: Master process received signal SIGSEGV (postgres.c:3360)

This is odd, so I execute the following SQL on my table:

jsm_partner_billing=# select * from clog_users;

ERROR:  cache lookup failed for attribute 12 of relation 21110733 (lsyscache.c:477)

Performing a count of rows on the table returns a different error:

jsm_partner_billing=# select count(1) from clog_users;

ERROR:  could not open relation with OID 21109330 (heapam.c:880)

Dropping and re-creating the table (cascade so it also includes children) results in the same error each run.

Dropping and re-creating the database also results in the same error (slightly more worrying and rather more extreme)

The question(s)
  • Is interrupting a psql COPY statement using ^C very bad?
  • What do you think the cause of this problem is?
  • Is this problem fixable?


Greenplum info

OS: Linux mdw 2.6.18-194.32.1.el5 #1 SMP Wed Jan 5 17:52:25 EST 2011 x86_64 x86_64 x86_64 GNU/Linux (Centos 5.5)

Version:     greenplum-db-4.1.1.3 build 4

Seg hosts: 10

Gpstate:

-Greenplum instance status summary

-----------------------------------------------------

-   Master instance                                = Active

-   Master standby                                 = No master standby configured

-   Total segment instance count from metadata     = 80

-----------------------------------------------------

-   Primary Segment Status

-----------------------------------------------------

-   Total primary segments                         = 80

-   Total primary segment valid (at master)        = 80

-   Total primary segment failures (at master)     = 0

-   Total number of postmaster.pid files missing   = 0

-   Total number of postmaster.pid files found     = 80

-   Total number of postmaster.pid PIDs missing    = 0

-   Total number of postmaster.pid PIDs found      = 80

-   Total number of /tmp lock files missing        = 0

-   Total number of /tmp lock files found          = 80

-   Total number postmaster processes missing      = 0

-   Total number postmaster processes found        = 80

0 Kudos
6 Replies
MPPaquette
1 Nickel

Re: Broken table - master received SIGSEGV / cache lookup fail / cannot open relation

Jump to solution

Hi Banana,

Hmm, it's hard to say at this point. I would run a gpcheckcat on the system to verify catalog health. In addition, I'd be intrested in setting the log for the specific event. The FATAL event would have generated a stace trace which would give me a better understanding of what happened at the time.

Thanks,

Marc Paquette

0 Kudos
banana_hammock
1 Nickel

Re: Broken table - master received SIGSEGV / cache lookup fail / cannot open relation

Jump to solution

Thanks Mark,

I ran gpcheckcat which didn't find any errors.

Here are the stack traces for the various errors.

Unexpected internal error: Master process received signal SIGSEGV (postgres.c:3360)

Command/SQL executed:

bzcat <file> | psql jsm_partner_billing

Log/Stack-trace:

",,,,,,"COPY data.clog_users_20111201( date, market, partner, isp_id, internal_user_id, user_key ) FROM stdin",0,,,,

2012-03-15 08:35:42.367734 EST,"billing","jsm_partner_billing",p19558,th-1931185280,"10.156.6.44","51690",2012-03-15 08:35:42 EST,10316120,con6658,cmd1,seg-1,,dx99460,x10316120,sx1,"FATAL","XX000","Unexpected internal error: Master process received signal SIGSEGV (postgres.c:3360)",,,,,,"COPY data.clog_users_20111201( date, market, partner, isp_id, internal_user_id, user_key ) FROM stdin",0,,"postgres.c",3360,"Stack trace:

1    0xa36ee5 postgres errstart (elog.c:454)

2    0x8e0a65 postgres <symbol not found> (postgres.c:3356)

3    0x39938302d0 libc.so.6 <symbol not found> (??:0)

4    0x637570 postgres CopyFromDispatch (copy.c:2799)

5    0x64015c postgres DoCopy (copy.c:1575)

6    0x8efb04 postgres ProcessUtility (utility.c:1046)

7    0x8ed260 postgres PortalRun (pquery.c:1522)

8    0x8e1c48 postgres <symbol not found> (postgres.c:1707)

9    0x8e4d19 postgres PostgresMain (postgres.c:4609)

10   0x84f8d1 postgres <symbol not found> (postmaster.c:6436)

11   0x856bc5 postgres PostmasterMain (postmaster.c:2272)

12   0x7655ea postgres main (main.c:212)

13   0x399381d994 libc.so.6 __libc_start_main (??:0)

14   0x475039 postgres <symbol not found> (??:0)

"

cache lookup failed for attribute 12 of relation 21374090 (lsyscache.c:477)

Command/SQL executed:

select count(1) from clog_users ;


Log/Stack-trace:

2012-03-15 08:52:06.578956 EST,"billing","jsm_partner_billing",p31036,th-1931185280,"10.156.6.44","38399",2012-03-15 08:51:01 EST,10318702,con6818,cmd6,seg-1,,dx101809,x10318702,sx1,"LOG","00000","statement: select count(1) from clog_users ;",,,,,,"select count(1) from clog_users ;",0,,"postgres.c",1526,

2012-03-15 08:52:06.630471 EST,"billing","jsm_partner_billing",p31036,th-1931185280,"10.156.6.44","38399",2012-03-15 08:51:01 EST,10318702,con6818,cmd6,seg-1,,dx101809,x10318702,sx1,"ERROR","XX000","cache lookup failed for attribute 12 of relation 21374090 (lsyscache.c:477)",,,,,,"select count(1) from clog_users ;",0,,"lsyscache.c",477,"Stack trace:

1    0xa3349a postgres <symbol not found> (elog.c:454)

2    0xa35208 postgres elog_finish (elog.c:1365)

3    0xa2826b postgres get_atttypetypmod (lsyscache.c:477)

4    0x7e27ea postgres find_indexkey_var (pathkeys.c:1461)

5    0xb513c6 postgres cdbpathlocus_from_baserel (cdbpathlocus.c:241)

6    0x826cc2 postgres create_seqscan_path (pathnode.c:1113)

7    0x7c042a postgres <symbol not found> (allpaths.c:333)

8    0x7c070b postgres <symbol not found> (allpaths.c:495)

9    0x7c17db postgres make_one_rel (allpaths.c:196)

10   0x7fbd0e postgres query_planner (planmain.c:236)

11   0x800747 postgres <symbol not found> (planner.c:1380)

12   0x802b2a postgres subquery_planner (planner.c:745)

13   0x803cb4 postgres planner (planner.c:241)

14   0x8e1a49 postgres <symbol not found> (postgres.c:800)

15   0x8e4d19 postgres PostgresMain (postgres.c:4609)

16   0x84f8d1 postgres <symbol not found> (postmaster.c:6436)

17   0x856bc5 postgres PostmasterMain (postmaster.c:2272)

18   0x7655ea postgres main (main.c:212)

19   0x399381d994 libc.so.6 __libc_start_main (??:0)

20   0x475039 postgres <symbol not found> (??:0)

"

ERROR:  could not open relation with OID 21109330 (heapam.c:880)

Command/SQL executed:

select count(1) from clog_users;

Log/Stack-trace:

2012-03-14 14:56:33.807323 EST,"gpadmin","jsm_partner_billing",p12892,th-1250180224,"[local]",,2012-03-14 14:56:19 EST,10198458,con13075,cmd17,seg-1,,dx151388,x10198458,sx1,"ERROR","XX000","could not open relation with OID 21109330 (heapam.c:880)",,,,,,"select count(1) from clog_users;",0,,"heapam.c",880,"Stack trace:

1    0xa3349a postgres <symbol not found> + 0xa3349a

2    0xa35208 postgres elog_finish + 0xb8

3    0x4c0878 postgres heap_open + 0x108

4    0x81c007 postgres expand_inherited_tables + 0x207

5    0x802538 postgres subquery_planner + 0x178

6    0x803cb4 postgres planner + 0x154

7    0x8e1a49 postgres <symbol not found> + 0x8e1a49

8    0x8e4d19 postgres PostgresMain + 0x11d9

9    0x84f8d1 postgres <symbol not found> + 0x84f8d1

10   0x856bc5 postgres PostmasterMain + 0x17d5

11   0x7655ea postgres main + 0x4ba

12   0x399381d994 libc.so.6 __libc_start_main + 0xf4

13   0x475039 postgres <symbol not found> + 0x475039

"


0 Kudos
MPPaquette
1 Nickel

Re: Broken table - master received SIGSEGV / cache lookup fail / cannot open relation

Jump to solution

Hi Banana,

I would open up a support ticket for this issue.. There's a lot of different things that need to be looked at and I think working with a Technical Support Engineer through a support ticket would be the best choice for this type of issue.

My apologies,

Marc Paquette

banana_hammock
1 Nickel

Re: Broken table - master received SIGSEGV / cache lookup fail / cannot open relation

Jump to solution

Ok, will do.

I'll update this thread with more info or the resolution when I get it.

Thanks for the help.

0 Kudos
banana_hammock
1 Nickel

Re: Broken table - master received SIGSEGV / cache lookup fail / cannot open relation

Jump to solution

Resolved!

Ok so it turns out I was the problem.

My parent/child tables had differing columns. The parent had a column named user_id, which should've been called 'user_key'.

When the column was renamed, all the problems went away.

0 Kudos
Highlighted
ViadeaZhu
1 Nickel

Re: Broken table - master received SIGSEGV / cache lookup fail / cannot open relation

Jump to solution

This issue is related to wrong syntax to inherit parent table.

Per http://www.postgresql.org/docs/8.2/static/ddl-inherit.html , we don't need to specify the columns of parent table again in the definition of child table.

In below reprodute test, when defining child_table, column "id" is not needed.

The minimum reproduce is:

1.

create table parent_table(id int,name varchar);

create table child_table(id int,newname int) INHERITS ( parent_table ) distributed by (id,newname);

2. Prepare a data file:

cat a.txt

111 '111'

[gpadmin@mdw viadea]$ cat  a.txt | psql -c "COPY child_table( id, newname ) FROM stdin"

FATAL:  Unexpected internal error: Master process received signal SIGSEGV (postgres.c:3388)

DETAIL:  (interrupt holdoff count 0, critical section count 0)

Master log:

2012-03-21 09:27:30.717632 CST,"gpadmin","viadea",p30512,th-539384128,"[local]",,2012-03-21 09:27:30 CST,852167,con40367,cmd1,seg-1,,dx21623,x852167,sx1,"FATAL","XX000","Unexpected internal error: Master process received signal SIGSEGV (postgres.c:3388)","(interrupt holdoff count 0, critical section count 0)",,,,,"COPY child_table( id, newname ) FROM stdin",0,,"postgres.c",3388,"Stack trace:

1    0xa47395 postgres errstart (elog.c:466)

2    0x8ecdb8 postgres <symbol not found> (postgres.c:3382)

3    0x3b07c302d0 libc.so.6 <symbol not found> (??:0)

4    0x649871 postgres CopyFromDispatch (copy.c:2809)

5    0x65249c postgres DoCopy (copy.c:1576)

6    0x8fa964 postgres ProcessUtility (utility.c:1055)

7    0x8f8150 postgres PortalRun (pquery.c:1497)

8    0x8ed468 postgres <symbol not found> (postgres.c:1707)

9    0x8ef8b7 postgres PostgresMain (postgres.c:4649)

10   0x85a131 postgres <symbol not found> (postmaster.c:6745)

11   0x8619d5 postgres PostmasterMain (postmaster.c:2303)

12   0x76b1aa postgres main (main.c:212)

13   0x3b07c1d994 libc.so.6 __libc_start_main (??:0)

14   0x47a709 postgres <symbol not found> (??:0)

Workaround:

Change the definition of child table to not include the columns of parent table.

Good example:

drop table parent_table cascade;

create table parent_table(id int,name varchar);

create table child_table(newname int) INHERITS ( parent_table ) distributed by (id,newname);

create table child_table2(newname int) INHERITS ( parent_table ) distributed by (id);

create table child_table3(newname int) INHERITS ( parent_table ) distributed by (newname);

[gpadmin@mdw viadea]$ cat  a.txt | psql -c "COPY child_table( id, newname ) FROM stdin"

[gpadmin@mdw viadea]$ cat  a.txt | psql -c "COPY child_table2( id, newname ) FROM stdin"

[gpadmin@mdw viadea]$ cat  a.txt | psql -c "COPY child_table3( id, newname ) FROM stdin"

[gpadmin@mdw viadea]$ psql

Timing is on.

psql (8.2.15)

Type "help" for help.

viadea=# select * from child_table;

id  | name | newname

-----+------+---------

111 |      |     111

(1 row)

Time: 20.849 ms

viadea=# select * from child_table2;

id  | name | newname

-----+------+---------

111 |      |     111

(1 row)

Time: 1.802 ms

viadea=# select * from child_table3;

id  | name | newname

-----+------+---------

111 |      |     111

(1 row)

Time: 2.011 ms